Title: Ambitious title?
1Ambitious title?
- Confidence intervals, design effects and
significance tests for surveys. - How to calculate sample numbers when planning a
survey.
2Summary
- Statistical inference
- Design based
- Model based
- Confidence intervals and hypothesis tests -
general - Their modification for survey designs
- Design effects and design factors
- Calculation of sample numbers for studies
- Their modification for complex surveys
3Statistical inference
- Making inferences about some aspect of the
population, using observation to draw conclusions
about the population now, or will evolve in
future - Data are what we are given
- Inference allows us to turn them into information
4Elements needed for statistical inference
design based
- Want to learn something about a population
- You have
- A model of how the sample was selected from the
population. - Some data obtained from the sample
- Knowledge of how to estimate!
- E.g. Obtain data on the income of 10,000 from a
population of 5 million. - Need inference to estimate the income
distribution of the whole 5 million and to know
how close this is to the population value
5Elements needed for statistical inference model
based
- You have
- A model that could have generated the data for
your population, along with ideas about what
current and future populations this might
generalise to.. - Some data that can be assumed to be generated by
this model. - Knowledge of how to carry out the inference!
- E.g. Obtain data on the income of 10,000 from a
population and can make the assumption that the
income distribution follows some mathematical
distribution - Need inference about the assumed model for the
income distribution of the whole 5 million and
how close your estimate will be to the true value
6How do design and model based inferences differ?
- Conceptually poles apart
- In practice they give the same answers
- Except when numbers are small
- Or when a large proportion of the population has
been sampled - But its good to think about what you are doing
and decide which type fits your problem
7Next set of results
- Apply to a simple unstructured sample
- No clustering
- No stratification
- No weighting
- Taken from a population with replacement (not a
problem in model based inference) - Exactly the same large-sample results apply for
model-based and design-based inferences
8Mean of 9 x s
? m
m ?
9Standard error of the mean
Approx a normal distr with s.d.
The data are fixed, so this tells us where m is
likely to be.
is called the standard error of the sample mean
Sometimes s.e.mean - it measures the expected
distance of the true mean from the mean of the
observed sample. A 100(1-a) confidence interval
for m from the normal distribution Is
10Values of Z for confidence intervals
- 95 c.I. Gives Z 1.96
- 99
- Z 2.58
- 68
- Z 1
- 90
- Z 1.64
11We can use it for proportions too
- Want too estimate a proportion p - e.g. a
proportion of 20 year olds who use the internet - Then r/n estimates p
- with standard error
- to use this formula we replace p with
- A rule of thumb is that this approximation is OK
if the smaller of r and (n-r) is gt5.
12Are these formulae good enough?
- Yes unless your survey is too small to be any
use - They extend easily to differences in means and
proportions - Similar approximate results apply to regression
models and logistic regressions - BUT they only apply to simple samples
13But my data are more complicated than thisAnd
nobody will let me put standard erorrs or
confidence intervals in my report
- A goal of a good statistical report is that it
should not include and tables or graphs where
what seems to be information are just the result
of chance variation (noise). - set out your task in terms of an outcome
predicted from other factors - Carry out a set of regression predictions
- Base the tables to go in the report on the
regression models that are found to be more than
chance effects
14Inferences for complex surveys
- The usual formulae and regression models dont
hold - Most surveys use weighting
- And allowances for clustering and stratification
have to be made - Software that modifies the results we have just
discussed and calculates them correctly for
complex surveys is now available
15Two main methods are used
- Taylor linearisation theory of this all worked
out in the 1940s and 50s - Replication methods, jacknives and bootsraps
1960s and 1970s - Only now is software readily available to do
things properly
16Getting by without the correct software
- Carry out an analysis using an ordinary computer
package (eg. SAS, SPSS simple procedures) - But use a weight in the analysis to get results
that will correct the bias in the estimates - Your weighted analysis will get you the wrong
standard errors and wrong tests, but the
estimates will be about right. - Use design effect tables to get some idea of the
standard errors
17Using the correct software
- Is not difficult PEAS web site explains how
- Routines are available in SAS, SPSS, STATA and R
- But it does mean that you need to get details of
the survey design - E.g. PSU, stratification variables need to be
available - Easier for you than for me
18Getting by without the correct software
- Use a table of design effects (DE)
- Often published with the surveys
- To get a s.e. from a complex survey
- Calculate the design factor (DF) as the square
root of the DE - Multiply the s.e. from a simple analysis by DF
- For most household surveys DEs vary from about
0.8 to 2 or 3. - This is a rough and ready method and will only
work if weights are not too far from 1.0
19Disadvantages of this
- DEs are not constant for a survey
- They are also different (usually lower) when
subgroups of a survey are selected - They may also be lower in complicated models,
like regressions where it is also very hard to
know how to apply them. - Methods are approximate
20Uses of design effects (DEs)
- They tell you about how well your survey design
has worked - Most survey software produce estimates of design
effects with their output - A design effect of 2 means your effective sample
size is halved - It is good to have such estimates when planning
sample numbers for surveys.
21Sample numbers for planning studies
- Think ahead about the sort of comparisons you
might want to make - Are you interested in time trends?
- Or in comparisons between certain groups
- If so, what proportions in each
- Do you want to estimate something (eg of
children in poverty)?
22Use spread sheet sample numbers.xls
23To modify these for surveys
- Simply multiply your answer by an estimate of the
design effect - Or try to do the next survey better by getting a
smaller design effect