Ambitious title? presentation

About This Presentation

Transcript and Presenter's Notes

Title: Ambitious title?

1
Ambitious title?

Confidence intervals, design effects and
significance tests for surveys.
How to calculate sample numbers when planning a
survey.

2
Summary

Statistical inference
Design based
Model based
Confidence intervals and hypothesis tests -
general
Their modification for survey designs
Design effects and design factors
Calculation of sample numbers for studies
Their modification for complex surveys

3
Statistical inference

Making inferences about some aspect of the
population, using observation to draw conclusions
about the population now, or will evolve in
future
Data are what we are given
Inference allows us to turn them into information

4
Elements needed for statistical inference
design based

Want to learn something about a population
You have
A model of how the sample was selected from the
population.
Some data obtained from the sample
Knowledge of how to estimate!
E.g. Obtain data on the income of 10,000 from a
population of 5 million.
Need inference to estimate the income
distribution of the whole 5 million and to know
how close this is to the population value

5
Elements needed for statistical inference model
based

You have
A model that could have generated the data for
your population, along with ideas about what
current and future populations this might
generalise to..
Some data that can be assumed to be generated by
this model.
Knowledge of how to carry out the inference!
E.g. Obtain data on the income of 10,000 from a
population and can make the assumption that the
income distribution follows some mathematical
distribution
Need inference about the assumed model for the
income distribution of the whole 5 million and
how close your estimate will be to the true value

6
How do design and model based inferences differ?

Conceptually poles apart
In practice they give the same answers
Except when numbers are small
Or when a large proportion of the population has
been sampled
But its good to think about what you are doing
and decide which type fits your problem

7
Next set of results

Apply to a simple unstructured sample
No clustering
No stratification
No weighting
Taken from a population with replacement (not a
problem in model based inference)
Exactly the same large-sample results apply for
model-based and design-based inferences

8
Mean of 9 x s
? m
m ?
9
Standard error of the mean
Approx a normal distr with s.d.
The data are fixed, so this tells us where m is
likely to be.
is called the standard error of the sample mean
Sometimes s.e.mean - it measures the expected
distance of the true mean from the mean of the
observed sample. A 100(1-a) confidence interval
for m from the normal distribution Is
10
Values of Z for confidence intervals

95 c.I. Gives Z 1.96
99
Z 2.58
68
Z 1
90
Z 1.64

11
We can use it for proportions too

Want too estimate a proportion p - e.g. a
proportion of 20 year olds who use the internet
Then r/n estimates p
with standard error
to use this formula we replace p with
A rule of thumb is that this approximation is OK
if the smaller of r and (n-r) is gt5.

12
Are these formulae good enough?

Yes unless your survey is too small to be any
use
They extend easily to differences in means and
proportions
Similar approximate results apply to regression
models and logistic regressions
BUT they only apply to simple samples

13
But my data are more complicated than thisAnd
nobody will let me put standard erorrs or
confidence intervals in my report

A goal of a good statistical report is that it
should not include and tables or graphs where
what seems to be information are just the result
of chance variation (noise).
set out your task in terms of an outcome
predicted from other factors
Carry out a set of regression predictions
Base the tables to go in the report on the
regression models that are found to be more than
chance effects

14
Inferences for complex surveys

The usual formulae and regression models dont
hold
Most surveys use weighting
And allowances for clustering and stratification
have to be made
Software that modifies the results we have just
discussed and calculates them correctly for
complex surveys is now available

15
Two main methods are used

Taylor linearisation theory of this all worked
out in the 1940s and 50s
Replication methods, jacknives and bootsraps
1960s and 1970s
Only now is software readily available to do
things properly

16
Getting by without the correct software

Carry out an analysis using an ordinary computer
package (eg. SAS, SPSS simple procedures)
But use a weight in the analysis to get results
that will correct the bias in the estimates
Your weighted analysis will get you the wrong
standard errors and wrong tests, but the
estimates will be about right.
Use design effect tables to get some idea of the
standard errors

17
Using the correct software

Is not difficult PEAS web site explains how
Routines are available in SAS, SPSS, STATA and R
But it does mean that you need to get details of
the survey design
E.g. PSU, stratification variables need to be
available
Easier for you than for me

18
Getting by without the correct software

Use a table of design effects (DE)
Often published with the surveys
To get a s.e. from a complex survey
Calculate the design factor (DF) as the square
root of the DE
Multiply the s.e. from a simple analysis by DF
For most household surveys DEs vary from about
0.8 to 2 or 3.
This is a rough and ready method and will only
work if weights are not too far from 1.0

19
Disadvantages of this

DEs are not constant for a survey
They are also different (usually lower) when
subgroups of a survey are selected
They may also be lower in complicated models,
like regressions where it is also very hard to
know how to apply them.
Methods are approximate

20
Uses of design effects (DEs)

They tell you about how well your survey design
has worked
Most survey software produce estimates of design
effects with their output
A design effect of 2 means your effective sample
size is halved
It is good to have such estimates when planning
sample numbers for surveys.

21
Sample numbers for planning studies

Think ahead about the sort of comparisons you
might want to make
Are you interested in time trends?
Or in comparisons between certain groups
If so, what proportions in each
Do you want to estimate something (eg of
children in poverty)?

Ambitious title? PowerPoint PPT Presentation