Ambitious title? - PowerPoint PPT Presentation

About This Presentation
Title:

Ambitious title?

Description:

Ambitious title? Confidence intervals, design effects and significance tests for surveys. How to calculate sample numbers when planning a survey. – PowerPoint PPT presentation

Number of Views:60
Avg rating:3.0/5.0
Slides: 24
Provided by: susa2652
Category:
Tags: ambitious | title

less

Transcript and Presenter's Notes

Title: Ambitious title?


1
Ambitious title?
  • Confidence intervals, design effects and
    significance tests for surveys.
  • How to calculate sample numbers when planning a
    survey.

2
Summary
  • Statistical inference
  • Design based
  • Model based
  • Confidence intervals and hypothesis tests -
    general
  • Their modification for survey designs
  • Design effects and design factors
  • Calculation of sample numbers for studies
  • Their modification for complex surveys

3
Statistical inference
  • Making inferences about some aspect of the
    population, using observation to draw conclusions
    about the population now, or will evolve in
    future
  • Data are what we are given
  • Inference allows us to turn them into information

4
Elements needed for statistical inference
design based
  • Want to learn something about a population
  • You have
  • A model of how the sample was selected from the
    population.
  • Some data obtained from the sample
  • Knowledge of how to estimate!
  • E.g. Obtain data on the income of 10,000 from a
    population of 5 million.
  • Need inference to estimate the income
    distribution of the whole 5 million and to know
    how close this is to the population value

5
Elements needed for statistical inference model
based
  • You have
  • A model that could have generated the data for
    your population, along with ideas about what
    current and future populations this might
    generalise to..
  • Some data that can be assumed to be generated by
    this model.
  • Knowledge of how to carry out the inference!
  • E.g. Obtain data on the income of 10,000 from a
    population and can make the assumption that the
    income distribution follows some mathematical
    distribution
  • Need inference about the assumed model for the
    income distribution of the whole 5 million and
    how close your estimate will be to the true value

6
How do design and model based inferences differ?
  • Conceptually poles apart
  • In practice they give the same answers
  • Except when numbers are small
  • Or when a large proportion of the population has
    been sampled
  • But its good to think about what you are doing
    and decide which type fits your problem

7
Next set of results
  • Apply to a simple unstructured sample
  • No clustering
  • No stratification
  • No weighting
  • Taken from a population with replacement (not a
    problem in model based inference)
  • Exactly the same large-sample results apply for
    model-based and design-based inferences

8
Mean of 9 x s
? m
m ?
9
Standard error of the mean
Approx a normal distr with s.d.
The data are fixed, so this tells us where m is
likely to be.
is called the standard error of the sample mean
Sometimes s.e.mean - it measures the expected
distance of the true mean from the mean of the
observed sample. A 100(1-a) confidence interval
for m from the normal distribution Is
10
Values of Z for confidence intervals
  • 95 c.I. Gives Z 1.96
  • 99
  • Z 2.58
  • 68
  • Z 1
  • 90
  • Z 1.64

11
We can use it for proportions too
  • Want too estimate a proportion p - e.g. a
    proportion of 20 year olds who use the internet
  • Then r/n estimates p
  • with standard error
  • to use this formula we replace p with
  • A rule of thumb is that this approximation is OK
    if the smaller of r and (n-r) is gt5.

12
Are these formulae good enough?
  • Yes unless your survey is too small to be any
    use
  • They extend easily to differences in means and
    proportions
  • Similar approximate results apply to regression
    models and logistic regressions
  • BUT they only apply to simple samples

13
But my data are more complicated than thisAnd
nobody will let me put standard erorrs or
confidence intervals in my report
  • A goal of a good statistical report is that it
    should not include and tables or graphs where
    what seems to be information are just the result
    of chance variation (noise).
  • set out your task in terms of an outcome
    predicted from other factors
  • Carry out a set of regression predictions
  • Base the tables to go in the report on the
    regression models that are found to be more than
    chance effects

14
Inferences for complex surveys
  • The usual formulae and regression models dont
    hold
  • Most surveys use weighting
  • And allowances for clustering and stratification
    have to be made
  • Software that modifies the results we have just
    discussed and calculates them correctly for
    complex surveys is now available

15
Two main methods are used
  • Taylor linearisation theory of this all worked
    out in the 1940s and 50s
  • Replication methods, jacknives and bootsraps
    1960s and 1970s
  • Only now is software readily available to do
    things properly

16
Getting by without the correct software
  • Carry out an analysis using an ordinary computer
    package (eg. SAS, SPSS simple procedures)
  • But use a weight in the analysis to get results
    that will correct the bias in the estimates
  • Your weighted analysis will get you the wrong
    standard errors and wrong tests, but the
    estimates will be about right.
  • Use design effect tables to get some idea of the
    standard errors

17
Using the correct software
  • Is not difficult PEAS web site explains how
  • Routines are available in SAS, SPSS, STATA and R
  • But it does mean that you need to get details of
    the survey design
  • E.g. PSU, stratification variables need to be
    available
  • Easier for you than for me

18
Getting by without the correct software
  • Use a table of design effects (DE)
  • Often published with the surveys
  • To get a s.e. from a complex survey
  • Calculate the design factor (DF) as the square
    root of the DE
  • Multiply the s.e. from a simple analysis by DF
  • For most household surveys DEs vary from about
    0.8 to 2 or 3.
  • This is a rough and ready method and will only
    work if weights are not too far from 1.0

19
Disadvantages of this
  • DEs are not constant for a survey
  • They are also different (usually lower) when
    subgroups of a survey are selected
  • They may also be lower in complicated models,
    like regressions where it is also very hard to
    know how to apply them.
  • Methods are approximate

20
Uses of design effects (DEs)
  • They tell you about how well your survey design
    has worked
  • Most survey software produce estimates of design
    effects with their output
  • A design effect of 2 means your effective sample
    size is halved
  • It is good to have such estimates when planning
    sample numbers for surveys.

21
Sample numbers for planning studies
  • Think ahead about the sort of comparisons you
    might want to make
  • Are you interested in time trends?
  • Or in comparisons between certain groups
  • If so, what proportions in each
  • Do you want to estimate something (eg of
    children in poverty)?

22
Use spread sheet sample numbers.xls
23
To modify these for surveys
  • Simply multiply your answer by an estimate of the
    design effect
  • Or try to do the next survey better by getting a
    smaller design effect
Write a Comment
User Comments (0)
About PowerShow.com