Sampling and Sample Size Calculation - PowerPoint PPT Presentation

1 / 49
About This Presentation
Title:

Sampling and Sample Size Calculation

Description:

Thomas Grein, Denis Coulombier, Philippe Sudre, Mike Catchpole, Denise Antona -IDEA. Brigitte Helynck, Philippe Malfait, Institut de veille sanitaire ... – PowerPoint PPT presentation

Number of Views:3095
Avg rating:3.0/5.0
Slides: 50
Provided by: biagiop
Category:

less

Transcript and Presenter's Notes

Title: Sampling and Sample Size Calculation


1
Sampling and Sample Size Calculation
Lazereto de Mahón, Menorca, Spain September 2006
Sources -EPIET Introductory course, Thomas
Grein, Denis Coulombier, Philippe Sudre, Mike
Catchpole, Denise Antona -IDEA Brigitte
Helynck, Philippe Malfait, Institut de veille
sanitaire Modified Viviane Bremer, EPIET 2004,
Suzanne Cotter 2005, Richard Pebody 2006
2
Objectives sampling
  • To understand
  • Why we use sampling
  • Definitions in sampling
  • Sampling errors
  • Main methods of sampling
  • Sample size calculation

3
Why do we use sampling?
  • Get information from large populations with
  • Reduced costs
  • Reduced field time
  • Increased accuracy
  • Enhanced methods

4
Definition of sampling
  • Procedure by which some members
  • of a given population are selected as
    representatives of the entire population

5
Definition of sampling terms
  • Sampling unit (element)
  • Subject under observation on which information is
    collected
  • Example children lt5 years, hospital discharges,
    health events
  • Sampling fraction
  • Ratio between sample size and population size
  • Example 100 out of 2000 (5)

6
Definition of sampling terms
  • Sampling frame
  • List of all the sampling units from which sample
    is drawn
  • Lists e.g. children lt 5 years of age,
    households, health care units
  • Sampling scheme
  • Method of selecting sampling units from sampling
    frame
  • Randomly, convenience sample

7
Survey errors
  • Systematic error (or bias)
  • Sample not typical of population
  • Inaccurate response (information bias)
  • Selection bias
  • Sampling error (random error)

8
Representativeness (validity)
  • A sample should accurately reflect distribution
    of
  • relevant variable in population
  • Person e.g. age, sex
  • Place e.g. urban vs. rural
  • Time e.g. seasonality
  • Representativeness essential to generalise
  • Ensure representativeness before starting,
  • Confirm once completed

9
Sampling and representativeness
Sampling Population
Sample
Target Population
Target Population ? Sampling Population ? Sample
10
Sampling error
  • Random difference between sample and population
    from which sample drawn
  • Size of error can be measured in probability
    samples
  • Expressed as standard error
  • of mean, proportion
  • Standard error (or precision) depends upon
  • Size of the sample
  • Distribution of character of interest in
    population

11
Sampling error
When simple random sample of size n is selected
from population of size N, standard error (s) for
population mean or proportion is s
p(1-p)
? n
n Used to calculate, 95 confidence intervals
Estimated 95 confidence interval
12
Quality of a sampling estimate
Precision validity
13
Survey errors example
  • Measuring height
  • Measuring tape held differently by different
    investigators
  • ? loss of precision
  • Large standard error
  • Tape shrunk/wrong
  • ? systematic error
  • Bias (cannot be corrected afterwards)

179
178
177
176
175
174
173
14
Types of sampling
  • Non-probability samples
  • Probability samples

15
Non probability samples
  • Convenience samples (ease of access)
  • Snowball sampling (friend of friend.etc.)
  • Purposive sampling (judgemental)
  • You chose who you think should be in the study

Probability of being chosen is unknown Cheaper-
but unable to generalise, potential for bias
16
Probability samples
  • Random sampling
  • Each subject has a known probability of being
    selected
  • Allows application of statistical sampling theory
    to results to
  • Generalise
  • Test hypotheses

17
Methods used in probability samples
  • Simple random sampling
  • Systematic sampling
  • Stratified sampling
  • Multi-stage sampling
  • Cluster sampling

18
Simple random sampling
  • Principle
  • Equal chance/probability of drawing each unit
  • Procedure
  • Take sampling population
  • Need listing of all sampling units (sampling
    frame)
  • Number all units
  • Randomly draw units

19
Simple random sampling
  • Advantages
  • Simple
  • Sampling error easily measured
  • Disadvantages
  • Need complete list of units
  • Does not always achieve best representativeness
  • Units may be scattered and poorly accessible

20
Simple random sampling
  • Example evaluate the prevalence of tooth decay
    among 1200 children attending a school
  • List of children attending the school
  • Children numerated from 1 to 1200
  • Sample size 100 children
  • Random sampling of 100 numbers between 1 and 1200

How to randomly select?
21
EPITABLE random number listing
22
EPITABLE random number listing
Also possible in Excel
23
Simple random sampling
24
Systematic sampling
  • Principle
  • Select sample at regular intervals based on
    sampling fraction
  • Advantages
  • Simple
  • Sampling error easily measured
  • Disadvantages
  • Need complete list of units
  • Periodicity

25
Systematic sampling
  • N 1200, and n 60
  • ? sampling fraction 1200/60 20
  • List persons from 1 to 1200
  • Randomly select a number between 1 and 20 (ex
    8)
  • ? 1st person selected the 8th on the list
  • ? 2nd person 8 20 the 28th etc .....

26
Systematic sampling
27
Stratified sampling
  • Principle
  • Divide sampling frame into homogeneous subgroups
    (strata) e.g. age-group, occupation
  • Draw random sample in each strata.

28
Stratified sampling
  • Advantages
  • Can acquire information about whole population
    and individual strata
  • Precision increased if variability within strata
    is less (homogenous) than between strata
  • Disadvantages
  • Can be difficult to identify strata
  • Loss of precision if small numbers in individual
    strata
  • resolve by sampling proportionate to stratum
    population

29
Multiple stage sampling
  • Principle
  • consecutive sampling
  • example sampling unit household
  • 1st stage draw neighborhoods
  • 2nd stage draw buildings
  • 3rd stage draw households

30
Cluster sampling
  • Principle
  • Sample units not identified independently but in
    a group (or cluster)
  • Provides logistical advantage.

31
Cluster sampling
  • Principle
  • Whole population divided into groups e.g.
    neighbourhoods
  • Random sample taken of these groups (clusters)
  • Within selected clusters, all units e.g.
    households included (or random sample of these
    units)

32
Example Cluster sampling
Section 2
Section 1
Section 3
Section 5
Section 4
33
Cluster sampling
  • Advantages
  • Simple as complete list of sampling units within
    population not required
  • Less travel/resources required
  • Disadvantages
  • Potential problem is that cluster members are
    more likely to be alike, than those in another
    cluster (homogenous).
  • This dependence needs to be taken into account
    in the sample size.and the analysis (design
    effect)

34
Selecting a sampling method
  • Population to be studied
  • Size/geographical distribution
  • Heterogeneity with respect to variable
  • Availability of list of sampling units
  • Level of precision required
  • Resources available

35
Sample size estimation
  • Estimate number needed to
  • reliably measure factor of interest
  • detect significant association
  • Trade-off between study size and resources.
  • Sample size determined by various factors
  • significance level (alpha)
  • power (1-beta)
  • expected prevalence of factor of interest

36
Type 1 error
  • The probability of finding a difference with our
    sample compared to population, and there really
    isnt one.
  • Known as the a (or type 1 error)
  • Usually set at 5 (or 0.05)

37
Type 2 error
  • The probability of not finding a difference that
    actually exists between our sample compared to
    the population
  • Known as the ß (or type 2 error)
  • Power is (1- ß) and is usually 80

38
A question?
  • Are the English more intelligent than the Dutch?
  • H0 Null hypothesis The English and Dutch have
    the same mean IQ
  • Ha Alternative hypothesis The mean IQ of the
    English is greater than the Dutch

39
Type 1 and 2 errors
  • Truth
  • Decision H0 true H0 false
  • Reject H0 Type I error Correct decision
  • Accept H0 Correct Type II error
  • decision

40
Power
  • The easiest ways to increase power are to
  • increase sample size
  • increase desired difference (or effect size)
  • decrease significance level desired e.g. 10

41
Steps in estimating sample size for descriptive
survey
  • Identify major study variable
  • Determine type of estimate (, mean, ratio,...)
  • Indicate expected frequency of factor of interest
  • Decide on desired precision of the estimate
  • Decide on acceptable risk that estimate will fall
    outside its real population value
  • Adjust for estimated design effect
  • Adjust for expected response rate

42
Sample size fordescriptive survey
Simple random / systematic sampling
z² p q
1.96²0.150.85
-------------- ----------------------
544
n

0.03²
Cluster sampling
z² p q
21.96²0.150.85
n g
-------------- ------------------------
1088

0.03²
z alpha risk expressed in z-score
p expected prevalence
q 1 - p
d absolute precision
g design effect
43
Case-control sample size issues to consider
  • Number of cases
  • Number of controls per case
  • Odds ratio worth detecting
  • Proportion of exposed persons in source
    population
  • Desired level of significance (a)
  • Power of the study (1-ß)
  • to detect at a statistically significant level a
    particular odds ratio

44
Case-controlSTATCALC Sample size
45
Case-control STATCALC Sample size
  • Risk of alpha error 5
  • Power 80
  • Proportion of controls exposed 20
  • OR to detect gt 2

46
Case-controlSTATCALC Sample size
47
Statistical Power of aCase-Control Study for
different control-to-case ratios and odds ratios
(with 50 cases)
48
Conclusions
  • Probability samples are the best
  • Ensure
  • Representativeness
  • Precision
  • ..within available constraints

49
Conclusions
  • If in doubt
  • Call a statistician !!!!
Write a Comment
User Comments (0)
About PowerShow.com