Title: Sampling
1Sample size calculation
Ioannis Karagiannisbased on previous EPIET
material
2Objectives sample size
- To understand
- Why we estimate sample size
- Principles of sample size calculation
- Ingredients needed to estimate sample size
3The idea of statistical inference
Generalisation to the population
Conclusions based on the sample
Population
Hypotheses
Sample
4Why bother with sample size?
- Pointless if power is too small
- Waste of resources if sample size needed is too
large
5Questions in sample size calculation
- A national Salmonella outbreak has occurred with
several hundred cases - You plan a case-control study to identify if
consumption of food X is associated with
infection - How many cases and controls should you recruit?
6Questions in sample size calculation
- An outbreak of 14 cases of a mysterious disease
has occurred in cohort 2012 - You suspect exposure to an activity is associated
with illness and plan to undertake a cohort study
under the kind auspices of coordinators - With the available cases, how much power will you
have to detect a RR of 1.5?
7Issues in sample size estimation
- Estimate sample needed to measure thefactor of
interest - Trade-off between study size and resources
- Sample size determined by various factors
- significance level (a)
- power (1-ß)
- expected prevalence of factor of interest
8Which variables should be included in the sample
size calculation?
- The sample size calculation should relate to the
study's primary outcome variable. - If the study has secondary outcome variables
which are also considered important, the sample
size should also be sufficient for the analyses
of these variables.
9Allowing for response rates and other losses to
the sample
- The sample size calculation should relate to the
final, achieved sample. - Need to increase the initial numbers in
accordance with - the expected response rate
- loss to follow up
- lack of compliance
- The link between the initial numbers approached
and the final achieved sample size should be made
explicit.
10Significance testingnull and alternative
hypotheses
- Null hypothesis (H0)
- There is no difference
- Any difference is due to chance
- Alternative hypothesis (H1)
- There is a true difference
11Examples of null hypotheses
- Case-control study
- H0 OR1
- the odds of exposure among cases are the same
asthe odds of exposure among controls - Cohort study
- H0 RR1
- the AR among the exposed is the same as the AR
among the unexposed
12Significance level (p-value)
- probability of finding a difference (RR?1, reject
H0), when no difference exists - a or type I error usually set at 5
- p-value used to reject H0 (significance level)
- ? NB a hypothesis is never accepted
13Type II error and power
- ß is the type II error
- probability of not finding a difference, when a
difference really does exist - Power is (1-ß) and is usually set to 80
- probability of finding a difference when a
difference really does exist (sensitivity)
14Significance and power
Truth Truth Truth
H0 true No difference H0 false Difference
Decision Cannot reject H0 Correct decision Type II error ß
Decision Reject H0 Type I error level a significance Correct decision power 1-ß
15How to increase power
- increase sample size
- increase desired difference (or effect size)
required - ? NB increasing the desired difference in RR/OR
means move it away from 1! - increase significance level desired(a error)
- ? Narrower confidence intervals
16The effect of sample size
- Consider 3 cohort studies looking at exposure to
oysters with N10, 100, 1000 - In all 3 studies, 60 of the exposed are ill
compared to 40 of unexposed (RR 1.5)
17Table A (N10)
Became ill Became ill Became ill
Yes Total AR
Ate oysters Yes 3 5 3/5
Ate oysters No 2 5 2/5
Ate oysters Total 5 10 5/10
RR1.5, 95 CI 0.4-5.4, p0.53
18Table B (N100)
Became ill Became ill Became ill
Yes Total AR
Ate oysters Yes 30 50 30/50
Ate oysters No 20 50 20/50
Ate oysters Total 50 100 50/100
RR1.5, 95 CI 1.0-2.3, p0.046
19Table C (N1000)
Became ill Became ill Became ill
Yes No AR
Ate oysters Yes 300 500 300/500
Ate oysters No 200 500 200/500
Ate oysters Total 500 1000 500/1000
RR1.5, 95 CI 1.3-1.7, plt0.001
20Sample size and power
- In Table A, with n10 sample, there was no
significant association with oysters, but there
was with a larger sample size. - In Tables B and C, with bigger samples, the
association became significant.
21Cohort sample size parameters to consider
- Risk ratio worth detecting
- Expected frequency of disease in unexposed
population - Ratio of unexposed to exposed
- Desired level of significance (a)
- Power of the study (1-ß)
22Cohort Episheet Power calculation
- Risk of a error 5
- Population exposed 100
- Exp freq disease in unexposed 5
- Ratio of unexposed to exposed 11
- RR to detect 1.5
23(No Transcript)
24(No Transcript)
25Case-control sample size parameters to consider
- Number of cases
- Number of controls per case
- OR ratio worth detecting
- of exposed persons in source population
- Desired level of significance (a)
- Power of the study (1-ß)
26Case-control Power calculation
- a error 5
- Number of cases 200
- Proportion of controls exposed 5
- OR to detect 1.5
- No. controls/case 11
27(No Transcript)
28Statistical Power of aCase-Control Study for
different control-to-case ratios and odds ratios
(50 cases)
29Statistical Power of aCase-Control Study
30Sample size for proportions parameters to
consider
- Population size
- Anticipated p
- a error
- Design effect
- ? Easy to calculate on openepi.com
31Conclusions
- Dont forget to undertake sample size/power
calculations - Use all sources of currently available data to
inform your estimates - Try several scenarios
- Adjust for non-response
- Let it be feasible
32Acknowledgements
Nick Andrews, Richard Pebody, Viviane Bremer