Title: Sample Size
1Sample Size
- Brian Yuen
- Public Health Sciences Medical Statistics
2Learning outcomes
- By the end of this session you should
- understand why determination of sample size is
important - appreciate some statistical concepts
- be aware of considerations needed to perform
sample size calculations - be able to perform a sample size calculation to a
precision with continuous or binary outcome - be able to perform a sample size calculation
based on two independent groups with continuous
or binary outcome - be appreciate with the useful resources relating
to this topic - be able to use PS to assist calculating sample
size
3Contents
- Introduction
- Reasons and objectives
- Sample size calculations
- for a precision with continuous outcome
- for a precision with binary outcome
- practical session
- Considerations for two independent groups
- 7 ingredients for sample size calculations
- Sample size calculations
- for two independent groups with continuous
outcome - for two independent groups with binary outcome
- practical session
- Fixing sample size and other methods
- Useful resources
- Sample size calculations for two independent
groups in PS - practical session
- Solutions to exercises
4Why sample size calculation?
- Required by ethical committees
- need approval when doing a confirmatory trial or
study - to document your estimation of the required
sample size, and they will not grant approval for
research projects with too few or too many
subjects. - Required for grant application for a study
- Required by lots of journals, one of the
checklist for writing up paper - A thoroughly designed study should consider
sample size calculation - Need to consider this carefully as recruiting too
few or too many subjects could cause ethical
problems - too few can't be specific enough about the size
of the effect in the population, hence the study
would become meaningless, unethical, and waste of
resources - too many comparatively more patients would be
allocated to the inferior treatment, hence
unethical, also waste of resources
5Sample size Objectives
- Aim to find out a reasonably large enough sample
size in which would give us high probability to
detect a clinically worthwhile treatment effect
if it exists - Some information is required in order to perform
the calculation - Generally relates to a single primary outcome
- occasionally more than one is considered
6Using sample estimates
- Uncertainty is often introduced when using a
sample to make inferences about the population,
because the information we collect is only an
estimate - In order to get the best estimate, we need a
representative, unbiased and a reasonably sized
sample - We can quantify uncertainty through
- standard error
- confidence interval
- and use these to calculate sample size
7Sample Size Calculations for a Precision
8with continuous outcome
Objective Estimate a population mean to a
required (or pre-defined) precision
If we know a sensible value for SD (?) and the
desired confidence interval (CI) width, then we
can obtain n, the number of observations required.
9with continuous outcome
- Example Peak Expiratory Flow Rate in young men
- Standard deviation 48 litres/min (Gregg et al,
BMJ, 1973) - Desired 95 confidence interval width ?20
litres/min - CI width
- 20
- Now, solve for n gives
- n
- Hence, a sample of 23 would enable us to estimate
the population PEFR mean to within 20 litres/min
(with 95 probability)
10with binary outcome
Objective Estimate a population proportion to a
required precision
This is useful when estimating a prevalence from
a survey, however, the standard error of a
proportion depends on the proportion itself, that
is the quantity we are trying to estimate.
Hence, we need an initial estimate of p.
11with binary outcome
- Example Prevalence of pancreatic cancer
- Suppose we are trying to estimate the prevalence
of pancreatic cancer, which we suspect to be
about 3, and we want the 95 confidence interval
width to be ?0.5 - CI width
- 0.5
- Now, solve for n gives
- n
- Hence, 4472 subjects are required to estimate the
proportion of prevalence cases to within 0.5
(with 95 probability).
12Considerations Needed for Two Independent Groups
137 ingredients for sample size calculations
- Research question to be answered
- Outcome measure
- Effect size
- Variability success proportions
- For continuous outcome
- For binary outcome
- Type I error
- Type II error
- Other factors
14Further explanations of ingredient 1
- Research question to be answered
- Translate the question into a clear hypothesis!
- For example,
- H0 there is no difference between treatment and
control - H1 there are differences between treatment and
control - Hypothesis ? Statistical results ? Conclusion
- statistically significant result (that is,
plt0.05) - ? enough evidence to reject H0 ? accept H1
- statistically non-significant result (that is,
pgt0.05) - ? no evidence to reject H0
15Further explanations of ingredient 2
- Outcome measures
- Should only have one primary outcome measure per
study! - Could have a secondary outcome measure, but we
can only sample sizing/powering for the primary
outcome - May not have enough power for any results
relating to the secondary outcome - Recall the two types of variables
- Continuous
- Categorical
- If the variable has 2 categories ? Binary
16Further explanations of ingredient 3
- Effect Size (d) from the word difference
- The magnitude of difference that we are looking
for - Clinically important difference
- For 2 treatment arms
- difference in means if continuous outcome
- difference in success proportions if binary
outcome - Minimum value worth detecting
- Decide what the minimum better means by looking
at the endpoint and by considering background
noise - Headache? or Moderate severe headache? or
Migraine? - Values could be found in previous literatures if
they were doing similar study or can be estimated
base on clinical experience but make sure it is
reasonable (Remember GIGO!)
17Further explanations of ingredient 3
- Effect Size (d)
- Example In previous study, morbidity of a
certain illness under conventional care is known
to be 73 - Interested in reducing morbidity to 50
(clinically important) - Therefore the effect size is 23
- A difference between these morbidities
- Example Summarising all the studies with similar
setting and characteristics regarding to a
specific outcome measure, e.g. pain relief - The overall response rate on Placebo is 32
- The overall response rate on Active is 50
- The overall estimate of the difference between
Active and Placebo is 18 - Of all the differences that are found in these
studies, the smallest difference observed is 12 - Could be the minimum value worth detecting
18Further explanations of ingredient 4.1
- Variability (s) pronounce as Sigma
- For continuous outcome only!
- Standard deviation (s) or variance (s2)
represents the spread of the distribution of a
continuous variable - Values can usually be found in previous
literatures or can be estimated base on clinical
experience but make sure it is reasonable (GIGO!) - Choose the largest or pool the values if the
standard deviation of the outcome were reported
in each group or treatment arm in a literature - For further details on pooling standard
deviations, read the next slide - Consider carefully if different standard
deviations of the outcome were found in different
studies - ask yourself if the design or study population of
those studies were similar to yours - if so, be conservative and choose the reasonably
largest to estimate - If none can be found from previous literature,
but range of the outcome is available, can then
divide the range by 6 (remember mean?3SD ? 99.7
of observations) to get an estimate of the
standard deviation
19Pooled standard deviation
- If there are several studies with variance
estimates available it is recommended that an
overall estimate of the population variance or
the pooled variance estimates, sp2, is obtained
from the following formula - where k is the number of studies, si2 is the
variance estimate from the ith study and dfi is
the degrees of freedom about this variance (which
is the corresponding number of observations in
the group minus 1, i.e. (ni - 1)).
20Pooled standard deviation
- Example The following descriptive statistics
(number of subjects, mean standard deviation)
of an outcome measure for each treatment arm were
reported, - Treatment A nA 83, meanA sA 40.98 22.52
- Treatment B nB 87, meanB sB 37.89 19.74
- Using the formula above, the pooled variance
(sp2) and the pooled SD (sp) is
21Further explanations of ingredient 4.2
- Success proportions (p)
- For binary outcome only!
- Normally concerning Cured/Not Cured,
Alive/Deadetc - Require to know the success proportion of the
binary outcome for each group or treatment arm
first, can be found in previous literatures or
estimate with clinical experience (GIGO!) - In the above table, suppose we are interested in
the proportion of Alive, then the success
proportions in each treatment are pA and pB for
treatment A and B respectively - Denote is the average success proportion,
i.e. (pA pB)/2 - We can use these information to find out the
effect size and the standard deviation - The effect size is the difference of the two
success proportions, i.e. pA - pB - The estimated standard deviation is
, where is between 0 and 100
22Further explanations of ingredients 5 6
- Type I error (a) Type II error (ß)
- You should have heard these mentioned in the
Hypothesis Testing session, hence this is just a
reminder - Due to the fact that we are sampling from a
population - Uncertainty is introduced
- Quality of the sample will have an impact on our
conclusion - Error does exist
- There are two types of error
- Type I error (a) observed something in our
sample but not exist in the population (the
truth) - e.g. drinking water leads to cancer
- Type II error (ß) observed nothing in our sample
but something exist in the population (the truth) - e.g. smoking doesnt lead to cancer
23Further explanations of ingredients 5 6
- Type I error (a) Type II error (ß)
- Type I error (a) usually allow for 5
- Significant level a ? cut-off point for
p-value, i.e. 0.05
24Further explanations of ingredients 5 6
- Type I error (a) Type II error (ß)
- Type II error (ß) usually allow for 10 or 20,
more than Type I error (since Type I error is
referred as society risk and hence more crucial
to pharmaceutical company financially) - Power of the study 1-Type II error 1-ß,
usually use 80 or 90, the probability of
detecting a difference in our study if there is
one in the whole population
25Further explanations of ingredient 7
- Other factors
- Calculated sample size meaning the number of
subjects required during the analysis, not the
number to start with for recruiting subjects, if
you want to detect a certain effect size with a
specific significance and power - Study design
- Response rate data gathering affect the response
rate, e.g. about 50 response rate by postal
questionnaire - Drop-out rate due to following subjects for a
long period of time, e.g. cohort study, usually
20 - 25 - Can increase the sample size by a suitable
percentage to allow for these problems - Beware for example, increase calculated sample
size (n) by 25
26Sample Size Calculations forTwo Independent
Groups
27Formula for 2 independent groups
- From the 7 ingredients, there are 4 crucial
factors involve in the actual sample size
calculation - Effect size (d) the size of the difference we
want to be able to detect - Variability (s) or ( ) the
standard deviation of the continuous outcome or
the estimation for the binary outcome - Level of significance (a) the risk of a Type I
error we will accept - Power (1-ß) the risk of a Type II error we will
accept
28Formula for 2 independent groups
- We use these 4 factors to generalise a formula to
calculate sample size for 2 groups with
continuous or binary outcome - The formula is
- where ? is the standardised effect size
- i.e. effect size / variability
- ? d/? for continuous outcome
- ? for binary outcome
29What is z-score?
z-score
- Z-score is the number of standard deviations
above/below the mean. z (x ?)/?
30What is z(1-?/2) and z(1-?)?
- z(1-?/2) is a value from the Normal distribution
relating to significance level - If the level of significance is set to 5, then ?
0.05 - For 2-sided test, z(1-?/2) z0.975 1.9600
- If the level of significance is set to 1, then ?
0.01 - For 2-sided test, z(1-?/2) z0.995 2.5758
- z(1-?) is a value from the Normal distribution
relating to power - If ? is set to 10, then the power is 90, so 1-
? 0.90 - For 1-sided test, z(1-?) z0.90 1.2816
- If ? is set to 20, then the power is 80, so 1-
? 0.80 - For 1-sided test, z(1-?) z0.80 0.8416
31Table of z-scores
z-score
32The quick formula
- We can pre-calculate z(1-?/2) z(1-?)2, and
call this k, using the relevant z-scores provided
in the table from the previous slide for
different combination of level of significance ?
and power 1-?, the formula then becomes - n (per group) 2k/?2
- Remember to multiply the calculated sample size
(n) by 2 to allow for 2 groups! - Always round up your final sample size
where ? is effect size / variability ? d/? for
continuous outcome ? for binary outcome
33Even simpler!
- For 5 significance level and power of 80
- n 2 ? (2 ? 7.85)/?2
- ? 32/?2 (Total for 2 groups)
- For 1 significance level and power of 90
- n 2 ? (2 ? 11.68)/?2
- ? 60/?2 (Total for 2 groups)
-
- A sample size of n within two groups will have
80 (and 90 respectively) power to detect the
standardised effect size ?, and that the test
will be performed at the 5 (and 1 respectively)
significance level (two-sided). Note that
??/?, hence the required sample size increases
as ? increases, or as ? decreases.
34The 4 factors sample size
- Referring to the quick formula, we can predict
the effect on the sample size if we
increase/decrease the value of each of the 4
factors - If the level of significance (?) decrease, e.g.
from 5 to 1 - sample size increase
- If Type II error rate (?) decrease, power (1- ?)
increase, e.g. from 80 to 90 - sample size increase
- If the effect size (d) decrease, e.g. detecting a
smaller difference between the 2 groups - sample size increase
- If the variability (?) decrease, e.g. assuming
the outcome measure has a smaller spread or less
vary - sample size decrease
35Confident intervals sample size
- Recall when calculating confidence intervals,
standard error (SE) is used - SE , which involves n
- 95 CI point estimate
- If you want more confident on your point estimate
- i.e. narrower confidence intervals (e.g. 95 to
99) - ? increase sample size
36with continuous outcome
- Example Differences between means
- In a trial to compare the effects of two oral
contraceptives on blood pressure (over one year),
it is anticipated that one drug will increase
diastolic blood pressure by 3mmHg, and the other
will not change it. The standard deviation (of
the changes in blood pressure) in both groups is
expected to be 10mmHg. How many patients are
required for this difference to be significant at
the 5 level (with 80 power)? - women per group
- and a total of 350 women need to be recruited.
37with binary outcome
- Example Difference between proportions
- In a randomised clinical trial, the placebo
response is anticipated to be 25, and the active
treatment response 65. How many patients are
needed if a two-sided test at the 1 level is
planned, and a power of 90 is required? - so n47 per group and a total of 94 patients are
needed for this study.
38Fixing sample size
- The maximum sample size is often fixed by
practical constraints. - The research question could then become
- What power will I have to detect a (specified)
clinically important difference? - What is the smallest difference I will be able
to detect?
39Other methods
- Other sample sizing methods are available for
- equivalence studies, where the aim is to show
that two groups do not differ by more than a
specified amount, ?. - matched case-control studies with a binary
outcome (exposed or unexposed), which requires
specification of the anticipated odds ratio and
the proportion of pairs with differing outcomes. - crossover studies with a binary outcome (success
or failure) (essentially the same as above).
40Useful resources
- Machin D, Campbell M, Fayers P, Pinol A. Sample
size tables for clinical studies. 2nd Ed. 1997.
Blackwell Science. - Sampsize included with book.
- St. Georges, Uni. of London (Statistics Guide
for Research Grant Applicants) - http//www.sgul.ac.uk/depts/chs/chs_research/stat_
guide/size.cfm - PS Power and Sample Size
- http//biostat.mc.vanderbilt.edu/twiki/bin/view/Ma
in/PowerSampleSize - Java applets for power and sample size
- http//www.stat.uiowa.edu/rlenth/Power/
- web browser on your PC is required to be able to
run Java applets (version 1.1 or higher) which
can be downloaded from java.sun.com
41Useful commercial resources
- nQuery Advisor (Statistical Solutions)
- http//www.statsol.ie/nquery/nquery.htm
- PASS (NCSS Inc. 7 days free trial
- http//www.ncss.com/pass.html
- SamplePower (SPSS Inc. 10 days free trial)
- http//www.spss.com/spower/
- Stata help sampsi
- http//www.stata.com/
- http//www.stata.com/help.cgi?sampsi
42Free software PS
- http//biostat.mc.vanderbilt.edu/twiki/bin/view/Ma
in/PowerSampleSize
43Two groups with continuous outcomes using PS
44Two groups with continuous outcomes using PS
- Example Differences between means
- In a trial to compare the effects of two oral
contraceptives on blood pressure (over one year),
it is anticipated that one drug will increase
diastolic blood pressure by 3mmHg, and the other
will not change it. The standard deviation (of
the changes in blood pressure) in both groups is
expected to be 10mmHg. How many patients are
required for this difference to be significant at
the 5 level (with 80 power)? - Recall from our calculation with the quick
formula, we would need 175 women per group and
with a total of 350 women to be recruited.
45Two groups with continuous outcomes using PS
46Two groups with binary outcomes using PS
47Two groups with binary outcomes using PS
- Example Difference between proportions
- In a randomised clinical trial, the placebo
response is anticipated to be 25, and the active
treatment response 65. How many patients are
needed if a two-sided test at the 1 level is
planned, and a power of 90 is required? - Recall from our calculation with the quick
formula, we would need 47 patients per group and
with a total of 94 patients to be recruited.
48Two groups with binary outcomes using PS
49Summary
- You should now be able to
- understand why determination of sample size is
important - appreciate some statistical concepts
- be aware of considerations needed to perform
sample size calculations - be able to perform a sample size calculation to a
precision with continuous or binary outcome - be able to perform a sample size calculation
based on two independent groups with continuous
or binary outcome - be appreciate with the useful resources relating
to this topic - be able to use PS to assist calculating sample
size
50References
- Kirkwood B.R. Sterne J. A.C. Essential Medical
Statistics, 2nd Edition. Oxford Blackwell 2004
(Chapter 4 8, 35) - Bland M. An Introduction to Medical Statistics,
3rd Edition. Oxford Oxford University Press
2000. (Chapters 8, 9 18) - Altman D.G. Practical Statistics for Medical
Research. London Chapman Hall 1999.
(Chapters 8 15) - Machin D., Campbell M., Fayers P. Pinol A.
Sample Size Tables for Clinical Studies, 2nd
Editio. Oxford Blackwell Science 1997.
51Exercises
Brian Yuen Public Health Sciences Medical
Statistics
52Exercise 1
- Suppose we are trying to estimate the prevalence
of a disease in a country, which we suspect the
prevalence to be 5, and would like to estimate
it to within 1 of the true value (with 95
confidence). How many patients are required?
53Exercise 2
- A psychologist wishes to test the IQ of a certain
population. His null hypothesis is that the mean
IQ is 100, and he wishes to be able to detect a
fairly small difference, with standard deviation
of 0.2 and 95 confidence interval width of
0.028, so that if he gets a non-significant
result from his analysis, he can be sure the mean
IQ from this population lies very close to 100.
How many subjects should he recruit?
54Exercise 3
- An investigator compares change in blood pressure
due to placebo with that due to a drug. If
investigator is looking for difference between
groups of 5 mmHg, and between-subject SD (?) is
10 mmHg. Assuming a two-sided test at the 5
level (?0.05), and a power of 90 (1-?0.9), how
many patients should be recruited?
55Exercise 4
- IgA nephropathy (IgAN) is a world-wide disease
and the cause of end-stage renal failure (ESRF)
in 15 to 20 of patients within 10 years. No
specific treatment has yet been established but
many approaches have been investigated. - A new two-arm parallel group RCT is going to
randomise patients into either the
immunosuppressive agent (steroid) group or the
placebo group. Assume the placebo group will have
15 of patents end up having ESRF within 10
years, how many patients are required so that we
can anticipate a reduction of ESRF by 5 (15 to
10) in the steroid group with 5 significance
level and 90 power? - The Cochrane Database of Systematic Reviews
2005, Issue 2. Samuels JA, Strippoli GFM, Craig
JC, Schena FP, Molony DA. Immunosuppressive
agents for treating IgA nephropathy. - The Cochrane Database of Systematic Reviews 2003,
Issue 4. Art. No. CD003965. DOI
10.1002/14651858.CD003965.
56Exercise 5
- Use PS to calculate the sample size for the
following exercise. - In a randomised clinical trial, we are interested
in detecting a difference of a least 15 between
two proportions. We are expecting the two success
proportions to be equal to 60 and 75. How many
patients are needed if we are going to perform a
two-sided test at the 5 level with a power of
80?
57Exercise 6
- Use PS to calculate the sample size for the
following exercise. - We are developing a study to investigate the
relationship of constant exposure to stress
(within a month) and blood pressure level for men
aged 25-34. From a pilot study, it was
determined that the mean blood pressure of the
group which were constantly under stress were
132.86, while that of the group which were not
constantly under stress were found to be 127.44.
The common standard deviation is 16.79. How many
men are required for this larger scale study,
given the level of significance is set to 5 with
90 power?
58Homework
Brian Yuen Public Health Sciences Medical
Statistics
59Exercise 7
- A survey is being planned to estimate the
prevalence of secondary infertility amongst
couples aged 20-45. The investigators expect the
prevalence to be 10, and would like to estimate
it to within 5 of the true value (with 90
confidence). How many couples are required?
60Exercise 8
- Suppose we want to sample a stable process that
deposits a 500 Angstrom film on a semiconductor
wafer in order to determine the process mean so
that we can set up a control chart on the
process. We want to estimate the mean within 10
Angstroms of the true mean with 95 confidence.
Our initial guess regarding the variation in the
process is that one standard deviation is about
20 Angstroms. How many sample do we need to take?
61Exercise 9
- A previous small study was investigating the
total cost of 10 specific items of fruits and
vegetables which were sold in 3 supermarkets
compare to that were sold in 3 local corner
shops. - On average, the total cost of 10 specific items
of fruits and vegetables which were sold in
supermarkets was 5.50. It was suggested that
the total cost of those fruits and vegetables
which were sold in the supermarkets was 1.6 times
cheaper than those were sold in the local corner
shops, with a common standard deviation of 1.65.
- Use the suggested formula and PS to find out how
many supermarkets and local corner shops are
required (assume equal group), if we are going to
conduct a similar study to investigate the
difference in cost of this size, with 90 power
and using a 1 significance level?
62Exercise 10
- In a pilot study, it was shown that adult
Cypriots who had followed a traditional Cypriot
Mediterranean type of diet had a 27 risk of
hypertension, 7 lower than those who had
followed a Western diet. - Use the suggested formula and PS to find out the
number of subjects required (assume equal group),
if we conduct a similar but larger scale study,
to evaluate the difference in the risk of
hypertension of this magnitude with 80 power and
using a 5 significance level? - A postal questionnaire will be sent to the
target population, assuming a high response rate
of 70, how many questionnaires should be sent
out?
63Solutions
Brian Yuen Public Health Sciences Medical
Statistics
64Exercise 1
- Solution
- Problem type Precision with binary outcome
- Thus, require 1825 patients in the study.
65Exercise 2
- Solution
- Problem type Precision with continuous outcome
- Thus, he should recruit 196 subjects.
66Exercise 3
- Solution
- Problem type Two independent groups with
continuous outcome - Thus, require 85 patients per treatment, i.e. a
total of 170 patients need to be recruited.
67Exercise 4
- Solution
- Problem type Two independent groups with binary
outcome - Therefore, 920 patients per group are needed and
1840 patients in total for the new trial.
68Exercise 5
- Solution
- Problem type Two independent groups with binary
outcome - ? 0.05
- power 0.80
- p0 0.6
- p1 0.75
- m 1
- n (per group) 152
69Exercise 6
- Solution
- Problem type Two independent groups with
continuous outcome - ? 0.05
- power 0.90
- ? 5.42
- ? 16.79
- m 1
- n (per group) 203
70Exercise 7
- Solution
- Problem type Precision with binary outcome
- Thus, require 97 patients in the study.
71Exercise 8
- Solution
- Problem type Precision with continuous outcome
- Therefore, we need to take at least 16 samples
from this process.
72Exercise 9
- Solution
- Problem type Two independent groups with
continuous outcome - The average total cost of fruits and vegetables
- in supermarkets is 5.50
- in local corner shops is 5.50?1.6 8.8
- the difference in total cost 3.3
- common standard deviation 1.65
- To detect such difference with 90 power and
using a 1 significance level - n (per group) (2?14.88) / (3.3/1.65)2
- n (per group) 7.44 ? 8
- Therefore we will require 8 supermarkets and 8
local corner shops.
73Exercise 9
- Solution
- Problem type Two independent groups with
continuous outcome - ? 0.01
- power 0.90
- ? 3.3
- ? 1.65
- m 1
- n (per group) 9
74Exercise 10
- Solution
- Problem type Two independent groups with binary
outcome - Proportion who follow
- a traditional Cypriot Mediterranean diet is 27,
i.e. pA 27 - a Western diet is 34, i.e. pB 277 34
- the difference in proportion 7
- (2734)/2 30.5
- estimation of standard deviation
?30.5(100-30.5) 46.04 - To detect such difference with 80 power and
using a 5 significance level - n (per group) (2?7.85) / (7/46.04)2
- n (per group) 679.16 ? 680
- Therefore, in total, we will require 1360
subjects for both groups. - With 70 response rate, we should send out
1360/0.7 1943 questionnaires.
75Exercise 10
- Solution
- Problem type Two independent groups with binary
outcome - ? 0.05
- power 0.80
- p0 0.27
- p1 0.34
- m 1
- n (per group) 678