Title: Clinical Investigation and Outcomes Research Statistical Issues in Designing Clinical Research
1Clinical Investigation and Outcomes Research
Statistical Issues in Designing Clinical Research
- Marcia A. Testa, MPH, PhD
- Department of Biostatistics
- Harvard School of Public Health
2Objective of Presentation
- Introduce statistical issues that are critical
for designing a clinical research study and
developing a research protocol, with a special
focus on - Power and sample size
- Readings Textbook, Designing Clinical Research,
Chapter 6, Estimating Sample Size and Power
Applications and Examples and Chapter 19, Writing
and Funding a Research Proposal.
3Research Proposal
- Carefully planning the analytical and statistical
methods is critical to any clinical research
study. - An outline of the main elements of a research
proposal are listed in Table 19.1 of your
textbook. - Two very important components of the Research
Methods section are Measurements and
Statistical Issues.
4Measurement and Statistical Components of the
Research Proposal
- Measurements you first must define
- Main predictor/independent variables
(intervention, if an experiment) - Potential confounding variables
- Outcome/dependent variables
- Statistical Issues you should outline
- Approach to statistical analyses
- Hypothesis, sample size and power
5Power and Sample Size
- Depends upon
- measurements and study hypotheses
- statistical test used on primary outcome
- study design
- variability and precision of the dependent
measure - alpha (type 1 error)
- effect size
- number of hypotheses that you want to test
6Types of Errors
Confidence
7What is power analysis?
- Statistical power
- the probability of correctly identifying a trend
or effect - (Being correct that there is a trend or effect)
- Statistical confidence
- the probability of not identifying a false trend
or effect (false alarm) - (Being correct that there is no trend)
8Why is power analysis useful in research planning?
- Clinical research is primarily concerned with
detecting improvements or worsening due to
interventions or risk factors. - Power analysis answers the question
How likely is my statistical test to detect
important clinical effects given my research
design?
9Elements of power analysis
- Variability (stochastic noise in the data)
- Sample Size (accumulated information)
- time horizon (e.g.,survival analysis)
- sampling frequency
- replication
- Confidence level/statistical test
Beyond our control
Within our control
10Dealing with Variability
- Variability is often a barrier to detection
- Minimizing variability is often the goal
- Choose variables with a high signal to noise
ratio - Caution these variables may be less sensitive
to change - Sample within a more homogeneous population
- Caution greater homogeneity often means we are
limiting the inferences we can make. At the
extreme we would have highly reliable results
that are for the most part clinically irrelevant
11The Balancing of Cost and Power
High Cost
Low Cost
12Limitations of power analysis
- Power analysis is only as good as the information
you provide - How appropriate is the statistical test?
- How accurate are estimates of variability?
- Power analysis cant tell you
- How much power is enough?
- Whats a meaningful change?
13How much power is enough?
- There is no universal standard
- What is more important?
- Not missing a trend?
- Power gt Confidence
- Reporting a false trend?
- Confidence gt Power
- Usual range for confidence and power 80-95
14Whats a meaningful change?
Example You want to be able to detect the
withdrawal (decline in participation) from a diet
and exercise program under usual care.
effect size
15Whats a meaningful change?
Power 80 for decline -13
effect size
16Whats a meaningful change?
Power 60 for decline -10
effect size
17Is a 17 annual withdrawal rate clinically
meaningful?
- Example Start with 100 patients
Year No. of individuals
1 100
2 83
3 69
4 57
5 47
18What is a meaningful change?
- Most people would concur that a withdrawal of 17
per year from a diet and exercise is large enough
to be considered clinically meaningful. - However, how meaningful are smaller withdrawal
rates (13, 10, 5 1) ? - This can not be answered using a formula.
- The answer will depend on the research objectives
and clinical objectives, and the research budget.
191. Chose Statistical Hypothesis
- Set up Null Hypotheses Examples
-
- 1. Compare sample group mean to a known value ?0
- Mean of group Known population mean
- (H0 ? ? ?0 ) vs (HA ? ? ?0 )
- 2. Compare two sample group means
- Mean Group (1) Mean Group (2)
- (H0 ?1 ? ?2 ) (HA ?1 ? ?2 )
- Note because you are testing not equal in the
alternative hypothesis (?) you have selected a
two-tailed test.
202. Chose Statistical Test
- There are many statistical tests that are used in
clinical research, however, for this presentation
we will restrict ourselves to the following
213. Chose Alpha Level and Effect Size
- Alpha 0.05 probability of rejecting the null
when the null is true 5 - You will conclude that there was a difference 5
of the time when there really was no difference - You would like to detect a difference of X units
or higher (effect size) in one group as compared
to the other
224. Need SD of the Dependent Variable
- Use historical data if available
- Use the sample data from a feasibility study
(e.g. 15 subjects) - If you have no data to serve as a reference, you
have to make an educated guess. Heres a trick if
your data is mound shaped and approximately
normal. - Choose a representative low and high from your
clinical experience, take the difference and
divide by 4. - ((high) (low))/4 SD estimate
235. Calculate a Standard Effect Size
- Effect size/standard deviation standardized
effect size - Choose the ? error
- Remember Power 1 - ?, so a type 2 error of 0.20
yields a power of 0.80 - Power is the probability of failure to reject the
null hypothesis when the null hypothesis is false
? concluding no difference when there really is a
difference.
24Power and Sample Size Example
- Continuous Glucose Monitoring Diabetes Study
25CGM Study Two-group Comparison
- How many subjects do we need to be able to detect
a difference in CGM mean daily glucose between
patients on Lantus and Apidra insulin versus
Premix analogue insulin? - Before you can answer this question, you must
gather some more information.
26Break down the problem
- CGM glucose at Week 12 dependent variable of
interest - Want to compare two groups each group has
different patients - Simple independent t-test
- Need SD of daily glucose
- Need to specify how large an effect you want to
detect
27Data from feasibility study
Week 12 Data
28CGM Study Two-group Comparison
- Compare Lantus Apidra to Premix at 12 weeks
- Feasibility data available on 15 patients
- Independent t test will be used
- Alpha 0.05, beta 0.20, 2-tailed test
- Power 0.80
- Null Mean L A Mean Premix
- (H0 ?1 ? ?2 ) (HA ?1 ? ?2 )
29CGM Study Two-group Comparison
- SD from 15 patient feasibility study 33
30Estimating Sample Size of CGM Study
- Alpha 0.05 for 1-sided, 0.025 for 2-sided test
- Beta 0.20, hence, power 0.80
- Clinically meaningful effect 10 mg/dL
difference (based upon clinical judgement) - SD CGM glucose 33 (from feasibility study)
- Standardized effect 10/33 0.30
- Check Appendix 6A in textbook for power
- Table 6A says you need 176 subjects per treatment
group for a total of 352 subjects.
31http//www.epibiostat.ucsf.edu/biostat/sampsize.ht
ml
This is a directory of where you can find sample
size and power programs
32Useful Power Calculator Website
http//www.stat.uiowa.edu/rlenth/Power/
33Online Power/Sample Size
Power 0.9, detect ES 0.35 (11.6 mg/dL) N
175 per group
Power 0.8, detect ES 0.3 (10 mg/dL) N 175
per group
34Online Power/Sample Size
Power 0.8, detect ES 0.5 (16.5 mg/dL) Sample
size 64/group
Power 0.8, detect ES 1.57 (52 mg/dL) Sample
size N1 7, N2 8
35CGM Study Paired Comparison
- Useful for longitudinal assessments
- CGM Study You want to detect a decrease
between Week 12 and Week 24 of 10 mg/dL - You only have one group of patients, but they are
measured on two separate occasions (Week 12 and
Week 24).
3615 patient feasibility study What is the mean
glucose, parameter for the subjects at Week 12
versus Week 24? For simplicity, we are going to
use the single value summary mean glucose levels
at Wk 12 and Wk 24.
Wk 0 Wk 12 Wk 24
37Power and Sample Size for Paired t-test
Power 0.8, detect ES 0.30 Need 92 subjects or
pairs (Wk 12 and Wk 24) data. Remember with
two independent groups we needed 175 subjects per
group for a total of 350 subjects. When patients
serve as their own control, you need fewer
subjects to detect an equivalent effect size (ES)
with the same power.
38HRV Study Correlation and Multiple Regression
- Single-Group Study
- Session 1 Signal 1 ? HRV
- Session 1 Signal 2 ? BP
- Demographic variables Age, Gender
- Clinical characteristics Disease Status
- Suppose you want to look at associations between
HRV, BP, demographic and clinical characteristics
--? use bivariate correlation coefficient for 2
variables of multiple regression R2 multiple
predictors.
39Power and Sample Size for Correlations (H0 r
0)
Power 0.97, r 0.4, ES R2 0.16, Sample
size 85
Power 0.0.80, r 0.3, ES R2 0.09, Sample
size 85
Only 1 regressor or predictor
40Power and Sample Size for Correlations (H0 r 0)
Power 0.80, r 0.3, ES R2 0.09, Sample
size 177, if number of predictor variables 10
Power 0.80, r 0.3, ES R2 0.09, Sample
size 139, if number of ipredictor variables 5
41Power and Sample Size for Test of Two Proportions
You want to detect a difference between two
proportions. Example How many patients do you
need in each group to detect a difference in the
numbers of patients who adhere to diet and
exercise at the end of 5 years. Old
Program 0.5 Adhere New Program 0.7 Adhere
Alpha 0.05, Power 0.8. You will need 103
individuals in each group.
42Final Points
- Design your study such that you will have a
sufficient number of subjects to be able to
detect the effects that are clinically meaningful
(high power). - If you have a limited budget, and you can not
afford to increase your sample size to the
necessary levels, and lowering the variability is
not feasible, you should consider alternative
designs and hypotheses rather than proceeding
with a study design with low power.