Title: STATISTICS 542 Introduction to Clinical Trials SAMPLE SIZE ISSUES
1STATISTICS 542Introduction to Clinical Trials
SAMPLE SIZE ISSUES
- Ref Lachin, Controlled Clinical Trials 293-113,
1981.
2Sample Size Issues
- Fundamental Point
- Trial must have sufficient statistical power to
detect differences of clinical interest - High proportion of published negative trials do
not have adequate power - Freiman et al, NEJM (1978)
- 50/71 could miss a 50 benefit
3Example How many subjects?
- Compare new treatment (T) with a control (C)
- Previous data suggests Control Failure Rate (Pc)
40 - Investigator believes treatment can reduce Pc by
25 - i.e. PT .30, PC .40
- N number of subjects/group?
4- Estimates only approximate
- Uncertain assumptions
- Over optimism about treatment
- Healthy screening effect
- Need series of estimates
- Try various assumptions
- Must pick most reasonable
- Be conservative yet be reasonable
5Statistical Considerations
Null Hypothesis (H0) No difference in the
response exists between treatment and control
groups Alternative Hypothesis (Ha) A
difference of a specified amount (?) exists
between treatment and control Significance
Level (?) Type I Error The probability of
rejecting H0 given that H0 is true Power (1 -
?) (? Type II Error) The probability of
rejecting H0 given that H0 is not true
6Standard Normal Distribution
Ref Brown Hollander. Statistics A Biomedical
Introduction. John Wiley Sons, 1977.
7Standard Normal Table
Ref Brown Hollander. Statistics A Biomedical
Introduction. John Wiley Sons, 1977.
8Distribution of Sample Means (1)
Ref Brown Hollander. Statistics A Biomedical
Introduction. John Wiley Sons, 1977.
9Distribution of Sample Means (2)
Ref Brown Hollander. Statistics A Biomedical
Introduction. John Wiley Sons, 1977.
10Distribution of Sample Means (3)
Ref Brown Hollander. Statistics A Biomedical
Introduction. John Wiley Sons, 1977.
11Distribution of Sample Means (4)
Ref Brown Hollander. Statistics A Biomedical
Introduction. John Wiley Sons, 1977.
12Test Statistics
13Distribution of Test Statistics
- Many have this common form
- Testing a population parameter (eg
difference in means) - T sample estimate of a population
parameter - Then
- Z T E(T)/vV(T)
- And then Z has a Normal (0,1) distribution for
large sample size
14- If statistic z is large enough (e.g. falls into
red area of scale), we believe this result is too
large - to have come from a distribution with mean O
(i.e. Pc - Pt 0) - Thus we reject H0 Pc - Pt 0, claiming that
their exists 5 chance this result could have
come from distribution with no difference
15Normal Distribution
Ref Brown Hollander. Statistics A Biomedical
Introduction. John Wiley Sons, 1977.
16Two Groups
OR
or
17Test of Hypothesis
- Two sided vs. One sided
- e.g. H0 PT PC H0 PT lt PC
- Classic test za critical value
- If z gt z? If z gt z?
- Reject H0 Reject H0
- ? .05 , z? 1.96 ? .05, z?
1.645 - where z test statistic
- Recommend
- z? be same value both cases (e.g. 1.96)
- two-sided one-sided
- ? ? .05 or .025
- z? 1.96 1.96
18Typical Design Assumptions (1)
- 1. ? .05, .025, .01
- 2. Power .80, .90
- Should be at least .80 for design
- 3. ? smallest difference hope to detect
- e.g. ? PC - PT
- .40 - .30
- .10 25 reduction!
19Typical Design Assumptions (2)
Two Sided
Power
Significance Level
20Sample Size Exercise
- How many do I need?
- Next question, whats the question?
- Reason is that sample size depends on the outcome
being measured, and the method of analysis to be
used
21Simple Case - Binomial
- 1. H0 PC PT
- 2. Test Statistic (Normal Approx.)
- 3. Sample Size
- Assume
- NT NC N
- HA? PC - PT
22Sample Size Formula (1)Two Proportions
- Simple Case
- Za constant associated with a
- P Zgt Za a two sided!
- (e.g. a .05, Za 1.96)
- Zb constant associated with 1 - b
- P Zlt Zb 1- b
- (e.g. 1- b .90, Zb 1.282)
- Solve for Zb (? 1- b) or D
23Sample Size Formula (2)Two Proportions
- Za constant associated with a
- P Zgt Za a two sided!
- (e.g. a .05, Za 1.96)
- Zb constant associated with 1 - b
- P Zlt Zb 1- b
- (e.g. 1- b .90, Zb 1.282)
24Sample Size Formula
- Power
- Solve for Zb ? 1- b
- Difference Detected
- Solve for D
25Simple Example (1)
- H0 PC PT
- HA PC .40, PT .30
- ? .40 - .30 .10
- Assume
- a .05 Za 1.96 (Two sided)
- 1 - b .90 Zb 1.282
- p (.40 .30 )/2 .35
26Simple Example (2)
- Thus
- a.
- N 476
- 2N 952
- b.
- 2N 956
- N 478
27Approximate Total Sample Size for Comparing
Various Proportions in Two Groups with
Significance Level (a) of 0.05 and Power (1-b) of
0.80 and 0.90
28(No Transcript)
29Comparison of Means
- Some outcome variables are continuous
- Blood Pressure
- Serum Chemistry
- Pulmonary Function
- Hypothesis tested by comparison of mean values
between groups, or comparison of mean changes
30Comparison of Two Means
- H0 ?C ?T ? ?C - ?T 0
- HA ?C - ?T ?
- Test statistic for sample means N (?, ?)
- Let N NC NT for design
N(0,1) for H0
31Comparison of Means
32Example
- e.g. IQ ? 15 ?
0.3x15 4.5 - Set 2? .05
- ? 0.10 1 - ? 0.90
- HA ? 0.3? ? ?/? 0.3
- Sample Size
- N 234
- ? 2N 468
33(No Transcript)
34Comparing Time to Event Distributions
- Primary efficacy endpoint is the time to an event
- Compare the survival distributions for the two
groups - Measure of treatment effect is the ratio of the
hazard rates in the two groups ratio of the
medians - Must also consider the length of follow-up
35Assuming Exponential Survival Distributions
- Then define the effect size by
- Standard difference
36Time to Failure (1)
- Use a parametric model for sample size
- Common model - exponential
- S(t) e-?t ? hazard rate
- H0 ?I ?C
- Estimate N
- George Desu (1974)
-
- Assumes all patients followed to an event
- (no censoring)
- Assumes all patients immediately entered
37Assuming Exponential Survival Distributions
- Simple case
- The statistical test is powered by the total
number of events observed at the time of the
analysis, d.
38Converting Number of Events (D) to Required
Sample Size (2N)
- d 2N x P(event) 2N d/P(event)
- P(event) is a function of the length of total
follow-up at time of analysis and the average
hazard rate - Let AR accrual rate (patients per year)
- A period of uniform accrual (2N AR x A)
- F period of follow-up after accrual complete
- A/2 F average total follow-up at planned
analysis - average hazard rate
- Then P(event) 1 P(no event)
39Time to Failure (2)
- In many clinical trials
- 1. Not all patients are followed to an event
- (i.e. censoring)
- 2. Patients are recruited over some period of
time - (i.e. staggered entry)
- More General Model (Lachin, 1981)
- where g(?) is defined as follows
40- 1. Instant Recruitment Study Censored At Time T
- 2. Continuous Recruiting (O,T) Censored at T
- 3. Recruitment (O, T0) Study Censored at T (T
gt T0)
41- Example
- Assume ? .05 (2-sided) 1 - ? .90
- ?C .3 and ?I .2
- T 5 years follow-up
- T0 3
- 0. No Censoring, Instant Recruiting
- N 128
- 1. Censoring at T, Instant Recruiting
- N 188
- 2. Censoring at T, Continual Recruitment
- N 310
- 3. Censoring at T, Recruitment to T0
- N 233
42Sample Size Adjustment for Non-Compliance (1)
- References
- 1. Shork Remington (1967) Journal of Chronic
Disease - 2. Halperin et al (1968) Journal of Chronic
Disease - 3. Wu, Fisher DeMets (1988) Controlled
Clinical Trials - Problem
- Some patients may not adhere to treatment
protocol - Impact
- Dilute whatever true treatment effect exists
43Sample Size Adjustment for Non-Compliance (2)
- Fundamental Principle
- Analyze All Subjects Randomized
- Called Intent-to-Treat (ITT) Principle
- Noncompliance will dilute treatment effect
- A Solution
- Adjust sample size to compensate for dilution
effect (reduced power) - Definitions of Noncompliance
- Dropout Patient in treatment group stops taking
therapy - Dropin Patient in control group starts taking
experimental therapy
44- Comparing Two Proportions
- Assumes event rates will be altered by
non-compliance - Define
- PT adjusted treatment group rate
- PC adjusted control group rate
- If PT lt PC,
1.0
0
PC
PT
PC
PT
45Adjusted Sample Size
- Simple Model -
- Compute unadjusted N
- Assume no dropins
- Assume dropout proportion R
- Thus PC PC
- PT (1-R) PT R PC
- Then adjust N
-
- Example
- R 1/(1-R)2 Increase
- .1 1.23 23
- .25 1.78 78
46Sample Size Adjustment for Non-Compliance
- Dropouts dropins (R0, RI)
- Example
- R0 R1 1/(1- R0- R1)2 Increase
- .1 .1 1.56 56
- .25 .25 4.0 4 times
47Sample Size Adjustments
- More Complex Model
- Ref Wu, Fisher, DeMets (1980)
- Further Assumptions
- Length of follow-up divided into intervals
- Hazard rate may vary
- Dropout rate may vary
- Dropin rate may vary
- Lag in time for treatment to be fully effective
48Example Beta-Blocker Heart Attack Trial (BHAT)
(1)
- Used complex model
- Assumptions
- 1. ? .05 (Two sided) 1 - ? .90
- 2. 3 year follow-up
- 3. PC .18 (Control Rate)
- 4. PT .13 Treatment assumed
- 28 reduction
- 5. Dropout
- 26 (12, 8, 6)
- 6. Dropin
- 21 (7, 7, 7)
49Example Beta-Blocker Heart Attack Trial (BHAT)
(2)
- Unadjusted Adjusted
- PC .18 PC .175
- PT .13 PT .14
- 28 reduction 20 reduction
- N 1100 N 2000
-
- 2N 2200 2N 4000
50Multiple Response Variables
- Many trials measure several outcomes
- (e.g. MILIS, NOTT)
- Must force investigator to rank them for
importance - Do sample size on a few outcomes (2-3)
- If estimates agree, OK
- If not, must seek compromise
51Equivalency or Non-Inferiority Trials
- Compare new therapy with standard
- Wish to show new "as good as"
- Rationale may be cost, toxicity, profit
- Examples
- Intermittent Positive Pressure Breathing Trial
- Expensive IPPB vs. Cheaper Treatment
- Nocturnal Oxygen Therapy Trial (NOTT)
- 12 Hours Oxygen vs. 24 Hours
- Problem
- Can't show H0? 0
- A Solution
- Specify minimum difference ? min
52Sample Size Formula Two Proportions
- Simple Case
- Za constant associated with a
- Zb constant associated with 1 - b
- Solve for Zb (? 1- b) or D
53 Difference in Events Test Drug Standard Drug
54Mid Stream Adjustments
- Murphy's Law applies to sample size
- May find event rate assumptions way off from
early results, power of study very inadequate - Problem
- Quit?
- Continue for almost certain doom?
- Adjust sample size?
- Extend followup?
- Early Decision
- Best to decide early, not look at treatment
comparisons
55Adaptive Designs
- One class allows re-estimating the sample size
once the trial is underway - Chung et al
- Chen, Lan DeMets
- Methods have been criticized for allowing bias
(eg Mehta Tsiatis) - Thus, methods still not widely used
- AHEFT Trial one example
- Will be discussed later in data monitoring lecture
56Event Rate Assumptions
- Challenging to get event rate assumptions correct
- Inclusion/exclusion criteria effect
- Healthy volunteer effect
- Changing background therapy/standard of care
- Even if trials conducted back to back
57PRAISE I vs PRAISE IIPlacebo arms
58Event Driven Trials
- For time to event trials, most of the information
is in the events - Power is a function of the events
- For time to event trials, target is really number
of observed events (D), not the total sample size
(2N) - Thus, target the number of events
59Event Driven Trials
- Can adjust or adapt trial to target the number of
events if the assumed event rate was too high - Steering committee can
- Increase sample size
- Increase follow up
- A combination of both
60Examples of Event Driven Trials
- PROMISE (Based on control arm)
- PRAISE I II
- COPERNICUS
- CARS (Based on control arm)
61Response Adaptive Designs
- The size of the observed treatment effect may be
different (i.e. less than) from assumptions - Treatment actually less effective
- Compliance worse than assumed
- Background therapy changed
- Smaller observed effect may be still of clinical
interest if real
62Response Adaptive Designs
- Also, probability of rejecting H0 is also small
- Power
- Conditional Power
- Question is whether to
- quit and start over or
- make design modification and continue
63Response Adaptive Designs
- Stopping and starting over problematic
- Waste of financial resources
- Ethical issues of wasting contributions of
patients who have already participated - Probably cant afford a policy of designing all
trials for minimum treatment effect of interest
64Response Adaptive Designs
- Adjust/increase sample size if treatment effect
assumed was too large - Traditionally, this approach discouraged
- Recent methodology suggests possible approaches
65Response Adaptive Designs
- These methods are relatively new and still
controversial - Many leading biostatisticians very critical
(e.g., Fleming, Emerson, Turnbull, Tsiatis) - Issues often more than statistical control of
Type I error - Introducing other sources of bias
66Response Adaptive Designs
- Increase sample size based on observed treatment
effect - May inflate false positive rate
- By 30 to 40 (Cui et al)
- Can double (Proschan et al)
- Inflation of Type I error of that magnitude not
acceptable
67Response Adaptive Designs
- Statistical adjustments to control alpha
- Weighted z-statistic
- Adjustment to the critical value
- enforcing rules for sample size recalculation
68Weighted Z Statistic
- Reference
- Cui, Hung Wang (1999,Biometrics)
- Fisher(1998,Stat Med)
- Shen Fisher (1999, Biometrics)
- Tsiatis Mehta ( 2003, Biometrika)
69Weighted Z
- Xi N(0,1) distribution
- n current sample size
- N0 initial total sample size
- ?a hypothesized
- treatment effect
- t n/ N0
-
-
70Weighted Z
- N proposed sample size based on
-
- Reject H0 if
- Note less weight assigned to new/additional
observations
71Weighted Z
- Possible to modify design, increase sample size
based on interim analysis control Type I error - Flexibility has a price
72Tsiatis Mehta Criticism
- Argue that a properly designed group sequential
trial is more efficient than these adaptive
designs - Challenge is to properly design
- (However, that can be a bigger challenge than
often realized)
73Weighted/UnWeighted Modification
- Both
- Type I error lt ?
- No real loss of power
- Ref Chen, DeMets, Lan
74 P-Value Method
- Reference
- Proschan Hunsberger (1995, Biometrics)
- Requires a promising p-value before allowing an
increase in sample size - Requires stopping if first stage p-value not
promising - Requires a larger critical value at the second
stage to control the Type I error
75P-value Method
- One sided alpha 0.05
- P(1) .10 .15 .20 .25 .50
- Z(2) 1.77 1.82 1.85 1.875 1.95
- Regardless of n2, second stage
76Proschan Hunsberger Method
- Simple method may make Type I error substantially
less than 0.05 - Developed another method to obtain exact Type I
error as a function of Z1 and n2, using a
conditional power type calculation (details to be
discussed later)
77Proschan Hunsberger
Conditional Power and p value required in stage 2
as a function of R n2/n1 for the NHLBI Type II
study example
78Proschan Hunsberger
- Allows for sample size adjustment based on
observed treatment effect - Requires increasing final critical value
79Adaptive Design Remarks
- A need exists for adaptive designs (even FDA
statisticians agree) - Technical advances have been made through several
new methods - Adaptive designs are still not widely accepted
subject to (strong) criticism - May be useful for non pivotal trials
- Practice precedes theory, perhaps in time
80Sample Size Summary
- Ethically, the size of the study must be large
enough to achieve the stated goals with
reasonable probability (power) - Sample size estimates are only approximate due to
uncertainty in assumptions - Need to be conservative but realistic
81Demo of Sample Size Programwww.biostat.wisc.edu
- Program covers comparison of proportions, means,
time to failure - Can vary control group rates or responses, alpha
power, hypothesized differences - Program develops sample size table and a power
curve for a particular sample size
82Sample Size Program Output
83Union Terrace/Lakefront