Title: Sample Size Issues Involved in Sequential Analysis/Sequential Trials
1Sample Size Issues Involved in Sequential
Analysis/Sequential Trials
- Jonathan J. Shuster
- Dept of Epidemiology and Health Policy Research
- College of Medicine
- University of Florida
- August 5, 2006
- Â
2Everyone wants to Peek
3Outline of Talk
- Motivation for Group Sequential Methods in
Clinical Trials - Motivation for Group Sequential Methods in Tissue
Bank case-control studies. - A non-technical look at Brownian Motion and its
role in Sample Size determination for Group
Sequential methods
4Outline (Continued)
- Sample Reference Designs
- Real Example
- Brief look at Continuous Monitoring by
OBrien-Fleming Method - Take Home Messages
5Motivation
- International Sudden Infarct Study 2
6ISIS 2 (Clot busters)
- Lancet 8/88 P349-360.
- International Sudden Infarct Study 2
- 3 year accrual. Major goal to prevent early
deaths (5 week mortality) - Design Double Blind 22 factorial of Aspirin vs.
Placebo and Streptokinase vs. Placebo.
7ISIS 2 (Cont)
- Death Rates (_at_ 5 weeks)
- (1) A/SK 343/4292 (8.0)
- (2) P/SK448/4300 (10.4)
- (3) A/P 461/4295 (10.7)
- (4) P/P568/4300 (13.2)
8ISIS 2 (Cont)
- Â Z (Pooled variance) for Double Drug vs. Double
Placebo 7.85, P4.210-15 - Z (Mantel-Haenszel) Aspirin vs. Placebo 5.23,
- P1.710-8
- Z (Mantel-Haenszel) Strepokinase vs. Placebo
5.90, P3.6 10-10 - Â
9How would existing Group Sequential Designs have
Fared?
- OBrien-Fleming (OF) or Pocock (P) Design with
three equally spaced looks with same operating
characteristics. - OF Double drug vs. Placebo has average predicted
sample size of 8542 (slightly under 50 of the
fixed.) - P Double drug vs. Placebo has average predicted
sample size of 6545 (under 40 of the fixed.) - Savings about 18-24 months of accrual with
public informed earlier.
10Tissue Banking Case-Control Studies
- Childhood Leukemia Bone Marrow Bank (Childrens
Oncology Group). There are about 10,000 patients
with available samples for research. Samples
cannot be reused. - Is a Genetic Marker (/-) prognostic for survival
in a well defined subgroup (including defined
therapy). - Available material 1000 patients (With
sufficient follow-up).
11Planning Parameters
- Frequency of Marker about 20
- Planning occurrence Long term survivors 15 vs.
Failures 25. (Odds ratio is near 2 (1.9)). - Fixed sample size needs 248/Group (496 total)
12Using a 2-Stage Reference Design (Shuster et. al.
2002 from Table 1b)
- Stage 1 Take 64 of the single stage study (64
of 496)318 (159 cases and 159 controls). Stop
for futility if Zlt1.08 and for significance if
Zgt2.28. - Stage 2 Take 113 of the single stage study (49
more) (560 280 cases and 280 controls or 121
more of each). Declare significance if and only
if Zgt2.00.
13Properties
- The power is 80 at P.05 (two-sided)
- The expected sample size is less than 426 (86 of
the fixed), irrespective of the true proportions
positive amongst cases and controls (fixed
requires 496). (No other 2-stage beats the 426) - Under the null, the expected sample size is 353
(about 71) - Under the alternative, the expected sample size
is 409 (about 82)
14Ingredients Needed
- Single Stage Sample Size Requirement
- Number of interim looks
- Timing of Each Look (we will use equally spaced
for 3 stages, but this is not an absolute)
(Expressed relative to Single Stage) - Cutoffs for futility and significance at each look
15Group Sequential Designs
- Why bother with sequential designs?
- Why not fully continuous sequential designs?
16Why do Sequential Studies
- Concerns about assigning knowingly inferior or
more toxic treatment to trial participants - Concerns about getting knowledge to the public
sooner - Concerns about conservation of resources
(especially in tissue banking).
17Why not do Sequential Studies
- They may need to be temporarily closed to accrue
the data - There may be no safety issues involved, and no
need to beat the competition to publish - May be impractical for small studies.
- Results may come in too slowly to be of value
- Effect sizes are estimated with lower precision.
(Sequential nature must be taken into account.) - Multivariate Endpoints add complexity (But can
deal with this if needed).
18Optimization of the Group Sequential Design
- Absolute minimum sample size No matter what, I
want a design that has a maximum sample size very
close to the fixed. - Minimize the average sample size under the Null
Hypothesis - Minimize the average sample size under the
Alternative Hypothesis - Minimize the expected value of the mean of the
sample size under null and alternative - Minimax Minimize the maximum expected sample
size over all values of the effect size.
19Other Considerations
- We shall enforce Uniform Look times (except 2
stage). - We shall impose a maximum number of looks.
20Brownian Motion (E.G)
- Sn(Y1 .. Yn) Yi are iid
- E(Sn ) n?
- Var(Sn ) n?2
- Yi Ui - Vi (Diff in Means)
- Normal distribution, independent stationary
increments, mean and variance proportional to
time
21Brownian Motion
- X(?) N(??,??2)
- ?Time (?1 is the time of the non-sequential
study) - ?Effect size for the non-sequential study
- ? is the population standard deviation of the
estimate for the non-sequential study, ?1.
22Brownian Motion
- The process has independent, stationary
increments. For example - X(?1) and X(?2)-X(?1) , ?1 lt ?2 are independent
- This implies that
- CovX(?1), X(?2) ?1?2 ?1 lt ?2 .
23Typical Examples approximating BM
- Sum of independent identically distributed (iid)
distributed random variables (One sample problem
for means and proportions). Time is proportional
to sample size. - Differences between partial sums with equal
sample sizes for two populations. (Two sample
problem for means or proportions.)
24Typical Examples approximating BM
- Two sample analysis of covariance for randomized
study with a completely random covariate. - Mann-Whitney U-statistic
- Cox Regression (Logrank) test in survival
analysis, under PROPORTIONAL HAZARDS and equal
randomization - Matched Proportions (Unconditional version of
McNemars Test)
25Reference Design Appearance
- For look times ?1 lt ?2 lt ?3 lt ?k ,
- Reject if X(?j)gt ?j 1/2 ? ZR(?j) Â
- Accept if X(?j)lt ?j 1/2 ? ZA(?j) Â
- Continue if
- ?j 1/2 ? ZA(?j) ltX(?j)lt ?j 1/2 ? ZR(?j)
-
26Optimizing the Design
- Suppose we wish to test the null hypothesis
- H0 ?0 vs. Ha ? ? a with type I error
? and Type II error ?. - Can we minimize E(?), the average time
required, defined for us as - E(?).5E(? ? 0) .25E(? ? ? a ) .25E(?
? - ? a )
27Risk Function
- In other words, for symmetric situations, the
risk function is the average of the expected
sample size under the null and alternative
hypotheses. We reward closure for futility and
efficacy equally. (Other researchers have used
the expected under the null only or expected
under the alternative only.)
28Characterization of a Group Sequential Design
- ?Type I error, ?Type II error, ?Expected
sample size - Let D be the set of all designs, including random
combinations of single designs (e.g.use design 1
with 65 probability and design 2 with 35
probability).
29Admissible Designs
- A design d1 with parameters (?, ?, ?) is
admissible if there is no design d2 with
parameters (?, ?, ?) with - ?? ? ? ? ? and ? ? ? with at least one of
these inequalities strict.
30Backward Induction Method
- Although a daunting task, the connection to the
fact that the admissible designs can be
characterized as Bayesian solutions under the
previously stated risk function, allows us to
develop a search procedure to optimize the
designs. This methodology was adapted from work
the 1957 work of Kiefer and Weiss to this problem
by Myron Chang (1996)
31Example of 4 Optimal Look Design
- Type I error ?0.05
- Type II error?0.20
- Maximum Look Time 1.333 times equivalent single
stage sample size. - Equally Spaced Look times.
32And the Champ is
- Look 1 ?0.33 Acc if Zlt0.40 Rej if Zgt2.59
- Look 2 ?0.67 Acc if Zlt0.93 Rej if Zgt2.36
- Look 3 ?1.00 Acc if Zlt1.34 Rej if Zgt2.31
- Look 4?1.33 Acc if Zlt2.02 Rej if Zgt2.02
33Properties (?1 is single Stage)
- E(?H0)0.6770.666
- E(?Ha)0.7550.751
- .5E(?H0).5E(?Ha)0.716 (Optimum)
- SupE(??).8070.809
- In is the champion in a grid search over 300
plausible designs with equally spaced looks and
same operating characteristics. Speaks well to
robustness to other optimization standards.
34Numerical Example
- Based on historical control data, a group of
patients with aneurisms and unstable urinary
creatinine had a 50 chance of dying or needing
dialysis within 28 days. - Can drug treatment cut this rate in half?
35Single Stage Study
- Using my AGS program, CLASSZTEST.SAS, we conclude
that for ?0.05 (two-sided) and Type II error
?0.20 (80 power) we need 55 patients per group
(110 total). - ?1 (Single Stage Study) corresponds to N110.
- Using the optimal design, we would look after 37,
74, 110, and 147 patients.
36Fixed requires 110 Subjects (Reference Design)
- E(NH0)74.5
- E(NHa)82.6
- .5E(NH0).5E(NHa)78.8
- SupE(N?)88.8
- Maximum Possible N147
37OBrien-Fleming Results
- E(NH0)114.4 (vs 74.5)
- E(NHa)93.2 (vs. 82.6)
- .5E(NH0).5E(NHa)101.4 (vs. 78.8)
- Maximum Sample Size 115 (vs. 147)
- For looks 1,2,3,4 Stops with rejection if
Zgt2.03/sqrt(look/4)
38Pocock Method
- E(NH0)128.0 (vs 74.5)
- E(NHa)88.2 (vs. 82.6)
- .5E(NH0).5E(NHa)108.1 (vs. 78.8)
- Maximum Sample Size 131 (vs. 147)
- For looks 1,2,3,4 Stops with rejection if
Zgt2.36 (Same cutoff for all looks)
39Convincing the Skeptic
- Step 1 Show them that the OBrien-Fleming Design
is so highly correlated with the Non-Sequential
Design with the same operating characteristics
that it behooves them to use a group sequential
design, if feasible. - Step 2 Convince them to consider more efficient
designs.
40Amazing Results
- 4 Stage Design with OBrien-Fleming vs. Single
Stage (5 two-sided type I error and 80 power) - Null Sample paths that are significant for both
4.1, Sample paths non-significant by both 94.1.
Discordance 0.9 in each direction. - Alternate Sample paths that are significant for
both 77.9. Sample paths not significant for
both 17.9. Discordance 2.1 in each direction. - Max sample size for OBFlt105 of Single Stage
41Continuous Monitoring via OBrien-Fleming
- ________________X____________________________
Z2.24 - ______________________________________________Z0
- ______________________________________________Z-2
.24 - ?0
?1 - X represents first time Brownian Motion (Mean 0)
hits /- 2.24.
42Reflecton (Null Hypothesis)
- If a path ends up above 2.24, it had to have
crossed 2.24 at some time. Place a mirror for
any path hitting 2.24, and it is equally likely
(under the null hypothesis of zero drift) that it
ends up above vs. below 2.24. - P(hits Z2.24)2P(ends above Z2.24).025.
- P(hits Z-2.24)2P(ends below Z-2.24).025.
- P(Hits both) is virtually zero.
43Power Function (Alternate Hypothesis for Z2.24)
- Power implies that the first time Z exceeds 2.24
is before time1. (Time1 has say 80 power for
the fixed study at P0.05 2-sided). (First
passage distribution in Brownian Motion) - Detectable effect size is 2.80 (1.960.84) for
the non-sequential study. - Detectable difference for study of same duration
continuously monitored 2.88, and same power. (Cox
and Miller reference provided at end) - Inflation of maximum time for Continuous OBF to
have same power as non-sequential, 6. (Inflate n
by 6 and look continuously with OBF).
(2.88/2.80)2
44Numerical Simulation
- Sign Test We wish to accrue enough subjects to
test P0.50 vs. a two-sided alternative P?0.50 to
have 80 power when P-.50gt0.10 at P0.05
two-sided. - Non-sequential sample size
- N(Z?/2 Z? )?/?2 (1.96.84)(.5)/.12
- N196 (Non-Sequential requirement)
45Continuous OBrien-Fleming
- Inflate by 6 (per first passage distribution)
- 106 of 196208.
- Type I error at P0.50 4.4 (100,000 sim)
- Power at P0.60 79.2 (100,000 sim)
- ASN206.1 (Null) and 154.0 (Alt)
46Adding Futility is Almost Free
- (Computer trial and error)
- Start with small of Simulations to zero in on
parameters. - Conditional Power idea Calculate the binomial
probability of rejection under the alternate
hypothesis at the last observation - Succ at finalgt.5Nfinal.5Zsqrt(Nfinal).
- If the probability of rejection is under the
alternative is lt10 stop for futility. Manipulate
Z and Nfinal in simulations.
47Continuous Monitoring with Futility
- N(Max)210 (Up from 208 OBF, 196 non-seq)
- Stop for efficacy Zgt2.20sqrt(210/N)
- NNumber sampled to date (Critical value is
lowered from 2.24 to 2.20, due to provision for
futility). - Empirical Results (based on 100,000 simulations)
Type I error 5.1, Power 80.2 - ASN Null 144.6(was 206.1) vs. Alternate
142.3(was 154.0). - ASN (Optimal 4 Stage) Null 132.7 vs. Alt 148.0
48Continuous Looking
- If no sequential monitoring was planned, but
client continuously looked, there is a 41 chance
of finding at Plt0.05, two-sided at some point in
a study of 196 for the sign test, when indeed the
success rate is 50 (Null). - When you are asked to do an analysis of a study
you did not design, ask if this was the planned
sample size. Is this a random high?
49Take Home Message
- As statisticians we understand uncertainty. Are
you willing to gamble about when a study may be
completed? (This affects your choices of fixed
or what type of Group Sequential Design to
consider.) - Is it important that the study be stopped early
for a positive result (efficacy)? - It is important that a study be stopped early for
a negative result (futility)?
50Take Home Message
- Some knowledge of Sequential Methods is useful
when dealing with your response to analyzing data
from studies where the design is unclear to you.
(Have your colleagues screened for variables
based on potential significance? Have they
picked a point that is premature so they can
present an abstract at a meeting? Have they
based the question on a possible random high?)
51Reference Designs
- Sample Size calculations for a single stage
designs gives you the sample size at ?1. The
reference designs give you the rest. - Reference designs of the preprint (4 stages), and
two Shuster et. al. references in the preprint (3
stages and 2 stages) give greater efficiency than
Off the shelf designs such as OBrien-Fleming
or Pocock.
52Thank You!
53Reference Designs
- Shuster, J. J., Chang, M. N., and Tian L. (2004).
Design of Group Sequential Clinical ?????Trials
with Ordinal Categorical Data based on the
Mann-Whitney-Wilcoxon Tests, ?????Sequential
Analysis 23 414-426. (3 stage) - Shuster, J. J., Link, M., Camitta, B., Pullen J.,
and Behm, F. (2002). Minimax Two-?????Stage
Designs with Applications to Tissue Banking
Case-Control Studies, Statistics in
????Medicine 21 2479-2493. (2 Stage)
54Reference Design (4 Stage)
- Shuster, J. J. and Chang, M. N. (2007) Second
Guessing Clinical Trial Designs. (In Press, in
Sequential Analysis) - Cox D.R. and Miller H.D. (1965) The Theory of
Stochastic Processes. Methuen Publications,
London (Equation 72, P221 gives the cumulative
distribution for first passage time in Brownian
Motion).