Title: Statistics in the Design and Analysis of Clinical Trials
1Statistics in the Designand Analysis of Clinical
Trials
- Dan Gillen, PhD
- Department of Statistics
- University of California, Irvine
2Outline
- Introduction
- Interplay of science and statistics in trial
design and implementation - 2. Fundamental clinical trial design
- Defining scientific hypotheses
- Statistical issues
- 3. Case Study
- Hodgkins trial
3Competing goals of a trial
- Scientific
- Questions regarding mechanistic pathways
- Ethical
- Minimize harm (due to treatment or disease) done
to patients - Clinical
- Improve the overall health of patients
- Statistical
- Quantifying scientific questions in a precise
manner
4Minimum scientific standards
- It must address a meaningful question
- Discriminate between viable hypotheses (Science)
- Trial results must be credible to the scientific
community - Valid materials, methods (Science, Statistics)
- Valid measurement of experimental outcome
(Science, Clinical, Statistics) - Valid quantification of uncertainty in
experimental procedure (Statistics)
5Defining the scientific hypothesesDefining
treatment(s)
- Treatment must be completely defined at the time
of randomization - Dose(s)
- Administration(s)
- Frequency and duration
- Ancillary treatments and treatment reduction
protocol
6Defining the scientific hypothesesDefining the
target patient population
- Inclusion/exclusion criteria to identify a
population for whom - A new treatment is needed
- Experimental treatment is likely to work
- Expected to work equally well in all subgroups
- All patients likely to eventually use the new
treatment are represented (safety) - Clinical experimentation with the new treatment
is not unethical
7Defining the scientific hypotheses
- Goals/discrimination of hypotheses
- One-sided hypothesis test (superiority)
- Two-sided hypothesis test (superiority/inferiority
) - Two-sided equivalence test (e.g.. bioequivalence)
- One-sided equivalence (non-inferiority) test
- How to choose?
- Base decision on what conditions will change
current practice by - Adopting a new treatment
- Discarding an existing treatment
8Defining the scientific hypothesesConditions
under which current practice will be changed
- 1. Adoption of a new treatment
- Superiority
- Better than using no treatment (efficacious)
- Better than existing treatment
- Equivalence or non-inferiority
- Equal to some existing efficacious treatment
- Not markedly worse than some existing efficacious
treatment - 2. Discarding an existing treatment
- Inferiority
- Worse than using no treatment (harmful)
- Markedly worse than another treatment
9Defining the scientific hypotheses
- Ethical issues when specifying hypotheses
- Clinical versus biological (surrogate) end points
- Typically, subjects participating in a trial are
hoping that they will benefit in some way from
the trial - Clinical endpoints are therefore of more interest
than purely biological endpoints - For late stage trials, how well does the proposed
surrogate correlate with the targeted clinical
endpoint?
10Defining the scientific hypotheses when
specifying hypotheses
- Ethical issues
- When is it ethical to establish efficacy by
comparing a treatment to no treatment? - When is it ethical to establish harm by comparing
a treatment to no treatment?
11Statistical design issuesGoals of statistical
design
- Interested in identifying beneficial treatments
in such a way as to maintain - Scientific credibility
- Ethical experiments
- Efficient experiments
- Minimize time
- Minimize cost
- Basic goal Attain a high positive predictive
value with minimal cost
12Statistical design issues
- Predictive value of statistically significant
result depends on - 1. Probability of beneficial drug
- Fixed when treatment is chosen
- 2. Specificity
- Fixed by level of significance (alpha level)
- 3. Sensitivity
- Statistical power made as high as possible by
design
13Statistical design issues
- Power is increased by
- 1. Minimizing bias
- Remove confounding and account for effect
modification - 2. Decreasing variability of measurements
- Homogeneity of population, appropriate endpoints,
appropriate sampling strategy - 3. Increasing sample size
- Hmmm....
14Statistical design issues
- Statistical tasks
- 1. Definition of the probability model
- Comparison group
- Refinement of statistical hypothesis
- Method of analysis
- 2. Definition of statistical hypotheses
- 3. Definition of statistical criteria for evidence
15Statistical design issues
- Statistical tasks (contd)
- 4. Determination of sample size
- 5. Evaluation of operating characteristics
- 6. Planning for interim monitoring
- 7. Plans for analysis and reporting results
16Statistical design issues
- Possible comparison groups
- 1. No comparison group
- Single arm clinical trial (cohort design)
- Appropriate when absolute criterion for treatment
effect exists - 2. Historical controls
- Single arm clinical trial
- Compare results to criteria defined from
historical trial or sample from historical trial
17Statistical design issues
- Possible comparison groups (contd)
- 3. Internal controls
- Subject serves as his/her own control (cross-over
design) - Different treatments at different times (washout
period?) - Different treatment for different body parts
(e.g.. eyes) - 4. Concurrent control group
- Two or more arms
- Active treatments or more than one level of same
treatment
18Statistical design issues
- Statistical hypotheses Choice of summary measure
- Wish to determine the tendency for a new
treatment to have a beneficial effect on a
clinical outcome - Consider the distribution of outcomes for
individuals receiving intervention - Usually choose a summary measure of the
distribution (e.g.. mean, median, proportion
cured, etc) - Hypotheses then refined and expressed as values
of the summary measure
19Statistical design issues
- Statistical hypotheses Choice of summary measure
- Typically have many choices for the summary
measure to compare across treatment groups - Consider the distribution of outcomes for
individuals receiving intervention - Example Treatment of high blood pressure with a
primary outcome of systolic blood pressure at end
of treatment - Possible analyses might compare
- Average, median, percent above 160 mmHg, or mean
or median time until blood pressure below 140 mm
Hg
20Statistical design issues
- Statistical hypotheses Choice of summary measure
- Choice of summary measure GREATLY affects the
scientific relevance of the trial - Summary measure should be chosen based on (in
order of importance) - Most clinically relevant summary measure
- Summary measure most likely to be affected by the
intervention - Summary measure affording the greatest
statistical precision
21Statistical design issues
- Statistical hypotheses Choice of summary measure
- In addition to choosing the summary measure
within groups, also need to choose how to
contrast measures across groups - Again many choices are available with different
implications - Ex Difference in means or proportions
- Ex Ratio of odds, medians, or risks
(probabilities)
22Monitoring Trials
- Usual fixed sample design
- Collect all data ? Perform a single hypothesis
test - For larger (longer running) trials, it may be
necessary to intermittently test accruing data
23Monitoring Trials
- Reasons for monitoring clinical trials
- Ethics
- Early stopping to reduce the number of patients
exposed to harmful treatments - Avoid delaying the availability of an effective
treatment - Administration
- Early monitoring may reveal design flaws (e.g..
compliance issues) - Economics
- Early stopping for null or inferior effects will
reduce study costs - Early stopping for beneficial effects allows
quicker marketing
24Monitoring Trials
- Group sequential monitoring
- Periodically analyze data after groups of
observations have been accrued - Assume groups independent
- Analyses must take into account the repeated
analyses of the same data - Sampling distribution of the test statistic is
altered - Frequentist properties of statistical tests are
altered - Monitoring planned must be specified a priori!
25Case Study-Hodgkins TrialBackground
- Hodgkins lymphoma represents a class of neoplasms
that start in lymphatic tissue - Approximately 7,350 new cases of Hodgkins are
diagnosed in the US each year (nearly equally
split between males and females) - 5-year survival rate among stage IV (most severe)
cases is approximately 60-70
26Case Study-Hodgkins TrialBackground
- Common treatments include the use of
chemotherapy, radiation therapy, immunotherapy,
and possible bone marrow transplantation - Treatment typically characterized by high rate of
initial response followed by relapse - Hypothesize that experimental monoclonal antibody
in addition to standard of care will increase
time to relapse among patients remission
27Case Study-Hodgkins TrialDefining the Treatment
- Administered via IV once a week for 4 weeks
- Patients randomized to receive standard of care
plus active treatment or placebo (administered
similarly) - Treatment discontinued in the event of grade 3 or
4 AEs - Primary efficacy analysis based upon
intention-to-treat (effectiveness as target of
inference...) - What about safety?
28Case Study-Hodgkins TrialDefining Target
Patient Population
- Histologically confirmed Hodgkins lymphoma Stage
1-3 - Progressive disease requiring treatment after at
least 1 prior chemotherapy - Recovered fully from any significant toxicity
associated with prior surgery, radiation
treatments, chemotherapy, biological therapy,
autologous bone marrow or stem cell transplant,
or investigational drugs - Exclusions
- Stage IV patients
- Patients with previous exposure to experimental
treatment
29Case Study-Hodgkins TrialLogistical
Considerations
- Multicenter clinical trial
- Adherence to clinical protocol difficult
- Uniform patient recruitment across centers
difficult - Data management is complicated
- Uniform measurement of outcome Data flow?
30Case Study-Hodgkins TrialDefining Comparison
Group
- Need to ensure scientific credibility for
regulatory approval - Crossover designs impossible
- Ultimate decision
- Single comparison group treated with placebo
- Not interested in studying dose response
- No similar current therapy (still ethical to use
placebo) - Randomized
- Allow for causal inference
- No blocking
31Case Study-Hodgkins TrialDefining Outcomes of
Interest
- Definition of event
- First occurrence of death or relapse
- Relapse defined as presence of measurable lesion
at 3-month scheduled visits - Goals
- Primary Increase relapse-free survival
- Long term (always best)
- Short term (many other processes may intervene)
- Secondary Decrease morbidity
32Case Study-Hodgkins TrialRefinement of Primary
Endpoint
- Option 1 Time to death (censored continuous
data) - Trial should have roughly uniform censoring
patterns - If heavy early censoring exists this might place
emphasis on clinically meaningless improvements
in short term survival - e.g. We may be detecting differences in 3 month
survival even though there is no difference in
survival at 1 year
33Case Study-Hodgkins Trial Refinement of
Primary Endpoint contd
- Option 2 Mortality rate at a fixed point in time
(binary data) - Allows for choice of a scientifically relevant
time frame - Treatment is a single administration short
half-life - Allows for choice of a clinically relevant time
frame - Avoids sensitivity to improvements lasting only
short periods of time - Ignores event rates ate other time periods (How
to choose?) - (Statistically) inefficient in particular settings
34Case Study-Hodgkins Trial Refinement of
Primary Endpoint contd
- Option 2 Quantile of event rate distribution
- Focus on representative survival times (e.g..
Difference in median survival) - Ignores other quantiles that may be of interest
(How to choose?) - (Statistically) inefficient in particular settings
35Case Study-Hodgkins Trial Refinement of
Primary Endpoint contd
- Final Choice Comparison of hazards for event
(censored continuous data) - Censoring resulting from staggered patient
accrual, study dropout, and end of study - Common statistics for comparing survival may
overemphasize emphasize short term survival if
early censoring is high - e.g., log rank statistic weights differences in
hazards by number of patients at risk
36Case Study-Hodgkins TrialDuration of Follow-up
- Wish to compare relapse-free survival over 4
years - Patients accrued over 3 years in order to
guarantee at least one year of follow-up for all
patients - Test for hazard ratio
- Interpretation under (roughly) proportional
hazards - 11 correspondence with log rank test
- No adjustment for covariates
- Statistical information dictated by number of
events
37Case Study-Hodgkins TrialDefinition of
Statistical Hypotheses
- Null hypothesis
- Hazard ratio of 1 (no difference in hazards)
- Estimated baseline survival
- Median progression-free survival approximately 9
months (needed in this case to estimate
variability) - Alternative hypothesis
- One-sided test for decreased hazard
- Unethical to prove harm in a placebo controlled
trial (always?) - 33 decrease in hazard considered clinically
meaningful - Corresponds to a difference in median survival of
4.4 months assuming exponential survival
38Case Study-Hodgkins TrialCriteria for
Statistical Evidence
- Frequentist criteria
- Type I error Probability of falsely rejecting
the null hypothesis - Standards
- Two-sided hypothesis tests 0.050
- One-sided hypothesis test 0.025
- Power Probability of correctly rejecting the
null hypothesis (1-type II error) - Popular choice
- 80 power
39Case Study-Hodgkins TrialDetermination of
Sample Size
- Sample size chosen to provide desired operating
characteristics - Type I error 0.025 when no difference in
mortality - Power 0.80 when 33 reduction in hazard
- Expected number of events determined by assuming
- Exponential survival in placebo group with median
survival of 9 months - Uniform accrual of patients over 3 years with
negligible dropout
40Case Study-Hodgkins TrialDetermination of
Sample Size contd
41Case Study-Hodgkins TrialDetermination of
Sample Size contd
42Case Study-Hodgkins TrialDetermination of
Sample Size contd
- But how many patients would we need to accrue?
- Depends on
- The total follow-up and accrual time
- The underlying survival distribution
- The accrual distribution
- Drop-out
- Potentially ugly math
43Case Study-Hodgkins TrialDetermination of
Sample Size contd
- Assuming exponential survival with a 9 month
median in the control arm, uniform accrual, and
minimal dropout, we would need - N76 patients per year for 3 years if the null
hypothesis were true (Total of 228 patients) - N81 patients per year for 3 years if the
alternative hypothesis were true (Total of 243
patients)
44Case Study-Hodgkins TrialEvaluation of
Operating Characteristics
- Critical values
- Observed value which rejects the null hypothesis
- Point estimate of treatment effect
- Will that effect be considered important?
- (Clinical and marketing relevance)
45Case Study-Hodgkins Trial Evaluation of
Operating Characteristics
- Confidence interval at the critical value
- Observed value which fails to reject the null
hypothesis - Set of hypothesized treatment effects which might
reasonably generate data like that observed - Have we excluded all scientifically alternatives
with a negative study? - If so, study is underpowered...
46Case Study-Hodgkins Trial Evaluation of
Operating Characteristics
- Operating characteristics with D196 events
- Critical value 0.756
- Corresponding p-value 0.025
- 95 confidence interval (0.57, 1)
- Interpretation Smallest magnitude of (observed)
effect which would result in a significant result
is a 24.4 decrease in the hazard on the
treatment arm with corresponding Cl ( 0.57,1).
47Take-Home Message
- Multiple competing goals in trial design
- Scientific, ethical, clinical, statistical
- Scientific and statistical tasks constantly
overlap - Definition of hypotheses, definition of control
groups, choice of summary measures, etc. - Solid trial design requires that clinicians and
statisticians communicate regularly throughout
the entire process
48Where to Get Help
- Cancer Centers Biostatistics Shared Resource
- Headed by Dr. Christine McLaren
- http//www.ucihs.uci.edu/biostatistics
- UCI Statistical Consulting Center
- Headed by Dr. Robert Newcombe
- http//stats.uci.edu