Title: Multiple Endpoint Testing in Clinical Trials
1Multiple Endpoint Testing in Clinical Trials
Some Issues Considerations
- Mohammad Huque, Ph.D.
- Division of Biometrics III/Office of
Biostatistics/OPaSS/CDER/FDA - 2005 Industry/FDA Workshop, Washington. DC
2Disclaimer
- Views expressed here is that of the presenter and
not necessarily of the FDA
3Sources of Multiplicity in Clinical Trials
- Multiple endpoints ?
- Multiple comparisons
- Interim analysis
- Subgroup analysis
- Selection of covariates in an analysis model
- Others
4OUTLINE
- Type I error concept and type I error control
when testing for multiple endpoints.
Complexities? - Multiple endpoints are often triaged into
primary, secondary and other types of endpoints.
Reasons for doing so and how these endpoints are
tested? - Sequential testing of endpoints - no alpha
adjustment is needed. Issues and fixes? - Some trials require that 2 or more endpoints must
show effects for clinical evidence. Reasons for
doing so and consequences? - Composite endpoints. Underlying concepts and
complexities?
5Trial has a single endpoint to test type I and
type II errors
Concludes Treatment Not beneficial Concludes Treatment beneficial
Truly Not beneficial H0 Correct Decision Type I error
Truly beneficial Ha Type II error Correct Decision
- Conduct a test for claiming that a new treatment
is beneficial - a Probability of the Type I error
- ß Probability of the Type II error (power 1-
ß )
6Trial has multiple endpoints to test
- Consider a two arm superiority trial, a test
treatment versus a control - Endpoints y1, y2, , yK
- Multiple Null Hypotheses F H01, H02, , H0K
- H0j dj 0, Haj dj ? 0, j 1, , K
7Trial has multiple endpoints to test
- Two scenarios
- (A) In the family F all are true null hypotheses
- (B) Some may be true null hypotheses, and some
may be false null hypotheses, but their true
state are unknown.
8Testing under scenario (A)
- Scenario (A) and the trial has 3 endpoints y1,
y2, and y3 - A test procedure can give type I error in
multiple ways (-, -, ), (-, , -), (, -, -),
(-, , ), (, -, ), (, , -), (, , ). These
are chance events because of multiplicity of
tests when in fact there is no treatment benefit
for any of the endpoint. - a0 Pr of at least one of these chance events
test procedure, H0, H0 nH0j
9Testing under scenario (A)
- a0 is called global alpha (or overall alpha).
Also, called the familywise type I error rate
(FWER) under H0, where - H0 nH0j is the global null hypothesis.
- A test procedure for testing H0 is called a
global test procedure
10Global Test procedures
- Useful for non-specific global claims. Difficulty
in interpreting the result. Type I error rate can
remain inflated for specific claims. - Examples Simes test, OBriens OLS/GLS tests,
Hotellings T2 test (Sankoh et al, DIA Jr.,1999)
11Testing under scenario (B)
- Some of the null hypotheses F H01, H02, ,
H0K may be true null hypotheses and some be
false, but its not known which ones are which. - Question Is there a treatment effect
specifically for the endpoint y1? - For answering this question, the null hypothesis
is not a single null hypothesis like a global
null hypothesis, rather it is a class of null
hypothesis configurations in which there is no
treatment effect for y1, and all possible
scenarios for treatment effects for the remaining
endpoints y2, , yK
12Testing under scenario (B)
- Consider 3 endpoints y1, y2, and y3.
- Question Is there a treatment effect
specifically for the endpoint y1? - Null hypothesis configurations F1 for testing for
treatment effect specifically for the endpoint
y1 - F1 (d1 0, d2 0, d3 0),
- (d1 0, d2 0, d3 ? 0),
- (d1 0, d2 ? 0, d3 0),
- (d1 0, d2 ? 0, d3 ? 0).
13Control of FWER(two types)
- Weak control
- Control FWER only under the global null
configuration - Strong control
- Control FWER under all null configurations
- Specificity property -- useful for making
specific claims. - Examples of methods Bonferroni, Holm, Hochberg,
closed statistical tests, and other methods - with some caveats
14Triaging of multiple endpoints into meaningful
families by trial objectives
1) Prospectively defined 2) FWE controlled
Primary endpoints
Secondary endpoints
Exploratory endpoints
(usually not prospectively defined)
- Primary endpoints are primary focus of the
trial. Their results determine - main benefits of he clinical trials
intervention.
- Secondary endpoints by themselves generally not
sufficient for characterizing - treatment benefit. Generally, tested for
statistical significance for extended - indication and labeling after the primary
objectives of the trial are met.
15Statistical methods
- Prospective alpha allocation schemes (PAAS)
Moyé (2000) - Spend alpha1 for the primary endpoints and the
remaining alpha for the secondary endpoints -
FWER is controlled
16Statistical methods
- Parallel gatekeeping strategies for clinical
trials - Dmitrienko-Offen-Westfall (SM 2003)
- Chen-Luo-Capizzi (SM 2005)
- Allows testing of secondary endpoints when at
least one of the primary endpoints exhibits a
statistically significant result - These methods controls FWER for both the primary
and secondary endpoints in the strong sense.
17Sequential testing of multiple endpoints
- A fixed sequence approach allows testing of each
of the k null hypotheses at the same significance
level of a without any adjustment, as long as the
null hypotheses to be tested are hierarchically
ordered and are tested in a pre-defined
sequential order. - Hierarchical ordering of null hypotheses can be
achieved, for example, by their clinical
relevance.
18Sequential testing of multiple endpoints
- For this fixed-sequence approach, however,
- there are two caveats
- Pre-specification of the testing sequence
- No further testing once the sequence breaks
- Problem when the sequence breaks and the next
p-value is extreme (e.g., p1 0.50, p2 0.001)
19A flexible fixed-sequence approach
Test H(02) at Level a
H(01) is rejected
Test H(01) at Level a1
Test H(02) at Level ?
H(01) is rejected
e.g., a1 0.04, a 0.05, ? 0.0104, ? 0 (?
0.0214, ? 0.8 )
20Example flexible fixed-sequence method
21Some trials require that 2 or more endpoints must
show effects
- Examples
- Alzheimer trial
- (win on ADAS-Cognitive Sub-scale) and (win on
Clinicians Interview Based Impression of Change) - Many other examples (PhRMA draft paper)
- Main Reason
- Clinical expectations of the desired clinical
benefit - (concept beyond statistics)
22Adjustments in the Type I error rate - Some
wining criterion require adjustments and some
dont
? Adjustment by Sidaks method on accounting
for correlation Note Which method to use
depends on on the clinical decision rule set in
advance
23Power ComparisonCase of K2 endpoints
24Loss in Power when win in all endpointsK of
endpoints
25Sample Size Increase (1) When Win in All K
Endpoints Compared to Single Endpoint Case
- Alpha 0.025 (1-sided), Power 0.90
- Correlation K 2 K3
K4 - 0.0 22.8
35.9 45.0 - 0.3 21.1
33.1 41.2 - 0.4 20.2
31.7 39.7 - 0.5 19.1
29.8 37.3 - 0.6 17.7
27.5 34.4 - 0.7 15.9
24.6 30.7 - 0.8 13.5
20.8 25.8 - 0.9 10.0
15.3 18.9 - (1) Calculations using mutivariate normal
distribution of the test statistics comparing
active treatment versus placebo for a 2-arm
trial, assuming same delta/sigma for all K
endpoints
26Composite Endpoints
- Two types -
- Total score or index based on a rating scale,
e.g., HAMD totals in depression trials,
ACR20/ACR70 in rheumatoid arthritis trials - Issues validity and reliability
27Composite Endpoints
- Another Type
- Composite endpoint is defined in terms of the
time to the first event, where event is one of
several possible event types - LIFE study Composite of cardiovascular
death, stroke and myocardial infraction events. -
28Composite Endpoint Issues
- Life Study
- The Composite endpoint was significantly
positive. However, analysis of the first events
by individual components and sub-composite
endpoints indicate overall composite result
mainly due to reduction in fatal and non-fatal
stroke. - Issue
- How to interpret composite endpoint results? How
to characterize benefits in terms of the
component endpoints?
29Extent of multiplicity adjustments between
endpoints
high
Practically no adjustments
Small adjustments
Good case for combining endpoints
Large adjustments
low
high
low
Homogeneity of treatment effects across endpoints
30Concluding Remarks
- For endpoint specific claims strong control of
the type I error is needed - Parallel gate-keeping strategies can be used for
the primary and secondary endpoint claims - Flexible sequential test procedure can be used to
gain power of the test - There is a scientific basis when a reasonable
clinical decision rule asks for statistically
significant efficacy results in more than 1
endpoint issue of loss of power? - When 4 or more endpoints included as primary
(e.g., arthritis trials), and homogeneity of
treatment effects acress endpoints is expected -
a composite or responder endpoint approach will
be effective.