Title: Study Design and Hypothesis Testing in Clinical Research
1 Study Design and Hypothesis Testing in Clinical
Research
- Jonathan J. Shuster, Ph.D (jshuster_at_biostat.ufl.ed
u) - Research Professor of Biostatistics
- Univ. of Florida, College of Medicine
2Take-home Messages
- Rely on Evidence-Based Medicine. Conventional
wisdom can easily lead us astray. - The objective of Statistics is to make informed
inferences about a population, based on a sample.
It is imperative to quantify the uncertainty. - The P-value is a quantity that allows us to infer
something about whether a scientific hypothesis
is false. - Non-significant results are inconclusive
- Randomization and intent-to-treat are vital
components in sound clinical research
3 4Topics
- 1. Motivating Evidence-Based Clinical Studies
- 2. Objective of Statistics
- 3. Hypothesis testing and P-values
- 4. Real Examples and their lessons
5 61. Motivating Evidence-Based Medicine
- A coin is loaded, with a 70 chance of landing
heads. One player picks a three outcome sequence
(e.g. HTH), then the other picks a different
sequence. Whoevers sequence comes up first is
the winner. - Do you want to choose first, and if so, what
sequence to you select?
7Evidence-Based Medicine
- So you decided to go first and pick HHH, right?
- OK, I pick THH.
- HHH can only occur before THH if it is on the
first three flips. (If the first time HHH occurs
is flips 6,7,8 then flip 5 is T, so flips 5,6,7
are THH, I win. (I make your first 2, my last 2,
so I tend to stay ahead.) - Your chance of winning.73 .343 (34.3)
8Evidence-Based Medicine
- Lesson from this example.
- Things are not always what they seem. You need
to be a healthy skeptic. - Reference Shuster, J. A two-player coin game
paradox in the classroom. American Statistician,
2006(Feb), vol 60, pp 68-70.
9 102. Objective of Statistics
- To make an inference about a defined target
population from a representative sample. - That is, for us, to start from a medical
hypothesis about a medical condition, help design
a study that can collect data to test the
question, and draw conclusions. Quantifying the
uncertainty about the inference is a key part. -
112. Comment on This
- Should we compare treatment groups statistically
in a randomized study with respect to baseline
parameter (e.g. age, gender, ethnicity, blood
pressure)? -
122. Provenzano Clin J Am Soc Nephrol 4, 386-93,
2009
- Baseline characteristics were similar except for
more men in the oral iron group compared with the
ferumoxytol group (62.9 versus 50.0, P 0.04).
Mean baseline laboratory measures were similar
between the two treatment groups.
132. Comment on This
- For hypothesis driven research, should we test
for normality before using a t-test, and if we
reject try to transform the data? -
14Nissen Article
- JAMA. 2008299(13)1561-1573. Comparison of
Pioglitazone vs Glimepiride on Progression of
Coronary Atherosclerosis in Patients With Type 2
Diabetes - For continuous variables with a normal
distribution, the mean and 95 confidence
intervals (CIs) are reported. For variables not
normally distributed, median and interquartile
ranges are reported and 95 CIs around median
changes were computed using bootstrap
resampling. (N273 vs 270 in groups)
152. Testing Assumptions
16 173. Testing a Hypothesis (P-Value)
- Put a statement on Trial Null Hypothesis
- ISIS 2 (International Sudden Infarct Study 2)
The five week mortality rates for Streptokinase
and Placebo are equivalent in patients with
recent MIs - Results Strep(791/85929.2) vs.
Plac(1029/859512.0)
183. P-Value
- P3.8 10-9
- If you replicated the experiment in a population
where the null hypothesis was true, there is a
3.8 in a billion chance of seeing a difference at
least as extreme in either direction (2-sided)
193. ISIS 2 Reference
- ISIS 2 Collaborative Group. (1988) Randomised
trial of intravenous streptokinase, oral
aspirin, both, or neither among 17,187 cases of
acute myocardial infarction ISIS 2, Lancet 2
349-360.
203. P-Value and Proof by Contradiction
- What is the probability that if you replicated
your experiment in a target population where your
null hypothesis is true that you would see
differences at least as extreme as what you
actually observed. If this value (the p-value)
is small it is evidence against this null
hypothesis. - Analogy is beyond a reasonable doubt. Science
uses 5 arbitrarily as reasonable doubt in most
cases.
213. Was this overkill in terms of sample size
- Suppose the results were 79/859 vs. 103/860 (same
percentages of 9.2 vs. 12.0 but with one tenth
the sample size). - Now P0.071 (7.1), and would not be
statistically significant. Would we be using
this clot buster today? It was the
biostatistician, Sir Richard Peto who determined
this sample size.
223. ISIS 2
- Any other questions about the study?
233. ISIS 2 Issues
- Who was watching the store. Accrual took 3.5
years and outcome was known for each patient
within five weeks. - Always report a sample size justification in your
papers (Provenzano, slide 12, did not).
244. Real Example
25The Coronary Drug Project Research Group (1980)
- Influence of adherence to treatment and response
of cholesterol on mortality in the Coronary Drug
Project. NEJM 303 1038-1041. - Double blind randomized study of Clofibrate vs.
Placebo in men who had prior MI. -
26Compliers vs. Not on Drug
27Compliers vs. Not
28Drug vs. Placebo
29Coronary Drug Project Take home Message
- What can this study teach us about Clinical
Studies?
30Intent-to-Treat
- The gold standard for analyzing randomized
clinical trials is Intent-to-treat. Patients are
analyzed in the groups they were assigned to,
irrespective of what they actually received.
31 32 4. Real UF Example
- Effectiveness of Nesiritide on Dialysis or
All-Cause Mortality in Patients Undergoing
Cardiothoracic Surgery. Clinical Cardiology.
2006 Jan29(1)18-24. with T. Beaver et. al. - Motivation Shands impression was that it was
harmful and costly.
334. Nesiritide Example
- Study Null Hypothesis 20 day death/dialysis rate
in patients getting nesiritide within two days of
surgery have the same death rate as similar
patients not getting it. - Design Suggestions?
344. Possible Designs (/-)
- Observational Historical Control (Compare period
before drug) to period after drug started to be
given to a sizable fraction (gap during ramping
up of use). Must include all comers and use
electronic chart review. - Observational Compare those getting to those not
getting the drug. - Randomized controlled prospective trial
354. Sources of Variation
- Within treatments, why might we not get the same
result for every patient? - Historical Control?
- Comparing concurrent nesiritide vs. not?
- Randomized prospective trial?
364. Sources of Bias (Confounders)
- Why might we see differences that might be
totally unrelated to the treatment (nesiritide
vs. not)? - Historical Control?
- Comparing concurrent nesiritide vs. not?
- Randomized prospective trial?
374. Nesiritide Propensity Scoring
- Actual Design Compared Nesiritide vs. Not by
Propensity Score Matching. - Using 12 key covariates, we estimated the
probability that a patient would get Nesiritide
given these covariates. Then we matched the
nesiritide patients to non-nesiritide patients
for the propensity, and did a matched analysis.
384. Conclusions
- Nesiritide showed no significant difference
(inconclusive) within CABG patients, - Nesiritide showed promise in aneurysm subjects
with baseline elevated SCR, but was inconclusive
in other such patients. - Run a future randomized double-blind trial in
aneurisms with elevated SCR (Just completed and
close to being in press with an inconclusive
result.)
394. Conclusion (continued)
- Note that the Shands study data were very
important in designing the randomized follow-up
study, in terms of the number of subjects needed
(power analysis).
40Take-home Messages
- Rely on Evidence-Based Medicine. Conventional
wisdom can easily lead us astray. - The objective of Statistics is to make informed
inferences about a population, based on a sample.
It is imperative to quantify the uncertainty. - The P-value is a quantity that allows us to infer
something about whether a scientific hypothesis
is false. - Non-significant results are inconclusive
- Randomization and intent-to-treat are vital
components in sound clinical research
41Design One Together
- Medical Question Does Caffeine Withdrawal cause
Headaches?
42Eligibility
43Design
- What are the sources of variation besides
caffeine consumption? - How do we control caffeine consumption
- Should we use deceptionhide purpose of study?
Is this ethical?
44Design
- Pre-Post?
- Double Blind Parallel Study?
- Double Blind Crossover Study?
45Forensics for Irregularity
46Phenylephrine Crossover Studies
47Phenylephrine (Baseline NAR)
Study (10 mg vs Placebo) Std Dev CV100SD/Mean
1 (N16) (EB) 2.0 15.3
2 (N10) (EB) 0.9 6.7
3 (N16) 7.8 36.3
4 (N15) 9.5 35.6
5 (N16) 6.2 29.3
6 (N16) 9.8 40.4
7 (N14) 9.4 35.3
48How do we test for Data Irregularities?
- Background Baseline NAR (Nasal Airway
resistance) measures are typically xx.x (e.g.
20.2), and are always based on the mean of 10
observations (5 from each nostril). - What null hypothesis can we test to find
potential irregularities? What P-value might we
use to declare significance?
49Baseline Last Digit (3rd sign)
Study 1 Study 2
02 5
14 2
22 1
36 9
42 4
523 7
68 5
79 10
83 3
95 4
50 51Coronary Drug ProjectCoronary Drug Project Data
- Five Year Mortality (Clofibrate)
- Compliers 15.0 (15.7) (N708)
- Non-Compliers 24.6(22.5) (N357)
- Compliers took gt80 of their meds to death or to
5 years whichever was first. - In () is 5 year mortality, adjusted for
prognostic factors.
52Coronary Drug Project
- Five Year Mortality (Placebo)
- Compliers 15.1 (16.4) (N1813)
- Non-Compliers 28.2(25.8) (N882)
- Compliers took gt80 of their meds to death or to
5 years whichever was first. - In () is 5 year mortality, adjusted for
prognostic factors.
53Coronary Drug Project
- Five-year mortality (As randomized)
- Clofibrate 20.0 (N1103)
- Placebo 20.9 (N2789)
- NB Compliance could not be assessed in a small
number of patients.