Title: Why some/many (all?) published clinical trials are false
1Why some/many (all?) published clinical trials
are false
- John P.A. Ioannidis
- Professor and Chairman, Department of Hygiene and
Epidemiology, University of Ioannina School of
Medicine, Ioannina, Greece - Professor of Medicine (adjunct), Tufts University
School of Medicine, Boston, USA
2Why research findings may not be credible?
- There is bias
- There is random error
- Usually there is plenty of both
3Discrepancies over time occur in randomized trials
Ioannidis and Lau, PNAS 2001
4Diminishing effects are common in clinical
medicine
- Across 100 meta-analyses of mental health related
interventions, when it comes to
pharmacotherapies, it was far more likely for
effect sizes to diminish rather than increase
with the appearance of newer trials - Trikalinos et al. J Clin Epidemiol 2004
5(No Transcript)
6Highly-cited contradicted findings in early
randomized trials
- Vitamin E and cardiovascular mortality (two large
prospective cohorts, but also one large trial of
2,002 subjects claimed large decreases in
mortality) - Hormone replacement therapy and coronary artery
disease (major benefits claimed by the Nurses
Health Study, but also by small trials) - A well-conducted randomized trial suggested that
the monoclonal antibody HA-1A halves mortality
from gram(-) sepsis no effect was seen in a
10-times larger RCT
7Overall credibility
- Depends on the pre-evidence odds
- Depends on the data (the study at hand)
- Depends on bias
- Depends on the field
- All of these may depend on each other
8Simple model no bias, one team of researchers
9Bias present
10Many teams of researchers
11(No Transcript)
12Illustrative PPV for clinical research
designsIoannidis. Why most published research
findings are false? PLoS Medicine 2005
13Post-study odds of a true finding are small
- When effect sizes are small
- When studies are small
- When field are hot (many furtively
competitively teams work on them) - When there is strong interest in the results
- When databases are large
- When analyses are more flexible
Ioannidis JP. PLoS Medicine 2005
14A research finding cannot reach credibility over
50 unless ultRi.e., bias must be less than
the pre-study odds
15Quality of studies
- Early empirical evaluations suggested that effect
sizes may depend on aggregate quality scores
this has been dismissed, since there are so many
quality scores, that inferences are widely
different - Other empirical evaluations suggested that
specific quality items such as lack of blinding
and lack of allocation concealment in RCTs may
inflate treatment effects (e.g. Shultz et al.
JAMA 1995) - Now it seems more likely that such quality
deficits may be associated either with inflated
or with deflated treatment effects (e.g. Balk et
al. JAMA 2002)
16Averaging quality is wrong
- A randomized trial with one major flaw may get
the wrong answer - A randomized trial with two major flaws may get
an even more wrong answer or may paradoxically
get a somehow more correct answer - Flaws do not cancel out of course, and they may
even have multiplicative detrimental effects
17The two kinds of bad quality
- Quality is bad on (evil) purpose the effect
sizes are almost always inflated - Quality is bad because of stupidity the effect
sizes may be anything usually, but not always,
they are deflated
18Potential conflicts
Patsopoulos et al. BMJ 2006
19Ioannidis PLoS Clinical Trials 2006 and Clinical
Trials 2007
20Exploratory test for significance chasing
21Spurious claims of subgroups
Rothwell P. Lancet 2005
22Month of birth and benefit from endarterectomy
23Time lag bad news take longer to appear
Ioannidis JP. JAMA 1998
24 even though they are obtained as fast..
25but publication is delayed
26Trial registration
- Upfront study registration has been adopted for
randomized clinical trials, as a means for
minimizing publication and reporting biases and
maximizing transparency - This is an extremely important step forward.
- Still many trials are not registered and also
among those that are registered there is room for
eventual selective reporting of outcomes and
analyses - Even with transparent and complete reporting
there is room for biases that act before the
level of study design
27Biases that precede the study design
- Setting the wider research agenda
- Poor scientific relevance
- Poor clinical utility
- Poor consideration of prior evidence
- Non-consideration of prior evidence
- Biased consideration of prior evidence
- Consideration of biased prior evidence
- Setting the specific research agenda
- Straw man effects
- Avoidance of head-to-head comparisons
- Head-to-head comparisons bypassing demonstration
of effectiveness - Overpowered studies
- Unilateral aims
- Benefits versus harms
- Research as bulk advertisement
- Ghost management of the literature
28Clinical trials and burden of disease in
sub-Saharan Africa
29Geometry of treatment networks
30Inflated effects with early stopping
Pocock et al. Clinical Trials 1989
31Biases after study completion
- Interpretation biases for the single study
- Bias related to metric selection
- OR vs. RR
- Absolute versus relative effects
- P-values versus effect sizes
- Selective discussion of results
- Selective invocation of external evidence
- Silencing of limitations
- Inappropriate generalization
- Interpretation biases in the wider scientific
field - Publication bias
- Time lag bias
- Selective outcome and analysis reporting bias
- Bias related to metrics of effect
- Ghost management of the literature
- Scientific citation bias
- Skewed public dissemination
- Resistance to independent replication
32Correct, but unilateral false
evidenceNeglecting harms
- Among 375,143 entries in the Cochrane Central
Register of Controlled Trials, the search terms
harm OR harms yielded 337 references - Compare 55,374 retrieved using efficacy and
23,415 retrieved using safety - Of the 337, excluding several cases articles on
self-harm or harm-reduction (an
efficacy-equivalent term) only 3 trial reports
and 2 abstracts had these words in their titles - Of the 3 trial reports, one started with the
clause more good than harm - The other two actually focused on the harms of
the placebo arm
33Harms
- An intervention is usually considered safe unless
proven otherwise - It may be more appropriate to consider an
intervention potentially harmful until proven
otherwise
34Reporting of harms in RCTs is neglected
- The space allocated to harms in the Results
section is typically the same or smaller than the
space allocated to the author names and
affiliations - Ioannidis and Lau, JAMA 2001
35(No Transcript)
36Emphasis on harms is often further limited
- When no dose comparison is involved
- When a trial appears in a high-impact factor
journal - When there is a prior indication for the
intervention - When the trial shows significant results for
efficacy
37Reporting of harms is worse for NP than for
pharmacological interventions
38Mental health trials- no harms recorded for any
NP trials
39Large-scale evidence is very sparse
40Integration after the fact is not easy
41Concluding comments
- Randomized controlled trials are a brilliant,
simple design with solid history of successful
utilization in clinical research - They can offer extremely useful evidence and they
are a must for documenting the effectiveness of
proposed interventions - This does not mean that they cannot suffer from
important major biases. - Caveat lector