Title: Systematic Reviews
1Systematic Reviews
GME Evidence Based Medicine Course Module 5
2Systematic Reviews
Objectives At the completion of module 5,
residents should be able to
- Define and list the main characteristics of a
systematic review. - Differentiate between a review article, a
systematic review, and a meta-analysis. - Describe the meaning of "heterogeneity" in terms
of systematic reviews. - Properly interpret a Forest plot.
- Critically appraise a systematic review in terms
of internal validity, interpretation of results,
and applying results to a particular patient
3Definitions
A broad overview of a topic, similar to a
textbook chapter.
- Often covers multiple, background aspects of a
disease such as natural history, etiology,
epidemiology, signs symptoms, diagnosis,
treatment, and prognosis. - The article summarizes the results from many
other primary studies. - The studies to summarize are chosen at the
discretion of the author.
4Definitions
Review articles
A broad overview of a topic, similar to a
textbook chapter.
A type of review article that focuses on a
focused clinical question
Studies are chosen using a standardized protocol
to minimize selection bias.
5Definitions
Review articles
A broad overview of a topic, similar to a
textbook chapter.
Systematic Review
A type of review article that focuses on a
focused clinical question
A type of systematic review in which the
numerical results from individual studies are
mathematically combined to give a single, overall
estimate of treatment effect.
6Definitions
Review articles
Systematic Review
Meta-analysis
- A systematic review can be thought of as a
research project done on the medical literature
itself.
- Instead of human beings acting as subjects, the
subjects of a systematic review are individual
RCTs
7Finding Systematic Reviews
- Produces high quality systematic reviews
- Managed by the Cochrane Collaboration
- A not-for-profit international organization and
one of the initial developers of systematic
reviews - Available through the HSLIC web site.
8Finding Systematic Reviews
- PubMed Clinical Queries
- They are accessed from the "Clinical Queries"
link on the blue side bar of the PubMed home
page.
9Essential Concepts
Three concepts are essential to understanding the
critical appraisal of systematic reviews. These
are
- Publication bias. Publication bias is one of the
factors that systematic reviews attempt to avoid
by selecting studies in a systematic way. - Heterogeneity. Heterogeneity is a statistical
measure of the difference between the results
from different studies. The less heterogeneous
results are, the easier it becomes to estimate
overall effect. - Forest plots. These graphical displays show study
data in a way that makes it easy to see
similarities and differences between studies.
10Publication Bias
Publication bias is the tendency for positive
studies (those that demonstrate a likely
difference between two or more populations) to be
preferentially published over those studies that
do not show a difference (negative studies).
Many factors contribute to publication bias
- The perception that positive findings are more
interesting, and publication worthy than negative
studies - The tendency for researchers to delay or avoid
submission of manuscripts with negative findings
(the "file drawer phenomenon"). - The tendency for journal editors to reserve most
of the space in their journals for the positive
studies.
Citation bias. Not only are positive studies more
likely to be published, but once published, they
are more likely to be cited than negative
studies.
11Publication Bias
Overcoming publication bias. Publication bias is
out of the control of those researchers working
on systematic reviews. Instead, they use
techniques to check for minimize its effects
- Searching a wide variety of bibliographic
databases using pre-defined search criteria. - Checking other publication sources, such as
reference lists from articles and symposium and
meeting proceedings. - Contacting experts in the field, other
researchers, journal editors, funding agencies,
and pharmaceutical companies or manufactures for
information on unpublished studies. - Registries of RCTs have recently been created, so
that RCTs are registered prior to conducting the
study. This provides a clear trail to follow when
tracking down study results, either positive or
negative. Many funding agencies require
registration in order to fund a study.
12ClinicalTrials.gov
Registration information is often included in
PubMed
13Funnel Plots
Funnel plots are a device for checking for
publication bias.
- Each dot represents the overall effect from one
RCT. - As sample size increases, the width of the
confidence interval should decrease. - Result should be located in a symmetric,
triangular area centered on the overall effect
for all studies.
14Funnel Plots
Missing studies will manifest as an asymmetry in
the funnel plot.
- Missing studies will appear as a gap in the
portion of the funnel plot where you would expect
to find negative studies. - The unopposed positive studies will shift the
apparent treatment effect (blue line) towards a
larger effect size than it really is.
15Heterogeneity
- Refers to differences between the outcomes of
studies included in a meta-analysis. - If most studies are similar to each other and
show a similar result (low heterogeneity), this
increases confidence that the effect being
measured is real. - If results from different studies are vastly
different from each other, this suggests that
each study is measuring something slightly
different from the other studies. - High heterogeneity can be due to
- Random chance
- Differences in patient populations between
studies - Differences in treatment
- Differences in assessing outcomes
- Other differences in study methodology
16Measures of Heterogeneity
You may see two parameters related to
heterogeneity, Cochrane's Q (abbreviated as
chi-square ?2) or a second parameter abbreviated
as I2.
- In either case, the p-value associated with the
parameter shows the probability that any
difference between studies is due to random
chance alone. - A high p-value indicates low heterogeneity.
Heterogeneity should be low in order to combine
individual study results into a single, overall
estimate of treatment effect. - In the case of meta-analyses, p-value should be
high to ensure that combining study results is
appropriate. - In general, p should be greater than 0.1 in order
to statistically combine study results.
17Measures of Heterogeneity
Systematic reviews with high heterogeneity should
either not combine results (in a meta-analysis)
or should use statistical methods to compensate
for the heterogeneity.
Fixed effects model. Assumes that any differences
between study results are due only to random
chance. Appropriate when heterogeneity is
low. Random effects model. Makes some
conservative assumptions in order to combine
studies. The overall result should be interpreted
with caution since each study seems to be
actually measuring something slightly different
from the others. In a sense, the random effects
model is comparing apples and oranges. Subgroup
analysis. If heterogeneity is high, but the
differences may be due to known factors (e.g.,
patient age), results are sometimes stratified by
these known factors and then individual strata
from different studies become similar enough that
they can be combined.
18Forest Plots
Forest plots are a graphical display of
individual study results that were included in a
systematic review. If the systematic review was a
meta-analysis, they also show an estimate of the
overall treatment effect.
The following series of slides comes from a
Cochrane systematic review Lee T, Southern KW.
Topical cystic fibrosis transmembrane conductance
regulator gene replacement for cystic
fibrosis-related lung disease. Cochrane Database
Syst Rev. 2007. 18(2)CD005599.
Synopsis The effectiveness of gene therapy for
cystic fibrosis was reviewed. Only three, high
quality RCTs were found. While gene therapy did
seem to improve some laboratory tests of cystic
fibrosis, clinically meaningful outcomes were not
improved.
19Reading Forest Plots
- Column 1 Subgroups
- This particular analysis used two subgroups
(arrows) - Outcomes measured between 1 and 30 days post
treatment - Outcomes measured between 1 and 2 months post
treatment.
20Reading Forest Plots
Column 2 Experimental groups The number of
subjects receiving gene therapy for each study
and each sub-group are shown.
21Reading Forest Plots
Column 3 Experimental groups The average FEV1
(Forced expiratory volume at 1 second) for the
experimental groups is shown.
22Reading Forest Plots
Columns 4 5 Control groups The number of
subjects in each placebo group, and their FEV1
are shown
23Reading Forest Plots
- Column 6 Graphical summary
- Green squares represent point estimates
- The size of the square is proportional to the
number of subjects in the group. - The horizontal lines show the 95 confidence
interval. - The black diamonds represent the combined results
for each subgroup. - Note that this analysis used a fixed effects
model.
24Reading Forest Plots
The last column shows the point estimate and
confidence interval using a traditional numerical
display.
25Reading Forest Plots
Each sub-group shows the Chi-square, I2, and
p-value for heterogeneity.
26Critical Appraisal
Remember that systematic reviews can be thought
of as research projects done on the medical
literature itself, with entire studies acting as
the "subjects" instead of human volunteers.
- The critical appraisal step is the third step of
the EBM process - Formulate a focused clinical question.
- Search for the best evidence.
- Critically appraise the evidence.
- Apply the evidence to your patients.
27Critical Appraisal
The critical appraisal asks three broad questions
- How valid are the results?
- What are the results?
- How can the results be applied to the patient?
The same Cochrane systematic review of gene
therapy for cystic fibrosis will be used to
illustrate the critical appraisal of systematic
reviews Lee T, Southern KW. Topical cystic
fibrosis transmembrane conductance regulator gene
replacement for cystic fibrosis-related lung
disease. Cochrane Database Syst Rev. 2007.
18(2)CD005599.
28Validity Questions
Four sub-questions address the validity of the
study
- Did the systematic review address a focused
clinical question? - How detailed and exhaustive was the search for
studies? - How high was the quality of the selected studies?
- How reproducible were the assessments of study
quality?
29Validity Questions
- Did the systematic review address a focused
clinical question?
- This validity question relates to the PICO
question (Patient, Intervention, Comparison,
Outcomes) - Systematic reviews should be specific enough to
answer a genuine clinical questions, but general
enough to apply to the broad range of patients. - For the systematic review of gene therapy for
cystic fibrosis, the answer to this question is
likely to be "No" for clinical care, but "Yes"
for research since gene therapy is generally an
area of research only. - Additionally, the review included both viral and
non-viral vectors in the delivery of gene
products, which may or may not have drastically
different response profiles.
30Validity Questions
- How detailed and exhaustive was the search for
studies?
- This question relates to publication bias and
methods to overcome it. - The only two search terms used were "gene
therapy" and "cystic fibrosis." - Both MEDLINE and EMBASE were searched.
- Registries of clinical trials were consulted.
- Hand searching included pediatric pulmonology
journals and abstracts from major conferences. - Experts and researchers were also personally
contacted for information on unpublished studies.
- Not done (or at least not listed) Contacting
pharmaceutical companies, searching without
language restrictions, or checking a funnel plot
for asymmetry.
31Validity Questions
- How high was the quality of the selected studies?
- Just as in the critical appraisal of RCTs, the
methodology of each study must be known so that
the validity of its results can be assessed. - In the CF systematic review, studies were
generally of moderate or uncertain risk of bias.
16 studies were initially identified, but only 3
were Randomized Controlled Trials that were
relevant to the focus. - Of these 3 studies (with a total of 155
subjects), none described the process of
randomization. - None included information on allocation
concealment. - Only 1 study explicitly listed the roles that
were blinded. The other two used the non-specific
expression "double-blind." - Follow up ranged from 75 to 89, which is
acceptable for these kinds of studies.
32Validity Questions
- How reproducible were the assessments of study
quality?
- This question asks if assessments of quality were
reproducible, not if one RCT could reproduce the
results of another RCT. - Sometimes a statistical measure known as
"inter-rater reliability" or "kappa" is used to
quantify the reproducibility of assessments. In
general, a kappa (?) of 0.7 or above indicates
good inter-rater reliability. - High inter-rater reliability is important because
it lends credibility to the method of assessment
of study quality. - The use of standardized forms or protocols and
explicit quality criteria, can increase the
reproducibility of assessments. - In the example article, no kappa or
quantification was given. Only a statement that
"there was agreement" between the reviewers was
included. - Articles were independently assessed, and a
standardized form were used.
33Results Questions
Three sub-questions address the results of the
study
- How similar to each other were the results of
each study? - What are the overall results and conclusions?
- How precise are the overall results?
34Results Questions
- How similar to each other were the results of
each study?
- This question is about heterogeneity.
- In the CF systematic review, meta-analysis of the
gene replacement studies was limited because of
different study designs. - The forest plots near the end of the systematic
review provide a quick, visual reference to
answer this question. - The point estimates can be compared to see if
they all fall to the same side of the "zero
effect" line. - The confidence intervals can be visually
inspected to see if there is a range that falls
within the confidence interval for each study. - The p-value for heterogeneity can be checked. The
p-value indicates the probability that any
differences between results is due to random
chance alone.
35Results Questions
- What are the overall results and conclusions?
- This question asks if the systematic review
demonstrate a clear treatment effect or not. - Heterogeneity or low quality studies can prohibit
making any general conclusions. - See the "Implications for practice" section on
p.13 of the gene therapy article. The authors
state there is no evidence to support the use of
nebulized gene therapy treatments.
36Results Questions
- How precise are the overall results?
- In general, a systematic review of high quality
studies should be able to come to a more precise
result than any of its individual studies. - In the gene therapy review, the ability to
combine studies was limited by different study
designs and the unclear quality of some of the
studies. - The best estimates of overall results are shown
by the black, diamond shaped summaries in the
forest plots. - The authors comment (p. 13) that the most
important finding was an increase in flu-like
side effects in those receiving gene therapy.
37Patient Applicability
Three sub-questions address the issue of applying
results to an individual patient
- How should results be interpreted for a specific
patient? - Were all clinically important outcomes
considered? - Are the benefits worth the potential risks and
costs?
38Patient Applicability
- How should results be interpreted for a specific
patient?
- Many factors go into answering this question,
including the amount of correspondence between
your clinical question and the focus of the
systematic review. - If any part of your PICO question (patients,
interventions, comparisons, or outcomes) differs
from that of the review, then clinical judgment
is required to determine if the systematic review
is similar enough to help inform decisions. - Both adults and children with cystic fibrosis
were included in the gene therapy review.
However, this systematic review is not likely to
be useful for clinical care because gene therapy
remains largely in experimental stage.
39Patient Applicability
- Were all clinically important outcomes considered?
- Outcomes to consider are effectiveness, adverse
drug events, expertise needed to deliver the
treatment effectively, and patient preferences. - This systematic review set out to compare a
comprehensive set of real and surrogate outcomes,
as well as a broad range of adverse drug events
and social factors such as days lost from school
or work (see pp. 3-4). - Unfortunately, few of the studies included these
measures (see pp. 9-12).
40Patient Applicability
- Are the benefits worth the potential risks and
costs?
- This question requires a subject synthesis of all
results of the critical appraisal. - Clinical judgment and patient preference is
required to prioritize benefits, risks, and
costs. - The Cochrane systematic review concludes that the
evidence does not support recommending gene
therapy to patients.