Title: The Campbell Collaboration Systematic Review
1The Campbell CollaborationSystematic Review
- Presented by
- Harris Cooper
- University of Missouri-Columbia
- Larry V. Hedges
- University of Chicago
2A Campbell Systematic Review
- Is meant to synthesize evidence on social and
behavioral interventions and public policy
3A Campbell Systematic Review
- Is primarily concerned with
- Evidence on overall intervention or policy
effectiveness and how effectiveness is influenced
by variations in - Process
- Implementation
- Intervention components
- Recipients
- Other factors
4A Campbell Systematic Review
- Uses systematic, transparent rules to define,
gather, summarize, integrate and present research
evidence
5How have the methods of systematic reviewing
developed?
6Statistical procedures for the integration of
research have existed for 100 years
- 1904 K. Pearson. Report on certain enteric fever
inoculation statistics. British Medical Journal,
3, 1243-1246 - 1932 R. A. Fisher. Statistical Methods for
Research Workers. London Oliver Boyd. - it sometimes happens that although few or no
statistical tests can be claimed individually as
significant, yet the aggregate gives an
impression that the probabilities are lower than
would have been obtained by chance. (p.99) - Birge RT (1932). The calculation of errors by
the method of least squares. Physical Review,
40, 207-227.
7Statistical procedures for the integration of
research have existed for 100 years
- Additional references to early work can be found
in - Chalmers, I., Hedges, L.V. Cooper, H (in
press). A brief history of research synthesis.
Evaluation and Health Professions. - Olkin, I. (1990). History and goals. In K.W.
Wachter M.L. Straf (Eds.). The future of
meta-analysis. New York Russell Sage Foundation
8Modern interest in statistical synthesis of
research exploded in the mid-1970s
- 1976 G. V. Glass Primary, secondary, and
meta-analysis of research. Educational
Researcher, 5, 3-8. - Meta-analysis is the statistical analysis of a
large collection of analysis results from
individual studies for purposes of integrating
the findings (p.3). - 1977 F. Schmidt J. Hunter. Development of a
general solution to the problem of validity
generalization. Journal of Applied Psychology,
62, 529-540. - 1997 M. Hunt. How science takes stock The story
of meta-analysis. New York Russell Sage
Foundation
9Use of statistical procedures was precipitated by
the increase of social science research
- 1978 R. Rosenthal D. Rubin. Interpersonal
expectancy effects The first 345 studies.
Behavioral and Brain Sciences, 3, 377-415. - 1979 G. V. Glass M. L. Smith. Meta-analysis of
research on class size and achievement.
Educational Evaluation and Policy Analysis, 1,
2-16. - 1979 J. Hunter, F. Schmidt R. Hunter.
Differential validity of employment tests by
race A comprehensive review and analysis.
Psychological Bulletin, 86, 721-735
10and by demonstrations of flaws in traditional
reviewing procedures
- 1980 H. Cooper R. Rosenthal. Statistical
versus traditional procedures for summarizing
research findings. Psychological Bulletin, 87,
442-449.
11The first textbooks on statistical procedures
appeared in the 1980s
- 1981 G. V. Glass, B. McGraw M. L. Smith.
Meta-Analysis in Social Research. Beverly Hills,
CA Sage. - 1982 J. Hunter, F. Schmidt G. Jackson.
Meta-Analysis Cumulating Research Findings
Across Studies. Beverly Hills, CA Sage. - 1984 R. Rosenthal. Meta-Analytic Procedures for
Social Research. Beverly Hills, CA Sage
12The first textbooks on statistical procedures
appeared in the 1980s
- 1985 L. V. Hedges I. Olkin. Statistical
methods for meta-analysis. Orlando, FL Academic
Press. - 1986 F. Wolf. Meta-analysis Quantitative
Methods for Research Synthesis. Beverly Hills,
CA Sage.
13...while, simultaneously, a scientific paradigm
for research synthesis emerged
- 1971 K. Feldman. Using the work of others Some
observations on reviewing and integrating.
Sociology of Education, 4, 86-102. - 1971 R. Light P. Smith. Accumulating evidence
Procedures for resolving contradictions among
research studies. Harvard Educational Review, 41,
429-471. - 1980 G. Jackson. Methods for integrative
reviews. Review of Educational Research, 50,
438-460.
14...while, simultaneously, a scientific paradigm
for research synthesis emerged
- 1982 H. Cooper. Scientific guidelines for
conducting integrative research reviews. Review
of Educational Research, 52, 291-302. - the integration of separate research projects
involves scientific inferences as central to the
validity of knowledge as the inferences made in
primary research. (p.291) - Most important, the methodological choices at
each review stage may engender threats to the
validity of the reviews conclusions. (p.292) - Because of the increasing role that reviews play
in our definition of knowledge, it seems that
these adjustments in procedures are inevitable if
behavioral scientists hope to retain their claim
to objectivity. (P.301)
15The Integrative Review Conceptualized as a
Research Project
- Stage of a Research Synthesis
- Problem Formulation
- Data Collection
- Data Evaluation
- Data Analysis
- Public Presentation
16The Integrative Review Conceptualized as a
Research Project
- Characteristics of Each Stage
- Research Question Asked
- Primary Function in Review
- Procedural Differences That Create Variation in
Review Conclusions - Sources of Potential Invalidity in Review
Conclusion
17The first texts on scientific research synthesis
appeared shortly thereafter
- 1984 Harris Cooper. The integrative research
review A systematic approach. Beverly Hills, CA
Sage. - 1984 Richard Light David Pillemer. Summing Up
The Science of Research Reviewing. Cambridge, MA
Harvard University Press.
18These were followed by other excellent texts
- Some treat it from the perspective of particular
research design conceptualizations - Eddy, Hasselblad Shachter, 1992 Mullen, 1989
- Some are tied to particular software packages
- Johnson, 1989 Bushman Wang, 1999
- Some of which treated research synthesis
generally - Cooper Hedges, 1994 Lipsey Wilson, 2001
- And some look at potential future developments in
research synthesis - Wachter Straf, 1990 Cook, et.al., 1992
19What is a C2 Review Protocol?
- A C2 Review Protocol is a document that
- Sets out the reviewers intentions with regard to
the topic and the methods to be used in carrying
out a proposed review - Is meant for inclusion in the Campbell Database
of Systematic Reviews
20What should a C2 Protocol contain?
- Cover Sheet
- Background for the Review
- Objectives for the Review
21What should a C2 Protocol contain?
- Methods
- Criteria for inclusion and exclusion of studies
in the review - Search strategy for identification of relevant
studies - Description of methods used in the component
studies - Criteria for determination of independent
findings - Details of study coding categories
- Statistical procedures and conventions
- Treatment of qualitative research
22What should a C2 Protocol contain?
- Timeframe
- Plans for Updating the Review
- Acknowledgements
- Statement Concerning Conflict of Interest
- References
- Tables
23What are the key aspects of methods in C2 Reviews?
- Criteria for inclusion and exclusion of studies
in the review - Search strategy for identification of relevant
studies - Description of methods used in primary research
- Criteria for determining independent findings
24What are the key aspects of methods in C2 Reviews?
- Details of coding of study characteristics
- Statistical procedures and conventions
- Treatment of qualitative research
25What is the major source of bias in systematic
reviews?
26Selection biases due to systematic non-inclusion
of studies that give different results than other
studies
- Non-inclusion may happen because
- The studies were never identified
- The studies were identified, but never retrieved
- Bias exists in evaluation of studies for
inclusion
27Selection biases due to systematic non-inclusion
of studies that give different results than other
studies
- A particularly important source of selection bias
is publication bias associated with non-inclusion
of studies with statistically insignificant
results - Publication selection effects can produce very
large biases
28Such biases are problematic because they do not
cancel across studies
- A systematic review that does not control for
selection bias may be just as biased as a single
study - However the review may be more misleading because
it seems to be more precise
29How are study results used in systematic reviews?
30Study results are usually represented
quantitatively as effect sizes
- Effect sizes are chosen to be comparable (to mean
the same thing) across all of the studies in the
review. - Sometimes the effect size will be as simple as
the raw mean difference between the treatment and
control groups
31Study results are usually represented
quantitatively as effect sizes
- However, when the outcome is measure by different
instruments in different studies, the raw mean
difference may not be comparable across studies.
In such cases, standardized effect size may be
used. - The standardized mean difference (the raw mean
difference divided by the standard deviation) is
often used as an effect size measure - The correlation coefficient is also used as an
effect size measure
32Study results are usually represented
quantitatively as effect sizes
- When studies have dichotomous outcomes, other
effect sizes are typically used - The odds ratio between treatment and control
groups - The rate ratio (ratio of proportions) between
treatment and control groups - The rate difference (difference in proportions)
between treatment and control groups
33Study results are usually represented
quantitatively as effect sizes
- When studies use a quantitative independent
variable, raw regression coefficients or
partially or fully standardized regression
coefficients can be used as effect sizes.
34Study results are usually represented
quantitatively as effect sizes
- In rare cases, p-values may be used to represent
study results.
35Why do we need special methods for systematic
reviews?
- Cant we just see how many studies found results?
- We usually interpret studies by determining
whether they found an effect that was big enough
to be statistically significant. - We could just see if a large proportion of
studies found the effect.
36Why do we need special methods for systematic
reviews?
- Such a strategy is usually called vote-counting.
Each study metaphorically casts a vote for or
against effectiveness of the treatment
37Why do we need special methods for systematic
reviews?
- Intuitive though it may be, vote counting can be
shown to have terrible properties - Vote counting has low power it often fails to
find effects even when they exist - The power doesnt necessarily increase, even as
the amount of evidence (the number of studies
increases) - In fact, the chance that vote counting detects
effects that exist in all studies may tend to
zero as the amount of evidence increases!
38Why do we need special methods for systematic
reviews?
- Vote counting also gives little insight about the
size of effects or the consistency of effects
across studies.
39Are the effect sizes from every study equal?
- Even if every study in the review is equally free
of bias, the effect sizes do not contain the same
amount of information.
40Are the effect sizes from every study equal?
- Sample size usually varies substantially from
study to study - Large studies provide more precise information
about effect size than small studies - Large variation in sample sizes and therefore
large variation in precision across studies, is
common in systematic reviews
41Are the effect sizes from every study equal?
- The precision of a studys effect size is
characterized by its sampling error variance - Sampling error variances can usually be computed
from formulas involving sample size and effect
size
42Are the effect sizes from every study equal?
- Systematic reviews often contain a graphical
presentation of each studys effect size and
precision, called a Forrest plot.
43Are the effect sizes from every study equal?
- A Forrest plot gives each studys effect size and
a 95 confidence interval for the effect - The width of the confidence interval indicates
the uncertainty of a studys effect size estimate
44Are the effect sizes from every study equal?
- Often there is a dot denoting the effect size
whose area corresponds to the relative sample
size - Overlap between the confidence intervals
represents consistency of the effect sizes across
studies
45How are the results combined across studies?
- Effect sizes are usually combined across studies
by averaging. - Since effect sizes usually differ in precision,
we usually want to give more weight to studies
with greater precision (that is smaller sampling
error variance)
46How are the results combined across studies?
- Effect sizes are usually combined across This
leads to a weighted average of the form - ? wi Ti
- ? wi
- Where the Ti are the effect sizes and the weights
wi are the inverses of the sampling error
variances
47How are the results combined across studies?
- The weighted average has precision (sampling
error variance) proportional to the inverse of
the sum of the weights - _1 _
- ? wi
48Is the average effect size the only important
summary?
- Average is important, but variation in effect
sizes across studies is important too. - The variation we care about is the degree to
which the true effect sizes differ across
studies - Total variation Sampling error variation True
variation
49Is the average effect size the only important
summary?
- The standard deviation of the effect sizes
measures total variation, which includes sampling
error variation and true variation in effect
sizes.
50Is the average effect size the only important
summary?
- There are two strategies for assessing true
variation in effect sizes across studies - Tests of homogeneity, which are statistical tests
of the hypothesis that the true effects are equal
across studies - Estimates of the between-study variance component
51Is the average effect size the only important
summary?
- The between-studies variance component is a
quantitative estimate of true effect size
variation across studies. - The homogeneity test is a test that the variance
component is zero.
52How do we account for variation in effects across
studies?
- We account for variation across studies by using
statistical models. - Just as in primary research there are basically
two modeling approaches - Analogues to Analysis of Variance
- Analogues to Multiple Regression
53How do we account for variation in effects across
studies?
- Analogues to Analysis of Variance allow us to
compare groups of studies, for example whether
one group of studies (such as studies conducted
in the US) has a different average effect than
another (such as studies conducted in Europe).
54How do we account for variation in effects across
studies?
- Analogues to Multiple Regression allow us to
determine the relation of a quantitative study
characteristic (such as treatment intensity or
duration) and effect size.
55How do we account for variation in effects across
studies?
- It is important to recognize that comparisons
among groups of experiments are observational
(correlational) studies.
56What about studies that have many results?
- Sometimes a study produces many results that
could lead to the computation of many effect
sizes. How should each effect size be treated?
57What about studies that have many results?
- The answer depends on whether each result (each
effect size is computed from data on different
individuals. - Effect sizes computed from the same individuals
(e.g., from different outcome measures from the
same people or by comparing different groups to
the same control groups) are statistically
dependent
58What about studies that have many results?
- Statistically dependent effect sizes do not
contain independent information, therefore two
dependent effect sizes contain less (often much
less) information than two independent effect
sizes, even if they have the same sampling error
variance
59What about studies that have many results?
- Effect sizes computed from different individuals
are statistically independent and therefore can
be treated for most purposes as if they came from
different studies
60What about studies that have many results?
- We usually try to obtain statistically
independent effect sizes because the analysis and
interpretation is much simpler.
61How do we check on the robustness of findings in
systematic reviews?
- It is critical to check the robustness of
findings by sensitivity analyses of various
kinds. - Sensitivity analyses check the effect of various
choices of methods made in review.
62How do we check on the robustness of findings in
systematic reviews?
- Important kinds of sensitivity analyses include
examination of the impact kind - Particular studies on results of the review
- Synthesis methods on results of the review
- Primary study methodology on results of the
review - Study heterogeneity on results of the review
- Publication bias on results of the review
63How do we check on the robustness of findings in
systematic reviews?
- Sensitivity analyses should be tailored to the
systematic review at hand. - Different sensitivity analyses will be
appropriate for reviews in different areas.
64What are the currently established C2 methods
groups?
65The Statistics Group
- Will focus primarily on statistical methods used
to develop summary indicators of study results
and how to combine these across studies - The Statistics group will provide
- advise to review groups and the C2 Steering
Committee on statistical methods - training and support
- research on statistical methods
- monitor the quality of statistical aspects of C2
reviews - a forum for discussion of statistical issues
66The Quasi-Experimental Design Group
- Will focus primarily on critically assessing the
power and limitations of trials which do not use
random assignments - These designs will be assessed with regard to
validity and generalizability - The group will also refine methods of
meta-analysis for non-randomized trials - The group will make recommendations to C2
regarding software and inclusion of
non-randomized studies in the Campbell Library
67The Process and Implementation Group
- Will focus primarily on both quantitative and
qualitative methods for uncovering those parts of
the implementation process that might influence
the success or failure of an intervention - These might include the population under study,
characteristics of the intervention itself and
the non-intervention condition, the setting, and
the outcome variables
68The Editorial Review Group
- Will serve as the editorial team for reviews
undertaken by other methods groups
69What future C2 methods groups might be
established?
70Literature Searching Date Retrieval
- Publication bias
- Prospective registers
- Coding techniques and reliability
71Research Design
- Research quality judgments
- Cluster randomized trials
72Statistical Issues
- Effect size metrics
- Stochastically dependent effect sizes
- Explanatory models
- Fixed, random, and multilevel effects
- Missing data