Title: Metaanalysis
1Meta-analysis psychotherapy outcome research
2Overview
- What is a meta-analysis?
- How is a meta-analysis conducted?
- Robinson et al, 1990 Is psychotherapy effective
as a treatment for depression?
3What is meta-analysis? An analogy
- Items Testing
-
- Studies Meta-analysis
- Testing Psychological constructs
-
- Meta-analysis Experimental/Clinical Effects
4What is a meta-analysis?
- "Meta-analysis refers to the analysis of analyses
. . . the statistical analysis of a large
collection of analysis results from individual
studies for the purpose of integrating the
findings. It connotes a rigorous alternative to
the casual, narrative discussions of research
studies traditional review papers which
typify our attempts to make sense of the rapidly
expanding research literature." (G. Glass, 1976)
5What is a meta-analysis?
- By comparing results from many different studies,
we can look for general conclusions in domains
where conclusions of individual studies may be
uncertain and/or disputed - because they are subject to many variables, or
- because the literature is a mess, offering
selective support for conflicting viewpoints
6What is meta-analysis?
- Each trial (experiment or treatment assessment)
is treated as one estimate of an effect, assumed
to be underlain by some global population value - This is analogous to individual items on a
psychometric test - Each item (question or study) is one estimate the
construct to which it relates, but may be subject
to error or contamination by itself - Just as many items on a test allow us to quantify
load on a construct, so many studies allow us to
quantify load on an effect of interest.
7Five steps in conducting a meta-analysis
- 1.) Define a question that you want to answer
- 2.) Select studies according some specified
inclusion criteria - 3.) Select your statistical model (fixed effects
versus random effects) - 4.) Calculate summary effects
- 5.) Interpret the results
81.) Define a question that you want to answer
- The question may be posed in terms of an
independent variable, or a set of commonly
researched variables, or by causes and
consequences of important variables. - E.g. How effective is psychotherapy for
depression?
92.) Select studies according some specified
inclusion criteria
- The purpose is to include only comparable studies
of good quality - Eg. In Robinson et al, 1990
- 1.) Studies from 1976-1986
- 2.) Patients suffering only and explicitly from
depression - 3.) Outpatients only
- 4.) Adults only
- 5.) Included a comparison of treatment versus no
treatment or different types of therapy no case
histories, no pre/post designs (Why not?) - 6.) Verbal psychotherapy only
103.) Select your statistical model
- Fixed effects Assumes that the data are
consistent with the treatment effect being
constant (i.e. there is a single fixed treatment
effect no interaction between study and effect) - Random effects Assumes that the studies included
in the meta-analysis are a random sample
generalizing to the domain of all similar studies
(under the assumption or finding that there is a
study X treatment interaction i.e. different
treatment effects in different studies) - We can still generalize, under the assumption
that our studies constitute a random sample of
possible study X treatment interactions, but the
confidence interval will be wider due to the
increased error less certainty in conclusions
11Fixed versus random effects in experimental
psychology
- The same distinctions apply in experimental
psychology experimental items are also fixed or
random effects - Eg. Consider lexical decision (decide whether a
string is a word or not) with low and high
frequency words - We (used to) assume that the effect is a fixed
effect the frequency effects is the same for
every word for every subject (no subject x
treatment interaction, and no subject x item
interaction) - However, such interactions may exist ( the
effect is a random effect) - If they do, we will have greater error, because
now we have error due to those interactions
harder to get reliable results, less confidence
in our conclusions, and a need to test for item
generalizability in the same way we test for
subject generalizability - Psycholinguists have taken this problem fairly
seriously (e.g. Clark, 1973 Raaijmakers,
Schrijnemakers, Gremmen, 1999) but much other
experimental psychology has this problem and has
not faced it.
12What is an effect size?
- To understand what an effect size is, we first
need to remind ourselves what a p-value is - A p-value measures the probability of error in
claiming to have found a difference - NB A p-value does not measure the size of an
effect - A p-value quantifies the probability that two (or
more) groups are really different, by computing
how likely it is that any apparent difference
might be found by chance alone - As you may recall, p-value depends on two
things the size of the effect and the size of
the sample. - You can get a significant effect (p 0.05)
either if the effect is very big (despite a small
sample) or if the sample is very big (despite a
small effect size- as in many medical studies)
13What is an effect size?
- You can't average p-values, because they do not
reflect the same things in different studies
(more generally, we cant average probabilities
when they are drawn from different domains) - Effect size is a way of quantifying the size of
the difference in standardized terms - It is the standardized mean difference between
two groups
14What is an effect size?
- Averaging each study equally would give each
study equal weight, which we know must be wrong
surely studies with more subjects should be
weighted more heavily (more likely to be true)
than studies with less - Meta-analytic methods give more weight to studies
are more informative, because they have more
subjects, more measurements, or lower variance - All of these are related to the confidence
interval of the study the probability of random
errors
15How does meta-analysis work?
- We won't consider the (rather complex)
mathematical details in this class - Specialized computer programs are available
- The basic idea is to convert values of of
significance (i.e. t, F, c, or p values) into
some common format Pearson's r, or Cohen's d (a
measure of effect size the standardized mean
difference between two groups) - As noted on the last slide, these common values
must be corrected for error (within each study)
due to sample size, measurement error, and range
restrictions (i.e. selection for studies
selecting for extremes in the possible range) - It is more difficult to control for (although one
can check for) publication bias (only significant
results get published) and publication quality
16How does it work?
- When the disparate measures from each study are
all converted to a single measure, they are
directly comparable (assuming they used
comparable outcome measures!) - The process is analogous to converting disparate
measures (number of hockey goals scored versus
number of baskets achieved) to z-scores to make
them directly comparable. - The effect size measure is standardized and is
essentially equivalent to a z-score, but it is a
z-score of differences
17The problem of moderator variables
- Moderator variables Extraneous variables
influencing the results in a particular study - There are mathematical ways to deal with these
184.) Calculate summary effects
- In Robinson et al, the mean effect size of
psychotherapy compared to no treatment (37
studies) was 0.73 - What does this mean?
- An effect size of 0.73 means that patients who
received psychotherapy had average outcomes about
3/4 of a standard deviation better than those who
had no treatment - The mean effect size of psychotherapy compared to
waiting list was 0.84 - The mean effect size of psychotherapy compared to
placebo was 0.28 (p gt 0.05)- What does this tell
us?
194.) Calculate summary effects
- There was no reliable global effect of types of
therapy, but in individual planned comparisons
cognitive, behavioral, and cognitive-behavioral
were all better than 'general verbal' - The effect size comparing psychotherapy to (all)
drug therapy was 0.13 (p lt 0.05), but there was
no difference between a combination of the two
versus psychotherapy alone (d 0.01 p gt 0.05)
or versus drug therapy alone (d 0.17 p gt 0.05) - This cuts both ways in the drug/therapy debate
there are costs and benefits to both, and the
question is How much benefit is worth how much
cost?
205.) Interpret the results
- The results of this meta-analysis suggest that
psychotherapy does work as a treatment for
depression - BUT it does not work better than placebos
- It works slightly (but significantly) better than
drug therapy, but the two treatments do not have
a significantly additive effect - Treatment cost- in human and dollar terms- must
be factored into treatment planning - Some costs may vary between individuals some
hate drugs and others hate paying more than they
need to
21What is meta-analysis? An analogy
- Items Testing
-
- Studies Meta-analysis
- Testing Psychological constructs
-
- Meta-analysis Experimental/Clinical Effects
22What is meta-analysis? An analogy
- Items Testing
-
- Studies Meta-analysis
- Testing Psychological constructs
-
- Meta-analysis Experimental/Clinical Effects
- Just as we can use psychometric testing to
quantify the degree to which a construct matters
for any particular purpose, we can use
meta-analysis to quantify the degree to which
measured effects matter for a specified purpose - Just as constructs exist in a hazy but
quantifiable uncertainty, so do treatments and
effects
23Significance testing ? meta-analysis
- It is important to distinguish significance
testing from measurement of effect sizes - When we select from extremes of a normal
distribution (high/low), we can often get highly
reliable effects that are nevertheless of
negligible import in explaining the phenomenon
under study - i.e. Some variables may have highly reliable
effects on some dependent measure when selected
from extremes, but correlate with that measure
with r lt 0.1 - How much of the variance do these highly reliable
effects account for?
24More is not always better
- More is not always better Effects that are
significant individually may be accounting for
shared variance, and therefore not sum together - i.e. drug therapy and psychotherapy are both
better than nothing at all, but adding drugs to
psychotherapy is not better than psychotherapy
alone - This is similar to having two factors that are
strongly correlated loading on a single
construct you may think you have more
information than you actually do have, because
you have redundant information - It is like counting one 10 bill ten times, and
claiming to have 100
25Ask and you shall receive
- The question you ask matters 'Which treatment is
better?' ? 'Which treatment should I prescribe?' - It is one thing to show that two treatments
differ, but quite another to make a decision
about which one is best for any particular
individual - Again, this is a point we have made over and over
in this class a construct can only be usefully
defined in terms of why and how it matters - There are not constructs independently of the
purpose for which they are defined