Title: Publication bias in impact evaluation: evidence from a systematic review of farmer field schools
1Publication bias in impact evaluation evidence
from a systematic review of farmer field schools
International Initiative for Impact Evaluation
2Acknowledgements
- Jorge Hombrados, J-PAL Latin America (co-author)
- Birte Snilstveit, co-PI on farmer field school
review - FFS co-authors Martina Vojtkova, Daniel Phillips
- Presentation based on training on publication
bias provided by Emily Tanner-Smith, Campbell
Collaboration
3- The haphazard way we individually and
collectively study the fragility of inferences
leaves most of us unconvinced that any inference
is believable... It is important we study
fragility in a much more systematic way - Edward Leamer Lets take the con out of
econometrics, AER 1983
4What is publication bias?
- Publication bias refers to bias that occurs when
research found in the published literature is
systematically unrepresentative of the population
of studies (Rothstein et al., 2005) - On average published studies have a larger mean
effect size than unpublished studies, providing
evidence for a publication bias (Lipsey and
Wilson 1993) - Also referred to as the file drawer problem
- journals are filled with the 5 of studies that
show Type I errors, while the file drawers back
at the lab are filled with the 95 of the studies
that show non-significant (e.g. p lt 0.05)
results (Rosenthal, 1979) - Well-documented in other fields of research
(biomedicine, public health, education, crime
justice, social welfare) entertaining overviews
in Ben Goldacres Bad Science and Bad Pharma
5Types of reporting biases
Definition
Publication bias The publication or non-publication of research findings, depending on the nature and direction of results
Time lag bias The rapid or delayed publication of research findings, depending on the nature and direction of results
Multiple publication bias The multiple or singular publication of research findings, depending on the nature and direction of results
Location bias The publication of research findings in journals with different ease of access or levels of indexing in standard databases, depending on the nature and direction of results
Citation bias The citation or non-citation publication of research findings, depending on the nature and direction of results
Language bias The publication of research findings in a particular language, depending on the nature and direction of results
Outcome reporting bias The selective reporting of some outcomes but not others, depending on the nature and direction of results
Source Sterne et al. (Eds.) (2008 298)
6How much of a problem is it likely to be in
international development research?
- Exploratory research tradition in social
sciences suggests potentially severe problems of
file drawer effects - Publication bias may be partly mitigated by
tradition of publishing working papers and
modern electronic dissemination - File drawer effects arguably more problematic for
observational data (and small sample intervention
studies) - Testing for publication bias usually relies on
testing for small study effects but biases due
to small study effects may also result from other
factors - gt But we need more evidence since very little
development research has addressed this topic
7Farmer field schools
- FFS originally associated with FAO and Integrated
Pest Management (IPM) - Originated in response to the overuse of
pesticides in irrigated rice systems in Asia - Belief that farmers need confidence to reduce
dependence on pesticides, through discovery
learning - Aim to promote use of good practices and improve
agriculture and other outcomes - Now applied globally in 90 countries, millions
of beneficiaries, range of crops and curricula
8A best practice FFS
- Group of 25 farmers, meeting once a week in a
designated field during the growing season - Exploratory facilitator encourages farmers to
ask questions, and to seek answers, rather than
lecturing or giving recommendations. - Experimentation group manages two plots
- Participatory emphasis on social learning with
exercises to build group dynamics - Field days and follow-up activities may be
provided for diffusion of message to neighbours
(c) JM Micaud for FAO
93ie review motivated by polarised debate
- "Studies reported substantial and consistent
reductions in pesticide use attributable to the
effect of training. In a number of cases, there
was also a convincing increase in yield due to
training.... Results demonstrated remarkable,
widespread and lasting developmental impacts
(Van den Berg 2004, FAO) - The analysis, employing a modified
difference-in-differences model, indicates that
the program did not have significant impacts on
the performance of graduates and their neighbors
(Feder et al. 2004) - But how good are they really - what does a
systematic review of the evidence say?
103ies review objectives and background
- Produce high quality review of relevance to
decision makers - Mixed methods review of effects on outcomes along
causal chain and barriers and facilitators of
change - Peer review managed by Campbell Collaboration
- Discussion with FAO led to inclusion of wide
range of impact evaluation research being
included in the effectiveness review
11Large body of evidence found
- 3ie systematic review found 93 separate impact
evaluations in LMICs - Experimental, quasi-experimental with controlled
comparison (no treatment, pipeline, other
intervention) were included - Wide variation in attribution methods used no
RCTs, quasi-experiments of varying quality - Small samples 400 farmers on average (sample
size ranges from 24 to 3,000), often in only a
handful of villages, and short follow-up periods
(usually less than 2 years) - Studies collected measuring outcomes across
causal chain - Knowledge
- Adoption
- Agriculture outcomes (yields, net revenues)
- Health, environment, empowerment outcomes
- Analysis today focuses today on impacts on yields
for FFS participants usually self-reported
weight of production per unit of area
12Study characteristics
Study Region (country) Crop Yield outcome measure
Ali and Sharif, 2011 SA (Pakistan) Cotton Yield (kg per ha)
Birthal et al., 2000 SA (India) Cotton Value of Yield (value per ha)
Carlberg et al., 2012 SSA (Ghana) Other staple/veg. Yield (50 kg bags per acre 2010).
Cavatassi et al., 2011 LAC (Ecuador) Other staple/veg. Yield (kg per ha)
Davis et al., 2012 SSA (Kenya, Tanzania) Other staple/veg. Value of Yield (growth rate in value local currency per acre)
Dinpanah et al., 2010 MENA (Iran) Rice Yield (ton per ha)
Feder et al., 2004 EAP (Indonesia) Rice Yield (growth rate in yield, kg per ha)
Gockowski et al., 2010 SSA (Ghana) Tree crop Sales (quantity of produce sold in 2004/05 season)
Hiller et al., 2009 SSA (Kenya) Tree crop Yield (growth rate in yield, kg per acre)
Huan et al., 1999 EAP (Vietnam) Rice Yield (ton per ha)
Khan et al., n.d. SA (Pakistan) Cotton Yield (growth rate in yield, kg per ha)
Labarta, 2005 LAC (Nicaragua) Other staple/veg. Yield (per ha)
Mutandwa Mpangwa, 2004 SSA (Zimbabwe) Cotton Yield (number of bales)
Naik et al., 2008 SA (India) Other staple/veg. Yield (quintals of produce)
Orozco Cirilo et al., 2008 LAC (Mexico) Other staple/veg. Yield (growth rate in ton per ha)
Palis, 1998 EAP (Philippines) Rice Yield (growth rate in ton per ha)
Pananurak, 2010 EAP (China) SA (India, Pakistan) Cotton Yield (growth rate in kg per ha)
Pande et al., 2009 SA (Nepal) Rice Yields (ton/ha)
Rejesus et al., 2010 EAP (Vietnam) Rice Yields (growth rate in tonnes per ha)
Todo Takahashi, 2011 SSA (Ethiopia) Other staple/veg. Value of production (growth rate, in Eth birr)
Van den Berg et al., 2002 SA (Sri Lanka) Rice Yield (kg per ha)
Van Rijn, 2010 LAC (Peru) Tree crop Yield (kg per ha, 2007)
Wandji et al., 2007 SSA (Cameroon) Tree crop Sales (Kg of cocoa sold in the 2004-05 season)
Wu Lifeng, 2010 EAP (China) Cotton Yield (growth rate in kg per ha)
Yang et al., 2005 EAP (China) Cotton Yield (kg per ha)
Yamazaki and Resosudarmo, 2007 EAP (Indonesia) Rice Yield (growth rate in kg per ha)
Zuger, 2004 LAC (Peru) Other staple/veg. Yield (ton per ha)
13Unit of analysis is the study-level effect size
- Response ratio effect size calculated for each
study - or
- RR standard error calculations
- or
14- Before we turn to examination of publication
bias, heres some summary results from the
meta-analysis of outcomes along the causal chain
15Farmer field school stylised causal chain
16Knowledge of improved farming practices
17Pesticide demand
18Yields
19Net revenues (income less costs)
20Detecting publication bias
- The only direct evidence for publication bias is
through comparison of published and unpublished
study results - But there are also ways of assessing likelihood
of publication bias directly and indirectly - Assess reporting biases in each study
- Statistical analysis based on sample size
21An ounce of prevention is worth a pound of cure
- Sources of grey literature
- Multidisciplinary Google, Google Scholar
- International development specific JOLIS, BLDS
and ELDIS (Institute of Development Studies) - Good sources for impact evaluations J-PAL/IPA
databases, 3ies database of impact evaluations - Subject-specific, e.g. IDEAS/Repec for economics,
ERIC for education, LILACS for Latin American
health publications, ALNAP for humanitarian - Conference proceedings, technical reports
(research, governmental agencies), organization
websites, dissertations, theses, contact with
primary researchers
22Meta-analysis of studies by publication status
journal v other
23Assess likelihood of file-drawer effects in each
study
- Is there evidence that results have been reported
selectively - outcomes not reported despite data collected (or
indicated in methods section, or reported in
study protocol if available)? - existence of studies reporting other outcomes?
- Have outcomes been constructed in a way which is
uncommon which might suggest biased exploratory
research?
24Risk of bias (including file drawer effects)
assessment for studies included in meta-analysis
25Additional evidence for file-drawer effects
- 34 (14/41) of studies which report data on
yields not includable in meta-analysis because do
not provide standard errors or information to
calculate them - 30 (27/91) of all studies do not provide
information on yields or other agriculture
outcomes (net revenues) despite collecting data
on knowledge/adoption
26Detecting publication bias statistically
- Methods for detecting publication bias assume
- Large n studies are likely to get published
regardless of results due to time and money
investments - Medium n studies will have some modest
significant effects that are reported, others may
never be published - Small n studies with the largest effects are most
likely to be reported, but many will never be
published or will be difficult to locate
27Funnel Plots
- Exploratory tool used to visually assess the
possibility of publication bias in a
meta-analysis - Scatter plot of effect size (x-axis) against some
measure of study size (y-axis) - Precision of estimates increases as the sample
size of a study increases - Estimates from small n studies (i.e., less
precise, larger standard errors) will show more
variability in the effect size estimates, thus a
wider scatter on the plot - Estimates from larger n studies will show less
variability in effect size estimates, thus have a
narrower scatter on the plot - If publication bias is present, we would expect
null or negative findings from small n studies
to be suppressed (i.e., missing from the plot)
28(No Transcript)
29Farmer field schools FFS participant yields
30Tests for Funnel Plot Asymmetry
- Several regression tests are available to test
for funnel plot asymmetry attempt to overcome
subjectivity of visual funnel plot inspection - Framed as tests for small study effects, or the
tendency for smaller n studies to show greater
effects than larger n studies i.e., effects
arent necessarily a result of bias - Egger test, Peters test (modified Egger test for
use with log odds ratio effect sizes), Beggs
test, selection modeling (Hedges Vevea, 2005),
failsafe n (not recommended) (Becker, 2005)
31Egger Test
- Weighted regression of the effect size on
standard error (winverse variance) - ß0 0 indicates a symmetric funnel plot
- ß0 gt 0 shows less precise (i.e., smaller n)
studies yield bigger effects - Can be extended to include p predictors
hypothesized to potentially explain funnel plot
asymmetry (Sterne et al., 2001) (see analysis
below) - Limitations
- Low power unless there is severe bias and large n
- Inflated Type I error with large treatment
effects, rare event data, or equal sample sizes
across studies - Inflated Type I error with log odds ratio effect
sizes
32Egger test for FFS-participant yields
Coef. t Pgtt
const -0.047 -1.70 0.100
slope 3.085 4.14 0.000
33Trim and fill analysis (Duval Tweedie, 2000)
- Iteratively trims (removes) smaller studies
causing asymmetry - Uses trimmed plot to re-estimate the mean effect
size - Fills (replaces) omitted studies and
mirror-images - Provides an estimate of the number of missing
(filled) studies and a new estimate of the mean
effect size - Major limitations include misinterpretation of
results, assumption of a symmetric funnel plot,
poor performance in the presence of heterogeneity
34Trim fill for FFS-participant yields
35Results of meta-trim
95 lower Effect size 95 upper Num. studies
Meta- analysis 1.16 1.23 1.32 31
Filled meta- analysis 1.03 1.10 1.17 40
36Cumulative meta-analysis
- Typically used to update pooled effect size
estimate with each new study cumulatively over
time - Can use as an alternative to update pooled effect
size estimate with each study in order of largest
to smallest sample size - If pooled effect size does not shift with the
addition of small n studies, provides some
evidence against publication bias
37Cumulative meta-analysis for FFS-participant
yields studies ordered by sample size from
largest to smallest
38- The evidence for small study effects seems
strong, but is this due to publication bias? - Asymmetry could be due to factors other than
publication bias, e.g., - methodological quality (smaller studies with
lower quality may have exaggerated treatment
effects) - Artefactual variation (e.g. outcome measurement)
- Chance
- True heterogeneity due to intervention
characteristics (FFS-type, region, crop,
follow-up length) - Assessing funnel plot symmetry relies entirely on
subjective visual judgment
39Analysis by study quality
40Contour Enhanced Funnel Plots
- Based on premise that statistical significance is
most important factor determining publication - Funnel plot with additional contour lines
associated with milestones of statistical
significance p .01, .05, .1 - If studies are missing in areas of statistical
non-significance, publication bias may be present - If studies are missing in areas of statistical
significance, asymmetry may be due to factors
other than publication bias - If there are no studies in areas of statistical
significance, publication bias may be present - Can help distinguish funnel plot asymmetry due to
publication bias versus other factors (Peters et
al., 2008)
41(No Transcript)
42Meta-regression analysis (t-stats reported)
1 2 3 4 5 6 7
STANDARD ERROR (LN_SE) 4.37 4.33 3.90 3.81 3.21 3.53 4.60
HIGH QUALITY 0.52 0.61 0.15 1.37 0.07 1.64
INTERACTION(HIGH QUALITYLN_SE) -1.83
FFS 0.51 -1.01 0.73 0.35
YIELD MEASURE DUMMIES Yes
REGION DUMMIES Yes
CROP-TYPE DUMMIES Yes
ADJ. R-SQU 0.36 0.34 0.33 0.45 0.42 0.29 0.39
N.OBS 33 33 33 33 33 33 33
Specification 7 suggests heterogeneity from small
study effects due to study quality
43Meta-analysis also suggests bias due to study
quality
Medium risk of bias
44Final thoughts
- Evidence of upwards bias in low quality vs higher
quality quasi-experiments - gt Where relevance of review is important for
users, careful risk of bias assessment and
sensitivity analysis required - Study quality appears more important than
publication bias in explaining small study
effects, but we do also find evidence for file
drawer effects in the literature - Statistical tests available are sensitive to
number of effect sizes available and are of
limited validity where sample sizes homogeneous
45Recommended Reading
- Duval, S. J., Tweedie, R. L. (2000). A
non-parametric trim and fill method of
accounting for publication bias in meta-analysis.
Journal of the American Statistical Association,
95, 89-98. - Egger, M., Davey Smith, G., Schneider, M.,
Minder, C. (1997). Bias in meta-analysis detected
by a simple, graphical test. British Medical
Journal, 315, 629-634. - Hammerstrøm, K., Wade, A., Jørgensen, A. K.
(2010). Searching for studies A guide to
information retrieval for Campbell systematic
reviews. Campbell Systematic Review, Supplement
1. - Harbord, R. M., Egger, M., Sterne, J. A. C.
(2006). A modified test for small-study effects
in meta-analyses of controlled trials with binary
endpoints. Statistics in Medicine, 25, 3443-3457. - Peters, J. L., Sutton, A. J., Jones, D. R.,
Abrams, K. R., Rushton, L. (2008).
Contour-enhanced meta-analysis funnel plots help
distinguish publication bias from other causes of
asymmetry. Journal of Clinical Epidemiology, 61,
991-996.
46Recommended Reading
- Rosenthal, R. (1979). The file-drawer problem
and tolerance for null results. Psychological
Bulletin, 86, 638-641. - Rothstein, H. R., Sutton, A. J., Borenstein, M.
L. (Eds). (2005). Publication bias in
meta-analysis Prevention, assessment and
adjustments. Hoboken, NJ Wiley. - Rücker, G., Schwarzer, G., Carpenter, J.
(2008). Arcsine test for publication bias in
meta-analyses with binary outcomes. Statistics in
Medicine, 27, 746-763 - Sterne, J. A., Egger, M. (2001). Funnel plots
for detecting bias in meta-analysis Guidelines
on choice of axis. Journal of Clinical
Epidemiology, 54, 1046-1055. - Sterne, J. A. C., Egger, M., Moher, D. (Eds.)
(2008). Chapter 10 Addressing reporting biases.
In J. P. T. Higgins S. Green (Eds.), Cochrane
handbook for systematic reviews of interventions,
pp. 297 333. Chichester, UK Wiley. - Sterne, J. A. C., et al. (2011). Recommendations
for examining and interpreting funnel plot
asymmetry in meta-analyses of randomised
controlled trials. BMJ, 343, d4002. - Waddington, H., White, H., Snilstveit, B.,
Hombrados, J. Vojtkova, M. (2012) How to do a
good systematic review of effects in
international development a tool-kit. Journal of
Development Effectiveness, 4 (3).