Title: Impact Evaluation:
1Impact Evaluation
- An Overview
- Lori Beaman, PhD
- RWJF Scholar in Health Policy
- UC Berkeley
2What is Impact Evaluation?
- IE assesses how a program affects the well-being
or welfare of individuals, households or
communities (or businesses) - Well-being at the individual level can be
captured by income consumption, health outcomes
or ideally both - At the community level, poverty levels or growth
rates may be appropriate, depending on the
question
3Outline
- Advantages of Impact Evaluation
- Challenges for IE Need for Comparison Groups
- Methods for Constructing Comparison
4IE Versus other ME Tools
- The key distinction between impact evaluation and
other ME tools is the focus on discerning the
impact of the program from all other confounding
effects - IE seeks to provide evidence of the causal link
between an intervention and outcomes
5Monitoring and IE
6Monitoring and IE
IMPACTS
Program impacts confounded by local, national,
global effects
difficulty of showing causality
OUTCOMES
Users meet service delivery
OUTPUTS
Govt/program production function
INPUTS
7Logic Model An Example
- Consider a program of providing
Insecticide-Treated Nets (ITNs) to poor
households - What are
- Inputs?
- Outputs?
- Outcomes?
- Impacts?
8Logic Model An Example
- Inputs of ITNs of health or NGO employees
to help dissemination - Outputs of ITNs received by HHs
- Outcomes ITNs utilized by of households
- Impact Reduction in illness from malaria
increase in income improvements in childrens
school attendance and performance
9Advantages of IE
- In order to be able to determine which projects
are successful, need a carefully designed impact
evaluation strategy - This is useful for
- Understanding if projects worked
- Justification for funding
- Scaling up
- Meta-analysis Learning from Others
- Cost-benefit tradeoffs across projects
- Can test between different approaches of same
program or different projects to meet national
indicator
10Essential Methodology
- Difficulty is determining what would have
happened to the individuals or communities of
interest in absence of the project - The key component to an impact evaluation is to
construct a suitable comparison group to proxy
for the counterfactual - Problem can only observe people in one state of
the world at one time
11Before/After Comparisons
- Why not collect data on individuals before and
after intervention (the Reflexive)? Difference in
income, etc, would be due to project - Problem many things change over time, including
the project - The country is growing and ITN usage is
increasing generally (from 2000-2003 in NetMark
data), so how do we know an increase in ITN use
is due to the program or would have occurred in
absence of program? - Many factors affect malaria rate in a given year
12Example Providing Insecticide-Treated Nets
(ITNs) to Poor Households
- The intervention provide free ITNs to households
in Zamfara - Program targets poor areas
- Women have to enroll at local NGO office in order
to receive bednets - Starts in 2002, ends in 2003, we have data on
malaria rates from 2001-2004 - Scenario 1 we observe that the households in
Zamfara we provided bednets to have an increase
malaria from 2002 to 2003
13Basic Problem of Impact Evaluation Scenario 1
Malaria Rate
Underestimated Impact when using before/after
comparisons High rainfall year
Zamfara households with bednets
C
Impact C A? An increase in malaria rate!
A
2001
2002
2003
2004
Treatment Period
Years
14Basic Problem of Impact Evaluation Scenario 1
Malaria Rate
Underestimated Impact when using before/after
comparisons High rainfall year
Counterfactual Zamfara Households if no
bednets provided
B
Zamfara households with bednets
Impact C B A Decline in the Malaria Rate!
C
Impact ? C - A
A
2001
2002
2003
2004
Treatment Period
Years
15Basic Problem of Impact Evaluation Scenario 2
Overestimated Impact Bad Rainfall
Malaria Rate
Counterfactual (Zamfara households if no
bednets provided)
B
Impact ? C - A
A
Zamfara households
TRUE Impact C - B
C
2001
2002
2003
Years
2004
Treatment Period
16Comparison Groups
- Instead of using before/after comparisons, we
need to use comparison groups to proxy for the
counterfactual - Two Core Problems in Finding Suitable Groups
- Programs are targeted
- Recipients receive intervention for particular
reason - Participation is voluntary
- Individuals who participate differ in observable
and unobservable ways (selection bias) - Hence, a comparison of participants and an
arbitrary group of non-participants can lead to
misleading or incorrect results
17Comparison 1 Treatment and Region B
- Scenario 1 Failure of reflexive comparison due
to higher rainfall, and everyone experienced an
increase in malaria rates - We compare the households in the program region
to those in another region - We find that our treatment households in
Zamfara have a larger increase in malaria rates
than those in region B, Oyo. Did the program
have a negative impact? - Not necessarily! Program placement is important
- Region B has better sanitation and therefore
affected less by rainfall (unobservable)
18Basic Problem of Impact Evaluation Program
Placement
High Rainfall
Malaria rate
D
TRUE IMPACT E-D
E
Treatment Zamfara
A
2001
2002
2003
Years
2004
Treatment Period
19Basic Problem of Impact Evaluation Program
Placement
Underestimated Impact when using region B
comparison group High Rainfall
Malaria rate
E-A gt C-B Region B affected less by rainfall
Region B Oyo
C
B
D
TRUE IMPACT E-D
E
Treatment Zamfara
A
2001
2002
2003
Years
2004
Treatment Period
20Comparison 2 Treatment vs. Neighbors
- We compare treatment households with their
neighbors. We think the sanitation and rainfall
patterns are about the same. - Scenario 2 Lets say we observe that treatment
households malaria rates decrease more than
comparison households. Did the program work? - Not necessarily There may be two types of
households types A and B, with A knowing how
malaria is transmitted and also burn mosquito
coils - Type A households were more likely to register
with the program. However, their other
characteristics mean they would have had lower
malaria rates in the absence of the ITNs
(individual unobservables).
21Basic Problem of Impact Evaluation Selection
Bias
Comparing Project Beneficiaries (Type A) to
Neighbors (Type B)
Malaria Rates
Type B HHs
Observed difference
Type A HHs with Project
Y1
Y2
Y3
Y4
Treatment Period
Years
22Basic Problem of Impact Evaluation Selection
Bias
Participants are often different than
Non-participants
Malaria Rates
Type B HHs
Selection Bias
Observed difference
Type A Households
True Impact
Type A HHs with Project
Y1
Y2
Y3
Y4
Treatment Period
Years
23Basic Problem of Impact Evaluation Spillover
Effects
- Another difficulty finding a true counterfactual
has to do will spillover or contagion effects - Example ITNs will not only reduce malaria rates
for those sleeping under nets, but also may lower
overall rates because ITNs kill mosquitoes - Problem children who did not receive treatment
may also have lower malaria rates and therefore
higher school attendance rates - Generally leads to underestimate of treatment
effect
24Basic Problem of Impact Evaluation Spillover
Effects
School Attendance
Treatment Children
B
Control Group of Children in Neighborhood
School
Impact ? B - C
Impact B - A
C
CgtA due to spillover from treatment children
A
2001
2002
2003
2004
Treatment Period
Years
25Counterfactual Methodology
- We need a comparison group that is as identical
in observable and unobservable dimensions as
possible, to those receiving the program, and a
comparison group that will not receive spillover
benefits. - Number of techniques
- Randomization as gold standard
- Various Techniques of Matching
26How to construct a comparison group building
the counterfactual
- Randomization
- Difference-in-Difference
- Regression discontinuity
- Matching
- Pipeline comparisons
- Propensity score
271. Randomization
- Individuals/communities/firms are randomly
assigned into participation - Counterfactual randomized-out group
- Advantages
- Often addressed to as the gold standard by
design selection bias is zero on average and
mean impact is revealed - Perceived as a fair process of allocation with
limited resources
28Randomization Disadvantages
- Disadvantages
- Ethical issues, political constraints
- Internal validity (exogeneity) people might not
comply with the assignment (selective
non-compliance) - External validity (generalizability) usually run
controlled experiment on a pilot, small scale.
Difficult to extrapolate the results to a larger
population. - Does not always solve problem of spillovers
29When to Randomize
- If funds are insufficient to treat all eligible
recipients - Randomization can be the most fair and
transparent approach - The program is administered at the individual,
household or community level - Higher level of implementation difficult example
trunk roads - Program will be scaled-up learning what works is
very valuable
302. Difference-in-difference
- Observations over time compare observed changes
in the outcomes for a sample of participants and
non-participants - Identification assumption the selection bias or
unobservable characteristics are time-invariant
(parallel trends in the absence of the program)
- Counter-factual changes over time for the
non-participants
31Diff-in-Diff Continued
- Constraint Requires at least two cross-sections
of data, pre-program and post-program on
participants and non-participants - Need to think about the evaluation ex-ante,
before the program - More valid if there are 2 pre-periods so can
observe whether trend is same - Can be in principle combined with matching to
adjust for pre-treatment differences that affect
the growth rate
32Implementing differences in differences
Different Strategies
- Some arbitrary comparison group
- Matched diff in diff
- Randomized diff in diff
- These are in order of more problems ? less
problems, think about this as we look at this
graphically
33Essential Assumptions of Diff-in-Diff
- Initial difference must be time invariant
- In absence of program, the change over time
would be identical
34Difference-in-Difference in ITN Example
- Instead of comparing Zamfara to Oyo, compare
Zamfara to Niger if - While Zamfara and Oyo have different malaria
rates and different ITN usage, we expect that
they change in parallel - Use NetMark data to compare 2000 to 2003 in
Zamfara and Niger states - Use additional data (GHS, NLSS) to compare
incomes and sanitation infrastructure levels and
changes prior to program implementation
353. Regression discontinuity design
- Exploit the rule generating assignment into a
program given to individuals only above a given
threshold Assume that discontinuity in
participation but not in counterfactual outcomes - Counterfactual individuals just below the
cut-off who did not participate - Advantages
- Identification built in the program design
- Delivers marginal gains from the program around
the eligibility cut-off point. Important for
program expansion - Disadvantages
- Threshold has to be applied in practice, and
individuals should not be able manipulate the
score used in the program to become eligible
36RDD in ITN Example
- Program available for poor households
- Eligibility criteria must be below the national
poverty line or lt 1 ha of land - Treatment group those below cut-off
- Those with income below the poverty line and
therefore qualified for ITNs - Comparison group those right above the cutoff
- Those with income just above poverty line and
therefore not-eligible
37RDD in ITN Example
- Problems
- How well enforced was the rule?
- Can the rule be manipulated?
- Local effect may not be generalizable if program
expands to households well above poverty line - Particularly relevant since NetMark data indicate
low ITN usage across all socio-economic status
groups
384. Matching
- Match participants with non-participants from a
larger survey - Counterfactual matched comparison group
- Each program participant is paired with one or
more non-participant that are similar based on
observable characteristics - Assumes that, conditional on the set of
observables, there is no selection bias based on
unobserved heterogeneity - When the set of variables to match is large,
often match on a summary statistics the
probability of participation as a function of the
observables (the propensity score)
394. Matching
- Advantages
- Does not require randomization, nor baseline
(pre-intervention data) - Disadvantages
- Strong identification assumptions
- In many cases, may make interpretation of results
very difficult - Requires very good quality data need to control
for all factors that influence program placement - Requires significantly large sample size to
generate comparison group
40Matching in Practice
- Using statistical techniques, we match a group of
non-participants with participants using
variables like gender, household size, education,
experience, land size (rainfall to control for
drought), irrigation (as many observable
characteristics not affected by program
intervention) - One common method Propensity Score Matching
41Matching in Practice 2 Approaches
- Approach 1 After program implementation, we
match (within region) those who received ITNs
with those who did not. Problem? - Problem likelihood of usage of different
households is unobservable, so not included in
propensity score - This creates selection bias
- Approach 2 The program is allocated based on
land size. After implementation, we match those
eligible in region A with those in region B.
Problem? - Problems same issues of individual
unobservables, but lessened because we compare
eligible to potential eligible - Now problem of unobservable factors across
regions
42An extension of matchingpipeline comparisons
- Idea compare those just about to get an
intervention with those getting it now - Assumption the stopping point of the
intervention does not separate two fundamentally
different populations - Example extending irrigation networks
- In ITN example If only some communities within
Zamfara receive ITNs in round 1 compare them to
nearby communities will receive ITNs in round 2 - Difficulty with Infrastructure Spillover effects
may be strong or anticipatory effect