Correcting for Self-Selection Bias Using the Heckman Selection Correction in a Valuation Survey Using Knowledge Networks - PowerPoint PPT Presentation

About This Presentation

Title:

Correcting for Self-Selection Bias Using the Heckman Selection Correction in a Valuation Survey Using Knowledge Networks

Description:

Correcting for Self-Selection Bias Using the Heckman Selection Correction in a Valuation Survey Using Knowledge Networks Presented at the 2005 Annual Meeting of the ... – PowerPoint PPT presentation

Number of Views:504

Avg rating:3.0/5.0

Slides: 28

Provided by: TrudyAnn2

Category:

more less

Transcript and Presenter's Notes

Title: Correcting for Self-Selection Bias Using the Heckman Selection Correction in a Valuation Survey Using Knowledge Networks

1
Correcting for Self-Selection Bias Using the
Heckman Selection Correction in a Valuation
Survey Using Knowledge Networks

Presented at the 2005 Annual Meeting of the
American Association of Public Opinion Research
Trudy Cameron, University of Oregon
J. R. DeShazo, UCLA
Mike Dennis, Knowledge Networks

2
Motivating Insights

It is possible to have 20 response rate, yet
still have a representative sample
It is possible to have 80 response rate, yet
still have very non-representative sample
Need to know
What factors affect response propensity
Whether response propensity is correlated with
the survey outcome of interest

3
Research Questions

Can the inferences from our two samples of
respondents from the Knowledge Networks Inc (KN)
consumer panel be generalized to the population?
Do observed/unobserved factors that affect the
odds of a respondent being in the sample also
affect the answers that he/she gives on the
survey?

4
What we find

Insignificant selectivity using one method
Significant, but very tiny, selectivity using an
alternative method
A bit disappointing for us (no sensational
results, reduced publication potential)
Probably reassuring for Knowledge Networks, since
our samples appear to be reasonably representative

5
Heckman Selectivity Correction Intuition Example
1

Suppose your sample matches the US population on
observables age, gender, income
Survey is about government regulation
Suppose liberals are more likely to fill out
surveys, but no data on political ideology for
non-respondents (unobserved heterogeneity)
Sample will have disproportionate number of
liberals
Your sample is likely to overstate sympathy for
government regulation (sample selection bias)

6
Heckman Selectivity Correction Intuition
Example 2

Suppose your sample matches the US population on
observables age, gender income
Survey is about WTP for health programs
People who are fearful about their future health
are more likely to respond, but have no data for
anyone on fearfulness (the salience of health
programs)
Sample will tend to overestimate WTP for health
programs

7
Level curves,
8
Level curves,
9
The selection process

Several phases of attrition (transitions) in KN
samples. Assume RDDrandom
RDD ? recruited
Recruited ? profiled
Profiled ? active at time of sampling
Active ? drawn for survey sample
Drawn ? member of estimating sample
Can explore a different selection process for
each of these transitions

10
Exploit RDD telephone numbers

Recruits can be asked their addresses
Some non-recruit numbers matched to addresses
using reverse directories
Phone numbers with no street address? Matched
(approximately) to best census tract using the
geographic extent of the telephone exchange
Link to other geocoded data

11
Panel protection during geocoding

Using dummy identifiers, match street addresses
to relevant census tract (or telephone exchange
to tract, courtesy Dale Kulp at MSG), return data
to KN
Get back from KN the pool of initial RDD
contacts, minus all confidential addresses, with
our respondent case IDs restored
Merge with auxiliary data about census tract
attributes and voting behaviors

12
Cameron and Crawford (2004)

Data for each of the 65,000 census tracts in the
year 2000 U.S. census
95 count variables for different categories
Convert to proportions of population (or
households, or dwellings)
Factor analysis 15 orthogonal factors that
together account for 88 of variation in
sociodemographic characteristics across tracts

13
Categories of Census Variables

Population density
Ethnicity Gender Age distribution
Family structure
Housing occupancy status Housing characteristics
Urbanization Residential mobility
Linguistic isolation
Educational Attainment
Disabilities
Employment Status
Industry Occupation Type of income

14
Labels 15 orthogonal factors
"well-to-do prime" "elderly disabled"
"well-to-do seniors" "rural farm., self-employ.
"single renter twenties" "low mobil., stable neigh."
"unemployed" "Native American"
"minority single moms" "female"
"thirty-somethings" "health-care workers"
"working-age disabled" "asian-hispanic-multi, language isolation"
"some college, no grad"
15
2000 Presidential Voting Data

Leip (2004) Atlas of U.S. Presidential Elections
vote counts for each county
Use of county votes for Gore, Nader,
versus Bush and others (omitted category)
will not be orthogonal to our 15 census factors

16
Empirical Illustrations

1. Analysis of government question in public
interventions sample--by naïve OLS, and via
preferred Heckman model
2. Analysis of selection processes leading to
private interventions sample
Marginal selection probabilities
Conditional selection probabilities
Allow marginal utilities to depend on propensity
to respond to survey

17
Analysis 1Heckman Selectivity Model

Public intervention sample
Find an outcome variable that is
Measures an attitude that may be relevant to
other research questions
Can be treated as cardinal and continuous
(although it is actually discrete and ordinal)
Can be modeled (naively) by OLS methods
Can be generalized into a two-equation FIML
selectivity model

18
Government Involvement in Regulating Env.,
Health, Safety?
19
Heckman correction modelbias not statistically
significant

Fail to reject at 5 level, at 10 level (but
close)
Point estimate of error correlation 0.10
May be more likely to respond if approve of govt
reg
Interpretation Insufficient signal-to-noise to
conclude that there is non-random selection
Reassuring, but could stem from noise due to
Census tract factors, county votes rather than
individual characteristics
Treating ordinal ratings as cardinal and
continuous

20
Implications for govt variable
21
Analysis 2Conditional Logit Choice Models

Very attractive properties for analyzing multiple
discrete choices,but
No established methods for joint modeling of
selection propensity and outcomes in the form of
3-way choices

22
Testable Hypothesis

Do the marginal utilities of key attributes
depend upon the fitted selection index (or the
fitted selection probability)?
If yes then observable heterogeneity in the
odds of being in the sample contributes to
heterogeneity in the apparent preferences in the
estimating sample
If no greater confidence in representativeness
of the estimating sample (although still no
certainty)

23
RDD contact dispositions
24
Results Response propensity models

Use response propensity as a shifter on the
parameters of conditional logit choice models
Only one marginal utility parameter (related to
the disutility of a sick-year) appears robustly
sensitive to selection propensity
Baseline coefficient is -50 units, average shift
coefficient is on the order of 3 units, times a
deviation in fitted response probabilities that
averages about 0.004

25
Conclusions 1

For our samples hard to find convincing and
robust evidence of substantial sample selection
bias in models for outcome variables in two KN
samples
Good news for Knowledge Networks not so good for
us, as researchers, in terms of the publication
prospects for this dimension of our work

26
Conclusions 2

Analysis 1 Insignificant point estimate of bias
in distribution of attitudes toward regulation on
the order of 10 too much in favor
Analysis 2 Statistically significant (but tiny)
heterogeneity in key parameters across response
propensities in systematically varying parameters
models

27
Guidance

High response rates do not necessarily eliminate
biases in survey samples
Weights help (so sample matches population on
observable dimensions), but weights are not
necessarily a fix if unobservables are correlated
Cannot tell if correlated unobservables are a
problem without doing this type of analysis
Need to model the selection process explicitly
Need to explain differences in response
propensities
Need analogous data for respondents and for
people who do not respond
E.g. Census tract factors and county voting
percentages
Anything else that might capture salience of
survey topic (e.g. county mortality rates from
same diseases covered in survey, hospital
densities)