Correcting for Self-Selection Bias Using the Heckman Selection Correction in a Valuation Survey Using Knowledge Networks - PowerPoint PPT Presentation

About This Presentation
Title:

Correcting for Self-Selection Bias Using the Heckman Selection Correction in a Valuation Survey Using Knowledge Networks

Description:

Correcting for Self-Selection Bias Using the Heckman Selection Correction in a Valuation Survey Using Knowledge Networks Presented at the 2005 Annual Meeting of the ... – PowerPoint PPT presentation

Number of Views:504
Avg rating:3.0/5.0
Slides: 28
Provided by: TrudyAnn2
Category:

less

Transcript and Presenter's Notes

Title: Correcting for Self-Selection Bias Using the Heckman Selection Correction in a Valuation Survey Using Knowledge Networks


1
Correcting for Self-Selection Bias Using the
Heckman Selection Correction in a Valuation
Survey Using Knowledge Networks
  • Presented at the 2005 Annual Meeting of the
    American Association of Public Opinion Research
  • Trudy Cameron, University of Oregon
  • J. R. DeShazo, UCLA
  • Mike Dennis, Knowledge Networks

2
Motivating Insights
  • It is possible to have 20 response rate, yet
    still have a representative sample
  • It is possible to have 80 response rate, yet
    still have very non-representative sample
  • Need to know
  • What factors affect response propensity
  • Whether response propensity is correlated with
    the survey outcome of interest

3
Research Questions
  • Can the inferences from our two samples of
    respondents from the Knowledge Networks Inc (KN)
    consumer panel be generalized to the population?
  • Do observed/unobserved factors that affect the
    odds of a respondent being in the sample also
    affect the answers that he/she gives on the
    survey?

4
What we find
  • Insignificant selectivity using one method
  • Significant, but very tiny, selectivity using an
    alternative method
  • A bit disappointing for us (no sensational
    results, reduced publication potential)
  • Probably reassuring for Knowledge Networks, since
    our samples appear to be reasonably representative

5
Heckman Selectivity Correction Intuition Example
1
  • Suppose your sample matches the US population on
    observables age, gender, income
  • Survey is about government regulation
  • Suppose liberals are more likely to fill out
    surveys, but no data on political ideology for
    non-respondents (unobserved heterogeneity)
  • Sample will have disproportionate number of
    liberals
  • Your sample is likely to overstate sympathy for
    government regulation (sample selection bias)

6
Heckman Selectivity Correction Intuition
Example 2
  • Suppose your sample matches the US population on
    observables age, gender income
  • Survey is about WTP for health programs
  • People who are fearful about their future health
    are more likely to respond, but have no data for
    anyone on fearfulness (the salience of health
    programs)
  • Sample will tend to overestimate WTP for health
    programs

7
Level curves,
8
Level curves,
9
The selection process
  • Several phases of attrition (transitions) in KN
    samples. Assume RDDrandom
  • RDD ? recruited
  • Recruited ? profiled
  • Profiled ? active at time of sampling
  • Active ? drawn for survey sample
  • Drawn ? member of estimating sample
  • Can explore a different selection process for
    each of these transitions

10
Exploit RDD telephone numbers
  • Recruits can be asked their addresses
  • Some non-recruit numbers matched to addresses
    using reverse directories
  • Phone numbers with no street address? Matched
    (approximately) to best census tract using the
    geographic extent of the telephone exchange
  • Link to other geocoded data

11
Panel protection during geocoding
  • Using dummy identifiers, match street addresses
    to relevant census tract (or telephone exchange
    to tract, courtesy Dale Kulp at MSG), return data
    to KN
  • Get back from KN the pool of initial RDD
    contacts, minus all confidential addresses, with
    our respondent case IDs restored
  • Merge with auxiliary data about census tract
    attributes and voting behaviors

12
Cameron and Crawford (2004)
  • Data for each of the 65,000 census tracts in the
    year 2000 U.S. census
  • 95 count variables for different categories
  • Convert to proportions of population (or
    households, or dwellings)
  • Factor analysis 15 orthogonal factors that
    together account for 88 of variation in
    sociodemographic characteristics across tracts

13
Categories of Census Variables
  • Population density
  • Ethnicity Gender Age distribution
  • Family structure
  • Housing occupancy status Housing characteristics
  • Urbanization Residential mobility
  • Linguistic isolation
  • Educational Attainment
  • Disabilities
  • Employment Status
  • Industry Occupation Type of income

14
Labels 15 orthogonal factors
"well-to-do prime" "elderly disabled"
"well-to-do seniors" "rural farm., self-employ.
"single renter twenties" "low mobil., stable neigh."
"unemployed" "Native American"
"minority single moms" "female"
"thirty-somethings" "health-care workers"
"working-age disabled" "asian-hispanic-multi, language isolation"
"some college, no grad"
15
2000 Presidential Voting Data
  • Leip (2004) Atlas of U.S. Presidential Elections
    vote counts for each county
  • Use of county votes for Gore, Nader,
    versus Bush and others (omitted category)
  • will not be orthogonal to our 15 census factors

16
Empirical Illustrations
  • 1. Analysis of government question in public
    interventions sample--by naïve OLS, and via
    preferred Heckman model
  • 2. Analysis of selection processes leading to
    private interventions sample
  • Marginal selection probabilities
  • Conditional selection probabilities
  • Allow marginal utilities to depend on propensity
    to respond to survey

17
Analysis 1Heckman Selectivity Model
  • Public intervention sample
  • Find an outcome variable that is
  • Measures an attitude that may be relevant to
    other research questions
  • Can be treated as cardinal and continuous
    (although it is actually discrete and ordinal)
  • Can be modeled (naively) by OLS methods
  • Can be generalized into a two-equation FIML
    selectivity model

18
Government Involvement in Regulating Env.,
Health, Safety?
19
Heckman correction modelbias not statistically
significant
  • Fail to reject at 5 level, at 10 level (but
    close)
  • Point estimate of error correlation 0.10
  • May be more likely to respond if approve of govt
    reg
  • Interpretation Insufficient signal-to-noise to
    conclude that there is non-random selection
  • Reassuring, but could stem from noise due to
  • Census tract factors, county votes rather than
    individual characteristics
  • Treating ordinal ratings as cardinal and
    continuous

20
Implications for govt variable
21
Analysis 2Conditional Logit Choice Models
  • Very attractive properties for analyzing multiple
    discrete choices,but
  • No established methods for joint modeling of
    selection propensity and outcomes in the form of
    3-way choices

22
Testable Hypothesis
  • Do the marginal utilities of key attributes
    depend upon the fitted selection index (or the
    fitted selection probability)?
  • If yes then observable heterogeneity in the
    odds of being in the sample contributes to
    heterogeneity in the apparent preferences in the
    estimating sample
  • If no greater confidence in representativeness
    of the estimating sample (although still no
    certainty)

23
RDD contact dispositions
24
Results Response propensity models
  • Use response propensity as a shifter on the
    parameters of conditional logit choice models
  • Only one marginal utility parameter (related to
    the disutility of a sick-year) appears robustly
    sensitive to selection propensity
  • Baseline coefficient is -50 units, average shift
    coefficient is on the order of 3 units, times a
    deviation in fitted response probabilities that
    averages about 0.004

25
Conclusions 1
  • For our samples hard to find convincing and
    robust evidence of substantial sample selection
    bias in models for outcome variables in two KN
    samples
  • Good news for Knowledge Networks not so good for
    us, as researchers, in terms of the publication
    prospects for this dimension of our work

26
Conclusions 2
  • Analysis 1 Insignificant point estimate of bias
    in distribution of attitudes toward regulation on
    the order of 10 too much in favor
  • Analysis 2 Statistically significant (but tiny)
    heterogeneity in key parameters across response
    propensities in systematically varying parameters
    models

27
Guidance
  • High response rates do not necessarily eliminate
    biases in survey samples
  • Weights help (so sample matches population on
    observable dimensions), but weights are not
    necessarily a fix if unobservables are correlated
  • Cannot tell if correlated unobservables are a
    problem without doing this type of analysis
  • Need to model the selection process explicitly
  • Need to explain differences in response
    propensities
  • Need analogous data for respondents and for
    people who do not respond
  • E.g. Census tract factors and county voting
    percentages
  • Anything else that might capture salience of
    survey topic (e.g. county mortality rates from
    same diseases covered in survey, hospital
    densities)
Write a Comment
User Comments (0)
About PowerShow.com