Title: Causal Structure, Endogeneity, and the Missing Data Problem in Modeling the Impact of Information an
1Causal Structure, Endogeneity, and the Missing
Data Problem in Modeling the Impact of
Information and Communication Technology Use on
Society
- Monday, October 19, 2009
- Hun Myoung Park
- University Information Technology Services
- Indiana University
- kucc625_at_indiana.edu
2Outline
- ICT Use and Society
- Competing Perspectives
- Review of Traditional Approaches
- Nature of Problems
- Alternative Approaches
- Data and Illustrations
- Findings
- Implications
3ICT Use and Society
- Does ICT use influence society?
- Positive, negative, or negligible effect?
- Technological determinism
- Optimistic perspective
- Pessimistic perspective
- Skeptical perspective
4Optimistic Perspective
ICT Use
Society
- Positive impact on society
- Transformation Theory
- Rheingold (1993) Grossman (1995) Morris (1999)
- Getting the general public engaged
5Pessimistic Perspective
ICT Use
Society
- Negative impact on society
- Reinforcement theory
- David (1999, 2005) Norris (2001)
- Digital inequality (digital divide)
- Engaging the engaged rather than the
disenfranchised
6Skeptical Perspective
ICT Use
Society
- ICT use shaped by society
- Reflection of the real world
- Normalization theory
- Margolis and Resnick (2000) Bimber (2001, 2003)
Uslaner (2004) - Politics as usual
7Conflicting Evidence, How?
- Conflicting empirical results depending on
perspectives - What is wrong?
- Failure to deal with the nature of problems
properly - How do we assess the impact of ICT use (treatment
effect) more correctly?
8Review T-test (ANOVA)
- Comparing means/proportions
- Scott (2006)
- Impact of ICT use mean difference
- Simplicity and easy interpretation
- Two groups are assumed to have same
characteristics except for the treatment
9Review Linear Regression
- Least squares dummy variable model (LSDV)
- Jennings and Zeitner (2003) Uslaner (2004)
Welch and Pandey (2007) - Impact dummy coefficient d
- What if the dummy d are related to disturbance e?
10Review Binary Response Model
- Binary logit and probit model for binary
dependent variables - Bimber (2001, 2003) and Thomas and Streib (2003)
- Impact a discrete change of d, difference in
predicted probabilities - Large N required
11Nature of Problems
- Measurement issues categorical and binary DVs
- Limited DVs (self-selected)
- Ambiguous causal structure
- Endogeneity d and e are related
- The missing data problem in nonexperimental
research
12Causal Structure
ICT Use
Society
ICT Use
Society
- Unidirectional versus bidirectional
- Interactive and jointly determined?
- Iterative and virtuous circle Norris (2000)
13Endogeneity
- ICT use may not be exogenous
- Disturbance e is related to the ICT use d?
violation of key OLS assumption - Jointly determined in a system
- Instrumental variable (IV) approach?
14Missing Data Problem
- A subject is either ICT user (participant) or
nonuser, not both. - NOT necessarily means many missing values in data
- Users and nonusers may have different
characteristics, which are not controlled in
research (survey) self-selection bias
15Nonexperimental Design
- Randomized control group pre-post test design
- Non-randomized post test only design
- Is ICT use a real treatment?
16Propensity Score Matching 1
- Rosenbaum and Rubin (1983, 1984)
- Binary Probit model to compute predicted
probabilities - Match users and nonusers who have similar
likelihood (propensity score) - Pair matching/subclassification one-to-one pair
matching w/o replacement - Controlling many covariates using one dimensional
propensity score
17Propensity Score Matching 2
- Rosenbaum and Rubin (1984) Dehejia and Wahba
(1999) - Matching?(paired) T-test
18Treatment Effect Model
- Subjects decide whether or not to receive
treatment selection bias - Selection equation estimates predicted
probabilities of ICT use - Impact is the dummy coefficient adjusted by
correlation of ICT use and the dependent variable - When ?0, the impact is d
19Recursive Bivariate Probit Model
- Maddala (1983), Greene (1998)
- Two equations with an endogenous IV variable, ICT
use - Correlation between disturbances
- If ??0, both direct/indirect effects are
considered in RBPM - If ?0, binary response model (BRM) examines
direct impact only
20Specification (RBPM)
21Secondary Data
- The PEW Internet and American Life Project
- 2004 Post-Election Internet Tracking Survey
(Crosssectional) - N2,146
- The American National Election Studies
- Longitudinal data of 1996, 1998, 2000, 2004
- N6,014
22Illustration 1 E-government Use
- IV (d) whether citizens look for information
from government websites - DV whether citizens sent email about voting
(deliberative civic engagement) - DV Attendance at a rally during the election
campaign (action-oriented)
23Illustration 1 E-government Use
- Average effect 9.8 vs. 2.2
- Discrete change 15.3 vs. 3.3
24Illustration 1 E-government Use
25Illustration 2 Internet Use
- IV (d) whether citizens have used the Internet
for political information - DV discussing politics (deliberative civic
engagement) - DV whether citizens gave money to a candidate
(action-oriented engagement)
26Illustration 2 Internet Use
- Average effect 10.1 vs. 4.4
- Discrete change 8.3 vs. 5.2
27Illustration 2 Internet Use
28Finding 1 T-test vs. PSM
- Robust estimation of PSM at the expense of loss
of N - T-test overestimates the impact on deliberative
civic engagement due to missing data problem - No big difference in action-oriented engagement
29Finding 2 BRM vs. RBPM
- BRM overestimates the impact on deliberative
civic engagement endogeneity matters - Both direct and indirect effects
- No big difference in action-oriented engagement
the impact of ICT use is direct
30Finding 3 Deliberative Engagement
- Both direct and indirect effects considered
- Overall impact depends on signs and magnitude of
effects - They may have opposite signs that cancel out each
other - BRM may report misleading results
31Implication and Conclusion
- Types of civic engagement to be differentiated
variety of civic engagement (Verba et al. 1995) - Characteristics of dependent variables carefully
examined - Causal structure, endogeneity, missing data
problem, and sample size considered - Specific use of ICT applications differentiated
as well
32Questions?