Title: Recent Advanced in Causal Modelling Using Directed Graphs
1Causal Data Mining
Richard Scheines Dept. of Philosophy, Machine
Learning, Human-Computer Interaction
Carnegie Mellon
2Causal Graphs
- Causal Graph G V,E
- Each edge X ? Y represents a direct causal
claim - X is a direct cause of Y relative to V
Chicken Pox
3Causal Bayes Networks
The Joint Distribution Factors According to the
Causal Graph, i.e., for all X in V P(V)
?P(XImmediate Causes of(X))
- P(S 0) .7
- P(S 1) .3
- P(YF 0 S 0) .99 P(LC 0 S 0) .95
- P(YF 1 S 0) .01 P(LC 1 S 0) .05
- P(YF 0 S 1) .20 P(LC 0 S 1) .80
- P(YF 1 S 1) .80 P(LC 1 S 1) .20
P(S,YF, LC) P(S) P(YF S) P(LC S)
4Structural Equation Models
Causal Graph
- Structural Equations
- One Equation for each variable V in the
graph - V f(parents(V), errorV)
- for SEM (linear regression) f is a linear
function - Statistical Constraints
- Joint Distribution over the Error terms
5Structural Equation Models
- Equations
- Education ?ed
- Income ????Education????income
- Longevity ????Education????Longevity
- Statistical Constraints
- (?ed, ?Income,?Income ) N(0,?2)
- ?????????2?diagonal
- - no variance is zero
6Tetrad 4 Demo www.phil.cmu.edu/projects/tetrad
7Causal Datamining in Ed. Research
- Collect Raw Data
- Build Meaningful Variables
- Constrain Model Space with Background Knowledge
- Search for Models
- Estimate and Test
- Interpret
8CSR Online
Are Online students learning as much?What
features of online behavior matter?
9CSR Online
Are Online students learning as much?
Raw Data Pitt 2001, 87 studentsFor everyone
Pre-test, Recitation attendance, final examFor
Online Students logged Voluntary question
attempts, online quizzes, requests to print
modules
10CSR Online
- Build Meaningful Variables
- Online 0,1
- Pre-test
- Recitation Attendance
- Final Exam
11CSR Online
- Data Correlation Matrix (corrs.dat, N83)
Pre Online Rec Final
Pre 1.0
Online .023 1.0
Rec -.004 -.255 1.0
Final .287 .182 .297 1.0
12CSR Online
- Background Knowledge
- Temporal Tiers
- Online, Pre
- Rec
- Final
13CSR Online
- Model Search
- No latents (patterns with PC or GES)
- - no time order 729 models
- - temporal tiers 96 models)
- With Latents (PAGs with FCI search)
- - no time order 4,096
- - temporal tiers 2,916
14- Tetrad Demo
- Online vs. Lecture
- Data file corrs.dat
15Estimate and Test Results
- Model fit excellent
- Online students attended 10 fewer recitations
- Each recitation gives an increase of 2 on the
final exam - Online students did 1/2 a Stdev better than
lecture students (p .059)
16References
- An Introduction to Causal Inference, (1997), R.
Scheines, in Causality in Crisis?, V. McKim and
S. Turner (eds.), Univ. of Notre Dame Press, pp.
185-200. - Causation, Prediction, and Search, 2nd Edition,
(2000), by P. Spirtes, C. Glymour, and R.
Scheines ( MIT Press) - Causality Models, Reasoning, and Inference,
(2000), Judea Pearl, Cambridge Univ. Press - Causal Inference, (2004), Spirtes, P.,
Scheines, R.,Glymour, C., Richardson, T., and
Meek, C. (2004), in Handbook of Quantitative
Methodology in the Social Sciences, ed. David
Kaplan, Sage Publications, 447-478 - Computation, Causation, Discovery (1999),
edited by C. Glymour and G. Cooper, MIT Press