Title: Recent Advanced in Causal Modelling Using Directed Graphs
1Automatic Causal Discovery
Richard Scheines Peter Spirtes, Clark Glymour,
and many others Dept. of Philosophy CALD
Carnegie Mellon
2Outline
- Motivation
- Representation
- Discovery
- TETRAD vs. Regression
3Tetrad 4 Demo
- Login
- philoso
- guest
- www.phil.cmu.edu/projects/tetrad_download/
- Launch Tetrad
41. Motivation
- Non-experimental Evidence
- Typical Predictive Questions
- Can we predict aggressiveness from the amount of
violent TV watched - Can we predict crime rates from abortion rates
20 years ago - Causal Questions
- Does watching violent TV cause Aggression?
- Does abortion reduce crime?
5Causal Estimation
When and how can we use non-experimental data to
tell us about the effect of an intervention?
- Manipulated Probability P(Y X set x, Zz)
- from
- Unmanipulated Probability P(Y X x, Zz)
6Conditioning vs. Intervening
P(Y X x1) vs. P(Y X set x1)Teeth Slides
72. Representation
- Representing causal structure, and modeling
interventions - Statistical Causal Models
- Bayes Networks
- Structural Equation Models
8Causation Association
X and Y are associated iff ?x1 ? x2 P(Y X
x1) ? P(Y X x2)
- X is a cause of Y iff
- ?x1 ? x2 P(Y X set x1) ? P(Y X set x2)
9Direct Causation
- X is a direct cause of Y relative to S, iff
- ?z,x1 ? x2 P(Y X set x1 , Z set z)
- ? P(Y X set x2 , Z set z)
- where Z S - X,Y
10Causal Graphs
- Causal Graph G V,E
- Each edge X ? Y represents a direct causal
claim - X is a direct cause of Y relative to V
11Modeling Ideal Interventions
- Ideal Interventions (on a variable X)
- Completely determine the value or distribution of
a variable X - Directly Target only X
- (no fat hand)
- E.g., Variables Confidence, Athletic Performance
- Intervention 1 hypnosis for confidence
- Intervention 2 anti-anxiety drug (also muscle
relaxer)
12Modeling Ideal Interventions
Interventions on the Effect
Pre-experimental System
Post
13Modeling Ideal Interventions
Interventions on the Cause
Pre-experimental System
Post
14Interventions Causal Graphs
- Model an ideal intervention by adding an
intervention variable outside the original
system - Erase all arrows pointing into the variable
intervened upon
Intervene to change Inf Post-intervention graph?
Pre-intervention graph
15Bayes Networks
The Joint Distribution Factors According to the
Graph, i.e., for all X in V P(V)
?P(XParents(X))
P(S,YF,LC) P(S) P(YF S) P(LC S)
16Bayes Networks
P(S,YF,LC) P(S) P(YF S) P(LC S)
P(S 0) .7 P(S 1) .3 P(YF 0
S 0) .99 P(LC 0 S 0) .95 P(YF 1
S 0) .01 P(LC 1 S 0) .05 P(YF 0
S 1) .20 P(LC 0 S 1) .80 P(YF 1
S 1) .80 P(LC 1 S 1)
.20 P(S1,YF1,LC1) P(S1)P(YF1S1)P(LC1S
1) .3
.80 .20 .048
17Causal Bayes Networks
The Joint Distribution Factors According to the
Causal Graph, i.e., for all X in V P(V)
?P(XImmediate Causes of(X))
- P(S 0) .7
- P(S 1) .3
- P(YF 0 S 0) .99 P(LC 0 S 0) .95
- P(YF 1 S 0) .01 P(LC 1 S 0) .05
- P(YF 0 S 1) .20 P(LC 0 S 1) .80
- P(YF 1 S 1) .80 P(LC 1 S 1) .20
P(S,Y,F) P(S) P(YF S) P(LC S)
18Structural Equation Models
Causal Graph
Statistical Model
- 1. Structural Equations
- 2. Statistical Constraints
19Structural Equation Models
Causal Graph
- Structural Equations
- One Equation for each variable V in the
graph - V f(parents(V), errorV)
- for SEM (linear regression) f is a linear
function - Statistical Constraints
- Joint Distribution over the Error terms
20Structural Equation Models
Causal Graph
- Equations
- Education ?ed
- Income ????Education????income
- Longevity ????Education????Longevity
- Statistical Constraints
- (?ed, ?Income,?Income ) N(0,?2)
- ?????????2?diagonal
- - no variance is zero
SEM Graph (path diagram)
Path Diagram
21Tetrad 4 Demo
- www.phil.cmu.edu/projects/tetrad_download/
- Launch Tetrad
- 1. Build a Causal Graph
- 2. Parameterize it as Bayes net
- 3. Parameterize it as a SEM
- 4. Generate Pseudo-random data from each
22The Markov Condition
Statistical Predictions
Markov Condition
Independence X __ Z Y i.e., P(X Y) P(X
Y, Z)
Causal Graphs
23Causal Structure ? Statistical Data
24Causal DiscoveryStatistical Data ? Causal
Structure
25D-separation Equivalence
- D-separation Equivalence Theorem (Verma and
Pearl, 1988) -
- Two acyclic graphs over the same set of variables
are d-separation equivalent iff they have - the same adjacencies
- the same unshielded colliders
26Representations ofD-separation Equivalence
Classes
- We want the representations to
- Characterize the Independence Relations Entailed
by the Equivalence Class - Represent causal features that are shared by
every member of the equivalence class
27Patterns PAGs
- Patterns (Verma and Pearl, 1990) graphical
representation of an acyclic d-separation
equivalence - no latent variables. - PAGs (Richardson 1994) graphical representation
of an equivalence class including latent variable
models and sample selection bias that are
d-separation equivalent over a set of measured
variables X
28Patterns
29Patterns What the Edges Mean
30Patterns
31Patterns
32Patterns
Not all boolean combinations of orientations of
unoriented pattern adjacencies occur in the
equivalence class.
33PAGs Partial Ancestral Graphs
What PAG edges mean.
34PAGs Partial Ancestral Graph
35Overview of Search Methods
- Constraint Based Searches
- TETRAD
- Scoring Searches
- Scores BIC, AIC, etc.
- Search Hill Climb, Genetic Alg., Simulated
Annealing - Very difficult to extend to latent variable
models - Heckerman, Meek and Cooper (1999). A Bayesian
Approach to Causal Discovery chp. 4 in
Computation, Causation, and Discovery, ed. by
Glymour and Cooper, MIT Press, pp. 141-166
36Tetrad 4 Demo
- www.phil.cmu.edu/projects/tetrad_download/
- 1. Apply search to Causal Graph
- 2. Include latent variables
- 3. Apply search to your own data
- 4. Generate data - send to partner
374. Problems with Using Regession for Causal
Inference
38Regression to estimate Causal Influence
- Let V X,Y,T, where
- - Y measured outcome
- - measured regressors X X1, X2, , Xn-
latent common causes of pairs in X U Y T T1,
, Tk - Let the true causal model over V be a Structural
Equation Model in which each V ? V is a linear
combination of its direct causes and independent,
Gaussian noise.
39Regression to estimate Causal Influence
- Consider the regression equation
- Y b0 b1X1 b2X2 ..bnXn
- Let the OLS regression estimate bi be the
estimated causal influence of Xi on Y. - That is, holding X/Xi experimentally constant, bi
is an estimate of the change in E(Y) that
results from an intervention that changes Xi by 1
unit. - Let the real Causal Influence Xi ? Y bi
- When is the OLS estimate bi an unbiased estimate
of bi ?
40Regression Example
? 0 ?
b1
b2
? 0 X
b3
? 0 X
X2
X1
X3
PAG
Y
41Regression Bias
- If
- Xi is d-separated from Y conditional on X/Xi in
the true graph after removing Xi ? Y, and - X contains no descendant of Y, then
- bi is an unbiased estimate of bi
-
- See Using Path Diagrams .
42Tetrad 4 Demo
- www.phil.cmu.edu/projects/tetrad_download/
- 1. Build Causal Graph among X1,X2,X3,Y (with
latent variables) - 2. Build SEM - generate psuedo-random data
(N10,000), save to desktop. - 3. Apply FCI
- 4. Use Minitab to do a regression
43References
- Causation, Prediction, and Search, 2nd Edition,
(2000), by P. Spirtes, C. Glymour, and R.
Scheines ( MIT Press) - Causality Models, Reasoning, and Inference,
(2000), Judea Pearl, Cambridge Univ. Press - Computation, Causation, Discovery (1999),
edited by C. Glymour and G. Cooper, MIT Press - Causality in Crisis?, (1997) V. McKim and S.
Turner (eds.), Univ. of Notre Dame Press. - TETRAD IV www.phil.cmu.edu/projects/tetrad
- Web Course on Causal and Statistical Reasoning
www.phil.cmu.edu/projects/csr/