Title: Why causality is important
1Why causality is important
- Conceptual structure
- Asymmetric dependence of features
2Why causality is important
- Conceptual structure
- Explaining transformation studies (Keil, Rips)
3Why causality is important
- Conceptual structure
- Successful action planning
- Treat the disease, not the symptom
4Why causality is important
- Conceptual structure
- Successful action planning
- Regime change
5Why inferring causal structure is hard
- Causation Correlation
- If A and B are correlated, we could have at least
three different causal structures - How do we learn the underlying causal structure,
if all we observe are correlations?
A
B
A
B
A
X
B
6Hume (1748)
- Causal structure is inferred from the constant
conjunction of two kinds of events. - We may define a cause to be an object, followed
by another, and where all the objects, similar to
the first, are followed by objects similar to the
second.
7Hume (1748)
- Causal structure is inferred from the constant
conjunction of two kinds of events. - We may define a cause to be an object, followed
by another, and where all the objects, similar to
the first, are followed by objects similar to the
second. - Example Observe B always follows A.
A
B
A
B
A
X
B
8Hume (1748)
- Causal structure is inferred from the constant
conjunction of two kinds of events. - We may define a cause to be an object, followed
by another, and where all the objects, similar to
the first, are followed by objects similar to the
second. - Example Observe B always follows A.
A
B
A
B
A
X
B
9Hume (1748)
- Causal structure is inferred from the constant
conjunction of two kinds of events. - Every idea is copied from some preceding
impression or sentiment and where we cannot find
any impression, we may be certain that there is
no idea. - Example Observe B always follows A.
A
B
A
B
A
X
B
10Humes legacy in psychology associationism
11Humes legacy in psychology associationism
- Pavlov
- Human causal learning
Injected with X
Not injected with X
Given a sample of mice
Expressed Y
15
20
Did not express Y
15
10
To what extent does chemical X cause gene Y
to be expressed?
12Humes legacy in psychology associationism
- Pavlov
- Human causal learning
- Rescorla-Wagner model of causal strength
- Cue-competition
- Blocking
We may define a cause to be an object, followed
by another, and where all the objects, similar to
the first, are followed by objects similar to the
second. Or, in other words, where, if the first
object had not been, the second never had
existed. (Hume)
13Levels of analysis(Shanks)
- Level 1 Computational theory
- A rational measure of causal strength
-
- Level 2 Representation and algorithm
- Linear model for predicting effects from causes
- Rescorla-Wagner algorithm for learning causal
strength
14Evidence for DP model
Mean judgment
0.75 0.25 0.5 0.75
0.75 0.0 0.25 0.25
0.0 0.25 0.75 - 0.5
50
0
-50
Lopez Shanks, 1995
Trial
15Evidence against DP?
Mean judgment
0.1 0.1 0.0 5
0.3 0.3 0.0 14
0.5 0.5 0.0 15
0.7 0.7 0.0 20
0.9 0.9 0.0 30
0.3 0.1 0.2 25
0.9 0.7 0.2 28
0.5 0.1 0.4 33
0.9 0.5 0.4 48
0.9 0.1 0.8 70
Allan Jenkins, 1983
16Levels of analysis(Shanks)
- Level 1 Computational theory
- A rational measure of causal strength
-
- Level 2 Representation and algorithm
- Linear model for predicting effects from causes
- Rescorla-Wagner algorithm for learning causal
strength
17Linear model for predicting effects from causes
C 1 if cause occurs, else 0 E 1 if effect
occurs, else 0 B 1 if background cause is
present (always true)
18Rescorla-Wagner learning model
Under RW (in the limit of infinite intermixed
trials) wC converges to wB converges to
19Evidence for Rescorla-Wagner model of causal
learning
Under RW (in the limit of infinite intermixed
trials) wC converges to wB converges to
- Learning from small (not infinite) samples
- Learning from blocked (not intermixed) samples
20Evidence for RW
- Learning from small (not infinite) samples
Mean judgment
0.1 0.1 0.0 5
0.3 0.3 0.0 14
0.5 0.5 0.0 15
0.7 0.7 0.0 20
0.9 0.9 0.0 30
0.3 0.1 0.2 25
0.9 0.7 0.2 28
0.5 0.1 0.4 33
0.9 0.5 0.4 48
0.9 0.1 0.8 70
Small sample effect
Allan Jenkins, 1983
21Mean human judgment
Trial
22Mean human judgment
Mean RW strength
Trial
23Evidence for RW
- Learning from small (not infinite) samples
- Learning from blocked (not intermixed) samples
Stage 1 Stage 2
Judgment Rating
AB O1, B no O
AC no O, C O1
A O1
19.7
DE no O, E O2
DF O2, F no O
D O2
69.4
Lopez Shanks, 1995
24Is human causal inference just associative
learning?
- A single domain-general, bottom-up principle of
association - We may define a cause to be an object, followed
by another, and where all the objects, similar to
the first, are followed by objects similar to the
second. - Every idea is copied from some preceding
impression or sentiment and where we cannot find
any impression, we may be certain that there is
no idea. No hidden variables.
25Some problems with simple associationism
- Garcia effect
- 3 pm eat potato chips, see flashing lights
- 5 pm get sick avoid potato
chips - get shocked avoid flashing
lights
26Some problems with simple associationism
- Garcia effect
- Mailman effect
- Imagine that your neighbors dog barks always and
only when the mailman comes around the corner.
Whenever you hear the dog bark, a few seconds
later the mailman invariably appears at the
corner. Does the dogs barking cause the mailman
to appear?
27Some problems with simple associationism
- Garcia effect
- Mailman effect
- Perception of causality
- Hume
This is the sole difference between one
instance, from which we can never receive the
idea of connection, and a number of similar
instances, by which it is suggested. The first
time a man saw the communication of motion by
impulse, as by the shock of two billiard-balls,
he could not pronounce that the one event was
connected but only that it was conjoined with
the other. After he had observed several
instances of this nature, be then pronounces them
to be connected.
28Some problems with simple associationism
- Garcia effect
- Mailman effect
- Perception of causality
- Hume
- Michotte
29Michotte Demos
- Launching
- Entraining
- Spatial Gap
- Temporal Gap
- Triggering
- Pulling
- Smashing
30Perception of causality
- Does not require a constant conjunction
- May posit hidden causal structure or causal
powers - Pulling
- Triggering
- Sensitive to domain knowledge
- Present in six-month-old infants (innate?)
31Some problems with simple associationism
- Garcia effect
- Mailman effect
- Perception of causality
- One-shot causal inferences
- Adults
- Children
32Blicket detector (Sobel Gopnik)
33Blocking
Trials 3, 4
Trial 1
Trial 2
- Two objects A and B
- Trial 1 A on detector detector active
- Trial 2 B on detector detector inactive
- Trials 3,4 A B on detector detector active
- 3, 4-year-olds judge whether each object is a
blicket - A a blicket
- B not a blicket
34Associative learning?
- RW with very high learning rate.
- Trials
- Trial 1 A on detector detector active
- Trial 2 B on detector detector inactive
- Trials 3,4 A B on detector detector active
- Output causal strengths
- wA 1
- wB 0
35A deductive inference?
- Causal law detector activates if and only if one
or more objects on top of it are blickets. - Premises
- Trial 1 A on detector detector active
- Trial 2 B on detector detector inactive
- Trials 3,4 A B on detector detector active
- Conclusions deduced from premises and causal law
- A not a blicket
- B a blicket
36Indirect inferences (Sobel, Tenenbaum Gopnik,
2002)
Trial 2
Trial 1
- Two objects A and B
- Trial 1 A B on detector detector active
- Trial 2 A on detector detector inactive
- 4-year-olds judge whether each object is a
blicket - A not a blicket
- B a blicket
37Associative learning?
- RW with very high learning rate.
- Trials
- Trial 1 A B on detector detector active
- Trial 2 A on detector detector inactive
- Output causal strengths
- wA 0
- wB 0.5
38A deductive inference?
- Causal law detector activates if and only if one
or more objects on top of it are blickets. - Premises
- Trial 1 A B on detector detector active
- Trial 2 A on detector detector inactive
- Conclusions deduced from premises and causal law
- A not a blicket
- B a blicket
39Backwards blocking
Trial 2
Trial 1
- Two objects A and B
- Trial 1 A B on detector detector active
- Trial 2 A on detector detector active
- 4-year-olds judge whether each object is a
blicket - A a blicket (100 of judgments)
- B probably not a blicket (66 of judgments)
40Associative learning?
- RW with very high learning rate.
- Trials
- Trial 1 A B on detector detector active
- Trial 2 A on detector detector active
- Output causal strengths
- wA 1.0
- wB 0.5
41A deductive inference?
- Causal law detector activates if and only if one
or more objects on top of it are blickets. - Premises
- Trial 1 A B on detector detector active
- Trial 2 A on detector detector active
- Conclusions deduced from premises and causal law
- A a blicket
- B cant tell
42Graphical models tutorial
- Idea of a graphical model
- Conditional dependence and independence
- Markovian learning
- Bayesian learning
- The need for theories
43Bayesian causal inference
A
B
A
B
- Hypotheses h00 h10
- h01 h11
E
E
A
B
E
A
A 1 if block A on detector, else 0 B 1 if
block B on detector, else 0 E 1 if detector
active, else 0
A is a blicket
E
44Bayesian causal inference
A
B
A
B
- Hypotheses h00 h10
- h01 h11
E
E
A
B
E
Causal law E1 if and only if A1 or B1. Data
d1 A1, B1, E1 d2 A1, B0, E1
45A theory-based inference
- A blocks position may affect detector
activation, but not vice versa. All block
positions are independent of each other. - No hypotheses with E A, E B, or A
B - Whether any block is a blicket is independent of
the identities of other blocks. - Prior probability P(hij) qi(1 q)1 i qj(1
q)1 j - Causal law Detector activates if and only if
one or more blickets are on it. - Likelihood P(d h) 1 if causal law is
satisfied, else 0.
46- Assume some independent probability q that A or B
is a blicket. - Priors h1 h2
- h3 h4
- Likelihood P(d h) 1 if causal law
satisfied, else 0.
A
B
A
B
E
E
(1-q)(1-q)
q(1-q)
A
B
E
(1-q)q
qq
47- Data d1 A1, B1, E1
- Posteriors h1 h2
- h3 h4
- Inferences
- e.g., if q 1/3,
A
B
A
B
0
q(1-q)
E
E
A
B
(1-q)q
qq
E
48- Data d1 A1, B1, E1,
- d2 A1, B0, E1
- Posteriors h1 h2
- h3 h4
- Inferences
- e.g., if q 1/3,
A
B
A
B
0
q(1-q)
E
E
A
B
0
qq
E
49Bayesian causal inference
- Causal law detector activates if and only if one
or more objects on top of it are blickets. - Prior
- Blickets are rare (e.g., q 1/3).
- Data
- Trial 1 A B on detector detector active
- Trial 2 A on detector detector active
- Inferences
- A definitely a blicket (100 confidence)
- B probably not a blicket (67 confidence)
50Backwards blocking
Trial 2
Trial 1
- Two objects A and B
- Trial 1 A B on detector detector active
- Trial 2 A on detector detector active
- 4-year-olds judge whether each object is a
blicket - A a blicket (100 of judgments)
- B probably not a blicket (66 of judgments)
51Manipulating the prior
I. Pre-training phase Blickets are rare . . . .
II. Backwards blocking phase
Trial 2
Trial 1
- After each trial, adults judge the probability
that each object is a blicket.
52- Rare condition First observe 12 objects on
detector, of which 2 set it off.
53- Common condition First observe 12 objects on
detector, of which 10 set it off.
54Inferences from ambiguous data
I. Pre-training phase Blickets are rare . . . .
II. Two trials A B detector, B C
detector
Trial 2
B
A
C
Trial 1
- After each trial, adults judge the probability
that each object is a blicket.
55Same domain theory generates hypothesis space for
3 objects
B
B
A
C
A
C
- Hypotheses h000
h100 - h010 h001
- h110 h011
- h101 h111
- Likelihoods
E
E
B
B
A
C
A
C
E
E
B
B
A
C
A
C
E
E
B
B
A
C
A
C
E
E
if A 1 and A E exists, or B 1 and
B E exists, or C 1 and C E exists,
else 0.
P(E1 A, B, C h) 1
56- Rare condition First observe 12 objects on
detector, of which 2 set it off.
57Sensitivity analysis
- How much work does domain theory do?
- Alternative model Bayes with noisy sufficient
causes theory (Cheng). - How much work does Bayes do?
- Alternative model Constraint-based learning
using c2 measures of independence, fictional
sample sizes (Glymour).
58Bayes with correct theory
Bayes with noisy sufficient causes theory
59Bayes with correct theory
Markov with fictional sample sizes
?
?
?
?
?
?
?
60Summary of blicket studies
- Given unambiguous data, people make all-or-none
causal inferences, with complete explaining away.
- Given ambiguous data, people make graded causal
inferences, with partial explaining away. - Only Bayes correct theory matches this full
spectrum from certainty to uncertainty.
61Conclusions
- Explain how people can reliably acquire true
causal beliefs given very limited data - Prior causal knowledge Domain theory
- Causal inference procedure Bayes
- Domain theory generates the hypothesis space that
allows Bayes to draw reasonable causal
inferences.
62Scope of Bayesian causal inference
- One-shot causal inferences (e.g., blickets)
- Causal strength judgments
- Inferring hidden variables
- Perception of causality
- Perception of hidden causes
- Learning causal theories