Why causality is important - PowerPoint PPT Presentation

About This Presentation

Title:

Why causality is important

Description:

Title: No Slide Title Author: Josh Tenenbaum Last modified by: Josh Tenenbaum Created Date: 3/30/2001 1:43:37 PM Document presentation format: On-screen Show – PowerPoint PPT presentation

Number of Views:108

Avg rating:3.0/5.0

Slides: 63

Provided by: JoshT162

Learn more at: http://www.mit.edu

Category:

more less

Transcript and Presenter's Notes

Title: Why causality is important

1
Why causality is important

Conceptual structure
Asymmetric dependence of features

2
Why causality is important

Conceptual structure
Explaining transformation studies (Keil, Rips)

3
Why causality is important

Conceptual structure
Successful action planning
Treat the disease, not the symptom

4
Why causality is important

Conceptual structure
Successful action planning
Regime change

5
Why inferring causal structure is hard

Causation Correlation
If A and B are correlated, we could have at least
three different causal structures
How do we learn the underlying causal structure,
if all we observe are correlations?

A
B
A
B
A
X
B
6
Hume (1748)

Causal structure is inferred from the constant
conjunction of two kinds of events.
We may define a cause to be an object, followed
by another, and where all the objects, similar to
the first, are followed by objects similar to the
second.

7
Hume (1748)

Causal structure is inferred from the constant
conjunction of two kinds of events.
We may define a cause to be an object, followed
by another, and where all the objects, similar to
the first, are followed by objects similar to the
second.
Example Observe B always follows A.

A
B
A
B
A
X
B
8
Hume (1748)

Causal structure is inferred from the constant
conjunction of two kinds of events.
We may define a cause to be an object, followed
by another, and where all the objects, similar to
the first, are followed by objects similar to the
second.
Example Observe B always follows A.

A
B
A
B
A
X
B
9
Hume (1748)

Causal structure is inferred from the constant
conjunction of two kinds of events.
Every idea is copied from some preceding
impression or sentiment and where we cannot find
any impression, we may be certain that there is
no idea.
Example Observe B always follows A.

A
B
A
B
A
X
B
10
Humes legacy in psychology associationism

Pavlov

11
Humes legacy in psychology associationism

Pavlov
Human causal learning

Injected with X
Not injected with X
Given a sample of mice
Expressed Y
15
20
Did not express Y
15
10
To what extent does chemical X cause gene Y
to be expressed?
12
Humes legacy in psychology associationism

Pavlov
Human causal learning
Rescorla-Wagner model of causal strength
Cue-competition
Blocking

We may define a cause to be an object, followed
by another, and where all the objects, similar to
the first, are followed by objects similar to the
second. Or, in other words, where, if the first
object had not been, the second never had
existed. (Hume)
13
Levels of analysis(Shanks)

Level 1 Computational theory
A rational measure of causal strength
Level 2 Representation and algorithm
Linear model for predicting effects from causes
Rescorla-Wagner algorithm for learning causal
strength

14
Evidence for DP model
Mean judgment
0.75 0.25 0.5 0.75
0.75 0.0 0.25 0.25
0.0 0.25 0.75 - 0.5
50
0
-50
Lopez Shanks, 1995
Trial
15
Evidence against DP?
Mean judgment
0.1 0.1 0.0 5
0.3 0.3 0.0 14
0.5 0.5 0.0 15
0.7 0.7 0.0 20
0.9 0.9 0.0 30
0.3 0.1 0.2 25
0.9 0.7 0.2 28
0.5 0.1 0.4 33
0.9 0.5 0.4 48
0.9 0.1 0.8 70
Allan Jenkins, 1983
16
Levels of analysis(Shanks)

Level 1 Computational theory
A rational measure of causal strength
Level 2 Representation and algorithm
Linear model for predicting effects from causes
Rescorla-Wagner algorithm for learning causal
strength

17
Linear model for predicting effects from causes
C 1 if cause occurs, else 0 E 1 if effect
occurs, else 0 B 1 if background cause is
present (always true)
18
Rescorla-Wagner learning model
Under RW (in the limit of infinite intermixed
trials) wC converges to wB converges to
19
Evidence for Rescorla-Wagner model of causal
learning
Under RW (in the limit of infinite intermixed
trials) wC converges to wB converges to

Learning from small (not infinite) samples
Learning from blocked (not intermixed) samples

20
Evidence for RW

Learning from small (not infinite) samples

Mean judgment
0.1 0.1 0.0 5
0.3 0.3 0.0 14
0.5 0.5 0.0 15
0.7 0.7 0.0 20
0.9 0.9 0.0 30
0.3 0.1 0.2 25
0.9 0.7 0.2 28
0.5 0.1 0.4 33
0.9 0.5 0.4 48
0.9 0.1 0.8 70
Small sample effect
Allan Jenkins, 1983
21
Mean human judgment
Trial
22
Mean human judgment
Mean RW strength
Trial
23
Evidence for RW

Learning from small (not infinite) samples
Learning from blocked (not intermixed) samples

Stage 1 Stage 2
Judgment Rating
AB O1, B no O
AC no O, C O1
A O1
19.7
DE no O, E O2
DF O2, F no O
D O2
69.4
Lopez Shanks, 1995
24
Is human causal inference just associative
learning?

A single domain-general, bottom-up principle of
association
We may define a cause to be an object, followed
by another, and where all the objects, similar to
the first, are followed by objects similar to the
second.
Every idea is copied from some preceding
impression or sentiment and where we cannot find
any impression, we may be certain that there is
no idea. No hidden variables.

25
Some problems with simple associationism

Garcia effect
3 pm eat potato chips, see flashing lights
5 pm get sick avoid potato
chips
get shocked avoid flashing
lights

26
Some problems with simple associationism

Garcia effect
Mailman effect
Imagine that your neighbors dog barks always and
only when the mailman comes around the corner.
Whenever you hear the dog bark, a few seconds
later the mailman invariably appears at the
corner. Does the dogs barking cause the mailman
to appear?

27
Some problems with simple associationism

Garcia effect
Mailman effect
Perception of causality
Hume

This is the sole difference between one
instance, from which we can never receive the
idea of connection, and a number of similar
instances, by which it is suggested. The first
time a man saw the communication of motion by
impulse, as by the shock of two billiard-balls,
he could not pronounce that the one event was
connected but only that it was conjoined with
the other. After he had observed several
instances of this nature, be then pronounces them
to be connected.
28
Some problems with simple associationism

Garcia effect
Mailman effect
Perception of causality
Hume
Michotte

29
Michotte Demos

Launching
Entraining
Spatial Gap
Temporal Gap
Triggering
Pulling
Smashing

30
Perception of causality

Does not require a constant conjunction
May posit hidden causal structure or causal
powers
Pulling
Triggering
Sensitive to domain knowledge
Present in six-month-old infants (innate?)

31
Some problems with simple associationism

Garcia effect
Mailman effect
Perception of causality
One-shot causal inferences
Adults
Children

32
Blicket detector (Sobel Gopnik)
33
Blocking
Trials 3, 4
Trial 1
Trial 2

Two objects A and B
Trial 1 A on detector detector active
Trial 2 B on detector detector inactive
Trials 3,4 A B on detector detector active
3, 4-year-olds judge whether each object is a
blicket
A a blicket
B not a blicket

34
Associative learning?

RW with very high learning rate.
Trials
Trial 1 A on detector detector active
Trial 2 B on detector detector inactive
Trials 3,4 A B on detector detector active
Output causal strengths
wA 1
wB 0

35
A deductive inference?

Causal law detector activates if and only if one
or more objects on top of it are blickets.
Premises
Trial 1 A on detector detector active
Trial 2 B on detector detector inactive
Trials 3,4 A B on detector detector active
Conclusions deduced from premises and causal law
A not a blicket
B a blicket

36
Indirect inferences (Sobel, Tenenbaum Gopnik,
2002)
Trial 2
Trial 1

Two objects A and B
Trial 1 A B on detector detector active
Trial 2 A on detector detector inactive
4-year-olds judge whether each object is a
blicket
A not a blicket
B a blicket

37
Associative learning?

RW with very high learning rate.
Trials
Trial 1 A B on detector detector active
Trial 2 A on detector detector inactive
Output causal strengths
wA 0
wB 0.5

38
A deductive inference?

Causal law detector activates if and only if one
or more objects on top of it are blickets.
Premises
Trial 1 A B on detector detector active
Trial 2 A on detector detector inactive
Conclusions deduced from premises and causal law
A not a blicket
B a blicket

39
Backwards blocking
Trial 2
Trial 1

Two objects A and B
Trial 1 A B on detector detector active
Trial 2 A on detector detector active
4-year-olds judge whether each object is a
blicket
A a blicket (100 of judgments)
B probably not a blicket (66 of judgments)

40
Associative learning?

RW with very high learning rate.
Trials
Trial 1 A B on detector detector active
Trial 2 A on detector detector active
Output causal strengths
wA 1.0
wB 0.5

41
A deductive inference?

Causal law detector activates if and only if one
or more objects on top of it are blickets.
Premises
Trial 1 A B on detector detector active
Trial 2 A on detector detector active
Conclusions deduced from premises and causal law
A a blicket
B cant tell

42
Graphical models tutorial

Idea of a graphical model
Conditional dependence and independence
Markovian learning
Bayesian learning
The need for theories

43
Bayesian causal inference
A
B
A
B

Hypotheses h00 h10
h01 h11

E
E
A
B
E
A
A 1 if block A on detector, else 0 B 1 if
block B on detector, else 0 E 1 if detector
active, else 0
A is a blicket
E
44
Bayesian causal inference
A
B
A
B

Hypotheses h00 h10
h01 h11

E
E
A
B
E
Causal law E1 if and only if A1 or B1. Data
d1 A1, B1, E1 d2 A1, B0, E1
45
A theory-based inference

A blocks position may affect detector
activation, but not vice versa. All block
positions are independent of each other.
No hypotheses with E A, E B, or A
B
Whether any block is a blicket is independent of
the identities of other blocks.
Prior probability P(hij) qi(1 q)1 i qj(1
q)1 j
Causal law Detector activates if and only if
one or more blickets are on it.
Likelihood P(d h) 1 if causal law is
satisfied, else 0.

Assume some independent probability q that A or B
is a blicket.
Priors h1 h2
h3 h4
Likelihood P(d h) 1 if causal law
satisfied, else 0.

A
B
A
B
E
E
(1-q)(1-q)
q(1-q)
A
B
E
(1-q)q
qq
47

Data d1 A1, B1, E1
Posteriors h1 h2
h3 h4
Inferences
e.g., if q 1/3,

A
B
A
B
0
q(1-q)
E
E
A
B
(1-q)q
qq
E
48

Data d1 A1, B1, E1,
d2 A1, B0, E1
Posteriors h1 h2
h3 h4
Inferences
e.g., if q 1/3,

A
B
A
B
0
q(1-q)
E
E
A
B
0
qq
E
49
Bayesian causal inference

Causal law detector activates if and only if one
or more objects on top of it are blickets.
Prior
Blickets are rare (e.g., q 1/3).
Data
Trial 1 A B on detector detector active
Trial 2 A on detector detector active
Inferences
A definitely a blicket (100 confidence)
B probably not a blicket (67 confidence)

50
Backwards blocking
Trial 2
Trial 1

Two objects A and B
Trial 1 A B on detector detector active
Trial 2 A on detector detector active
4-year-olds judge whether each object is a
blicket
A a blicket (100 of judgments)
B probably not a blicket (66 of judgments)

51
Manipulating the prior
I. Pre-training phase Blickets are rare . . . .
II. Backwards blocking phase
Trial 2
Trial 1

After each trial, adults judge the probability
that each object is a blicket.

Rare condition First observe 12 objects on
detector, of which 2 set it off.

Common condition First observe 12 objects on
detector, of which 10 set it off.

54
Inferences from ambiguous data
I. Pre-training phase Blickets are rare . . . .
II. Two trials A B detector, B C
detector
Trial 2
B
A
C
Trial 1

After each trial, adults judge the probability
that each object is a blicket.

55
Same domain theory generates hypothesis space for
3 objects
B
B
A
C
A
C

Hypotheses h000
h100
h010 h001
h110 h011
h101 h111
Likelihoods

E
E
B
B
A
C
A
C
E
E
B
B
A
C
A
C
E
E
B
B
A
C
A
C
E
E
if A 1 and A E exists, or B 1 and
B E exists, or C 1 and C E exists,
else 0.
P(E1 A, B, C h) 1
56

Rare condition First observe 12 objects on
detector, of which 2 set it off.

57
Sensitivity analysis

How much work does domain theory do?
Alternative model Bayes with noisy sufficient
causes theory (Cheng).
How much work does Bayes do?
Alternative model Constraint-based learning
using c2 measures of independence, fictional
sample sizes (Glymour).

58
Bayes with correct theory
Bayes with noisy sufficient causes theory
59
Bayes with correct theory
Markov with fictional sample sizes
?
?
?
?
?
?
?
60
Summary of blicket studies

Given unambiguous data, people make all-or-none
causal inferences, with complete explaining away.
Given ambiguous data, people make graded causal
inferences, with partial explaining away.
Only Bayes correct theory matches this full
spectrum from certainty to uncertainty.

61
Conclusions

Explain how people can reliably acquire true
causal beliefs given very limited data
Prior causal knowledge Domain theory
Causal inference procedure Bayes
Domain theory generates the hypothesis space that
allows Bayes to draw reasonable causal
inferences.

62
Scope of Bayesian causal inference