Why causality is important - PowerPoint PPT Presentation

About This Presentation
Title:

Why causality is important

Description:

Title: No Slide Title Author: Josh Tenenbaum Last modified by: Josh Tenenbaum Created Date: 3/30/2001 1:43:37 PM Document presentation format: On-screen Show – PowerPoint PPT presentation

Number of Views:105
Avg rating:3.0/5.0
Slides: 63
Provided by: JoshT162
Learn more at: http://www.mit.edu
Category:

less

Transcript and Presenter's Notes

Title: Why causality is important


1
Why causality is important
  • Conceptual structure
  • Asymmetric dependence of features

2
Why causality is important
  • Conceptual structure
  • Explaining transformation studies (Keil, Rips)

3
Why causality is important
  • Conceptual structure
  • Successful action planning
  • Treat the disease, not the symptom

4
Why causality is important
  • Conceptual structure
  • Successful action planning
  • Regime change

5
Why inferring causal structure is hard
  • Causation Correlation
  • If A and B are correlated, we could have at least
    three different causal structures
  • How do we learn the underlying causal structure,
    if all we observe are correlations?

A
B
A
B
A
X
B
6
Hume (1748)
  • Causal structure is inferred from the constant
    conjunction of two kinds of events.
  • We may define a cause to be an object, followed
    by another, and where all the objects, similar to
    the first, are followed by objects similar to the
    second.

7
Hume (1748)
  • Causal structure is inferred from the constant
    conjunction of two kinds of events.
  • We may define a cause to be an object, followed
    by another, and where all the objects, similar to
    the first, are followed by objects similar to the
    second.
  • Example Observe B always follows A.

A
B
A
B
A
X
B
8
Hume (1748)
  • Causal structure is inferred from the constant
    conjunction of two kinds of events.
  • We may define a cause to be an object, followed
    by another, and where all the objects, similar to
    the first, are followed by objects similar to the
    second.
  • Example Observe B always follows A.

A
B
A
B
A
X
B
9
Hume (1748)
  • Causal structure is inferred from the constant
    conjunction of two kinds of events.
  • Every idea is copied from some preceding
    impression or sentiment and where we cannot find
    any impression, we may be certain that there is
    no idea.
  • Example Observe B always follows A.

A
B
A
B
A
X
B
10
Humes legacy in psychology associationism
  • Pavlov

11
Humes legacy in psychology associationism
  • Pavlov
  • Human causal learning

Injected with X
Not injected with X
Given a sample of mice
Expressed Y
15
20
Did not express Y
15
10
To what extent does chemical X cause gene Y
to be expressed?
12
Humes legacy in psychology associationism
  • Pavlov
  • Human causal learning
  • Rescorla-Wagner model of causal strength
  • Cue-competition
  • Blocking

We may define a cause to be an object, followed
by another, and where all the objects, similar to
the first, are followed by objects similar to the
second. Or, in other words, where, if the first
object had not been, the second never had
existed. (Hume)
13
Levels of analysis(Shanks)
  • Level 1 Computational theory
  • A rational measure of causal strength
  • Level 2 Representation and algorithm
  • Linear model for predicting effects from causes
  • Rescorla-Wagner algorithm for learning causal
    strength

14
Evidence for DP model
Mean judgment
0.75 0.25 0.5 0.75
0.75 0.0 0.25 0.25
0.0 0.25 0.75 - 0.5
50
0
-50
Lopez Shanks, 1995
Trial
15
Evidence against DP?
Mean judgment
0.1 0.1 0.0 5
0.3 0.3 0.0 14
0.5 0.5 0.0 15
0.7 0.7 0.0 20
0.9 0.9 0.0 30
0.3 0.1 0.2 25
0.9 0.7 0.2 28
0.5 0.1 0.4 33
0.9 0.5 0.4 48
0.9 0.1 0.8 70
Allan Jenkins, 1983
16
Levels of analysis(Shanks)
  • Level 1 Computational theory
  • A rational measure of causal strength
  • Level 2 Representation and algorithm
  • Linear model for predicting effects from causes
  • Rescorla-Wagner algorithm for learning causal
    strength

17
Linear model for predicting effects from causes
C 1 if cause occurs, else 0 E 1 if effect
occurs, else 0 B 1 if background cause is
present (always true)
18
Rescorla-Wagner learning model
Under RW (in the limit of infinite intermixed
trials) wC converges to wB converges to
19
Evidence for Rescorla-Wagner model of causal
learning
Under RW (in the limit of infinite intermixed
trials) wC converges to wB converges to
  • Learning from small (not infinite) samples
  • Learning from blocked (not intermixed) samples

20
Evidence for RW
  • Learning from small (not infinite) samples

Mean judgment
0.1 0.1 0.0 5
0.3 0.3 0.0 14
0.5 0.5 0.0 15
0.7 0.7 0.0 20
0.9 0.9 0.0 30
0.3 0.1 0.2 25
0.9 0.7 0.2 28
0.5 0.1 0.4 33
0.9 0.5 0.4 48
0.9 0.1 0.8 70
Small sample effect
Allan Jenkins, 1983
21
Mean human judgment
Trial
22
Mean human judgment
Mean RW strength
Trial
23
Evidence for RW
  • Learning from small (not infinite) samples
  • Learning from blocked (not intermixed) samples

Stage 1 Stage 2
Judgment Rating
AB O1, B no O
AC no O, C O1
A O1
19.7
DE no O, E O2
DF O2, F no O
D O2
69.4
Lopez Shanks, 1995
24
Is human causal inference just associative
learning?
  • A single domain-general, bottom-up principle of
    association
  • We may define a cause to be an object, followed
    by another, and where all the objects, similar to
    the first, are followed by objects similar to the
    second.
  • Every idea is copied from some preceding
    impression or sentiment and where we cannot find
    any impression, we may be certain that there is
    no idea. No hidden variables.

25
Some problems with simple associationism
  • Garcia effect
  • 3 pm eat potato chips, see flashing lights
  • 5 pm get sick avoid potato
    chips
  • get shocked avoid flashing
    lights

26
Some problems with simple associationism
  • Garcia effect
  • Mailman effect
  • Imagine that your neighbors dog barks always and
    only when the mailman comes around the corner.
    Whenever you hear the dog bark, a few seconds
    later the mailman invariably appears at the
    corner. Does the dogs barking cause the mailman
    to appear?

27
Some problems with simple associationism
  • Garcia effect
  • Mailman effect
  • Perception of causality
  • Hume

This is the sole difference between one
instance, from which we can never receive the
idea of connection, and a number of similar
instances, by which it is suggested. The first
time a man saw the communication of motion by
impulse, as by the shock of two billiard-balls,
he could not pronounce that the one event was
connected but only that it was conjoined with
the other. After he had observed several
instances of this nature, be then pronounces them
to be connected.
28
Some problems with simple associationism
  • Garcia effect
  • Mailman effect
  • Perception of causality
  • Hume
  • Michotte

29
Michotte Demos
  • Launching
  • Entraining
  • Spatial Gap
  • Temporal Gap
  • Triggering
  • Pulling
  • Smashing

30
Perception of causality
  • Does not require a constant conjunction
  • May posit hidden causal structure or causal
    powers
  • Pulling
  • Triggering
  • Sensitive to domain knowledge
  • Present in six-month-old infants (innate?)

31
Some problems with simple associationism
  • Garcia effect
  • Mailman effect
  • Perception of causality
  • One-shot causal inferences
  • Adults
  • Children

32
Blicket detector (Sobel Gopnik)
33
Blocking
Trials 3, 4
Trial 1
Trial 2
  • Two objects A and B
  • Trial 1 A on detector detector active
  • Trial 2 B on detector detector inactive
  • Trials 3,4 A B on detector detector active
  • 3, 4-year-olds judge whether each object is a
    blicket
  • A a blicket
  • B not a blicket

34
Associative learning?
  • RW with very high learning rate.
  • Trials
  • Trial 1 A on detector detector active
  • Trial 2 B on detector detector inactive
  • Trials 3,4 A B on detector detector active
  • Output causal strengths
  • wA 1
  • wB 0

35
A deductive inference?
  • Causal law detector activates if and only if one
    or more objects on top of it are blickets.
  • Premises
  • Trial 1 A on detector detector active
  • Trial 2 B on detector detector inactive
  • Trials 3,4 A B on detector detector active
  • Conclusions deduced from premises and causal law
  • A not a blicket
  • B a blicket

36
Indirect inferences (Sobel, Tenenbaum Gopnik,
2002)
Trial 2
Trial 1
  • Two objects A and B
  • Trial 1 A B on detector detector active
  • Trial 2 A on detector detector inactive
  • 4-year-olds judge whether each object is a
    blicket
  • A not a blicket
  • B a blicket

37
Associative learning?
  • RW with very high learning rate.
  • Trials
  • Trial 1 A B on detector detector active
  • Trial 2 A on detector detector inactive
  • Output causal strengths
  • wA 0
  • wB 0.5

38
A deductive inference?
  • Causal law detector activates if and only if one
    or more objects on top of it are blickets.
  • Premises
  • Trial 1 A B on detector detector active
  • Trial 2 A on detector detector inactive
  • Conclusions deduced from premises and causal law
  • A not a blicket
  • B a blicket

39
Backwards blocking
Trial 2
Trial 1
  • Two objects A and B
  • Trial 1 A B on detector detector active
  • Trial 2 A on detector detector active
  • 4-year-olds judge whether each object is a
    blicket
  • A a blicket (100 of judgments)
  • B probably not a blicket (66 of judgments)

40
Associative learning?
  • RW with very high learning rate.
  • Trials
  • Trial 1 A B on detector detector active
  • Trial 2 A on detector detector active
  • Output causal strengths
  • wA 1.0
  • wB 0.5

41
A deductive inference?
  • Causal law detector activates if and only if one
    or more objects on top of it are blickets.
  • Premises
  • Trial 1 A B on detector detector active
  • Trial 2 A on detector detector active
  • Conclusions deduced from premises and causal law
  • A a blicket
  • B cant tell

42
Graphical models tutorial
  • Idea of a graphical model
  • Conditional dependence and independence
  • Markovian learning
  • Bayesian learning
  • The need for theories

43
Bayesian causal inference
A
B
A
B
  • Hypotheses h00 h10
  • h01 h11

E
E
A
B
E
A
A 1 if block A on detector, else 0 B 1 if
block B on detector, else 0 E 1 if detector
active, else 0
A is a blicket
E
44
Bayesian causal inference
A
B
A
B
  • Hypotheses h00 h10
  • h01 h11

E
E
A
B
E
Causal law E1 if and only if A1 or B1. Data
d1 A1, B1, E1 d2 A1, B0, E1
45
A theory-based inference
  • A blocks position may affect detector
    activation, but not vice versa. All block
    positions are independent of each other.
  • No hypotheses with E A, E B, or A
    B
  • Whether any block is a blicket is independent of
    the identities of other blocks.
  • Prior probability P(hij) qi(1 q)1 i qj(1
    q)1 j
  • Causal law Detector activates if and only if
    one or more blickets are on it.
  • Likelihood P(d h) 1 if causal law is
    satisfied, else 0.

46
  • Assume some independent probability q that A or B
    is a blicket.
  • Priors h1 h2
  • h3 h4
  • Likelihood P(d h) 1 if causal law
    satisfied, else 0.

A
B
A
B
E
E
(1-q)(1-q)
q(1-q)
A
B
E
(1-q)q
qq
47
  • Data d1 A1, B1, E1
  • Posteriors h1 h2
  • h3 h4
  • Inferences
  • e.g., if q 1/3,

A
B
A
B
0
q(1-q)
E
E
A
B
(1-q)q
qq
E
48
  • Data d1 A1, B1, E1,
  • d2 A1, B0, E1
  • Posteriors h1 h2
  • h3 h4
  • Inferences
  • e.g., if q 1/3,

A
B
A
B
0
q(1-q)
E
E
A
B
0
qq
E
49
Bayesian causal inference
  • Causal law detector activates if and only if one
    or more objects on top of it are blickets.
  • Prior
  • Blickets are rare (e.g., q 1/3).
  • Data
  • Trial 1 A B on detector detector active
  • Trial 2 A on detector detector active
  • Inferences
  • A definitely a blicket (100 confidence)
  • B probably not a blicket (67 confidence)

50
Backwards blocking
Trial 2
Trial 1
  • Two objects A and B
  • Trial 1 A B on detector detector active
  • Trial 2 A on detector detector active
  • 4-year-olds judge whether each object is a
    blicket
  • A a blicket (100 of judgments)
  • B probably not a blicket (66 of judgments)

51
Manipulating the prior
I. Pre-training phase Blickets are rare . . . .
II. Backwards blocking phase
Trial 2
Trial 1
  • After each trial, adults judge the probability
    that each object is a blicket.

52
  • Rare condition First observe 12 objects on
    detector, of which 2 set it off.

53
  • Common condition First observe 12 objects on
    detector, of which 10 set it off.

54
Inferences from ambiguous data
I. Pre-training phase Blickets are rare . . . .
II. Two trials A B detector, B C
detector
Trial 2
B
A
C
Trial 1
  • After each trial, adults judge the probability
    that each object is a blicket.

55
Same domain theory generates hypothesis space for
3 objects
B
B
A
C
A
C
  • Hypotheses h000
    h100
  • h010 h001
  • h110 h011
  • h101 h111
  • Likelihoods

E
E
B
B
A
C
A
C
E
E
B
B
A
C
A
C
E
E
B
B
A
C
A
C
E
E
if A 1 and A E exists, or B 1 and
B E exists, or C 1 and C E exists,
else 0.
P(E1 A, B, C h) 1
56
  • Rare condition First observe 12 objects on
    detector, of which 2 set it off.

57
Sensitivity analysis
  • How much work does domain theory do?
  • Alternative model Bayes with noisy sufficient
    causes theory (Cheng).
  • How much work does Bayes do?
  • Alternative model Constraint-based learning
    using c2 measures of independence, fictional
    sample sizes (Glymour).

58
Bayes with correct theory
Bayes with noisy sufficient causes theory
59
Bayes with correct theory
Markov with fictional sample sizes
?
?
?
?
?
?
?
60
Summary of blicket studies
  • Given unambiguous data, people make all-or-none
    causal inferences, with complete explaining away.
  • Given ambiguous data, people make graded causal
    inferences, with partial explaining away.
  • Only Bayes correct theory matches this full
    spectrum from certainty to uncertainty.

61
Conclusions
  • Explain how people can reliably acquire true
    causal beliefs given very limited data
  • Prior causal knowledge Domain theory
  • Causal inference procedure Bayes
  • Domain theory generates the hypothesis space that
    allows Bayes to draw reasonable causal
    inferences.

62
Scope of Bayesian causal inference
  • One-shot causal inferences (e.g., blickets)
  • Causal strength judgments
  • Inferring hidden variables
  • Perception of causality
  • Perception of hidden causes
  • Learning causal theories
Write a Comment
User Comments (0)
About PowerShow.com