Logistics

About This Presentation
Title:

Logistics

Description:

Logistics Class size? Who is new? Who is listening? Everyone on Athena mailing list concepts-and-theories ? If not write to me. Everyone on stellar yet? – PowerPoint PPT presentation

Number of Views:3
Avg rating:3.0/5.0
Slides: 92
Provided by: JoshT156
Learn more at: http://www.mit.edu

less

Transcript and Presenter's Notes

Title: Logistics


1
Logistics
  • Class size? Who is new? Who is listening?
  • Everyone on Athena mailing list
    concepts-and-theories? If not write to me.
  • Everyone on stellar yet? If not, write to Melissa
    Yeh (mjyeh_at_mit.edu).
  • Interest in having a printed course pack, even if
    a few readings get changed?

2
Plan for tonight
  • Why be Bayesian?
  • Informal introduction to learning as
    probabilistic inference
  • Formal introduction to probabilistic inference
  • A little bit of mathematical psychology
  • An introduction to Bayes nets

3
Plan for tonight
  • Why be Bayesian?
  • Informal introduction to learning as
    probabilistic inference
  • Formal introduction to probabilistic inference
  • A little bit of mathematical psychology
  • An introduction to Bayes nets

4
Virtues of Bayesian framework
  • Generates principled models with strong
    explanatory and descriptive power.

5
Virtues of Bayesian framework
  • Generates principled models with strong
    explanatory and descriptive power.
  • Unifies models of cognition across tasks and
    domains.
  • Categorization
  • Concept learning
  • Word learning
  • Inductive reasoning
  • Causal inference
  • Conceptual change
  • Biology
  • Physics
  • Psychology
  • Language
  • . . .

6
Virtues of Bayesian framework
  • Generates principled models with strong
    explanatory and descriptive power.
  • Unifies models of cognition across tasks and
    domains.
  • Explains which processing models work, and why.
  • Associative learning
  • Connectionist networks
  • Similarity to examples
  • Toolkit of simple heuristics

7
Virtues of Bayesian framework
  • Generates principled models with strong
    explanatory and descriptive power.
  • Unifies models of cognition across tasks and
    domains.
  • Explains which processing models work, and why.
  • Allows us to move beyond classic dichotomies.
  • Symbols (rules, logic, hierarchies, relations)
    versus Statistics
  • Domain-general versus Domain-specific
  • Nature versus Nurture

8
Virtues of Bayesian framework
  • Generates principled models with strong
    explanatory and descriptive power.
  • Unifies models of cognition across tasks and
    domains.
  • Explains which processing models work, and why.
  • Allows us to move beyond classic dichotomies.
  • A framework for understanding theory-based
    cognition
  • How are theories used to learn about the
    structure of the world?
  • How are theories acquired?

9
Rational statistical inference(Bayes, Laplace)
  • Fundamental question
  • How do we update beliefs in light of data?
  • Fundamental (and only) assumption
  • Represent degrees of belief as probabilities.
  • The answer
  • Mathematics of probability theory.

10
What does probability mean?
  • Frequentists Probability as expected frequency
  • P(A) 1 A will always occur.
  • P(A) 0 A will never occur.
  • 0.5 lt P(A) lt 1 A will occur more often than not.
  • Subjectivists Probability as degree of belief
  • P(A) 1 believe A is true.
  • P(A) 0 believe A is false.
  • 0.5 lt P(A) lt 1 believe A is more likely to be
    true than false.

11
What does probability mean?
  • Frequentists Probability as expected frequency
  • P(heads) 0.5 If we flip 100 times, we
    expect to see about 50 heads.
  • Subjectivists Probability as degree of belief
  • P(heads) 0.5 On the next flip, its an
    even bet whether it comes up heads or tails.
  • P(rain tomorrow) 0.8
  • P(Saddam Hussein is dead) 0.1
  • . . .

12
Is subjective probability cognitively viable?
  • Evolutionary psychologists (Gigerenzer, Cosmides,
    Tooby, Pinker) argue it is not.

13
  • To understand the design of statistical
    inference mechanisms, then, one needs to examine
    what form inductive-reasoning problems -- and the
    information relevant to solving them -- regularly
    took in ancestral environments. Asking for
    the probability of a single event seems
    unexceptionable in the modern world, where we are
    bombarded with numerically expressed statistical
    information, such as weather forecasts telling us
    there is a 60 chance of rain today. In
    ancestral environments, the only external
    database available from which to reason
    inductively was one's own observations and,
    possibly, those communicated by the handful of
    other individuals with whom one lived.
  • The probability of a single event cannot be
    observed by an individual, however. Single
    events either happen or they dont -- either it
    will rain today or it will not. Natural
    selection cannot build cognitive mechanisms
    designed to reason about, or receive as input,
    information in a format that did not regularly
    exist.

(Brase, Cosmides and Tooby, 1998)
14
Is subjective probability cognitively viable?
  • Evolutionary psychologists (Gigerenzer, Cosmides,
    Tooby, Pinker) argue it is not.
  • Reasons to think it is
  • Intuitions are old and potentially universal
    (Aristotle, the Talmud).
  • Represented in semantics (and syntax?) of natural
    language.
  • Extremely useful .

15
Why be subjectivist?
  • Often need to make inferences about singular
    events
  • e.g., How likely is it to rain tomorrow?
  • Cox Axioms
  • A formal model of common sense
  • Dutch Book Survival of the Fittest
  • If your beliefs do not accord with the laws of
    probability, then you can always be out-gambled
    by someone whose beliefs do so accord.
  • Provides a theory of learning
  • A common currency for combining prior knowledge
    and the lessons of experience.

16
Cox Axioms (via Jaynes)
  • Degrees of belief are represented by real
    numbers.
  • Qualitative correspondence with common sense,
    e.g.
  • Consistency
  • If a conclusion can be reasoned in more than one
    way, then every possible way must lead to the
    same result.
  • All available evidence should be taken into
    account when inferring a degree of belief.
  • Equivalent states of knowledge should be
    represented with equivalent degrees of belief.
  • Accepting these axioms implies Bel can be
    represented as a probability measure.

17
Plan for tonight
  • Why be Bayesian?
  • Informal introduction to learning as
    probabilistic inference
  • Formal introduction to probabilistic inference
  • A little bit of mathematical psychology
  • An introduction to Bayes nets

18
Example flipping coins
  • Flip a coin 10 times and see 5 heads, 5 tails.
  • P(heads) on next flip? 50
  • Why? 50 5 / (55) 5/10.
  • Future will be like the past.
  • Suppose we had seen 4 heads and 6 tails.
  • P(heads) on next flip? Closer to 50 than to 40.
  • Why? Prior knowledge.

19
Example flipping coins
  • Represent prior knowledge as fictional
    observations F.
  • E.g., F 1000 heads, 1000 tails strong
    expectation that any new coin will be fair.
  • After seeing 4 heads, 6 tails, P(heads) on next
    flip 1004 / (10041006) 49.95
  • E.g., F 3 heads, 3 tails weak expectation
    that any new coin will be fair.
  • After seeing 4 heads, 6 tails, P(heads) on next
    flip 7 / (79) 43.75. Prior knowledge too
    weak.

20
Example flipping thumbtacks
  • Represent prior knowledge as fictional
    observations F.
  • E.g., F 4 heads, 3 tails weak expectation
    that tacks are slightly biased towards heads.
  • After seeing 2 heads, 0 tails, P(heads) on next
    flip 6 / (63) 67.
  • Some prior knowledge is always necessary to avoid
    jumping to hasty conclusions.
  • Suppose F After seeing 2 heads, 0 tails,
    P(heads) on next flip 2 / (20) 100.

21
Origin of prior knowledge
  • Tempting answer prior experience
  • Suppose you have previously seen 2000 coin flips
    1000 heads, 1000 tails.
  • By assuming all coins (and flips) are alike,
    these observations of other coins are as good as
    actual observations of the present coin.

22
Problems with simple empiricism
  • Havent really seen 2000 coin flips, or any
    thumbtack flips.
  • Prior knowledge is stronger than raw experience
    justifies.
  • Havent seen exactly equal number of heads and
    tails.
  • Prior knowledge is smoother than raw experience
    justifies.
  • Should be a difference between observing 2000
    flips of a single coin versus observing 10 flips
    each for 200 coins, or 1 flip each for 2000
    coins.
  • Prior knowledge is more structured than raw
    experience.

23
A simple theory
  • Coins are manufactured by a standardized
    procedure that is effective but not perfect.
  • Justifies generalizing from previous coins to the
    present coin.
  • Justifies smoother and stronger prior than raw
    experience alone.
  • Explains why seeing 10 flips each for 200 coins
    is more valuable than seeing 2000 flips of one
    coin.
  • Tacks are asymmetric, and manufactured to less
    exacting standards.

24
Limitations
  • Can all domain knowledge be represented so
    simply, in terms of an equivalent number of
    fictional observations?
  • Suppose you flip a coin 25 times and get all
    heads. Something funny is going on .
  • But with F 1000 heads, 1000 tails, P(heads) on
    next flip 1025 / (10251000) 50.6. Looks
    like nothing unusual.

25
Plan for tonight
  • Why be Bayesian?
  • Informal introduction to learning as
    probabilistic inference
  • Formal introduction to probabilistic inference
  • A little bit of mathematical psychology
  • An introduction to Bayes nets

26
Basics
  • Propositions A, B, C, . . . .
  • Negation
  • Logical operators and, or
  • Obey classical logic, e.g.,

27
Basics
  • Conservation of belief
  • Joint probability
  • For independent propositions
  • More generally

28
Basics
  • Example
  • A Heads on flip 2
  • B Tails on flip 2

29
Basics
  • All probabilities should be conditioned on
    background knowledge K e.g.,
  • All the same rules hold conditioned on any K
    e.g.,
  • Often background knowledge will be implicit,
    brought in as needed.

30
Bayesian inference
  • Definition of conditional probability
  • Bayes theorem

31
Bayesian inference
  • Definition of conditional probability
  • Bayes rule
  • Posterior probability
  • Prior probability
  • Likelihood

32
Bayesian inference
  • Bayes rule
  • What makes a good scientific argument? P(HD) is
    high if
  • Hypothesis is plausible P(H) is high
  • Hypothesis strongly predicts the observed data
  • P(DH) is high
  • Data are surprising P(D) is low

33
Bayesian inference
  • Deriving a more useful version

34
Bayesian inference
  • Deriving a more useful version

35
Bayesian inference
  • Deriving a more useful version

Conditionalization
36
Bayesian inference
  • Deriving a more useful version

37
Bayesian inference
  • Deriving a more useful version

38
Bayesian inference
  • Deriving a more useful version

39
Bayesian inference
  • Deriving a more useful version

40
Random variables
  • Random variable X denotes a set of mutually
    exclusive exhaustive propositions (states of the
    world)
  • Bayes theorem for random variables

41
Random variables
  • Random variable X denotes a set of mutually
    exclusive exhaustive propositions (states of the
    world)
  • Bayes rule for more than two hypotheses

42
Sherlock Holmes
  • How often have I said to you that when you have
    eliminated the impossible whatever remains,
    however improbable, must be the truth? (The Sign
    of the Four)

43
Sherlock Holmes
  • How often have I said to you that when you have
    eliminated the impossible whatever remains,
    however improbable, must be the truth? (The Sign
    of the Four)

44
Sherlock Holmes
  • How often have I said to you that when you have
    eliminated the impossible whatever remains,
    however improbable, must be the truth? (The Sign
    of the Four)

0
45
Sherlock Holmes
  • How often have I said to you that when you have
    eliminated the impossible whatever remains,
    however improbable, must be the truth? (The Sign
    of the Four)

46
Plan for tonight
  • Why be Bayesian?
  • Informal introduction to learning as
    probabilistic inference
  • Formal introduction to probabilistic inference
  • A little bit of mathematical psychology
  • An introduction to Bayes nets

47
Representativeness in reasoning
  • Which sequence is more likely to be produced by
    flipping a fair coin?
  • HHTHT
  • HHHHH

48
A reasoning fallacy
  • Kahneman Tversky people judge the probability
    of an outcome based on the extent to which it is
    representative of the generating process.

49
Predictive versus inductive reasoning
Hypothesis
H
Data
D
50
Predictive versus inductive reasoning
Prediction given ?
H
D
51
Predictive versus inductive reasoning
Prediction given ?
Induction ? given
H
D
52
Bayes Rule in odds form
  • P(H1D) P(DH1) P(H1)
  • P(H2D) P(DH2) P(H2)
  • D data
  • H1, H2 models
  • P(H1D) posterior probability that model 1
    generated the data.
  • P(DH1) likelihood of data given model 1
  • P(H1) prior probability that model 1 generated
    the data

x
53
Bayesian analysis of coin flipping
  • D HHTHT
  • H1, H2 fair coin, trick all heads coin.
  • P(DH1) 1/32 P(H1) 999/1000
  • P(DH2) 0 P(H2) 1/1000
  • P(H1D) / P(H2D) infinity

P(H1D) P(DH1) P(H1) P(H2D)
P(DH2) P(H2)
x
54
Bayesian analysis of coin flipping
  • D HHHHH
  • H1, H2 fair coin, trick all heads coin.
  • P(DH1) 1/32 P(H1) 999/1000
  • P(DH2) 1 P(H2) 1/1000
  • P(H1D) / P(H2D) 999/32 301

P(H1D) P(DH1) P(H1) P(H2D)
P(DH2) P(H2)
x
55
Bayesian analysis of coin flipping
  • D HHHHHHHHHH
  • H1, H2 fair coin, trick all heads coin.
  • P(DH1) 1/1024 P(H1) 999/1000
  • P(DH2) 1 P(H2) 1/1000
  • P(H1D) / P(H2D) 999/1024 11

P(H1D) P(DH1) P(H1) P(H2D)
P(DH2) P(H2)
x
56
The role of theories
  • The fact that HHTHT looks representative of a
    fair coin and HHHHH does not reflects our
    implicit theories of how the world works.
  • Easy to imagine how a trick all-heads coin could
    work high prior probability.
  • Hard to imagine how a trick HHTHT coin could
    work low prior probability.

57
Plan for tonight
  • Why be Bayesian?
  • Informal introduction to learning as
    probabilistic inference
  • Formal introduction to probabilistic inference
  • A little bit of mathematical psychology
  • An introduction to Bayes nets

58
Scaling up
  • Three binary variables Cavity, Toothache, Catch
    (whether dentists probe catches in your tooth).

59
Scaling up
  • Three binary variables Cavity, Toothache, Catch
    (whether dentists probe catches in your tooth).
  • With n pieces of evidence, we need 2n1
    conditional probabilities.
  • Here n2. Realistically, many more X-ray, diet,
    oral hygiene, personality, . . . .

60
Conditional independence
  • All three variables are dependent, but Toothache
    and Catch are independent given the presence or
    absence of Cavity.
  • Both Toothache and Catch are caused by Cavity,
    but via independent causal mechanisms.
  • In probabilistic terms
  • With n pieces of evidence, x1, , xn, we need 2 n
    conditional probabilities

61
A simple Bayes net
  • Graphical representation of relations between a
    set of random variables
  • Causal interpretation independent local
    mechanisms
  • Probabilistic interpretation factorizing complex
    terms

62
A more complex system
Battery
Radio
Ignition
Gas
Starts
On time to work
  • Joint distribution sufficient for any inference

63
A more complex system
Battery
Radio
Ignition
Gas
Starts
On time to work
  • Joint distribution sufficient for any inference

64
A more complex system
Battery
Radio
Ignition
Gas
Starts
On time to work
  • Joint distribution sufficient for any inference
  • General inference algorithm local message passing

65
Explaining away
  • Assume grass will be wet if and only if it rained
    last night, or if the sprinklers were left on

66
Explaining away
Compute probability it rained last night, given
that the grass is wet
67
Explaining away
Compute probability it rained last night, given
that the grass is wet
68
Explaining away
Compute probability it rained last night, given
that the grass is wet
69
Explaining away
Compute probability it rained last night, given
that the grass is wet
70
Explaining away
Compute probability it rained last night, given
that the grass is wet
71
Explaining away
Compute probability it rained last night, given
that the grass is wet and sprinklers were left
on
72
Explaining away
Compute probability it rained last night, given
that the grass is wet and sprinklers were left
on
73
Explaining away
Discounting to prior probability.
74
Contrast w/ spreading activation
Rain
Sprinkler
Grass Wet
  • Observing rain, Wet becomes more active.
  • Observing grass wet, Rain and Sprinkler become
    more active.
  • Observing grass wet and sprinkler, Rain cannot
    become less active. No explaining away!
  • Excitatory links Rain Wet, Sprinkler
    Wet

75
Contrast w/ spreading activation
Rain
Sprinkler
Grass Wet
  • Excitatory links Rain Wet, Sprinkler
    Wet
  • Inhibitory link Rain Sprinkler
  • Observing grass wet, Rain and Sprinkler become
    more active.
  • Observing grass wet and sprinkler, Rain becomes
    less active explaining away.

76
Contrast w/ spreading activation
Rain
Burst pipe
Sprinkler
Grass Wet
  • Each new variable requires more inhibitory
    connections.
  • Interactions between variables are not causal.
  • Not modular.
  • Whether a connection exists depends on what other
    connections exist, in non-transparent ways.
  • Big holism problem.
  • Combinatorial explosion.

77
Causality and the Markov property
  • Markov property Any variable is conditionally
    independent of its non-descendants, given its
    parents.
  • Example

78
Causality and the Markov property
  • Markov property Any variable is conditionally
    independent of its non-descendants, given its
    parents.
  • Example

79
Causality and the Markov property
  • Markov property Any variable is conditionally
    independent of its non-descendants, given its
    parents.
  • Example

80
Causality and the Markov property
  • Markov property Any variable is conditionally
    independent of its non-descendants, given its
    parents.
  • Example

81
Causality and the Markov property
  • Markov property Any variable is conditionally
    independent of its non-descendants, given its
    parents.
  • Example

82
Causality and the Markov property
  • Markov property Any variable is conditionally
    independent of its non-descendants, given its
    parents.
  • Example

83
Causality and the Markov property
  • Markov property Any variable is conditionally
    independent of its non-descendants, given its
    parents.
  • Suppose we get the direction of causality wrong,
    thinking that symptoms causes diseases
  • Does not capture the correlation between
    symptoms falsely believe P(Ache, Catch)
    P(Ache) P(Catch).

Ache
Catch
Cavity
84
Causality and the Markov property
  • Markov property Any variable is conditionally
    independent of its non-descendants, given its
    parents.
  • Suppose we get the direction of causality wrong,
    thinking that symptoms causes diseases
  • Inserting a new arrow allows us to capture this
    correlation.
  • This model is too complex do not believe that

Ache
Catch
Cavity
85
Causality and the Markov property
  • Markov property Any variable is conditionally
    independent of its non-descendants, given its
    parents.
  • Suppose we get the direction of causality wrong,
    thinking that symptoms causes diseases
  • New symptoms require a combinatorial
    proliferation of new arrows. Too general, not
    modular, holism, yuck . . . .

Ache
X-ray
Catch
Cavity
86
Still to come
  • Applications to models of categorization
  • More on the relation between causality and
    probability
  • Learning causal graph structures.
  • Learning causal abstractions (diseases cause
    symptoms)
  • Whats missing

87
The end
88
Mathcamp data raw
89
Mathcamp data collapsed over parity
90
Zenith radio data collapsed over parity
91
(No Transcript)
Write a Comment
User Comments (0)