Logistics

About This Presentation

Title:

Logistics

Description:

Logistics Class size? Who is new? Who is listening? Everyone on Athena mailing list concepts-and-theories ? If not write to me. Everyone on stellar yet? – PowerPoint PPT presentation

Number of Views:3

Avg rating:3.0/5.0

Slides: 92

Provided by: JoshT156

Learn more at: http://www.mit.edu

more less

Transcript and Presenter's Notes

Title: Logistics

1
Logistics

Class size? Who is new? Who is listening?
Everyone on Athena mailing list
concepts-and-theories? If not write to me.
Everyone on stellar yet? If not, write to Melissa
Yeh (mjyeh_at_mit.edu).
Interest in having a printed course pack, even if
a few readings get changed?

2
Plan for tonight

Why be Bayesian?
Informal introduction to learning as
probabilistic inference
Formal introduction to probabilistic inference
A little bit of mathematical psychology
An introduction to Bayes nets

3
Plan for tonight

Why be Bayesian?
Informal introduction to learning as
probabilistic inference
Formal introduction to probabilistic inference
A little bit of mathematical psychology
An introduction to Bayes nets

4
Virtues of Bayesian framework

Generates principled models with strong
explanatory and descriptive power.

5
Virtues of Bayesian framework

Generates principled models with strong
explanatory and descriptive power.
Unifies models of cognition across tasks and
domains.
Categorization
Concept learning
Word learning
Inductive reasoning
Causal inference
Conceptual change

Biology
Physics
Psychology
Language
. . .

6
Virtues of Bayesian framework

Generates principled models with strong
explanatory and descriptive power.
Unifies models of cognition across tasks and
domains.
Explains which processing models work, and why.
Associative learning
Connectionist networks
Similarity to examples
Toolkit of simple heuristics

7
Virtues of Bayesian framework

Generates principled models with strong
explanatory and descriptive power.
Unifies models of cognition across tasks and
domains.
Explains which processing models work, and why.
Allows us to move beyond classic dichotomies.
Symbols (rules, logic, hierarchies, relations)
versus Statistics
Domain-general versus Domain-specific
Nature versus Nurture

8
Virtues of Bayesian framework

Generates principled models with strong
explanatory and descriptive power.
Unifies models of cognition across tasks and
domains.
Explains which processing models work, and why.
Allows us to move beyond classic dichotomies.
A framework for understanding theory-based
cognition
How are theories used to learn about the
structure of the world?
How are theories acquired?

9
Rational statistical inference(Bayes, Laplace)

Fundamental question
How do we update beliefs in light of data?
Fundamental (and only) assumption
Represent degrees of belief as probabilities.
The answer
Mathematics of probability theory.

10
What does probability mean?

Frequentists Probability as expected frequency
P(A) 1 A will always occur.
P(A) 0 A will never occur.
0.5 lt P(A) lt 1 A will occur more often than not.
Subjectivists Probability as degree of belief
P(A) 1 believe A is true.
P(A) 0 believe A is false.
0.5 lt P(A) lt 1 believe A is more likely to be
true than false.

11
What does probability mean?

Frequentists Probability as expected frequency
P(heads) 0.5 If we flip 100 times, we
expect to see about 50 heads.
Subjectivists Probability as degree of belief
P(heads) 0.5 On the next flip, its an
even bet whether it comes up heads or tails.
P(rain tomorrow) 0.8
P(Saddam Hussein is dead) 0.1
. . .

12
Is subjective probability cognitively viable?

Evolutionary psychologists (Gigerenzer, Cosmides,
Tooby, Pinker) argue it is not.

To understand the design of statistical
inference mechanisms, then, one needs to examine
what form inductive-reasoning problems -- and the
information relevant to solving them -- regularly
took in ancestral environments. Asking for
the probability of a single event seems
unexceptionable in the modern world, where we are
bombarded with numerically expressed statistical
information, such as weather forecasts telling us
there is a 60 chance of rain today. In
ancestral environments, the only external
database available from which to reason
inductively was one's own observations and,
possibly, those communicated by the handful of
other individuals with whom one lived.
The probability of a single event cannot be
observed by an individual, however. Single
events either happen or they dont -- either it
will rain today or it will not. Natural
selection cannot build cognitive mechanisms
designed to reason about, or receive as input,
information in a format that did not regularly
exist.

(Brase, Cosmides and Tooby, 1998)
14
Is subjective probability cognitively viable?

Evolutionary psychologists (Gigerenzer, Cosmides,
Tooby, Pinker) argue it is not.
Reasons to think it is
Intuitions are old and potentially universal
(Aristotle, the Talmud).
Represented in semantics (and syntax?) of natural
language.
Extremely useful .

15
Why be subjectivist?

Often need to make inferences about singular
events
e.g., How likely is it to rain tomorrow?
Cox Axioms
A formal model of common sense
Dutch Book Survival of the Fittest
If your beliefs do not accord with the laws of
probability, then you can always be out-gambled
by someone whose beliefs do so accord.
Provides a theory of learning
A common currency for combining prior knowledge
and the lessons of experience.

16
Cox Axioms (via Jaynes)

Degrees of belief are represented by real
numbers.
Qualitative correspondence with common sense,
e.g.
Consistency
If a conclusion can be reasoned in more than one
way, then every possible way must lead to the
same result.
All available evidence should be taken into
account when inferring a degree of belief.
Equivalent states of knowledge should be
represented with equivalent degrees of belief.
Accepting these axioms implies Bel can be
represented as a probability measure.

17
Plan for tonight

Why be Bayesian?
Informal introduction to learning as
probabilistic inference
Formal introduction to probabilistic inference
A little bit of mathematical psychology
An introduction to Bayes nets

18
Example flipping coins

Flip a coin 10 times and see 5 heads, 5 tails.
P(heads) on next flip? 50
Why? 50 5 / (55) 5/10.
Future will be like the past.
Suppose we had seen 4 heads and 6 tails.
P(heads) on next flip? Closer to 50 than to 40.
Why? Prior knowledge.

19
Example flipping coins

Represent prior knowledge as fictional
observations F.
E.g., F 1000 heads, 1000 tails strong
expectation that any new coin will be fair.
After seeing 4 heads, 6 tails, P(heads) on next
flip 1004 / (10041006) 49.95
E.g., F 3 heads, 3 tails weak expectation
that any new coin will be fair.
After seeing 4 heads, 6 tails, P(heads) on next
flip 7 / (79) 43.75. Prior knowledge too
weak.

20
Example flipping thumbtacks

Represent prior knowledge as fictional
observations F.
E.g., F 4 heads, 3 tails weak expectation
that tacks are slightly biased towards heads.
After seeing 2 heads, 0 tails, P(heads) on next
flip 6 / (63) 67.
Some prior knowledge is always necessary to avoid
jumping to hasty conclusions.
Suppose F After seeing 2 heads, 0 tails,
P(heads) on next flip 2 / (20) 100.

21
Origin of prior knowledge

Tempting answer prior experience
Suppose you have previously seen 2000 coin flips
1000 heads, 1000 tails.
By assuming all coins (and flips) are alike,
these observations of other coins are as good as
actual observations of the present coin.

22
Problems with simple empiricism

Havent really seen 2000 coin flips, or any
thumbtack flips.
Prior knowledge is stronger than raw experience
justifies.
Havent seen exactly equal number of heads and
tails.
Prior knowledge is smoother than raw experience
justifies.
Should be a difference between observing 2000
flips of a single coin versus observing 10 flips
each for 200 coins, or 1 flip each for 2000
coins.
Prior knowledge is more structured than raw
experience.

23
A simple theory

Coins are manufactured by a standardized
procedure that is effective but not perfect.
Justifies generalizing from previous coins to the
present coin.
Justifies smoother and stronger prior than raw
experience alone.
Explains why seeing 10 flips each for 200 coins
is more valuable than seeing 2000 flips of one
coin.
Tacks are asymmetric, and manufactured to less
exacting standards.

24
Limitations

Can all domain knowledge be represented so
simply, in terms of an equivalent number of
fictional observations?
Suppose you flip a coin 25 times and get all
heads. Something funny is going on .
But with F 1000 heads, 1000 tails, P(heads) on
next flip 1025 / (10251000) 50.6. Looks
like nothing unusual.

25
Plan for tonight

Why be Bayesian?
Informal introduction to learning as
probabilistic inference
Formal introduction to probabilistic inference
A little bit of mathematical psychology
An introduction to Bayes nets

26
Basics

Propositions A, B, C, . . . .
Negation
Logical operators and, or
Obey classical logic, e.g.,

27
Basics

Conservation of belief
Joint probability
For independent propositions
More generally

28
Basics

Example
A Heads on flip 2
B Tails on flip 2

29
Basics

All probabilities should be conditioned on
background knowledge K e.g.,
All the same rules hold conditioned on any K
e.g.,
Often background knowledge will be implicit,
brought in as needed.

30
Bayesian inference

Definition of conditional probability
Bayes theorem

31
Bayesian inference

Definition of conditional probability
Bayes rule
Posterior probability
Prior probability
Likelihood

32
Bayesian inference

Bayes rule
What makes a good scientific argument? P(HD) is
high if
Hypothesis is plausible P(H) is high
Hypothesis strongly predicts the observed data
P(DH) is high
Data are surprising P(D) is low

33
Bayesian inference

Deriving a more useful version

34
Bayesian inference

Deriving a more useful version

35
Bayesian inference

Deriving a more useful version

Conditionalization
36
Bayesian inference

Deriving a more useful version

37
Bayesian inference

Deriving a more useful version

38
Bayesian inference

Deriving a more useful version

39
Bayesian inference

Deriving a more useful version

40
Random variables

Random variable X denotes a set of mutually
exclusive exhaustive propositions (states of the
world)
Bayes theorem for random variables

41
Random variables

Random variable X denotes a set of mutually
exclusive exhaustive propositions (states of the
world)
Bayes rule for more than two hypotheses

42
Sherlock Holmes

How often have I said to you that when you have
eliminated the impossible whatever remains,
however improbable, must be the truth? (The Sign
of the Four)

43
Sherlock Holmes

How often have I said to you that when you have
eliminated the impossible whatever remains,
however improbable, must be the truth? (The Sign
of the Four)

44
Sherlock Holmes

How often have I said to you that when you have
eliminated the impossible whatever remains,
however improbable, must be the truth? (The Sign
of the Four)

0
45
Sherlock Holmes

How often have I said to you that when you have
eliminated the impossible whatever remains,
however improbable, must be the truth? (The Sign
of the Four)

46
Plan for tonight

Why be Bayesian?
Informal introduction to learning as
probabilistic inference
Formal introduction to probabilistic inference
A little bit of mathematical psychology
An introduction to Bayes nets

47
Representativeness in reasoning

Which sequence is more likely to be produced by
flipping a fair coin?
HHTHT
HHHHH

48
A reasoning fallacy

Kahneman Tversky people judge the probability
of an outcome based on the extent to which it is
representative of the generating process.

49
Predictive versus inductive reasoning
Hypothesis
H
Data
D
50
Predictive versus inductive reasoning
Prediction given ?
H
D
51
Predictive versus inductive reasoning
Prediction given ?
Induction ? given
H
D
52
Bayes Rule in odds form

P(H1D) P(DH1) P(H1)
P(H2D) P(DH2) P(H2)
D data
H1, H2 models
P(H1D) posterior probability that model 1
generated the data.
P(DH1) likelihood of data given model 1
P(H1) prior probability that model 1 generated
the data

x
53
Bayesian analysis of coin flipping

D HHTHT
H1, H2 fair coin, trick all heads coin.
P(DH1) 1/32 P(H1) 999/1000
P(DH2) 0 P(H2) 1/1000
P(H1D) / P(H2D) infinity

P(H1D) P(DH1) P(H1) P(H2D)
P(DH2) P(H2)
x
54
Bayesian analysis of coin flipping

D HHHHH
H1, H2 fair coin, trick all heads coin.
P(DH1) 1/32 P(H1) 999/1000
P(DH2) 1 P(H2) 1/1000
P(H1D) / P(H2D) 999/32 301

P(H1D) P(DH1) P(H1) P(H2D)
P(DH2) P(H2)
x
55
Bayesian analysis of coin flipping

D HHHHHHHHHH
H1, H2 fair coin, trick all heads coin.
P(DH1) 1/1024 P(H1) 999/1000
P(DH2) 1 P(H2) 1/1000
P(H1D) / P(H2D) 999/1024 11

P(H1D) P(DH1) P(H1) P(H2D)
P(DH2) P(H2)
x
56
The role of theories

The fact that HHTHT looks representative of a
fair coin and HHHHH does not reflects our
implicit theories of how the world works.
Easy to imagine how a trick all-heads coin could
work high prior probability.
Hard to imagine how a trick HHTHT coin could
work low prior probability.

57
Plan for tonight

Why be Bayesian?
Informal introduction to learning as
probabilistic inference
Formal introduction to probabilistic inference
A little bit of mathematical psychology
An introduction to Bayes nets

58
Scaling up

Three binary variables Cavity, Toothache, Catch
(whether dentists probe catches in your tooth).

59
Scaling up

Three binary variables Cavity, Toothache, Catch
(whether dentists probe catches in your tooth).
With n pieces of evidence, we need 2n1
conditional probabilities.
Here n2. Realistically, many more X-ray, diet,
oral hygiene, personality, . . . .

60
Conditional independence

All three variables are dependent, but Toothache
and Catch are independent given the presence or
absence of Cavity.
Both Toothache and Catch are caused by Cavity,
but via independent causal mechanisms.
In probabilistic terms
With n pieces of evidence, x1, , xn, we need 2 n
conditional probabilities

61
A simple Bayes net

Graphical representation of relations between a
set of random variables
Causal interpretation independent local
mechanisms
Probabilistic interpretation factorizing complex
terms

62
A more complex system
Battery
Radio
Ignition
Gas
Starts
On time to work

Joint distribution sufficient for any inference

63
A more complex system
Battery
Radio
Ignition
Gas
Starts
On time to work

Joint distribution sufficient for any inference

64
A more complex system
Battery
Radio
Ignition
Gas
Starts
On time to work

Joint distribution sufficient for any inference
General inference algorithm local message passing

65
Explaining away

Assume grass will be wet if and only if it rained
last night, or if the sprinklers were left on

66
Explaining away
Compute probability it rained last night, given
that the grass is wet
67
Explaining away
Compute probability it rained last night, given
that the grass is wet
68
Explaining away
Compute probability it rained last night, given
that the grass is wet
69
Explaining away
Compute probability it rained last night, given
that the grass is wet
70
Explaining away
Compute probability it rained last night, given
that the grass is wet
71
Explaining away
Compute probability it rained last night, given
that the grass is wet and sprinklers were left
on
72
Explaining away
Compute probability it rained last night, given
that the grass is wet and sprinklers were left
on
73
Explaining away
Discounting to prior probability.
74
Contrast w/ spreading activation
Rain
Sprinkler
Grass Wet

Observing rain, Wet becomes more active.
Observing grass wet, Rain and Sprinkler become
more active.
Observing grass wet and sprinkler, Rain cannot
become less active. No explaining away!

Excitatory links Rain Wet, Sprinkler
Wet

75
Contrast w/ spreading activation
Rain
Sprinkler
Grass Wet

Excitatory links Rain Wet, Sprinkler
Wet
Inhibitory link Rain Sprinkler

Observing grass wet, Rain and Sprinkler become
more active.
Observing grass wet and sprinkler, Rain becomes
less active explaining away.

76
Contrast w/ spreading activation
Rain
Burst pipe
Sprinkler
Grass Wet

Each new variable requires more inhibitory
connections.
Interactions between variables are not causal.
Not modular.
Whether a connection exists depends on what other
connections exist, in non-transparent ways.
Big holism problem.
Combinatorial explosion.

77
Causality and the Markov property

Markov property Any variable is conditionally
independent of its non-descendants, given its
parents.
Example

78
Causality and the Markov property

Markov property Any variable is conditionally
independent of its non-descendants, given its
parents.
Example

79
Causality and the Markov property

Markov property Any variable is conditionally
independent of its non-descendants, given its
parents.
Example

80
Causality and the Markov property

Markov property Any variable is conditionally
independent of its non-descendants, given its
parents.
Example

81
Causality and the Markov property

Markov property Any variable is conditionally
independent of its non-descendants, given its
parents.
Example

82
Causality and the Markov property

Markov property Any variable is conditionally
independent of its non-descendants, given its
parents.
Example

83
Causality and the Markov property

Markov property Any variable is conditionally
independent of its non-descendants, given its
parents.
Suppose we get the direction of causality wrong,
thinking that symptoms causes diseases
Does not capture the correlation between
symptoms falsely believe P(Ache, Catch)
P(Ache) P(Catch).

Ache
Catch
Cavity
84
Causality and the Markov property

Markov property Any variable is conditionally
independent of its non-descendants, given its
parents.
Suppose we get the direction of causality wrong,
thinking that symptoms causes diseases
Inserting a new arrow allows us to capture this
correlation.
This model is too complex do not believe that

Ache
Catch
Cavity
85
Causality and the Markov property

Markov property Any variable is conditionally
independent of its non-descendants, given its
parents.
Suppose we get the direction of causality wrong,
thinking that symptoms causes diseases
New symptoms require a combinatorial
proliferation of new arrows. Too general, not
modular, holism, yuck . . . .

Ache
X-ray
Catch
Cavity
86
Still to come