Part III Learning structured representations Hierarchical Bayesian models - PowerPoint PPT Presentation

About This Presentation

Title:

Part III Learning structured representations Hierarchical Bayesian models

Description:

Bags in general. Meta-constraints. Shape of the Beta prior. A hierarchical Bayesian model ... Bags in general. Meta-constraints. Learning about feature ... – PowerPoint PPT presentation

Number of Views:40

Avg rating:3.0/5.0

Slides: 74

Provided by: cmu64

Learn more at: https://cocosci.princeton.edu

Category:

more less

Transcript and Presenter's Notes

Title: Part III Learning structured representations Hierarchical Bayesian models

1
Part IIILearning structured representationsHier
archical Bayesian models
2
Universal Grammar
Hierarchical phrase structure grammars (e.g.,
CFG, HPSG, TAG)
Grammar
Phrase structure
Utterance
Speech signal
3
Outline

Learning structured representations
grammars
logical theories
Learning at multiple levels of abstraction

4
A historical divide
Structured Representations
Unstructured Representations
vs
Innate knowledge
Learning
5
Structured Representations
Chomsky Keil
Structure Learning

Innate Knowledge

Learning
McClelland, Rumelhart
Unstructured Representations
6
Representations
asbestos
Causal networks
lung cancer
coughing
chest pain
Grammars
Logical theories
7
Representations
Phonological rules
cause
Chemicals
Diseases
affect
affect
interact with
Semantic networks
disrupt
Biologicalfunctions
Bio-active substances
8
How to learn a R

Search for R that maximizes
Prerequisites
Put a prior over a hypothesis space of Rs.
Decide how observable data are generated from an
underlying R.

9
How to learn a R
anything

Search for R that maximizes
Prerequisites
Put a prior over a hypothesis space of Rs.
Decide how observable data are generated from an
underlying R.

10
Context free grammar
S ? N VP
VP ? V
N ? Alice
V ? scratched
VP ? V N
N ? Bob
V ? cheered
S
S
N
VP
N
VP
V
Alice
V
Alice
N
cheered
scratched
Bob
11
Probabilistic context free grammar
1.0
0.6
0.5
0.5
S ? N VP
VP ? V
N ? Alice
V ? scratched
0.4
0.5
0.5
VP ? V N
N ? Bob
V ? cheered
S
S
1.0
1.0
N
VP
N
VP
0.5
0.4
0.5
0.6
V
Alice
V
Alice
N
0.5
0.5
cheered
scratched
Bob
probability 1.0 0.5 0.6 0.3
probability 1.00.50.40.50.5 0.05
12
The learning problem
Grammar G
1.0
0.6
0.5
0.5
S ? N VP
VP ? V
N ? Alice
V ? scratched
0.4
0.5
0.5
VP ? V N
N ? Bob
V ? cheered
Data D
Alice scratched. Bob scratched. Alice scratched
Alice. Alice scratched Bob. Bob scratched
Alice. Bob scratched Bob.
Alice cheered. Bob cheered. Alice cheered
Alice. Alice cheered Bob. Bob cheered Alice. Bob
cheered Bob.
13
Grammar learning

Search for G that maximizes
Prior
Likelihood
assume that sentences in the data are
independently generated from the grammar.

(Horning 1969 Stolcke 1994)
14
Experiment

Data 100 sentences

...
(Stolcke, 1994)
15
Generating grammar
Model solution
16
Predicate logic

A compositional language

For all x and y, if y is the sibling of x then x
is the sibling of y
For all x, y and z, if x is the ancestor of y and
y is the ancestor of z, then x is the ancestor of
z.
17
Learning a kinship theory
Theory T
Data D

Sibling(victoria, arthur), Sibling(arthur,victori
a),
Ancestor(chris,victoria), Ancestor(chris,colin),
Parent(chris,victoria), Parent(victoria,colin)
,
Uncle(arthur,colin), Brother(arthur,victoria)

(Hinton, Quinlan, )
18
Learning logical theories

Search for T that maximizes
Prior
Likelihood
assume that the data include all facts that are
true according to T

(Conklin and Witten Kemp et al 08 Katz et al 08)
19
Theory-learning in the lab
R(c,b)
R(k,c)
R(f,c)
R(c,l)
R(f,k)
R(k,l)
R(l,b)
R(f,l)
R(l,h)
R(f,b)
R(k,b)
R(f,h)
R(b,h)
R(c,h)
R(k,h)
(cf Krueger 1979)
20
Theory-learning in the lab
Transitive
R(f,k). R(k,c). R(c,l). R(l,b). R(b,h).
R(X,Z) ? R(X,Y), R(Y,Z).
21
Learning time
Complexity
trans.
trans.
Theory length
Goodman
(Kemp et al 08)
trans.
excep.
trans.
22
Conclusion Part 1

Bayesian models can combine structured
representations with statistical inference.

23
Outline

Learning structured representations
grammars
logical theories
Learning at multiple levels of abstraction

24
Vision
(Han and Zhu, 2006)
25
Motor Control
(Wolpert et al., 2003)
26
Causal learning
chemicals
Schema
diseases
symptoms
asbestos
mercury
Causal models
lung cancer
minamata disease
muscle wasting
coughing
chest pain
Patient 1 asbestos exposure, coughing, chest pain
ContingencyData
Patient 2 mercury exposure, muscle wasting
(Kelley Cheng Waldmann)
27
Universal Grammar
Hierarchical phrase structure grammars (e.g.,
CFG, HPSG, TAG)
P(grammar UG)
Grammar
P(phrase structure grammar)
Phrase structure
P(utterance phrase structure)
Utterance
P(speech utterance)
Speech signal
28
Hierarchical Bayesian model
U
Universal Grammar
P(GU)
G
Grammar
P(sG)
s1
s2
s3
s4
s5
s6
Phrase structure
P(us)
u1
u2
u3
u4
u5
u6
Utterance
A hierarchical Bayesian model specifies a joint
distribution over all variables in the
hierarchy P(ui, si, G U)
P (ui si) P(si G)
P(GU)
29
Top-down inferences
U
Universal Grammar
G
Grammar
s1
s2
s3
s4
s5
s6
Phrase structure
u1
u2
u3
u4
u5
u6
Utterance
Infer si given ui, G P( si ui, G)
a P( ui si ) P( si G)
30
Bottom-up inferences
U
Universal Grammar
G
Grammar
s1
s2
s3
s4
s5
s6
Phrase structure
u1
u2
u3
u4
u5
u6
Utterance
Infer G given si and U P(G si, U) a P(
si G) P(GU)
31
Simultaneous learning at multiple levels
U
Universal Grammar
G
Grammar
s1
s2
s3
s4
s5
s6
Phrase structure
u1
u2
u3
u4
u5
u6
Utterance
Infer G and si given ui and U P(G, si
ui, U) a P( ui si )P(si G)P(GU)
32
Word learning
Whole-object bias Shape bias
Words in general
Individual words
gavagai
duck
monkey
car
Data
33
A hierarchical Bayesian model
physical knowledge
Coins
q Beta(FH,FT)
FH,FT
...
Coin 1
Coin 2
Coin 200
q200
q1
q2
d1 d2 d3 d4
d1 d2 d3 d4
d1 d2 d3 d4

Qualitative physical knowledge (symmetry) can
influence estimates of continuous parameters (FH,
FT).

Explains why 10 flips of 200 coins are better
than 2000 flips of a single coin more
informative about FH, FT.

34
Word Learning
This is a dax.
Show me the dax.

24 month olds show a shape bias
20 month olds do not

(Landau, Smith Gleitman)
35
Is the shape bias learned?

Smith et al (2002) trained 17-month-olds on
labels for 4 artificial categories
After 8 weeks of training 19-month-olds show the
shape bias

Show me the dax.
This is a dax.
36
Learning about feature variability
?
(cf. Goodman)
37
Learning about feature variability
?
(cf. Goodman)
38
A hierarchical model
Meta-constraints
M
Color varies across bags but not much within bags
Bags in general
mostly red
mostly brown
mostly green
mostly yellow
mostly blue?

Bag proportions
Data

39
A hierarchical Bayesian model
M
Meta-constraints
Within-bag variability
0.1
Bags in general
0.4, 0.4, 0.2

1,0,0
0,1,0
1,0,0
0,1,0
.1,.1,.8
Bag proportions

Data
6,0,0
0,6,0
6,0,0
0,6,0
0,0,1

40
A hierarchical Bayesian model
M
Meta-constraints
Within-bag variability
5
Bags in general
0.4, 0.4, 0.2

.5,.5,0
.5,.5,0
.5,.5,0
.5,.5,0
.4,.4,.2
Bag proportions

Data
3,3,0
3,3,0
3,3,0
3,3,0
0,0,1

41
Shape of the Beta prior
42
A hierarchical Bayesian model
Meta-constraints
M
Bags in general

Bag proportions
Data

43
A hierarchical Bayesian model
Meta-constraints
M
Bags in general

Bag proportions
Data

44
Learning about feature variability
Meta-constraints
M
Categories in general
Individual categories
Data
45
wib
lug
zup
div
46
wib
lug
zup
div
dax
47
Model predictions
Choice probability
dax
Show me the dax
48
Where do priors come from?
Meta-constraints
M
Categories in general
Individual categories
Data
49
Knowledge representation
50
The discovery of structural form
BIOLOGY
POLITICS
Scalia
Ginsburg
Stevens
Thomas
COLOR
FRIENDSHIP
CHEMISTRY
51
Children discover structural form

Children may discover that
Social networks are often organized into cliques
The months form a cycle
Heavier than is transitive
Category labels can be organized into hierarchies

52
A hierarchical Bayesian model
Meta-constraints
M
Form
Tree
Structure
Data
53
A hierarchical Bayesian model
Meta-constraints
M
F form
Tree
S structure
D data
54
Structural forms
Order
Chain
Ring
Partition
Hierarchy
Tree
Grid
Cylinder
55
P(SF,n) Generating structures

Each structure is weighted by the number of
nodes
it contains

if S inconsistent with F
otherwise
where is the number of nodes in S
56
P(SF, n) Generating structures from forms

Simpler forms are preferred

Chain
Grid
P(SF)
All possible graph structures S
A
B
C
D
57
A hierarchical Bayesian model
Meta-constraints
M
F form
Tree
S structure
D data
58
p(DS) Generating feature data

Intuition features should be smooth over graph S

Relatively smooth
Not smooth
59
p(DS) Generating feature data
i
Let be the feature value at node i

j
(Zhu, Lafferty Ghahramani)
60
A hierarchical Bayesian model
Meta-constraints
M
F form
Tree
S structure
D data
61
Feature data results
features
animals
cases
judges
62
Developmental shifts
5 features
20 features
110 features
63
Similarity data results
colors
colors
64
Relational data
Meta-constraints
M
Form
Cliques
1
2
4
5
Structure
7
8
3
6
Data
65
Relational data results
Primates
Prisoners
Bush cabinet
x dominates y
x is friends with y
x tells y
66
Universal Structure grammar
U
Form
Structure
warm
Data
67
Node-replacement graph grammars
Production (Chain)
Derivation
68
A hypothesis space of forms
Form
Form
Process
Process
69
The complete space of grammars
1
...
...
4096
70
Universal Structure grammar
U
Form
Structure
feature
Data
71
Conclusions Part 2