Part III Learning structured representations Hierarchical Bayesian models - PowerPoint PPT Presentation

About This Presentation
Title:

Part III Learning structured representations Hierarchical Bayesian models

Description:

Bags in general. Meta-constraints. Shape of the Beta prior. A hierarchical Bayesian model ... Bags in general. Meta-constraints. Learning about feature ... – PowerPoint PPT presentation

Number of Views:40
Avg rating:3.0/5.0
Slides: 74
Provided by: cmu64
Category:

less

Transcript and Presenter's Notes

Title: Part III Learning structured representations Hierarchical Bayesian models


1
Part IIILearning structured representationsHier
archical Bayesian models
2
Universal Grammar
Hierarchical phrase structure grammars (e.g.,
CFG, HPSG, TAG)
Grammar
Phrase structure
Utterance
Speech signal
3
Outline
  • Learning structured representations
  • grammars
  • logical theories
  • Learning at multiple levels of abstraction

4
A historical divide
Structured Representations
Unstructured Representations
vs
Innate knowledge
Learning
5
Structured Representations
Chomsky Keil
Structure Learning
  • Innate Knowledge

Learning
McClelland, Rumelhart
Unstructured Representations
6
Representations
asbestos
Causal networks
lung cancer
coughing
chest pain
Grammars
Logical theories
7
Representations
Phonological rules
cause
Chemicals
Diseases
affect
affect
interact with
Semantic networks
disrupt
Biologicalfunctions
Bio-active substances
8
How to learn a R
  • Search for R that maximizes
  • Prerequisites
  • Put a prior over a hypothesis space of Rs.
  • Decide how observable data are generated from an
    underlying R.

9
How to learn a R
anything
  • Search for R that maximizes
  • Prerequisites
  • Put a prior over a hypothesis space of Rs.
  • Decide how observable data are generated from an
    underlying R.

10
Context free grammar
S ? N VP
VP ? V
N ? Alice
V ? scratched
VP ? V N
N ? Bob
V ? cheered
S
S
N
VP
N
VP
V
Alice
V
Alice
N
cheered
scratched
Bob
11
Probabilistic context free grammar
1.0
0.6
0.5
0.5
S ? N VP
VP ? V
N ? Alice
V ? scratched
0.4
0.5
0.5
VP ? V N
N ? Bob
V ? cheered
S
S
1.0
1.0
N
VP
N
VP
0.5
0.4
0.5
0.6
V
Alice
V
Alice
N
0.5
0.5
cheered
scratched
Bob
probability 1.0 0.5 0.6 0.3
probability 1.00.50.40.50.5 0.05
12
The learning problem
Grammar G
1.0
0.6
0.5
0.5
S ? N VP
VP ? V
N ? Alice
V ? scratched
0.4
0.5
0.5
VP ? V N
N ? Bob
V ? cheered
Data D
Alice scratched. Bob scratched. Alice scratched
Alice. Alice scratched Bob. Bob scratched
Alice. Bob scratched Bob.
Alice cheered. Bob cheered. Alice cheered
Alice. Alice cheered Bob. Bob cheered Alice. Bob
cheered Bob.
13
Grammar learning
  • Search for G that maximizes
  • Prior
  • Likelihood
  • assume that sentences in the data are
    independently generated from the grammar.

(Horning 1969 Stolcke 1994)
14
Experiment
  • Data 100 sentences

...
(Stolcke, 1994)
15
Generating grammar
Model solution
16
Predicate logic
  • A compositional language

For all x and y, if y is the sibling of x then x
is the sibling of y
For all x, y and z, if x is the ancestor of y and
y is the ancestor of z, then x is the ancestor of
z.
17
Learning a kinship theory
Theory T
Data D
  • Sibling(victoria, arthur), Sibling(arthur,victori
    a),
  • Ancestor(chris,victoria), Ancestor(chris,colin),
  • Parent(chris,victoria), Parent(victoria,colin)
    ,
  • Uncle(arthur,colin), Brother(arthur,victoria)


(Hinton, Quinlan, )
18
Learning logical theories
  • Search for T that maximizes
  • Prior
  • Likelihood
  • assume that the data include all facts that are
    true according to T

(Conklin and Witten Kemp et al 08 Katz et al 08)
19
Theory-learning in the lab
R(c,b)
R(k,c)
R(f,c)
R(c,l)
R(f,k)
R(k,l)
R(l,b)
R(f,l)
R(l,h)
R(f,b)
R(k,b)
R(f,h)
R(b,h)
R(c,h)
R(k,h)
(cf Krueger 1979)
20
Theory-learning in the lab
Transitive
R(f,k). R(k,c). R(c,l). R(l,b). R(b,h).
R(X,Z) ? R(X,Y), R(Y,Z).
21
Learning time
Complexity
trans.
trans.
Theory length
Goodman
(Kemp et al 08)
trans.
excep.
trans.
22
Conclusion Part 1
  • Bayesian models can combine structured
    representations with statistical inference.

23
Outline
  • Learning structured representations
  • grammars
  • logical theories
  • Learning at multiple levels of abstraction

24
Vision
(Han and Zhu, 2006)
25
Motor Control
(Wolpert et al., 2003)
26
Causal learning
chemicals
Schema
diseases
symptoms
asbestos
mercury
Causal models
lung cancer
minamata disease
muscle wasting
coughing
chest pain
Patient 1 asbestos exposure, coughing, chest pain
ContingencyData
Patient 2 mercury exposure, muscle wasting
(Kelley Cheng Waldmann)
27
Universal Grammar
Hierarchical phrase structure grammars (e.g.,
CFG, HPSG, TAG)
P(grammar UG)
Grammar
P(phrase structure grammar)
Phrase structure
P(utterance phrase structure)
Utterance
P(speech utterance)
Speech signal
28
Hierarchical Bayesian model
U
Universal Grammar
P(GU)
G
Grammar
P(sG)
s1
s2
s3
s4
s5
s6
Phrase structure
P(us)
u1
u2
u3
u4
u5
u6
Utterance
A hierarchical Bayesian model specifies a joint
distribution over all variables in the
hierarchy P(ui, si, G U)
P (ui si) P(si G)
P(GU)
29
Top-down inferences
U
Universal Grammar
G
Grammar
s1
s2
s3
s4
s5
s6
Phrase structure
u1
u2
u3
u4
u5
u6
Utterance
Infer si given ui, G P( si ui, G)
a P( ui si ) P( si G)
30
Bottom-up inferences
U
Universal Grammar
G
Grammar
s1
s2
s3
s4
s5
s6
Phrase structure
u1
u2
u3
u4
u5
u6
Utterance
Infer G given si and U P(G si, U) a P(
si G) P(GU)
31
Simultaneous learning at multiple levels
U
Universal Grammar
G
Grammar
s1
s2
s3
s4
s5
s6
Phrase structure
u1
u2
u3
u4
u5
u6
Utterance
Infer G and si given ui and U P(G, si
ui, U) a P( ui si )P(si G)P(GU)
32
Word learning
Whole-object bias Shape bias
Words in general
Individual words
gavagai
duck
monkey
car
Data
33
A hierarchical Bayesian model
physical knowledge
Coins
q Beta(FH,FT)
FH,FT
...
Coin 1
Coin 2
Coin 200
q200
q1
q2
d1 d2 d3 d4
d1 d2 d3 d4
d1 d2 d3 d4
  • Qualitative physical knowledge (symmetry) can
    influence estimates of continuous parameters (FH,
    FT).
  • Explains why 10 flips of 200 coins are better
    than 2000 flips of a single coin more
    informative about FH, FT.

34
Word Learning
This is a dax.
Show me the dax.
  • 24 month olds show a shape bias
  • 20 month olds do not

(Landau, Smith Gleitman)
35
Is the shape bias learned?
  • Smith et al (2002) trained 17-month-olds on
    labels for 4 artificial categories
  • After 8 weeks of training 19-month-olds show the
    shape bias

Show me the dax.
This is a dax.
36
Learning about feature variability
?
(cf. Goodman)
37
Learning about feature variability
?
(cf. Goodman)
38
A hierarchical model
Meta-constraints
M
Color varies across bags but not much within bags
Bags in general
mostly red
mostly brown
mostly green
mostly yellow
mostly blue?

Bag proportions
Data

39
A hierarchical Bayesian model
M
Meta-constraints
Within-bag variability
0.1
Bags in general
0.4, 0.4, 0.2

1,0,0
0,1,0
1,0,0
0,1,0
.1,.1,.8
Bag proportions

Data
6,0,0
0,6,0
6,0,0
0,6,0
0,0,1

40
A hierarchical Bayesian model
M
Meta-constraints
Within-bag variability
5
Bags in general
0.4, 0.4, 0.2

.5,.5,0
.5,.5,0
.5,.5,0
.5,.5,0
.4,.4,.2
Bag proportions

Data
3,3,0
3,3,0
3,3,0
3,3,0
0,0,1

41
Shape of the Beta prior
42
A hierarchical Bayesian model
Meta-constraints
M
Bags in general

Bag proportions
Data

43
A hierarchical Bayesian model
Meta-constraints
M
Bags in general

Bag proportions
Data

44
Learning about feature variability
Meta-constraints
M
Categories in general
Individual categories
Data
45
wib
lug
zup
div
46
wib
lug
zup
div
dax
47
Model predictions
Choice probability
dax
Show me the dax
48
Where do priors come from?
Meta-constraints
M
Categories in general
Individual categories
Data
49
Knowledge representation
50
The discovery of structural form
BIOLOGY
POLITICS
Scalia
Ginsburg
Stevens
Thomas
COLOR
FRIENDSHIP
CHEMISTRY
51
Children discover structural form
  • Children may discover that
  • Social networks are often organized into cliques
  • The months form a cycle
  • Heavier than is transitive
  • Category labels can be organized into hierarchies

52
A hierarchical Bayesian model
Meta-constraints
M
Form
Tree
Structure
Data
53
A hierarchical Bayesian model
Meta-constraints
M
F form
Tree
S structure
D data
54
Structural forms
Order
Chain
Ring
Partition
Hierarchy
Tree
Grid
Cylinder
55
P(SF,n) Generating structures
  • Each structure is weighted by the number of
    nodes
    it contains

if S inconsistent with F
otherwise
where is the number of nodes in S
56
P(SF, n) Generating structures from forms
  • Simpler forms are preferred

Chain
Grid
P(SF)
All possible graph structures S
A
B
C
D
57
A hierarchical Bayesian model
Meta-constraints
M
F form
Tree
S structure
D data
58
p(DS) Generating feature data
  • Intuition features should be smooth over graph S

Relatively smooth
Not smooth
59
p(DS) Generating feature data
i
Let be the feature value at node i

j
(Zhu, Lafferty Ghahramani)
60
A hierarchical Bayesian model
Meta-constraints
M
F form
Tree
S structure
D data
61
Feature data results
features
animals
cases
judges
62
Developmental shifts
5 features
20 features
110 features
63
Similarity data results
colors
colors
64
Relational data
Meta-constraints
M
Form
Cliques
1
2
4
5
Structure
7
8
3
6
Data
65
Relational data results
Primates
Prisoners
Bush cabinet
x dominates y
x is friends with y
x tells y
66
Universal Structure grammar
U
Form
Structure
warm
Data
67
Node-replacement graph grammars
Production (Chain)
Derivation
68
A hypothesis space of forms
Form
Form
Process
Process
69
The complete space of grammars
1
...
...
4096
70
Universal Structure grammar
U
Form
Structure
feature
Data
71
Conclusions Part 2
  • Hierarchical Bayesian models provide a unified
    framework which helps to explain
  • How abstract knowledge is acquired
  • How abstract knowledge is used for induction

72
Outline
  • Learning structured representations
  • grammars
  • logical theories
  • Learning at multiple levels of abstraction

73
Handbook of Mathematical Psychology, 1963
Write a Comment
User Comments (0)
About PowerShow.com