A%20Bayesian%20view%20of%20language%20evolution%20by%20iterated%20learning - PowerPoint PPT Presentation

About This Presentation

Title:

A%20Bayesian%20view%20of%20language%20evolution%20by%20iterated%20learning

Description:

Title: Inductive inference in perception and cognition Author: Josh Tenenbaum Last modified by: Thomas Griffiths Created Date: 1/21/2004 2:57:30 AM – PowerPoint PPT presentation

Number of Views:147

Avg rating:3.0/5.0

Slides: 42

Provided by: JoshT161

Learn more at: http://cocosci.berkeley.edu

Category:

more less

Transcript and Presenter's Notes

Title: A%20Bayesian%20view%20of%20language%20evolution%20by%20iterated%20learning

1
A Bayesian view of language evolution by iterated
learning

Tom Griffiths
Brown University

Mike Kalish University of Louisiana
2
Linguistic universals

Human languages are a subset of all logically
possible communication schemes
universal properties common to all languages
(Comrie, 1981 Greenberg, 1963 Hawkins, 1988)
Two questions
why do linguistic universals exist?
why are particular properties universal?

3
Possible explanations

Traditional answer
linguistic universals reflect innate constraints
specific to a system for acquiring language
(e.g., Chomsky, 1965)
Alternative answer
linguistic universals emerge as the result of the
fact that language is learned anew by each
generation
(e.g., Briscoe, 1998 Kirby, 2001)

4
Iterated learning(Kirby, 2001)

Each learner sees data, forms a hypothesis,
produces the data given to the next learner
c.f. the playground game telephone

5
The information bottleneck(Kirby, 2001)
size indicates compressibility
6
Analyzing iterated learning
What are the consequences of iterated learning?
?
Komarova, Niyogi, Nowak (2002) Brighton (2002)
Kirby (2001) Smith, Kirby, Brighton (2003)
7
Outline

Iterated Bayesian learning
Markov chains
Convergence results
Example Emergence of compositionality
Conclusion

8
Outline

Iterated Bayesian learning
Markov chains
Convergence results
Example Emergence of compositionality
Conclusion

9
Bayesian inference

Rational procedure for updating beliefs
Foundation of many learning algorithms
(e.g., Mackay, 2003)
Widely used for language learning
(e.g., Charniak, 1993)

Reverend Thomas Bayes
10
Bayes theorem
h hypothesis d data
11
Iterated Bayesian learning
p(hd)
p(hd)
p(dh)
p(dh)
12
Outline

Iterated Bayesian learning
Markov chains
Convergence results
Example Emergence of compositionality
Conclusion

13
Markov chains
x
x
x
x
x
x
x
x
Transition matrix P(x(t1)x(t))

Variables x(t1) independent of history given
x(t)
Converges to a stationary distribution under
easily checked conditions for ergodicity

14
Markov chain Monte Carlo

A strategy for sampling from complex probability
distributions
Key idea construct a Markov chain which
converges to a particular distribution
e.g. Metropolis algorithm
e.g. Gibbs sampling

15
Gibbs sampling

For variables x x1, x2, , xn
Draw xi(t1) from P(xix-i)
x-i x1(t1), x2(t1),, xi-1(t1), xi1(t), ,
xn(t)
Converges to P(x1, x2, , xn)

(Geman Geman, 1984)
(a.k.a. the heat bath algorithm in statistical
physics)
16
Gibbs sampling
(MacKay, 2003)
17
Outline

Iterated Bayesian learning
Markov chains
Convergence results
Example Emergence of compositionality
Conclusion

18
Analyzing iterated learning

Iterated learning is a Markov chain on (h,d)

19
Analyzing iterated learning
p(hd)
p(hd)
p(dh)
p(dh)

Iterated learning is a Markov chain on (h,d)
Iterated Bayesian learning is a Gibbs sampler for

20
Analytic results

Iterated Bayesian learning converges to

(geometrically Liu, Wong, Kong, 1995)
21
Analytic results

Iterated Bayesian learning converges to
Corollaries
distribution over hypotheses converges to p(h)
distribution over data converges to p(d)
the proportion of a population of iterated
learners with hypothesis h converges to p(h)

(geometrically Liu, Wong, Kong, 1995)
22
Outline

Iterated Bayesian learning
Markov chains
Convergence results
Example Emergence of compositionality
Conclusion

23
A simple language model
24
A simple language model
0
1

Data m event-utterance pairs
Hypotheses languages, with error ?

0
1
holistic
0
1
0
1
25
Analysis technique

Compute transition matrix on languages
Sample Markov chains
Compare language frequencies with prior
(can also compute eigenvalues etc.)

26
Convergence to priors
? 0.50, ? 0.05, m 3
Chain
Prior
? 0.01, ? 0.05, m 3
Iteration
27
The information bottleneck
? 0.50, ? 0.05, m 1
Chain
Prior
? 0.01, ? 0.05, m 3
? 0.50, ? 0.05, m 10
Iteration
28
The information bottleneck
Bottleneck affects relative stability of
languages favored by prior
29
Outline

Iterated Bayesian learning
Markov chains
Convergence results
Example Emergence of compositionality
Conclusion

30
Implications for linguistic universals

Two questions
why do linguistic universals exist?
why are particular properties universal?
Different answers
existence explained through iterated learning
universal properties depend on the prior
Focuses inquiry on the priors of the learners
languages reflect the biases of human learners

31
Extensions and future directions

Results extend to
unbounded populations
continuous time population dynamics
Iterated learning applies to other knowledge
religious concepts, social norms, legends
Provides a method for evaluating priors
experiments in iterated learning with humans

32
(No Transcript)
33
Iterated function learning

Each learner sees a set of (x,y) pairs
Makes predictions of y for new x values
Predictions are data for the next learner

34
Function learning in the lab
Examine iterated learning with different initial
data
35
Initial data
Iteration
1 2 3 4
5 6 7 8 9
(Kalish, 2004)
36
(No Transcript)
37
An example Gaussians