A%20Bayesian%20view%20of%20language%20evolution%20by%20iterated%20learning - PowerPoint PPT Presentation

About This Presentation
Title:

A%20Bayesian%20view%20of%20language%20evolution%20by%20iterated%20learning

Description:

Title: Inductive inference in perception and cognition Author: Josh Tenenbaum Last modified by: Thomas Griffiths Created Date: 1/21/2004 2:57:30 AM – PowerPoint PPT presentation

Number of Views:147
Avg rating:3.0/5.0
Slides: 42
Provided by: JoshT161
Category:

less

Transcript and Presenter's Notes

Title: A%20Bayesian%20view%20of%20language%20evolution%20by%20iterated%20learning


1
A Bayesian view of language evolution by iterated
learning
  • Tom Griffiths
  • Brown University

Mike Kalish University of Louisiana
2
Linguistic universals
  • Human languages are a subset of all logically
    possible communication schemes
  • universal properties common to all languages
  • (Comrie, 1981 Greenberg, 1963 Hawkins, 1988)
  • Two questions
  • why do linguistic universals exist?
  • why are particular properties universal?

3
Possible explanations
  • Traditional answer
  • linguistic universals reflect innate constraints
    specific to a system for acquiring language
  • (e.g., Chomsky, 1965)
  • Alternative answer
  • linguistic universals emerge as the result of the
    fact that language is learned anew by each
    generation
  • (e.g., Briscoe, 1998 Kirby, 2001)

4
Iterated learning(Kirby, 2001)
  • Each learner sees data, forms a hypothesis,
    produces the data given to the next learner
  • c.f. the playground game telephone

5
The information bottleneck(Kirby, 2001)
size indicates compressibility
6
Analyzing iterated learning
What are the consequences of iterated learning?
?
Komarova, Niyogi, Nowak (2002) Brighton (2002)
Kirby (2001) Smith, Kirby, Brighton (2003)
7
Outline
  • Iterated Bayesian learning
  • Markov chains
  • Convergence results
  • Example Emergence of compositionality
  • Conclusion

8
Outline
  • Iterated Bayesian learning
  • Markov chains
  • Convergence results
  • Example Emergence of compositionality
  • Conclusion

9
Bayesian inference
  • Rational procedure for updating beliefs
  • Foundation of many learning algorithms
  • (e.g., Mackay, 2003)
  • Widely used for language learning
  • (e.g., Charniak, 1993)

Reverend Thomas Bayes
10
Bayes theorem
h hypothesis d data
11
Iterated Bayesian learning
p(hd)
p(hd)
p(dh)
p(dh)
12
Outline
  • Iterated Bayesian learning
  • Markov chains
  • Convergence results
  • Example Emergence of compositionality
  • Conclusion

13
Markov chains
x
x
x
x
x
x
x
x
Transition matrix P(x(t1)x(t))
  • Variables x(t1) independent of history given
    x(t)
  • Converges to a stationary distribution under
    easily checked conditions for ergodicity

14
Markov chain Monte Carlo
  • A strategy for sampling from complex probability
    distributions
  • Key idea construct a Markov chain which
    converges to a particular distribution
  • e.g. Metropolis algorithm
  • e.g. Gibbs sampling

15
Gibbs sampling
  • For variables x x1, x2, , xn
  • Draw xi(t1) from P(xix-i)
  • x-i x1(t1), x2(t1),, xi-1(t1), xi1(t), ,
    xn(t)
  • Converges to P(x1, x2, , xn)

(Geman Geman, 1984)
(a.k.a. the heat bath algorithm in statistical
physics)
16
Gibbs sampling
(MacKay, 2003)
17
Outline
  • Iterated Bayesian learning
  • Markov chains
  • Convergence results
  • Example Emergence of compositionality
  • Conclusion

18
Analyzing iterated learning
  • Iterated learning is a Markov chain on (h,d)

19
Analyzing iterated learning
p(hd)
p(hd)
p(dh)
p(dh)
  • Iterated learning is a Markov chain on (h,d)
  • Iterated Bayesian learning is a Gibbs sampler for

20
Analytic results
  • Iterated Bayesian learning converges to

(geometrically Liu, Wong, Kong, 1995)
21
Analytic results
  • Iterated Bayesian learning converges to
  • Corollaries
  • distribution over hypotheses converges to p(h)
  • distribution over data converges to p(d)
  • the proportion of a population of iterated
    learners with hypothesis h converges to p(h)

(geometrically Liu, Wong, Kong, 1995)
22
Outline
  • Iterated Bayesian learning
  • Markov chains
  • Convergence results
  • Example Emergence of compositionality
  • Conclusion

23
A simple language model
24
A simple language model
0
1
  • Data m event-utterance pairs
  • Hypotheses languages, with error ?

0
1
holistic
0
1
0
1
25
Analysis technique
  • Compute transition matrix on languages
  • Sample Markov chains
  • Compare language frequencies with prior
  • (can also compute eigenvalues etc.)

26
Convergence to priors
? 0.50, ? 0.05, m 3
Chain
Prior
? 0.01, ? 0.05, m 3
Iteration
27
The information bottleneck
? 0.50, ? 0.05, m 1
Chain
Prior
? 0.01, ? 0.05, m 3
? 0.50, ? 0.05, m 10
Iteration
28
The information bottleneck
Bottleneck affects relative stability of
languages favored by prior
29
Outline
  • Iterated Bayesian learning
  • Markov chains
  • Convergence results
  • Example Emergence of compositionality
  • Conclusion

30
Implications for linguistic universals
  • Two questions
  • why do linguistic universals exist?
  • why are particular properties universal?
  • Different answers
  • existence explained through iterated learning
  • universal properties depend on the prior
  • Focuses inquiry on the priors of the learners
  • languages reflect the biases of human learners

31
Extensions and future directions
  • Results extend to
  • unbounded populations
  • continuous time population dynamics
  • Iterated learning applies to other knowledge
  • religious concepts, social norms, legends
  • Provides a method for evaluating priors
  • experiments in iterated learning with humans

32
(No Transcript)
33
Iterated function learning
  • Each learner sees a set of (x,y) pairs
  • Makes predictions of y for new x values
  • Predictions are data for the next learner

34
Function learning in the lab
Examine iterated learning with different initial
data
35
Initial data
Iteration
1 2 3 4
5 6 7 8 9
(Kalish, 2004)
36
(No Transcript)
37
An example Gaussians
  • If we assume
  • data, d, is a single real number, x
  • hypotheses, h, are means of a Gaussian, ?
  • prior, p(?), is Gaussian(?0,?02)
  • then p(xn1xn) is Gaussian(?n, ?x2 ?n2)

38
?0 0, ?02 1, x0 20 Iterated learning
results in rapid convergence to prior
39
An example Linear regression
  • Assume
  • data, d, are pairs of real numbers (x, y)
  • hypotheses, h, are functions
  • An example linear regression
  • hypotheses have slope ? and pass through origin
  • p(?) is Gaussian(?0,?02)

y

?
x 1
40
y

?
?0 1, ?02 0.1, y0 -1
x 1
41
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com