Title: Analyzing iterated learning
1Analyzing iterated learning
- Tom Griffiths
- Brown University
Mike Kalish University of Louisiana
2Cultural transmission
- Most knowledge is based on secondhand data
- Some things can only be learned from others
- cultural objects transmitted across generations
- Studying the cognitive aspects of cultural
transmission provides unique insights
3Iterated learning(Kirby, 2001)
- Each learner sees data, forms a hypothesis,
produces the data given to the next learner - c.f. the playground game telephone
4Objects of iterated learning
- Its not just about languages
- In the wild
- religious concepts
- social norms
- myths and legends
- causal theories
- In the lab
- functions and categories
5Outline
- Analyzing iterated learning
- Iterated Bayesian learning
- Examples
- Iterated learning with humans
- Conclusions and open questions
6Outline
- Analyzing iterated learning
- Iterated Bayesian learning
- Examples
- Iterated learning with humans
- Conclusions and open questions
7Discrete generations of single learners
PL(hd)
PL(hd)
PP(dh)
PP(dh)
PL(hd) probability of inferring hypothesis h
from data d PP(dh) probability of generating
data d from hypothesis h
8Markov chains
x
x
x
x
x
x
x
x
Transition matrix T P(x(t1)x(t))
- Variables x(t1) independent of history given
x(t) - Converges to a stationary distribution under
easily checked conditions for ergodicity
9Stationary distributions
- Stationary distribution
- In matrix form
- ? is the first eigenvector of the matrix T
- Second eigenvalue sets rate of convergence
10Analyzing iterated learning
11A Markov chain on hypotheses
- Transition probabilities sum out data
- Stationary distribution and convergence rate from
eigenvectors and eigenvalues of Q - can be computed numerically for matrices of
reasonable size, and analytically in some cases
12Infinite populations in continuous time
- Language dynamical equation
- Neutral model (fj(x) constant)
- Stable equilibrium at first eigenvector of Q
(Nowak, Komarova, Niyogi, 2001)
(Komarova Nowak, 2003)
13Outline
- Analyzing iterated learning
- Iterated Bayesian learning
- Examples
- Iterated learning with humans
- Conclusions and open questions
14Bayesian inference
- Rational procedure for updating beliefs
- Foundation of many learning algorithms
- (e.g., Mackay, 2003)
- Widely used for language learning
- (e.g., Charniak, 1993)
Reverend Thomas Bayes
15Bayes theorem
h hypothesis d data
16Iterated Bayesian learning
17Markov chains on h and d
- Markov chain on h has stationary distribution
- Markov chain on d has stationary distribution
the prior
the prior predictive distribution
18Markov chain Monte Carlo
- A strategy for sampling from complex probability
distributions - Key idea construct a Markov chain which
converges to a particular distribution - e.g. Metropolis algorithm
- e.g. Gibbs sampling
19Gibbs sampling
- For variables x x1, x2, , xn
- Draw xi(t1) from P(xix-i)
- x-i x1(t1), x2(t1),, xi-1(t1), xi1(t), ,
xn(t) - Converges to P(x1, x2, , xn)
(Geman Geman, 1984)
(a.k.a. the heat bath algorithm in statistical
physics)
20Gibbs sampling
(MacKay, 2003)
21Iterated learning is a Gibbs sampler
- Iterated Bayesian learning is a sampler for
- Implies
- (h,d) converges to this distribution
- converence rates are known
- (Liu, Wong, Kong, 1995)
22Outline
- Analyzing iterated learning
- Iterated Bayesian learning
- Examples
- Iterated learning with humans
- Conclusions and open questions
23An example Gaussians
- If we assume
- data, d, is a single real number, x
- hypotheses, h, are means of a Gaussian, ?
- prior, p(?), is Gaussian(?0,?02)
- then p(xn1xn) is Gaussian(?n, ?x2 ?n2)
24An example Gaussians
- If we assume
- data, d, is a single real number, x
- hypotheses, h, are means of a Gaussian, ?
- prior, p(?), is Gaussian(?0,?02)
- then p(xn1xn) is Gaussian(?n, ?x2 ?n2)
- p(xnx0) is Gaussian(?0cnx0, (?x2 ?02)(1 -
c2n)) - i.e. geometric convergence to prior
25An example Gaussians
- p(xn1x0) is Gaussian(?0cnx0,(?x2
?02)(1-c2n))
26?0 0, ?02 1, x0 20 Iterated learning
results in rapid convergence to prior
27An example Linear regression
- Assume
- data, d, are pairs of real numbers (x, y)
- hypotheses, h, are functions
- An example linear regression
- hypotheses have slope ? and pass through origin
- p(?) is Gaussian(?0,?02)
y
?
x 1
28y
?
?0 1, ?02 0.1, y0 -1
x 1
29An example compositionality
30An example compositionality
0
1
- Data m event-utterance pairs
- Hypotheses languages, with error ?
0
1
holistic
0
1
0
1
31Analysis technique
- Compute transition matrix on languages
- Sample Markov chains
- Compare language frequencies with prior
- (can also compute eigenvalues etc.)
32Convergence to priors
? 0.50, ? 0.05, m 3
Chain
Prior
? 0.01, ? 0.05, m 3
Iteration
33The information bottleneck
? 0.50, ? 0.05, m 1
Chain
Prior
? 0.01, ? 0.05, m 3
? 0.50, ? 0.05, m 10
Iteration
34The information bottleneck
Bottleneck affects relative stability of
languages favored by prior
35Outline
- Analyzing iterated learning
- Iterated Bayesian learning
- Examples
- Iterated learning with humans
- Conclusions and open questions
36A method for discovering priors
- Iterated learning converges to the prior
- evaluate prior by producing iterated learning
37Iterated function learning
- Each learner sees a set of (x,y) pairs
- Makes predictions of y for new x values
- Predictions are data for the next learner
38Function learning in the lab
Examine iterated learning with different initial
data
39Initial data
Iteration
1 2 3 4
5 6 7 8 9
(Kalish, 2004)
40Outline
- Analyzing iterated learning
- Iterated Bayesian learning
- Examples
- Iterated learning with humans
- Conclusions and open questions
41Conclusions and open questions
- Iterated Bayesian learning converges to prior
- properties of languages are properties of
learners - information bottleneck doesnt affect equilibrium
- What about other learning algorithms?
- What determines rates of convergence?
- amount and structure of input data
- What happens with people?
42(No Transcript)