Modelling Language Evolution Lecture 2: Learning Syntax - PowerPoint PPT Presentation

About This Presentation

Title:

Modelling Language Evolution Lecture 2: Learning Syntax

Description:

Modelling Language Evolution Lecture 2: Learning Syntax Simon Kirby University of Edinburgh Language Evolution & Computation Research Unit – PowerPoint PPT presentation

Number of Views:66

Avg rating:3.0/5.0

Slides: 19

Provided by: lelEdAc

Category:

more less

Transcript and Presenter's Notes

Title: Modelling Language Evolution Lecture 2: Learning Syntax

1
Modelling Language EvolutionLecture 2 Learning
Syntax

Simon Kirby
University of Edinburgh
Language Evolution Computation Research Unit

2
Multi-layer networks

For many modelling problems, multi-layer networks
are used
Three layers are common
Input layer
Hidden layer
Output layer
What do the hidden-node activations correspond
to?
Internal representation
For some problems, networks need to compute an
intermediate representation of the data

3
XOR network - step 1

XOR is the same as OR but not AND
Calculate OR
Calculate NOT AND
AND the results

AND
NOT AND
OR
4
XOR network - step 2
OUTPUT
BIAS NODE
-7.5
AND
-7.5
5
5
7.5
HIDDEN 1
HIDDEN 2
NOT AND
OR
10
10
-5
-5
INPUT 1
INPUT 2
5
Simple example (Smith 2003)

Smith wanted to model a simple language-using
population
Needed a model that learned vocabulary
3 meanings (1 0 0), (0 1 0), (0 0 1)
6 possible signals (0 0 0), (1 0 0) , (1 1 0)
Used networks for reception and production

SIGNAL
MEANING
Train
Perform
MEANING
SIGNAL

After training, knowledge of language stored in
the weights
During reception/production, internal
representation is in the activations of the
hidden nodes

6
Can a network learn syntax? (Elman 1993)

Important question for the evolution of language
Modelling can tell us what we can do without
Can we model the acquisition of syntax using a
neural network?
One problem sentences can be arbitrarily long

How much knowledge of grammar are we born with?
7
Representing time

Imagine we presented words one at a time to a
network
Would it matter what order the words were give?
No Each word is a brand new experience
The net has no way of relating each experience
with what has gone before
Needs some kind of working memory
Intuitively each word needs to be presented
along with what the network was thinking about
when it heard the previous word

8
The Simple Recurrent Net (SRN)
Output
Copy back connections
Hidden
Input
Context

At each time step, the input is
a new experience
plus a copy of the hidden unit activations at the
last time step

9
What inputs and outputs?

How do we force the network to learning syntactic
relations?
Can we do it without an external teacher?
Answer the next-word prediction task
Inputs Current word (and context)
Outputs Predicted next word
The error signal is implicit in the data

10
Long distance dependencies and hierarchy

Elmans question how much is innate?
Many argue
Long-distances dependencies and hierarchical
embedding are unlearnable without innate
language faculty
How well can an SRN learn them?
Examples
boys who chase dogs see girls
cats chase dogs
dogs see boys who cats who mary feeds chase
mary walks

11
First experiments

Each word encoded as a single unit on in the
input.

12
Initial results

How can we tell if the net has learned syntax?
Check whether it predicts the correct number
agreement
Gets some things right, but makes many mistakes

boys who girl chase see dog

Seems not to have learned long-distance
dependency.

13
Incremental input

Elman tried teaching the network in stages
Five stages
10,000 simple sentences (x 5)
7,500 simple 2,500 complex (x 5)
5,000 simple 5,000 complex (x 5)
2,500 simple 7,500 complex (x 5)
10,000 complex sentences (x 5)
Surprisingly, this training regime lead to
success!

14
Is this realistic?

Elman reasons that this is in some ways like
childrens behaviour
Children seem to learn to produce simple
sentences first
Is this a reasonable suggestion?
Where is the incremental input coming from?
Developmental schedule appears to be a product of
changing the input.

15
Another route to incremental learning

Rather than the experimenter selecting simple,
then complex sentences, could the network?
Childrens data isnt changing children are
changing
Elman gets the network to change throughout its
life
What is a reasonable way for the network to
change?
One possibility memory

16
Reducing the attention span of a network

Destroy memory by setting context nodes to 0.5
Five stages of learning (with both simple and
complex sentences)
Memory blanked every 3-4 words (x 12)
Memory blanked every 4-5 words (x 5)
Memory blanked every 5-6 words (x 5)
Memory blanked every 6-7 words (x 5)
No memory limitations (x 5)
The network learned the task.

17
Counter-intuitive conclusion starting small

A fully-functioning network cannot learn syntax.
A network that is initially limited (but matures)
learns well.
This seems a strange result, suggesting that
networks arent good models of language learning
after all
On the other hand
Children mature during learning
Infancy in humans is prolonged relative to other
species
Ultimate language ability seems to be related to
how early learning starts
i.e., there is a critical period for language
acquisition.

18
Next lecture