Modelling Language Evolution Lecture 2: Learning Syntax - PowerPoint PPT Presentation

About This Presentation
Title:

Modelling Language Evolution Lecture 2: Learning Syntax

Description:

Modelling Language Evolution Lecture 2: Learning Syntax Simon Kirby University of Edinburgh Language Evolution & Computation Research Unit – PowerPoint PPT presentation

Number of Views:66
Avg rating:3.0/5.0
Slides: 19
Provided by: lelEdAc
Category:

less

Transcript and Presenter's Notes

Title: Modelling Language Evolution Lecture 2: Learning Syntax


1
Modelling Language EvolutionLecture 2 Learning
Syntax
  • Simon Kirby
  • University of Edinburgh
  • Language Evolution Computation Research Unit

2
Multi-layer networks
  • For many modelling problems, multi-layer networks
    are used
  • Three layers are common
  • Input layer
  • Hidden layer
  • Output layer
  • What do the hidden-node activations correspond
    to?
  • Internal representation
  • For some problems, networks need to compute an
    intermediate representation of the data

3
XOR network - step 1
  • XOR is the same as OR but not AND
  • Calculate OR
  • Calculate NOT AND
  • AND the results

AND
NOT AND
OR
4
XOR network - step 2
OUTPUT
BIAS NODE
-7.5
AND
-7.5
5
5
7.5
HIDDEN 1
HIDDEN 2
NOT AND
OR
10
10
-5
-5
INPUT 1
INPUT 2
5
Simple example (Smith 2003)
  • Smith wanted to model a simple language-using
    population
  • Needed a model that learned vocabulary
  • 3 meanings (1 0 0), (0 1 0), (0 0 1)
  • 6 possible signals (0 0 0), (1 0 0) , (1 1 0)
  • Used networks for reception and production

SIGNAL
MEANING
Train
Perform
MEANING
SIGNAL
  • After training, knowledge of language stored in
    the weights
  • During reception/production, internal
    representation is in the activations of the
    hidden nodes

6
Can a network learn syntax? (Elman 1993)
  • Important question for the evolution of language
  • Modelling can tell us what we can do without
  • Can we model the acquisition of syntax using a
    neural network?
  • One problem sentences can be arbitrarily long

How much knowledge of grammar are we born with?
7
Representing time
  • Imagine we presented words one at a time to a
    network
  • Would it matter what order the words were give?
  • No Each word is a brand new experience
  • The net has no way of relating each experience
    with what has gone before
  • Needs some kind of working memory
  • Intuitively each word needs to be presented
    along with what the network was thinking about
    when it heard the previous word

8
The Simple Recurrent Net (SRN)
Output
Copy back connections
Hidden
Input
Context
  • At each time step, the input is
  • a new experience
  • plus a copy of the hidden unit activations at the
    last time step

9
What inputs and outputs?
  • How do we force the network to learning syntactic
    relations?
  • Can we do it without an external teacher?
  • Answer the next-word prediction task
  • Inputs Current word (and context)
  • Outputs Predicted next word
  • The error signal is implicit in the data

10
Long distance dependencies and hierarchy
  • Elmans question how much is innate?
  • Many argue
  • Long-distances dependencies and hierarchical
    embedding are unlearnable without innate
    language faculty
  • How well can an SRN learn them?
  • Examples
  • boys who chase dogs see girls
  • cats chase dogs
  • dogs see boys who cats who mary feeds chase
  • mary walks

11
First experiments
  • Each word encoded as a single unit on in the
    input.

12
Initial results
  • How can we tell if the net has learned syntax?
  • Check whether it predicts the correct number
    agreement
  • Gets some things right, but makes many mistakes

boys who girl chase see dog
  • Seems not to have learned long-distance
    dependency.

13
Incremental input
  • Elman tried teaching the network in stages
  • Five stages
  • 10,000 simple sentences (x 5)
  • 7,500 simple 2,500 complex (x 5)
  • 5,000 simple 5,000 complex (x 5)
  • 2,500 simple 7,500 complex (x 5)
  • 10,000 complex sentences (x 5)
  • Surprisingly, this training regime lead to
    success!

14
Is this realistic?
  • Elman reasons that this is in some ways like
    childrens behaviour
  • Children seem to learn to produce simple
    sentences first
  • Is this a reasonable suggestion?
  • Where is the incremental input coming from?
  • Developmental schedule appears to be a product of
    changing the input.

15
Another route to incremental learning
  • Rather than the experimenter selecting simple,
    then complex sentences, could the network?
  • Childrens data isnt changing children are
    changing
  • Elman gets the network to change throughout its
    life
  • What is a reasonable way for the network to
    change?
  • One possibility memory

16
Reducing the attention span of a network
  • Destroy memory by setting context nodes to 0.5
  • Five stages of learning (with both simple and
    complex sentences)
  • Memory blanked every 3-4 words (x 12)
  • Memory blanked every 4-5 words (x 5)
  • Memory blanked every 5-6 words (x 5)
  • Memory blanked every 6-7 words (x 5)
  • No memory limitations (x 5)
  • The network learned the task.

17
Counter-intuitive conclusion starting small
  • A fully-functioning network cannot learn syntax.
  • A network that is initially limited (but matures)
    learns well.
  • This seems a strange result, suggesting that
    networks arent good models of language learning
    after all
  • On the other hand
  • Children mature during learning
  • Infancy in humans is prolonged relative to other
    species
  • Ultimate language ability seems to be related to
    how early learning starts
  • i.e., there is a critical period for language
    acquisition.

18
Next lecture
  • Weve seen how we can model aspects of language
    learning in simulations
  • What about evolution?
Write a Comment
User Comments (0)
About PowerShow.com