Connectionist Models of Language Development: Grammar and the Lexicon - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

Connectionist Models of Language Development: Grammar and the Lexicon

Description:

Triples (Mozer, Wicklegren), or artificial phonemes ... completed simulation demonstrating superiority of triples or phoneme-level word ... – PowerPoint PPT presentation

Number of Views:95
Avg rating:3.0/5.0
Slides: 20
Provided by: SteveR72
Category:

less

Transcript and Presenter's Notes

Title: Connectionist Models of Language Development: Grammar and the Lexicon


1
Connectionist Models of Language Development
Grammar and the Lexicon
  • Steve R. Howell
  • McMaster University, 1999

2
Overview
  • Description of Research Plan
  • Explanation of Research Goals
  • Examine Inspiration for this research, in both
    connectionist and language sub-fields
  • Methods
  • Results (Preliminary)
  • Discussion Future Directions

3
Overall Research Plan
  • Pursuit of an Integrated, multi-level,
    connectionist model of language development
  • Multi-level dealing with several different
    levels or parts of the language task
  • Integrated non-modular homogenous
    functioning throughout the multi-level design

4
Research Goals
  • Better understanding of language development
    process
  • Ability to test different interventions on a
    successful model, instead of children, including
    possibly lesioning the model
  • Functional language-learning model for AI and
    Software (e.g. Chatterbots on net)

5
Connectionist Inspiration
  • Work of Jeff Elman on models of grammar learning
    using Simple Recurrent Networks (SRNs)
  • Work of Landauer et. al. on acquisition of
    semantic information (i.e. the lexicon) through
    analysis of many weak word to word relations in
    real-world text

6
Language-Domain Inspiration
  • Evidence against a sharp divide in the
    acquisition of the lexicon and grammar (e.g.
    Bates)
  • Lexicon develops first, but grammar development
    overlaps, in relation to it, and seemingly in
    step with it)
  • Hence the present focus on homogenous mechanisms
    to explain the two.

7
Method
  • Computer Simulation of Connectionist (Neural
    Network) model
  • Base algorithm and structure is Elmans (1990)
    Simple Recurrent Network
  • Modifications include sub-word-level input,
    multi-level architecture, and automated localist
    to distributed representation conversion

8
Diagram of SRN
9
Parts of an Elman SRN
  • Input Layer of Units
  • Larger (usually) Hidden Layer of Units
  • Context Layer memory connected to hidden layer
  • Output Layer of units, same size as input layer
  • Uses back-propagation learning algorithm
  • Uses Prediction task to provide more plausible
    teaching signal
  • Recurrent Context Units take copy of hidden units
    at each time step

10
Modifications Sub-word Input
  • Triples (Mozer, Wicklegren), or artificial
    phonemes
  • Recently completed simulation demonstrating
    superiority of triples or phoneme-level word
    representations to whole-word localist
    representations for grammar learning (phonics?)

11
Representations of Words
  • Localist
  • 0 0 0 0 0 0 0 1
  • 0 0 0 1 0 0 0 0
  • Binary Distributed
  • 0 1 0 0 1 0 1 1
  • 1 0 1 1 1 1 1 1
  • Fully Distributed
  • 0.43 0.23 0.03 0.1 0.04
  • 0.22 0.12 0.04 0.42 0.5
  • Elman(1990) - Localist
  • Triples - Binary distrib.
  • Semantic Encoding
  • - Fully Distributed

12
Route to Multi-level Architecture
  • Elman SRN showed how word co-occurrence
    information could be used to learn word
    relationships (simple grammar)
  • Learning was of previous words (context) to next
    word predicted
  • Even with a sub-word distributed representation,
    prediction is still of the next word

13
Elman (1990) Clustering Results
14
Sub-word prediction
  • If we use a sliding window on the input text
    (e.g. five letters for three letter triples) then
    we are predicting the next triple from the
    previous triples true sub-word prediction
  • e.g. The dog chased the cat...
  • Time 1 - The_d The, he_, e_d, _d
  • Time 2 - he_do he_, e_d, _do

15
Sub-word Advantages
  • Richer representations, accessing more of the
    data inherent in the text or speech stream
  • Makes prediction/internal representation easier
  • Eliminates need for artificial pre-processing of
    text into word vectors, just automatically
    translates letters into triple vectors.

16
Sub-word Disadvantages
  • Cannot output words easily, just have a
    collection of triples
  • Must stack a clean-up net on top in order to
    reach word representations from the existing
    triple representations
  • Hence, the multi-layer approach combine
    prediction at two time-scales and levels of
    granularity, but using the same method

17
Multi-layer SRN Diagram
18
Multi-layer SRN
  • Triples or letters layer
  • Input Layer 1
  • Hidden Layer 1
  • Context Layer 1
  • Output Layer 1
  • Learns to predict triples/phonemes
  • Word Layer
  • Input Layer 2 Hidden Layer 1
  • Hidden Layer 2
  • Context Layer 2
  • Output Layer 2
  • Predicts words from triples/phonemes

19
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com