Title: Connectionist Sentence Comprehension
1- Connectionist Sentence Comprehension
- and Production System
- A model by Dr. Douglas Rohde, M.I.T
- by
- Dave Cooke
- Nov. 6, 2004
2Overview
- Introduction
- A brief overview of Artificial Neural Networks
- The basic architecture
- Introduce Douglas Rohde's CSCP model
- Overview
- Penglish Language
- Architecture
- Semantic System
- Comprehension, Prediction, and Production System
- Training
- Testing
- Conclusions
- Bibliography
-
3A Brief Overview
- Basic definition of an Artificial Neural Network
- A network of interconnected neurons inspired by
the biological nervous system. - The function of an Artificial Neural Network is
to produce an output pattern from a given input. - First described by Warren McCulloch and Walter
Pitts in 1943 in their seminal paper A logical
calculus of ideas imminent in nervous activity.
4- Artificial neurons are modeled after biological
neurons - The architecture of an Artificial Neuron
5Architecture -- Structure
- Network Structure
- Many types of neural network structures
- Ex Feedforward, Recurrent
- Feedforward
- Can be single layered or multi-layered
- Inputs are propagated forward to the output layer
6Architecture -- Recurrent NN
- Recurrent Neural Networks
- Operate on an input space and an internal state
space they have memory. - Primary types of Recurrent neural networks
- simple recurrent
- fully recurrent
- Below is an example of a simple recurrent network
(SRN)
7Architecture -- Learning
- Learning used in NN's
- Learning change in connection weights
- Supervised networks network is told about
correct answer - ex. back propagation, back propagation through
time, reinforcement learning - Unsupervised networks network has to find
correct input. - competitive learning, self-organizing or Kohonen
maps
8Architecture -- Learning (BPTT)
- Backpropagation Through Time (BPTT) is used in
the CSCP Model and SRNs - In BPTT the network runs ALL of its forward
passes then performs ALL of the backward passes. - Equivalent to unrolling the network backwards
through time
9The CSCP Model
- Connectionist Sentence Comprehension and
Production model - Primary Goal learn to comprehend and produce
sentences developed in the Penglish( Pseudo
English) language. - Secondary Goal to construct a model that will
acount for a wide range of human sentence
processing behaviours.
10Basic Architecture
- A Simple Recurrent NN is used
- Penglish (Pseudo English) was used to train and
test the model. - Consists of 2 separate parts contected by a
message layer - Semantic System (Encoding/Decoding System)
- CPP system
- Backpropagation Through Time (BPTT) is the
learning algorithm. - method for learning temporal tasks
11Penglish
- Goal to produce only sentences that are
reasonably valid in english - Built around the framework of a stochastic
context-free grammar. - Given a SCFG it is easy to generate sentences,
parse sentences, and perform optimal prediction - Subset of english some grammatical structures
used are - 56 verb stems
- 45 noun stems
- adjectives, determiners, adverbs, subordinate
clauses - several types of logical ambiguity.
12Penglish
- Penglish sentences do not always sound entirely
natural even though constraints to avoid semantic
violations were implemented - Example sentences are
- (1) We had played a trumpet for you
- (2) A answer involves a nice school.
- (3) The new teacher gave me a new book of
baseball. - (4) Houses have had something the mother has
forgotten
13The CSCP Model
Start
Semantic System
stores all propositions seen for current sentence
CPP System
14Semantic System
Propositions loaded sequentially
Propositions stored in Memory
15Semantic System
Error measure
16Training (SS)
- Backpropagation
- Trained separate and prior to the rest of the
model. - The decoder uses standard single-step
backpropagation - The encoder is trained using BPTT.
- Majority of the running time is in the decoding
stage.
17Training (SS)
Error is assessed here.
18CPP System
Error measure
The CPP System
Phonologically encoded word.
19CPP System (cont.)
Starts here by trying to predict next word in
sentence.
Goal to produce next word in sentence and pass it
to Word Input Layer
20The CPP System - Training
1. BPTT starts here.
4. BPTT
2. Backpropagated to here.
3. Previously recorded output errors are injected
here
21Training
- 16 Penglish training sets
- Set 250,000 sentences, total 4 million
sentences - 50 000 weight updates per set 1 epoch
- Total of 16 epochs.
- The learning rate start at .2 for the first epoch
and then was gradually reduced over the course of
learning. - After the Semantic System the CPP system was
similarily trained - Training began with limited complexity sentences
and complexity increased gradually. - Training a single network took about 2 days on a
500Mhz alpha. Total training time took about
two months. - Overall 3 networks were trained
22Testing
- 50,000 sentences
- 33.8 of testing sentences also appeared in one
of the training sets. - Nearly all of the sentences had 1 or 2
propositions. - 3 forms of measurement are used in measuring
comprehension. - multiple choice measure
- Reading time measure
- Grammaticality rating measure
23Testing (Multiple Choice)
- Example When the owner let go, the dog ran
after the mailman. - Expressed as ran after, theme, ?
- Possible answers
- Mailman (correct answer)
- owner, dog, girls, cats. (distractors)
- Error measure is
- When applying four distractors, the chance
performance is 20 correct.
24Testing (Reading Time)
- Also known as Simulated Reading Time
- Its a weighted average of 4 components.
- 1 and 2 Measure the degree to which the current
word was expected - 3rd The change in the message that occurred when
the current word was read - 4th The average level of activation in the
message layer - The four components are multiplied by scaling
factors to achieve average values of close to 1.0
for each of them and a weighted average is then
taken. - Ranges from .4 for easy words to 2.5 or more for
very hard words.
25Testing (Grammaticality)
- The Grammaticality Method
- (1) prediction accuracy (PE)
- Indicator of syntactic complexity
- Involves the point in the sentence at which the
worst two consecutive predictions occur. - (2) comprehension performance (CE)
- Average strict-criterion comprehension error rate
on the sentence. - Intented to reflect the degree to which the
sentence makes sense. - Simulated ungrammaticality rating (SUR)
- SUR (PE 8) X (CE 0.5)
- combines the two components into a single measure
of ungrammaticality
26Conclusions
- General Comprehension Results
- final networks are able to provide complete,
accurate answer - Given NO choices 77
- Given 5 choices 92
- Sentential Complement Ambiguity
- Strict criterion error rate 13.5
- Multiple choice 2
- Subordinate Clause Ambiguity
- Ex. Although the teacher saw a book was taken in
the school. - Intransitive, weak bad, weak good condition,
strong bad, and strong good all were under 20
error rate on multiple choice questions.
27Bibliography
- Artificial Intelligence 4th ed, Luger G.F.,
Addison Wesley, 2002 - Artificial Intelligence 2nd ed, Russel Norvig,
Prentice Hall, 2003 - Neural Networks 2nd ed, Picton P., Palgrave, 2000
- A connectionist model of sentence comprehension
and production, Rohde D., MIT, March 2 2002 - Finding Structure in Time, Elman J.L, UC San
Diego, Cognitive Science, 14, 179-211, 1990 - Fundamentals of Neural Networks, Fausett L,
Pearson, 1994