Syntactic and Semantic Systematicity in Connectionist Models of Sentence Processing PowerPoint PPT Presentation

presentation player overlay
1 / 36
About This Presentation
Transcript and Presenter's Notes

Title: Syntactic and Semantic Systematicity in Connectionist Models of Sentence Processing


1
Syntactic and Semantic Systematicityin
Connectionist Modelsof Sentence Processing
  • Stefan Frank
  • Nijmegen Institute
  • for
  • Cognition and Information

2
Systematicity in language
Understood Would you like some milk in your
coffee? Do you want sugar in your tea?
Not understood Would you like some sugar in your
coffee? Do you want milk in your tea?
3
Systematicity and connectionismFodor Pylyshyn
(Cognition, 1988)
  • Systematicity requires combinatorial
    representations
  • Neural networks do not have a combinatorial
    syntax/semantics
  • So, they cannot display (let alone explain)
    systematicity
  • Connectionism will never result in viable models
    of human cognition

4
Systematicity and connectionismHadley (Mind
Language, 1994)
Systematicityknowing sentence X ? knowing new
sentence Y Generalisationtraining on input X ?
ability to process new input Y
The extent to which a network is systematic is
apparent in its ability to generalise
5
Systematicity and connectionismHadley (Mind
Language, 1994)
  • Alleged demonstrations of connectionist
    systematicity only show minimal generalisation
    (many training examples and few tests small
    difference between training and test inputs).
  • Syntactic systematicity Decide whether the novel
    word string is a sentence.
  • Semantic systematicity Assign a semantic
    representation to the novel sentence.

6
Connectionist sentence processing
  • Simple Recurrent Network (SRN) feedforward
    network with connections from a hidden layer to
    itself.

output layer
SRNs do not only have long-term memory (LTM) but
also short-term memory (STM)
(
)
hidden layer
recurrenthidden layer
(
)
hidden layer
input layer
7
Connectionist sentence processing
  • Simple Recurrent Network (SRN) feedforward
    network with connections from a hidden layer to
    itself.
  • Training examples are grammatical sentences ?
    network cannot learn to make grammaticality
    judgements
  • Next-word prediction after processing words
    w1,, wt of the input sentence, the network is
    trained to predict wt1.
  • (e.g., Elman, 1990, 1991, 1993 Servan-Schreiber
    et al., 1991 Christiansen Chater, 1994, 1999
    Rohde Plaut, 1999 Tabor Tanenhaus, 1999
    MacDonald Christiansen, 2002 Van der Velde et
    al., 2004)
  • If the network makes correct predictions for new
    inputs, it must have acquired knowledge of the
    grammar

8
Testing for syntactic systematicity
  • How would we test for syntactic systematicity in
    human subjects using next-word prediction?
  • Give (the first words of) a new sentence, for
    instance The averages smelland ask what could
    come next
  • Correct answers bricks, daily, some, the, .,
    Incorrect answers eat, see,
  • If the subjects give only correct answers, they
    seem to have no problem with the new sentence
    systematicity.But if they give many incorrect
    answers, the new sentence is just a string of
    words to them no systematicity.

9
Testing for syntactic systematicity
  • When testing networks, the wrong questions are
    often asked
  • Are correct next-word probabilities predicted?
  • Are all possible words (all correct word classes)
    predicted?
  • We dont expect human subjects to answer such
    questions.
  • Van der Velde et al. (2004) Does the network
    avoid incorrect next-word predictions on novel
    inputs?

10
Network training and testingVan der Velde et al.
(Connection Science, 2004)
  • A SRN processed a minilanguage with
  • 18 words (boy, girl, , loves, sees, , who, .)
  • 3 sentence types
  • simple N V N . (boy sees girl.)
  • right-branching N V N who V N . (boy sees girl
    who loves boy.)
  • center-embedded N who N V V N . (boy who girl
    sees loves boy.)
  • Nouns and verbs were divided into four groups,
    each had two nouns and two verbs.
  • In training sentences, nouns and verbs were from
    the same group lt 0.44 of sentences used for
    training.
  • In test sentences, nouns and verbs came from
    different groups.

11
Results and ConclusionsVan der Velde et al.
(Connection Science, 2004)
  • The network failed on test sentences, so they
    do not generalise to structurally similar
    sentences
  • But
  • what does it mean to fail? Maybe the network
    displayed some systematicity.
  • was the language complex enough? With more word
    types there is more reason to abstract to
    syntactic classes.
  • was the size of the network appropriate?
  • larger recurrent layer ? more STM ? better
    processing?
  • smaller recurrent layer ? less LTM ? better
    generalisation?

12
Syntactic systematicity revisitedFrank
(Connection Science, 2006)
  • Replicating Van der Velde et al., but
  • Using a measure of generalisation performance
    rather than just prediction performance
  • Varying language size (number of word types)
  • Varying recurrent-layer size (number of units)
  • Varying LTM/STM ratio

13
Rating generalisation
  • Baseline output of a (hypothetical), perfectly
    trained but non-generalising network.
  • The best such a network can do is use the last
    word of the test input that also appeared in the
    training input.
  • Generalisation scores
  • score 1 network never makes ungrammatical
    predictions
  • score 0 network does not generalise, but gives
    the best possible output based on the last word
  • score 1 network only makes ungrammatical
    predictions
  • Positive generalisation score (at each word)
    indicates systematicity

14
Network architecture
w units (one for each word, w 18, , 42)
output layer
10 units
hidden layer
n units (n 10, , 100)
recurrenthidden layer
w units (one for each word, w 18, , 42)
input layer
15
Results
generalisation
Positive generalisation at each word of each test
sentence type, so there is some systematicity.
16
Resultseffect of lexicon size
N V N
N V N who V N
N who N V V N
generalisation
Larger lexicon leads to improved generalisation
(even though a smaller percentage of possible
sentences is used for training).
17
Resultseffect of recurrent layer size
N V N
N V N who V N
N who N V V N
generalisation
Larger networks (n 40) do better, but very
large ones (n 100) overfit.
18
SRN generalisation and memory
  • SRNs do show systematicity to some extent.
  • But generalisation remains limited
  • small n ? limited processing capacity (STM)
  • large n ? large LTM ? overfitting.
  • How to combine large STM with small LTM?

19
Echo State NetworksJaeger (Advances in NIPS,
2003)
  • Keep the connections to and within the recurrent
    layer fixed at random values.
  • The recurrent layer becomes a dynamical
    reservoir a task-unspecific STM for the input
    sequence.
  • Some constraints on the dynamical reservoir
  • large enough
  • sparsely connected (here 15)
  • suitable spectral radius of weight matrix (here
    0.7)
  • LTM capacity
  • In SRNs O(n2)
  • In ESNs O(n)
  • Can ESNs successfully combine large STM and
    small LTM?

20
Network architecture
w units (one for each word, w 18 - 42)
output layer
10 units
hidden layer
n units (n 10, , 100)
recurrenthidden layer
w units (one for each word, w 18 - 42)
input layer
21
Results
generalisation
Positive performance at each word of each test
sentence type, so there is some systematicity
(but less than in a SRN of the same size)
22
Resultseffect of recurrent layer size
N V N
N V N who V N
N who N V V N
generalisation
Bigger is better no overfitting (even when n
1530)
23
Resultseffect of lexicon size
N V N
N V N who V N
N who N V V N
generalisation
Larger lexicon leads to improved generalisation
(even though a smaller percentage of possible
sentences is used for training).
24
ConclusionsSyntactic systematicity
  • Generalisation scores at all points of test
    sentences are significantly larger than zero.
  • Both SRNs and ESNs can be syntactically
    systematic
  • Even with few training sentences and many test
    sentences
  • By doing less training, the network can learn
    more
  • Training fewer connections gives better results
  • Training a smaller fraction of possible sentences
    gives better results

25
Semantic systematicity
  • People can assign a semantic representation to
    most sentences that are new to them
  • Usually, this is the representation intended by
    the speaker/writer
  • How to account for this semantic systematicity?

26
Connectionist semantic systematicityFrank
Haselager (Proceedings of CogSci Conference, 2006)
  • An ESN transforms sentences into their semantic
    representations
  • Trained on ? of possible sentences particular
    sentence structures and semantics are withheld
  • Can correctly process these untrained sentences

27
Levels of representation
  • Result of text comprehension is a mental
    representation of the described situation (Zwaan
    Radvansky, Psych. Bull., 1994)
  • not linguistic
  • no predicate-argument structure
  • involves the readers/listeners knowledge of and
    experience with the world
  • Sentences The sun is shining. The sky is blue.
  • Propositions SHINE(SUN) BLUE(SKY)
  • Situation

28
The microworld
  • Characters Bob, Jilly
  • Basic events, among others
  • Bob/Jilly is outside
  • Bob and Jilly play soccer/hide-and-seek
  • Bob/Jilly plays a computer game
  • Bob/Jilly plays with the dog
  • Bob/Jilly wins
  • Complex events, e.g.
  • ?(Bob outside) Bob is inside
  • (Bob and Jilly play soccer) ? (Jilly wins)
    Jilly wins at soccer
  • (Bob wins) ? (Jilly wins) Someone wins
  • Constraints, e.g.
  • no winning when playing with the dog
  • computer games are only played inside

29
Situational representations
  • The Distributed Situation Space model of
    inference during story comprehension (Frank et
    al., Cognitive Science, 2003) represents story
    situations as vectors in situation space.
  • Situation space is self-organised by training on
    a large number of situations occurring in the
    microworld.
  • Similarities among vectors reflect similarities
    among represented situations
  • playing soccer and being outside often
    co-occur ? similar vectors
  • playing soccer and hide-and-seek ? dissimilar
    vectors
  • Belief value of some event p in situation X
  • Estimated probability that event p occurs in
    situation X
  • Can be computed from vector representations of p
    and X

30
Microworld and microlanguage
  • Microworld situations can be described by 3558
    possible sentences from a 20-word microlanguage,
    e.g.

jilly plays soccer outside bob and jilly play
with dog someone loses to jilly jilly or bob wins
inside jilly beats bob at hide-and-seek bob loses
at game outside
31
Network training
  • Input microlanguage sentences, e.g., jilly wins
    at soccer outside
  • Target output vector for Jilly wins ? soccer ?
    Jilly is outside
  • The network is not trained on the 2288 (gt 64 )
    sentences in which
  • (1) anyone beats bob or anyone loses to jilly
  • (2) both dog and inside occur or both
    hide-and-seek and outside occur
  • So it is not trained to understand sentences
  • (1) that use a transitive verb to describe that
    bob loses ( jilly wins)
  • (2) in which the dog is inside, or hide-and-seek
    is played outside
  • Understanding test sentences
  • (1) requires generalisation to new sentences
  • (2) requires generalisation to new sentences and
    new situations

32
Measuring comprehension
  • After processing jilly wins at soccer outside
  • Compute the belief values of Jilly wins, soccer,
    and Jilly is outside in the networks output
    vector
  • If these are larger than the prior probabilities
    of these events, the sentence was understood to
    some extent
  • Comprehension score (between -1 and 1) Increase
    in belief value of stated event relative to the
    events prior probability
  • Negative comprehension score ? Belief value lt
    prior probability ? Error the sentence was
    misunderstood

33
Results
New sentences, old situations
34
Results
New sentences, new situations
35
ConclusionsSemantic systematicity
  • Comprehension scores are significantly larger
    than zero and error rates are very low.
  • The network learned how new combinations of known
    words refer to (new) combinations of known
    microworld events.
  • It displays semantic systematicity

36
Open questions
  • Upscaling Will it work with languages of more
    realistic size? ? Work in progress
  • How does it work? ? Look at hidden
    representations
  • Can connectionism explain systematicity?
  • No, because neural networks do not need to be
    systematic
  • Yes, because they need to adapt to systematicity
    in the training input.
  • Where does systematic cognition come from? Not
    from the cognitive system, but from systematicity
    in the world and in language.
Write a Comment
User Comments (0)
About PowerShow.com