Connectionism presentation

About This Presentation

Transcript and Presenter's Notes

Title: Connectionism

1
Connectionism

A subsymbolic approach

2
neurally inspired computation

Neurons integrate information.
Neurons indicate their level of input.
Brain structure is layered.
A neurons influence on another depends on the
strength of their connection.
Learning involves changing the strength of
connections between neurons.

3
Features Principles

Massively parallel processing
Active representation
representations directly involved in processing
Implicit knowledge in connections learning
through adjusting them
Initial architecture constrains learning
Distributed representations permit graceful
degradation
Memory access by content

4
Nature/nurture

Connectionism Neo-behaviorism?
Assumptions about what is innate in
decisions about network architecture, learning
rules, and activation (potentially in
input/output representations).
Assumptions about structure of environment
decisions about contents, order, frequency of
items in training set

5
Localist representations

A node dedicated to each meaningful
representation
e.g., word nodes, sound nodes, feature nodes
Language examples with only local reps
McClelland Rumelhart (1981) visual word
recognition
McClelland Elman (1986) for spoken word
recognition
Dell (1986) word production/speech errors

name
N
\
6
Distributed representations

Less information built in
More opportunity to see how task input shape
representations but analysis may be difficult.
Learning required
Various types of architectures
Language examples
Elman (1990 1991) and St. John McClelland
(1990) for sentence comprehension
Plaut et al (1996) for word recognition
Dell, Juliano, Govindjee (1993) for word
production

7
Linearly separable
y
X and Y is linearly separable

No way to draw straight line through
2-dimensional space to group outputs.
XOR cannot be computed with perceptrons.

(1,1)
x
(0,0)
Exclusively X or Y is not linearly separable
(1,1)
x
(0,0)
8
Linear threshold XOR network

Activation rule if ?w ij o j gt 0, a i 1, else 0

(no spreading inhibition) w ij connection
weight between units i j o j output of unit
j a i activation of unit I adapted from
Rumelhart et al., 1986
9
Input-to-Hidden layer connections

make items that yield similar outputs more
similar to each other items with different
outputs less similar.

10
Multilayer networks

Extra layer is needed to solve complex mappings
such as XOR, in which similar inputs dont always
correspond to similar outputs
Hidden layer between input and output.
Hidden layer smaller than input output so must
find efficient way to compress information
Use back-propagation of error to train weights
Multilayer networks are as powerful as Turing
machines.

11
Supervised learning isnt always easy
Object 1
Object 2
Predict this
12
Supervised learning isnt always easy
13
Supervised learning isnt always easy

If Object 1 is a edible to a vegetarian, you get
an ant otherwise you get a lamp.

14
Example from reading aloud
15
GPC rules

Grapheme to Phoneme Correspondence rules
spelling to sound rules
E ? /E/, A ? /æ/, but EA ? /I/
BED BAD BEAD
allow pronunciation of novel or nonwords FLORP
Learner already knows sound-meaning relationship
for sound-based orthographies, good strategy
would be to go from spelling to sound to meaning

16
Dual route model

Via sound (mediated, assembled)
Convert to phonemes and use phonological form to
find meaning
Phonics sounding out words
Shows effects associated with spelling-sound
correspondences
regularity effects

Direct (lexical)
From whole visual word form to meaning
phonological form
Need experience with word to know pronunciation
(AISLE, PREFACE)
Whole word reading method
Shows effects associated with whole word
word frequency
semantic priming

17
Rough Sketch of a Dual Route Model (Coltheart,
Curtis, Atkins, Haller, 1993)
letter detectors
Routes for irregular words
Route for regular words
Visual word detectors
Semantic system
GPC rule system
Phonological output lexicon
Phonemes
18
Frequency by Regularity Interaction
Irregular
600
Coltheart (1978, 1985) Marshall Newcombe
(1973) Morton Patterson (1980) Paap Noel
(1991)
550
Response Time (ms)
525
Regular
500
High
Low
frequency
19
Patterns of dyslexia

Phonological
Can read high-frequency words, but have trouble
with uncommon words
Cant read aloud pronounceable nonwords.
SLORF

Surface
Can read and sound out regular words nonwords
but not irregular words.
Regularize irregular
say PINT so rhymes with MINT.

20
Seidenberg McClelland 1989
Learning Weights between units start out random.
Weight adjustment is scaled by frequency of
the word Results Model learns correct
pronunciation for 3000 single syllable words.
Units may be close to 0 or 1, but not exact.
Distance is error score. Models analog of human
naming latency is phonological error score
Context
Meaning
Orthography
Phonology
MAKE
/mAk/
21
Implemented SM89 model

Learn to activate phonological units given
orthographic ones (no meaning units)
distance between models activation levels
target activation is error measure RT
No word units or explicit GPC rules
showed frequency by regularity interaction.
showed pronunciation priming (TINT after
MINT/PINT) repetition priming
could pronounce some nonwords.
showed performance during learning like kids
learning to read.
suggested degrees of spelling-sound regularity

22
Regularity

Regular
CODE
BIRD
Regular inconsistent
CONE
SHONE
BONE
GAVE
PAVE
SAVE

Ambiguous
WIND
LEAD
Reg. Nonword
GLIP
Incon. Nonword
MAVE
NUST
Pseudohomophone
BURD

Irregular
NONE
GONE
DONE
HAVE
PINT
Unique/strange
SOAP
AISLE
FUGUE

23
Regularity versus Consistency

Consistency effect
Naming time for regular inconsistent words
(sand "wand" is irregular) is longer than for
regular consistent words (week) (Glushko, 1979)
consistency is a statistical property of words,
highly variable across words in English language.

24
Model Performance

Frequency by Regularity interaction

Irregular
5
4
Regular Inconsistent
Phonological Error score
3
Regular
2
High
Low
frequency
25
Neighborhoods

Lexical Neighbors
words spelled similarly to target.
Dog has neighbors log, bog, doe, dig,
etc.
Jared, McRae, and Seidenberg (1990)
Friends are neighbors that share spelling to
sound correspondences, enemies do not.
Consistency effect size depends on summed
frequency of friends versus enemies
Higher frequency friends, smaller consistency
effect.

26
Parallel distributed processing approach to word
naming

PDP models excel at extracting statistical
relations between input and output patterns.
Learning process (e.g., back propagation) is
sensitive to idiosyncratic characteristics of
words
sand is more consistent than pint which is
more consistent than aisle

27
Sublexical units

In symbolic model
each new grouping requires a level of
representation
syllable
morpheme
bigram

In connectionist model
sublexical representations may emerge without
being built in
syllables in phonological reps from frequently
used groups of phonemes
morphemes from similarity in form and meaning

28
Deep dyslexia

Mostly semantic errors in reading.
Semantic NIGHT ? sleep
Visual SCANDAL ? sandals
Visual Semantic SHIRT ? skirt
Visual then Semantic SYMPATHY ? (symphony) ?
orchestra
Symbolic models need to assume multiple modules
damaged.
visual semantic, but no principled reason for
that combination to co-occur frequently

29
Attractors

In models with recurrent connections, eventually
activation levels stop changing
reach stable state, attractor
where satisfied many constraints
connection weights trained to get activation
levels to stable state
create landscape in which input determines
starting point activation pattern is a rolling
ball that settles at lowest point
when closer to a stable state, activation levels
change faster
time steps to settle reaction times

30
Hinton Shallice (1991)

Connectionist model of deep dyslexia
Trained model to map from orthography to
semantics
40 word set, 5 categories
recurrent network resulted in attractor
structure
lesioned different parts of model by removing
units, connections, or randomly changing weight.

clean up
sememes
hidden
graphemes
31
Modeling dyslexia

All locations ways of lesioning led to similar
mix of errors, like deep dyslexics.
Lesions changed attractor shapes shape
determined by all weights in model.

cat cap dog
cat dog cap
Mapping from orthographic space to semantic space
Ortho Sem
32
Modeling dyslexia

All locations ways of lesioning led to similar
mix of errors, like deep dyslexics.
Lesions changed attractor shapes shape
determined by all weights in model.

cat dog cap
cat cap dog
Mapping from orthographic space to semantic space
Ortho Sem
33
Dependent on architecture?

Variations on network architecture produced same
results (Plaut Shallice, 1993).
But abstract words (few sememes) depend less on
clean-up units than concrete words (many).
Lesion ortho?hidden or hidden?sem connections,
concrete spared relative to abstract.
Lesion sem ?clean-up connections, abstract
spared relative to concrete.
Double dissociation within one unified processing
system!

34
Simple recurrent network (SRN)

A feedforward network with a memory
Memory is bank of context units that stores
values from previous time step.
Used for modeling sequential behavior
Word-by-word sentence prediction/ production
phoneme-by-phoneme prediction
learns statistical regularities in language

35
Implemented models

Elmans sentence prediction model
SRN trained to predict next word in a sentence
Elman (1990, 1991)
hidden units reflected similarity in word use
important because syntax word classes were
specialty of symbolic models
Christiansen Chaters recursive model
Same design task as Elmans SRN models
Demonstrated learning of recursion, with
degrading performance for increased embeddings

36
Christiansen Chater (1999)

Model the types of recursion that Chomsky (1957)
said finite state models context free grammars
couldnt.
Right branching easy
John loves Mary who likes Jim who dislikes
Martha.
Counting recursion
if S 1 then S 2. if (if S 1 then S 2) then S
3.
Mirror recursion - center embedding
NP 1 NP 2 V 2 V 1 The cat the dog chased died.
Identity recursion - cross-dependency
NP 1 NP 2 V 1 V 2 Dutch has these structures

37
Performance after training

For humans
If-Then lt cross-dependency lt center embedding
performance declines with multiple embeddings
For SRN models
If-Then lt cross-dependency lt center embedding
generalized to deeper embeddings than seen in
training
performance declines for all (even right
branching) with multiple embeddings
For trigrams/bigrams (transitional probabilities
for word pairs or triplets)
If-Then lt center embedding lt cross-dependency
worse than SRN models

38
The apartment that the maid who the service had
sent over was well decorated.
39
The apartment that the maid who the service had
sent over was cleaning every weekwas well
decorated.
40
Unexpected results

Ungrammatical NNNVV rated more grammatical than
grammatical NNNVVV
Thomas Gibson (1997) Christianson MacDonald
(1999)
CC recursive model
after NNNVV model activates End-of-sentence
marker more highly than the set of third Vs.
People rate sentences as less grammatical with
increasing right-branching structures - PPs
recursive models show same trend.
The blooming flowers in the vase on the table
by the window resemble roses.

41
PDP Features Principles

Massively parallel processing
Neurons much slower than computers so processing
must be parallel to accomplish tasks in under a
second
Merge knowledge/representation processing
Implicit knowledge in connections learning
through adjusting them
Information not available outside of processing
Initial architecture constrains learning
Distributed representations permit graceful
degradation

42
PDP approach best at

Classification
automatic similarity based generalization
Pattern recognition
finding best match quickly even with noisy data
Memory/recall
content addressable - retrieval cue leads to
reconstruction of memory
Optimization
finding best organization given constraints
Prediction/inference
e.g., causes from effects, disease from symptoms

43
Problems for connectionist models

Trade-off between ability to generalize ability
to recall individual episodes or examples
Usually opt for generalization
No one-trial learning for dissimilar examples
Learning new information may interfere with old
(catastrophic interference)
Difficult to model entire appropriate training
set, but if only train part, might get
unrealistic result
Networks may fail due to non-critical assumptions
made in implementing model
Often require sophisticated analysis to
understand how a problem is solved

44
PDP Symbolic

Learning/development
Damage (graceful degradation)
Time course
Generalization within training space
Generalization outside training space
Scaling up

45
Trends in modelling

Training on real input
e.g., parental speech from transcripts
Scaling up
e.g., model whole domain
Train multiple tasks
e.g., sentence comprehension production
Add more neurobiological constraints
e.g., model specific patients, predict their
recovery. Use neuroanatomy to constrain
architecture.

Write a Comment

User Comments (0)

About PowerShow.com

Connectionism PowerPoint PPT Presentation