Connectionism - PowerPoint PPT Presentation

1 / 45
About This Presentation
Title:

Connectionism

Description:

A neuron's influence on another depends on the strength of ... that the maid who the service had sent over was well decorated. ... decorated. Unexpected ... – PowerPoint PPT presentation

Number of Views:260
Avg rating:3.0/5.0
Slides: 46
Provided by: zenzig
Category:

less

Transcript and Presenter's Notes

Title: Connectionism


1
Connectionism
  • A subsymbolic approach

2
neurally inspired computation
  • Neurons integrate information.
  • Neurons indicate their level of input.
  • Brain structure is layered.
  • A neurons influence on another depends on the
    strength of their connection.
  • Learning involves changing the strength of
    connections between neurons.

3
Features Principles
  • Massively parallel processing
  • Active representation
  • representations directly involved in processing
  • Implicit knowledge in connections learning
    through adjusting them
  • Initial architecture constrains learning
  • Distributed representations permit graceful
    degradation
  • Memory access by content

4
Nature/nurture
  • Connectionism Neo-behaviorism?
  • Assumptions about what is innate in
  • decisions about network architecture, learning
    rules, and activation (potentially in
    input/output representations).
  • Assumptions about structure of environment
  • decisions about contents, order, frequency of
    items in training set

5
Localist representations
  • A node dedicated to each meaningful
    representation
  • e.g., word nodes, sound nodes, feature nodes
  • Language examples with only local reps
  • McClelland Rumelhart (1981) visual word
    recognition
  • McClelland Elman (1986) for spoken word
    recognition
  • Dell (1986) word production/speech errors

name
N
\
6
Distributed representations
  • Less information built in
  • More opportunity to see how task input shape
    representations but analysis may be difficult.
  • Learning required
  • Various types of architectures
  • Language examples
  • Elman (1990 1991) and St. John McClelland
    (1990) for sentence comprehension
  • Plaut et al (1996) for word recognition
  • Dell, Juliano, Govindjee (1993) for word
    production

7
Linearly separable
y
X and Y is linearly separable
  • No way to draw straight line through
    2-dimensional space to group outputs.
  • XOR cannot be computed with perceptrons.

(1,1)
x
(0,0)
Exclusively X or Y is not linearly separable
(1,1)
x
(0,0)
8
Linear threshold XOR network
  • Activation rule if ?w ij o j gt 0, a i 1, else 0

(no spreading inhibition) w ij connection
weight between units i j o j output of unit
j a i activation of unit I adapted from
Rumelhart et al., 1986
9
Input-to-Hidden layer connections
  • make items that yield similar outputs more
    similar to each other items with different
    outputs less similar.

10
Multilayer networks
  • Extra layer is needed to solve complex mappings
    such as XOR, in which similar inputs dont always
    correspond to similar outputs
  • Hidden layer between input and output.
  • Hidden layer smaller than input output so must
    find efficient way to compress information
  • Use back-propagation of error to train weights
  • Multilayer networks are as powerful as Turing
    machines.

11
Supervised learning isnt always easy
Object 1
Object 2
Predict this
12
Supervised learning isnt always easy
13
Supervised learning isnt always easy
  • If Object 1 is a edible to a vegetarian, you get
    an ant otherwise you get a lamp.

14
Example from reading aloud
15
GPC rules
  • Grapheme to Phoneme Correspondence rules
    spelling to sound rules
  • E ? /E/, A ? /æ/, but EA ? /I/
  • BED BAD BEAD
  • allow pronunciation of novel or nonwords FLORP
  • Learner already knows sound-meaning relationship
  • for sound-based orthographies, good strategy
    would be to go from spelling to sound to meaning

16
Dual route model
  • Via sound (mediated, assembled)
  • Convert to phonemes and use phonological form to
    find meaning
  • Phonics sounding out words
  • Shows effects associated with spelling-sound
    correspondences
  • regularity effects
  • Direct (lexical)
  • From whole visual word form to meaning
    phonological form
  • Need experience with word to know pronunciation
    (AISLE, PREFACE)
  • Whole word reading method
  • Shows effects associated with whole word
  • word frequency
  • semantic priming

17
Rough Sketch of a Dual Route Model (Coltheart,
Curtis, Atkins, Haller, 1993)
letter detectors
Routes for irregular words
Route for regular words
Visual word detectors
Semantic system
GPC rule system
Phonological output lexicon
Phonemes
18
Frequency by Regularity Interaction
Irregular
600
Coltheart (1978, 1985) Marshall Newcombe
(1973) Morton Patterson (1980) Paap Noel
(1991)
550
Response Time (ms)
525
Regular
500
High
Low
frequency
19
Patterns of dyslexia
  • Phonological
  • Can read high-frequency words, but have trouble
    with uncommon words
  • Cant read aloud pronounceable nonwords.
  • SLORF
  • Surface
  • Can read and sound out regular words nonwords
    but not irregular words.
  • Regularize irregular
  • say PINT so rhymes with MINT.

20
Seidenberg McClelland 1989
Learning Weights between units start out random.
Weight adjustment is scaled by frequency of
the word Results Model learns correct
pronunciation for 3000 single syllable words.
Units may be close to 0 or 1, but not exact.
Distance is error score. Models analog of human
naming latency is phonological error score
Context
Meaning
Orthography
Phonology
MAKE
/mAk/
21
Implemented SM89 model
  • Learn to activate phonological units given
    orthographic ones (no meaning units)
  • distance between models activation levels
    target activation is error measure RT
  • No word units or explicit GPC rules
  • showed frequency by regularity interaction.
  • showed pronunciation priming (TINT after
    MINT/PINT) repetition priming
  • could pronounce some nonwords.
  • showed performance during learning like kids
    learning to read.
  • suggested degrees of spelling-sound regularity

22
Regularity
  • Regular
  • CODE
  • BIRD
  • Regular inconsistent
  • CONE
  • SHONE
  • BONE
  • GAVE
  • PAVE
  • SAVE
  • Ambiguous
  • WIND
  • LEAD
  • Reg. Nonword
  • GLIP
  • Incon. Nonword
  • MAVE
  • NUST
  • Pseudohomophone
  • BURD
  • Irregular
  • NONE
  • GONE
  • DONE
  • HAVE
  • PINT
  • Unique/strange
  • SOAP
  • AISLE
  • FUGUE

23
Regularity versus Consistency
  • Consistency effect
  • Naming time for regular inconsistent words
    (sand "wand" is irregular) is longer than for
    regular consistent words (week) (Glushko, 1979)
  • consistency is a statistical property of words,
    highly variable across words in English language.

24
Model Performance
  • Frequency by Regularity interaction

Irregular
5
4
Regular Inconsistent
Phonological Error score
3
Regular
2
High
Low
frequency
25
Neighborhoods
  • Lexical Neighbors
  • words spelled similarly to target.
  • Dog has neighbors log, bog, doe, dig,
    etc.
  • Jared, McRae, and Seidenberg (1990)
  • Friends are neighbors that share spelling to
    sound correspondences, enemies do not.
  • Consistency effect size depends on summed
    frequency of friends versus enemies
  • Higher frequency friends, smaller consistency
    effect.

26
Parallel distributed processing approach to word
naming
  • PDP models excel at extracting statistical
    relations between input and output patterns.
  • Learning process (e.g., back propagation) is
    sensitive to idiosyncratic characteristics of
    words
  • sand is more consistent than pint which is
    more consistent than aisle

27
Sublexical units
  • In symbolic model
  • each new grouping requires a level of
    representation
  • syllable
  • morpheme
  • bigram
  • In connectionist model
  • sublexical representations may emerge without
    being built in
  • syllables in phonological reps from frequently
    used groups of phonemes
  • morphemes from similarity in form and meaning

28
Deep dyslexia
  • Mostly semantic errors in reading.
  • Semantic NIGHT ? sleep
  • Visual SCANDAL ? sandals
  • Visual Semantic SHIRT ? skirt
  • Visual then Semantic SYMPATHY ? (symphony) ?
    orchestra
  • Symbolic models need to assume multiple modules
    damaged.
  • visual semantic, but no principled reason for
    that combination to co-occur frequently

29
Attractors
  • In models with recurrent connections, eventually
    activation levels stop changing
  • reach stable state, attractor
  • where satisfied many constraints
  • connection weights trained to get activation
    levels to stable state
  • create landscape in which input determines
    starting point activation pattern is a rolling
    ball that settles at lowest point
  • when closer to a stable state, activation levels
    change faster
  • time steps to settle reaction times

30
Hinton Shallice (1991)
  • Connectionist model of deep dyslexia
  • Trained model to map from orthography to
    semantics
  • 40 word set, 5 categories
  • recurrent network resulted in attractor
    structure
  • lesioned different parts of model by removing
    units, connections, or randomly changing weight.

clean up
sememes
hidden
graphemes
31
Modeling dyslexia
  • All locations ways of lesioning led to similar
    mix of errors, like deep dyslexics.
  • Lesions changed attractor shapes shape
    determined by all weights in model.

cat cap dog
cat dog cap
Mapping from orthographic space to semantic space
Ortho Sem
32
Modeling dyslexia
  • All locations ways of lesioning led to similar
    mix of errors, like deep dyslexics.
  • Lesions changed attractor shapes shape
    determined by all weights in model.

cat dog cap
cat cap dog
Mapping from orthographic space to semantic space
Ortho Sem
33
Dependent on architecture?
  • Variations on network architecture produced same
    results (Plaut Shallice, 1993).
  • But abstract words (few sememes) depend less on
    clean-up units than concrete words (many).
  • Lesion ortho?hidden or hidden?sem connections,
    concrete spared relative to abstract.
  • Lesion sem ?clean-up connections, abstract
    spared relative to concrete.
  • Double dissociation within one unified processing
    system!

34
Simple recurrent network (SRN)
  • A feedforward network with a memory
  • Memory is bank of context units that stores
    values from previous time step.
  • Used for modeling sequential behavior
  • Word-by-word sentence prediction/ production
  • phoneme-by-phoneme prediction
  • learns statistical regularities in language

35
Implemented models
  • Elmans sentence prediction model
  • SRN trained to predict next word in a sentence
  • Elman (1990, 1991)
  • hidden units reflected similarity in word use
  • important because syntax word classes were
    specialty of symbolic models
  • Christiansen Chaters recursive model
  • Same design task as Elmans SRN models
  • Demonstrated learning of recursion, with
    degrading performance for increased embeddings

36
Christiansen Chater (1999)
  • Model the types of recursion that Chomsky (1957)
    said finite state models context free grammars
    couldnt.
  • Right branching easy
  • John loves Mary who likes Jim who dislikes
    Martha.
  • Counting recursion
  • if S 1 then S 2. if (if S 1 then S 2) then S
    3.
  • Mirror recursion - center embedding
  • NP 1 NP 2 V 2 V 1 The cat the dog chased died.
  • Identity recursion - cross-dependency
  • NP 1 NP 2 V 1 V 2 Dutch has these structures

37
Performance after training
  • For humans
  • If-Then lt cross-dependency lt center embedding
  • performance declines with multiple embeddings
  • For SRN models
  • If-Then lt cross-dependency lt center embedding
  • generalized to deeper embeddings than seen in
    training
  • performance declines for all (even right
    branching) with multiple embeddings
  • For trigrams/bigrams (transitional probabilities
    for word pairs or triplets)
  • If-Then lt center embedding lt cross-dependency
  • worse than SRN models

38
The apartment that the maid who the service had
sent over was well decorated.
39
The apartment that the maid who the service had
sent over was cleaning every weekwas well
decorated.
40
Unexpected results
  • Ungrammatical NNNVV rated more grammatical than
    grammatical NNNVVV
  • Thomas Gibson (1997) Christianson MacDonald
    (1999)
  • CC recursive model
  • after NNNVV model activates End-of-sentence
    marker more highly than the set of third Vs.
  • People rate sentences as less grammatical with
    increasing right-branching structures - PPs
  • recursive models show same trend.
  • The blooming flowers in the vase on the table
    by the window resemble roses.

41
PDP Features Principles
  • Massively parallel processing
  • Neurons much slower than computers so processing
    must be parallel to accomplish tasks in under a
    second
  • Merge knowledge/representation processing
  • Implicit knowledge in connections learning
    through adjusting them
  • Information not available outside of processing
  • Initial architecture constrains learning
  • Distributed representations permit graceful
    degradation

42
PDP approach best at
  • Classification
  • automatic similarity based generalization
  • Pattern recognition
  • finding best match quickly even with noisy data
  • Memory/recall
  • content addressable - retrieval cue leads to
    reconstruction of memory
  • Optimization
  • finding best organization given constraints
  • Prediction/inference
  • e.g., causes from effects, disease from symptoms

43
Problems for connectionist models
  • Trade-off between ability to generalize ability
    to recall individual episodes or examples
  • Usually opt for generalization
  • No one-trial learning for dissimilar examples
  • Learning new information may interfere with old
    (catastrophic interference)
  • Difficult to model entire appropriate training
    set, but if only train part, might get
    unrealistic result
  • Networks may fail due to non-critical assumptions
    made in implementing model
  • Often require sophisticated analysis to
    understand how a problem is solved

44
PDP Symbolic
  • Learning/development
  • Damage (graceful degradation)
  • Time course
  • Generalization within training space
  • Generalization outside training space
  • Scaling up

45
Trends in modelling
  • Training on real input
  • e.g., parental speech from transcripts
  • Scaling up
  • e.g., model whole domain
  • Train multiple tasks
  • e.g., sentence comprehension production
  • Add more neurobiological constraints
  • e.g., model specific patients, predict their
    recovery. Use neuroanatomy to constrain
    architecture.
Write a Comment
User Comments (0)
About PowerShow.com