Connections - PowerPoint PPT Presentation

1 / 77
About This Presentation
Title:

Connections

Description:

Neurons are cells of the nervous system. ... canary 0 0 1 1 0 1. robin 0 0 1 1 0 1. ostrich 0 0 1 1 0 1. snake 0 1 0 0 0 1. lizard 0 1 0 0 0 1 ... – PowerPoint PPT presentation

Number of Views:105
Avg rating:3.0/5.0
Slides: 78
Provided by: ralphgr
Category:

less

Transcript and Presenter's Notes

Title: Connections


1
Chapter 7
  • Connections
  • Associations
  • Neural Networks
  • Parallel Distributed Processing (PDP)

2
Origin ofParallel Distributed Processing
  • Work in PDP began from an effort to
    computationally model how the networks of neurons
    in the brain might contribution to thought.

3
Neurons
  • Neurons are cells of the nervous system.
  • Neurons are specialized to carry "messages"
    through an electrochemical process.
  • Neurons send these messages through a single axon
    and receive messages through multiple dendrites.
  • The human brain has about 100 billion neurons.

4
The Neuron
5
Differences between axons and dendrites
  • Axons
  • Take information away from the cell body
  • Smooth Surface
  • Generally only 1 axon per cell
  • Branch further from the cell body
  • Dendrites
  • Bring information to the cell body
  • Rough Surface (dendritic spines)
  • Usually many dendrites per cell
  • Branch near the cell body

6
Axonal Transmission
  • A nerve cell at rest is electrically more
    negatively charged.
  • A stimulus will cause the gates in the axonal
    membrane to open allowing positively charged
    sodium ions to rush into the cell.
  • When the cell reaches it maximal positive state,
    these gates close.
  • Another set of gates allows positively chargely
    potassium ions to leave the cell in a process
    called synaptic transmission.

7
Synaptic Transmission
Dendrite
8
Synaptic Junctions
  • Two types
  • Excitatory
  • depolarization occurs at postsynaptic membrane
    sites raising the postsynaptic membrane potential
  • Inhibitory
  • hyperpolarization occurs at postsynaptic
    membrane sites, lowering the postsynaptic
    membrance potential
  • polarization is the production of a reverse
    electromotive force

9
From Neuron to Perceptron
  • In the 1950s and 60s computational modeling of
    a network resembling the neuron was tested
  • The model was called a perceptron.

10
From Neuron to Neural Network
Rich
11

A Perceptron
x1
w1
?
Threshold
w2
x2
x1 and x2 are inputs w1 and w2 are connection
weights ? is the sum of these weights if ?
exceeds the threshold, the perceptron will fire.
Rosenblatt
12
The AND and OR Relationsare natural input to a
perceptron
  • AND
  • w1 w2 Sum (?) Threshold
  • 0 0 0 0
  • 0 1 1 1
  • 1 0 1 1
  • 1 1 2 1
  • OR
  • w1 w2 Sum (?) Threshold
  • 0 0 0 0
  • 0 1 1 1
  • 1 0 1 1
  • 1 1 1 1

13
Sample AND Relation
  • AND is the English 'and,' meaning both. For
    example, to take
  • PSYC 301 you must have taken PSYC 220 and PSYC
    203.
  • 220 203 Take 301?
  • no no no
  • no yes no
  • yes no no
  • yes yes yes

14
Sample OR Relation
  • OR is 'inclusive or,' meaning either, possibly
    both. For
  • example, an insurance policy states that
    insurance premiums
  • will be waived in the event of sickness or
    unemployment.
  • Sick Unemployed Premium Waiver?
  • no no no
  • no yes yes
  • yes no yes
  • yes yes yes

15
Perceptrons and Learnability
  • A perceptron, faced with new data, can reset the
    weights
  • This means it can learn

16
Learnability
  • The symbol/rule systems (logic, rules, concepts,
    analogies, images) have proposed only one
    explanation of human learning
  • innate knowledge parameter setting (as in
    Universal Grammar)
  • The perceptron provided another explanation of
    learning, one that involved learning entirely
    from experience

17
The exclusive OR relationcannot be computed by a
perceptron
  • Exclusive or (XOR)
  • w1 w2 Sum (?) Threshold
  • 0 0 0 0
  • 0 1 1 1
  • 1 0 1 1
  • 1 1 0 ?
  • An example of an XOR relation
  • at a restaurant the lunch special is a
    cheeseburger with either
  • salad or french fries, but not both.
  • Minsky and Papert.

18
The Fate of the Perceptron
  • due to Minsky and Paperts XOR argument, interest
    in the perceptron waned.
  • in the 1980s interest in neural networks revived
    with the notions of
  • hidden units in the network
  • backpropagation

19
Stereopsis assigning structure to data
  • Interest in artificial neural nets revived with
    studies of stereoscopic vision (thumbs)
  • the perception of depth involves assigning
    structure (our perception of the object, its
    distance, its depth, its structure) to data (the
    object in the physical world)
  • whenever structure is assigned to data, the
    question arises as to whether the assignment is
    done top-down (from structure to data) or
    bottom-up (from data to structure)

20
Top-down vs. Bottom-up Processing
  • Working top down, a system uses knowledge of
    structure to predict the details to be found in
    the data
  • Working bottom up, a system uses the data to
    predict high-level structure
  • With stereopsis, the question is whether we work
    bottom up from simple disparities between right
    and left image, or whether we anticipate the
    image in depth by knowing something about its
    structure in advance

21
The Perception of Unstructured Data
  • Bela Julesz (1971) showed that pictures composed
    of random dots could produce depth effects.
    worksheet
  • This implies that stereopsis can work bottom up
    from simple disparities in the data alone.
  • Even telling subjects what they should see does
    not speed up the perception of depth. Frisby
    and Clatworthy

22
Bottom up vs. Top down
  • Julesz discovery indicated that depth perception
    is driven by low-level brain activity making
    sense out of the data rather than high-level
    principles or rules imposing structure on the
    data.
  • Marr and Poggio (1976) built a computer program
    for stereopsis that relies on two low-level
    constraints they believed could be wired into the
    brain to guide the matching of two images.

23
Stereo Matching 3 lines of sight from each eye
give nine possible points of fusion
3 adjacent points on the surface of an object
how do we resolve to these 3 out of the possible
9?
depth
24
Uniqueness Constraint on Stereo Matching
  • A point in one image can
  • normally be matched
  • with one and only one
  • point in the other image.
  • (Each link in the figure is
  • inhibitory if one fusion
  • point is active, the points
  • to which its linked are
  • not active.)

depth
25
Continuity Constraint on Stereo Matching
  • Two adjacent points in
  • an image will tend to
  • represent points at about
  • the same depth.
  • (Each link in the figure is
  • excitatory, and so if one
  • fusion point is active it
  • excites all those to which
  • it is connected.)

depth
26
Marr Poggios Programfor Stereopsis
  • The brain has to apply these constraints in
    parallel (simultaneously)
  • MP designed an array of processors to handle the
    fusion of images under these constraints
  • Each processor unit does the same computation
    on the particular values that are local to it
    (its own value and those of its neighbors)
  • The output from each processor is then used on a
    fresh cycle of activity
  • The system continues to compute until it settles
    down to stable values at each processor

27
  • Connectionism was back in business.
  • MPs program was followed by
  • Feldmans 1981 model of visual representation in
    memory
  • McClelland and Rumelharts 1981 model of letter
    perception

28
Key Elements in Current Connectionist Models
  • hidden units/layers
  • distributed representation
  • parallel processing
  • parallel constraint satisfaction
  • excitatory and inhibitory links
  • Learning
  • Hebbian learning
  • delta rule
  • backpropagation
  • feedforward network
  • spreading activation
  • relaxation/settling
  • graceful degradation

29

Hidden Units A Simple XOR network with one Hidden
Unit
Input Units
Hidden Unit
Output Unit
1
x1
1
-2
1.5
.5
1
x2
1
The number on the arrows show strengths of
connections among units. The numbers in the
circles show the thresholds of the units. If both
input units are on, the hidden unit threshold
will be tripped sending an inhibitory signal
(here -2) to the output unit. Since -2 is below
the .5 threshold of the output unit, the output
unit will not fire. Rumelhart, Hinton,
Williams
30
A Local Connectionist Network
  • Involves a one-to-one correspondence between
    concepts and hardware units
  • Each unit is given an identifiable interpretation
    in terms of specifiable concepts or propositions.
  • likes programming likes parties
  • computer geek outgoing
  • shy

-
31
Distributed Connectionist Networks
  • Each entity is represented by a pattern of
    activity distributed over many computing elements
  • Each computing element is involved in
    representing many different entities.
  • As an example, look at the Necker cube in B on
    the worksheet. How many ways can you interpret
    the cube?

32
The Necker Cube
  • There are 2 global interpretations of the Necker
    cube
  • in one, point a is the front upper left point
  • in the other, point a is the back upper left
    point
  • In each interpretation, the cube has 8 points
  • But the orientation of each point differs
    depending on the interpretation

33
  • When point a is the front upper left (FUL) point
  • point b is the front lower left (FLL) point
  • point c is the front lower right (FLR) point
  • point d is the back upper right (BUR) point
  • point
  • wksht B

34
  • The interpretation of the Necker cube as a cube
    with its front facing downward (i.e., with point
    a as FUL) depends on all of the points in the
    cube having one and only one orientation.
  • In other words, this interpretation is
    distributed over 8 orientations that must be on
    together in order for us to interpret the cube as
    having its front facing downward.

35
(No Transcript)
36
Parallel Processing
  • To interpret the cube in one orientation, we do
    not
  • first establish a as FUL
  • then establish b as FLL
  • then establish c as FLR
  • in a serial fashion
  • We interpret all the point simultaneously, in
    parallel

37
Parallel Processing (2)
  • A single computer processor
  • Time nanoseconds (1 billionth of a sec.) for
    each operation
  • Mode serial (consecutive)
  • The human brain
  • Time milliseconds (1 thousandth of a sec.) for
    each operation
  • Mode parallel (simultaneous)

38
Parallel Constraint Satisfaction
  • There are 3 constraints on the relations between
    the points in the Necker cube
  • Each point can have only one label at a time
    (point a cannot be both FUL and BUL
    simultaneously)
  • Each point depends on the interpretation of its
    near neighbors
  • Each label can be used only once in an
    interpretation (e.g., no 2 points can be FUL)
  • These constraints all hold simultaneously

39
Parallel Constraint Satisfaction in Word
Recognition
40
Constraints on Word Recognition
  • RED KEY BEE word
  • R K T 1st letter
  • features
  • of letters

41
Word Recognition
  • Constraints operate at the feature, letter, and
    word levels.
  • Each word is represented by a processing unit
  • Each letter at each position in a word is
    represented by a processing unit
  • Some pairs of units excite each other
  • the word is RED the 1st letter is R
  • Inconsistent pairs inhibit each other
  • the word is KEY the 1st letter is R
  • McClelland Rumelhart

42
Excitatory Inhibitory Links
  • Each point can have only one label at a time
  • ? negative link between a points label in one
    interpretation and the other
  • Each point depends on the interpretation of its
    near neighbors
  • ? positive link between a point and its 3
    closest neighbors
  • Each label can be used only once in an
    interpretation
  • ? negative link between 2 identical labels
    representing different points

43
Representation of Links as A Connectivity Matrix
  • u1 u2 u3 u4 u5 u6 u7 u8
  • u1 0 0 0 0 0 -5 -5 0
  • u2 0 0 0 0 6 5 0 6
  • u3 0 0 0 0 6 5 0 6
  • u4 0 0 0 0 6 5 0 6
  • u5 0 0 0 0 6 5 0 6
  • u5 strongly excites u2
  • u6 strongly inhibits u1

44
Hebb association between neurons
  • When an axon of cell A is near enough to excite
    a cell B and repeatedly or persistently takes
    part in firing it, some growth process or
    metabolic change takes place in one or both cells
    such that As efficiency, as one of the cells
    firing B, is increased.
  • From The organization of behavior. 1949.

45
Hebb a cell assembly
  • many repetitions of a sensory event will lead to
    the gradual building up of a set of perhaps 25 to
    100 neurons in a cell assembly
  • One assembly will form connections with others,
    and it may therefore be made active by one of
    them in the total absence of the adequate
    stimulus. In short, the assembly activity is the
    simplest case of an image or an idea . . . .

46
Hebbian Learning
  • a type of learning in which the weight on a link
    between two units is increased if both units are
    active at the same time
  • Expressed as the delta rule

47
The Delta Rule
  • ?wij g(ai(t),ti(t)) h(oj(t),wij).
  • A change (?) in the weight of a link between unit
    i and unit j is the product of
  • the function g of the activation of uniti (ai at
    time t) and its teaching input ( ti(t) ) and
  • the function h of the output value of unitj and
    the connection strength between unit i and unitj
    ( wij ).

48
The Delta Rule (2)
  • a change in the weight from uniti to unitj is the
    product of
  • the value of the teaching input
  • the value of the current activation state of
    uniti
  • the value of the output of unitj
  • the current weight between uniti and unitj

49
The Delta Rule (3)
unitj
uniti
outputj
wij
teachingi
activationi
  • If there is no teaching input, the weights
    change in
  • proportion to the change in ai

50
Advantages of Hebbian Learning
  • founded on known neurological activity
    (neurologically real)
  • can occur without explicit correction (without a
    teacher)

51
Feed forward Networks
  • A network in which information flows only from
    input layer to output layer, but not back
  • input layer - the set of units in a network that
    is directly activated by the input
  • output layer - the last layer of units
  • hidden layer - an intermediate layer that serves
    to readjust threshold weights

52
Learning by Backpropagation
  • 1. Weights between units are randomly assigned
  • 2. There is an initial test phase in which an
    input activation is introduced and propagates
    through the network to yield an output
  • 3. This output is compared with the required
    output given by an external source (teacher). If
    there is a difference, then an error signal is
    calculated.
  • 3. The error signal causes the weights to change
    backward through the network.

53
Disadvantages of Back Propagation
  • not neurologically realistic
  • requires an external teaching source

54
Relaxation ? Settling
  • Relaxation -- the procedure whereby a system
    settles into a locally optimal state in which as
    many as possible of the constraints are satisfied
  • Suppose all units are off, and you focus on the
    FLL unit on the left of the network. This unit
    receives positive input from the diagram. The
    activation of this FLL unit spreads to FUL, BLL,
    and FLR on the left of the network, turning them
    on, and on to the units for which they have
    excitatory links. Why?

55
Relaxation ? Settling
  • The activation of this FLL unit also spreads to
    FLL and BLL toward the right of the network,
    turning them off. Why?
  • At this point, the network 'settles' into the
    interpretation that corresponds to the network of
    8 units on the left.
  • Gazing at the lower left corner of the diagram
    may turn on the BLL unit near the center of the
    network, turning on the BLL, FLL, and BLR units
    with which it has excitatory links, and turning
    off its competitors (FLL and BLL in the network
    on the left).
  • At this point, the network settles in to the
    other interpretation.

56
Graceful Degradation
  • Because knowledge is distributed, i.e.,
  • Every unit is involved in the storage of patterns
    of connections
  • Each pattern of connections involves many units
  • if some units or connections are lost, the stored
    knowledge will be degraded but not entirely
    lost
  • in other CRUM models, the failure of one
    component means a loss of knowledge

57
Rumelhart metaphors for mind
  • New the brain
  • slow (milliseconds)
  • parallel processing
  • parallel constraint satisfaction
  • knowledge stored in links between units
  • accommodates approximate matching (by prototype)
  • Old the computer
  • fast (nanoseconds)
  • serial processing
  • serial constraint satisfaction
  • knowledge stored as facts rules
  • demands exact matching

58
Approximate Matching
  • PDPs have content addressable memory
  • many patterns (units links) stored
  • the network can perform a match when only a
    portion of the pattern is given
  • the network finds the closest match
  • Consider the Necker cube. How many units and
    links do you need to be given to match one
    orientation of the cube?

59
Issues in Rumelhart
60
Architecture more than just input, hidden, and
output units
  • Human activity that takes place in time involves
    sequential as well as parallel behavior (e.g.,
    movement, speech)
  • How do PDPs blend the sequential and the
    parallel?
  • Plan units tell the network which sequence it is
    producing (Jordan, 1986)
  • Context units keep track of where the system is
    in the sequence (Jordan, 1986 Elman, 1988)

61
The Scaling Problem
  • moderately difficult problems require a few
    hundred thousand input examples
  • One view grows from viewing learning and
    evolution as continuous with one another. On
    this view the fact that networks take a long time
    to learn is to be expected because we normally
    compare their behavior to organisms that have
    long evolutionary histories (Rumelhart, p. 235)
  • Compare innateness and universal grammar

62
The Generalization Problemwksht C
  • How do neural networks perform in inducing the
    best generalization from input data?
  • Rumelhart chose the simplest, most robust
    network that is consistent with the observations
    made.
  • But . . . .

63
An example of generalizationRich and Knight
  • hair scales feathers flies water
    eggs
  • dog 1 0 0 0 0 0
  • cat 1 0 0 0 0 0
  • bat 1 0 0 1 0 0
  • whale 1 0 0 0 1 0
  • canary 0 0 1 1 0 1
  • robin 0 0 1 1 0 1
  • ostrich 0 0 1 1 0 1
  • snake 0 1 0 0 0 1
  • lizard 0 1 0 0 0 1
  • alligator 0 1 0 0 1 1

64
A Common Generalization Effect in Neural Net
Learning
training set
  • After a certain plateau, performance on the test
    set gets worse
  • Given large amounts of input, the network begins
    to memorize individual input-output pairs
  • It stores the entire training set, rather than
    generalizing over it. Rich Knight

Performance ?
test set
Training Time ?
65
Applications of Connectionism
  • Transforming text to speech (wksht D)
  • Teaching sound discrimination to non-native
    speakers
  • Language processing (past tense)
  • Decision systems
  • Teaching reading

66
Distinguishing /r/ from /l/
  • The following four slides are from a talk
    entitled
  • Intervention Strategies that Promote Learning
    Their Basis and Use in Enhancing Literacy
  • from the Center for the Neural Basis of
    Cognition

67
Learning to identify speech sounds
  • Key hypothesis Brain reinforces whatever
    pattern of neural representation is elicited.
  • Training that elicits undesired neural
    representations may be counterproductive so
    ensure correct perception during training!
  • Findings
  • Learning without feedback requires correct
    perception.
  • Learning with feedback occurs effectively,
    whether or not correct perception is ensured.

68
Learning to distinguish /l/ and /r/ Behavioral
experiment
Train Japanese natives on normal vs. adaptively
exaggerated /l/ and /r/
classified as /r/
After only 3 training sessions, adaptive training
yields much better performance!
classified as /r/
lock
lock
rock
rock
69
Learning /l/-/r/ distinction Neural network
model
Percept
Nearby units model /l/ and /r/ acoustic
inputs. English training on /l/ and /r/ learns
2 percepts. Japanese training maps both to
single percept. Later training on /l/ and /r/
reinforces this output preventing learning of
the English contrast.
Input
But training on exaggerated inputs learns /l/-/r/
contrast successfully and retains it even
under later training on normal /l/ and /r/ input.
70
Distinguishing /l/ from /r/Functional magnetic
resonance imaging (fMRI) study
Auditory brain areas in English speakers
habituate to a stream of similar speech input
(load), but dishabituate to oddballs that vary
only by the sound /r/ (road).
Schematic of the acoustic input
road
road
road
load
load
load
load
load
load
load
load
load
load
load
load
load
load
load
load
load
load
14 sec post-oddball time-window
post-oddball time-window
post-oddball time-window
Left Auditory Cortex
Right Auditory Cortex
.09
.09
.06
.06
p.0001
p.0001
..03
signal change from average
signal change from average
.03
0
0
-.03
-.03
-.06
-.06
1.6
4.8
8.0
11.2
14.4
1.6
4.8
8.0
11.2
14.4
Post-Oddball Time (sec)
Post-Oddball Time (sec)
Areas in white show parts of auditory cortex that
respond transiently to oddballs. Graphs show the
time course of this response (arrows show peak)
relative to baseline (dashed line).
Future work will determine if the same auditory
areas in Japanese speakers respond to /l/ vs. /r/
oddballs after training, so as to test whether
training modifies perceptual representations
learned in childhood, versus downstream,
non-acoustic processes.
71
Language Processing
-the past tense- Rumelhart McClelland
  • Net is trained on both regular and irregular past
    tense forms
  • Training input go stand look
  • Training output went stood looked
  • no knowledge of verb stems (look decomposable
    into look and ed)
  • explicitly coded word boundary information

72
Language Processing -the past
tense- Rumelhart McClelland
  • Testing sample 86 unseen low frequency verbs
    (14 irregular, 72 regular)
  • Performance
  • Irregulars 78.6 error rate
  • Regulars 33.3 error rate
  • for 6 regular verbs it produced no response (it
    cannot generalize to Ved)
  • strange errors squat ? squakt, mail ? membled,
    tour ? toureder, shape ? shipt, brown ? brawned

73
Issues in Evaluating Connectionism
  • Implementational connectionism
  • PDP models have to be able to
  • implement symbolic structures
  • in order to enable them to
  • manipulate mental
  • representations with constituent
  • structure
  • Eliminative connectionism
  • once PDP models are fully
  • developed, they will replace
  • symbol-processing models as
  • explanations of cognitive
  • processes

74
Issues in evaluating connectionism
  • Neural networks are best suited to handle
    classification problems they have have not been
    tried extensively on planning, language modeling,
    etc.
  • Still serious problems with their ability to
    handle phenomena that involve time
  • Inability of the network to capture generalization

75
Issues in Evaluating Connectionism
  • Inability to deal with infinite sets that have no
    finite sample for inductive modeling
  • In such dealings, humans are guided by a
    knowledge of which similarities are important and
    which are spurious
  • With pairs like (5x6) 2 32,
  • (8x4)7 39, (105x72) 3 7563, the net has a
    potentially infinite number of pairs and no
    knowledge of the structure of the arithmetic
    expression
  • scaling PDPs require a large number of examples
    for tasks a human does with one example (e.g.,
    face recognition)

76
References
  • Frisby, J.P. and J.L. Clatworthy. 1975.
    Learning to see complex random-dot stereograms.
    Perception, 4, 173-8.
  • Interactive Tutorial Building Blocks of the
    Nervous System http//www.wwnorton.com/gleitman/c
    h2/tutorials/2tut2.htm
  • Johnson-Laird, P. 1988. The Computer and the
    Mind. Harvard Univ. Press.
  • Julesz, B. 1971. Foundations of Cyclopean
    Perception. Univ. of Chicago Press.
  • Marr, D. and T. Poggio. 1976. Co-operative
    computation of stereo disparity. Science, 194,
    283-7.
  • Rich, E. and K. Knight. 1991. Artificial
    Intelligence. 2nd Edition. McGraw Hill.
  • Rumelhart, D, G. Hinton, and R. Williams. 1986.
    Learning internal representations by error
    propagation. in Rumelhart, McClelland et al.
  • Rumelhart, D., J. McClelland and the PDP Research
    Group. 1986. Parallel Distributed Processing
    Explorations in the Microstructure of Cognition.
    MIT Press.
  • http//neuromod.uva.nl/courses/connectionism1999/i
    ntro/sld029.htm

77
References (2)
  • Rumelhart, D. and J. McClelland. 1986. Learning
    the past tense of English verbs. In Rumelhart,
    D., J. McClelland and the PDP Research Group,
    vol. 2.
Write a Comment
User Comments (0)
About PowerShow.com