Connections

About This Presentation

Title:

Connections

Description:

Neurons are cells of the nervous system. ... canary 0 0 1 1 0 1. robin 0 0 1 1 0 1. ostrich 0 0 1 1 0 1. snake 0 1 0 0 0 1. lizard 0 1 0 0 0 1 ... – PowerPoint PPT presentation

Number of Views:105

Avg rating:3.0/5.0

Slides: 78

Provided by: ralphgr

Category:

Tags: connections

more less

Transcript and Presenter's Notes

Title: Connections

1
Chapter 7

Connections
Associations
Neural Networks
Parallel Distributed Processing (PDP)

2
Origin ofParallel Distributed Processing

Work in PDP began from an effort to
computationally model how the networks of neurons
in the brain might contribution to thought.

3
Neurons

Neurons are cells of the nervous system.
Neurons are specialized to carry "messages"
through an electrochemical process.
Neurons send these messages through a single axon
and receive messages through multiple dendrites.
The human brain has about 100 billion neurons.

4
The Neuron
5
Differences between axons and dendrites

Axons
Take information away from the cell body
Smooth Surface
Generally only 1 axon per cell
Branch further from the cell body

Dendrites
Bring information to the cell body
Rough Surface (dendritic spines)
Usually many dendrites per cell
Branch near the cell body

6
Axonal Transmission

A nerve cell at rest is electrically more
negatively charged.
A stimulus will cause the gates in the axonal
membrane to open allowing positively charged
sodium ions to rush into the cell.
When the cell reaches it maximal positive state,
these gates close.
Another set of gates allows positively chargely
potassium ions to leave the cell in a process
called synaptic transmission.

7
Synaptic Transmission
Dendrite
8
Synaptic Junctions

Two types
Excitatory
depolarization occurs at postsynaptic membrane
sites raising the postsynaptic membrane potential
Inhibitory
hyperpolarization occurs at postsynaptic
membrane sites, lowering the postsynaptic
membrance potential
polarization is the production of a reverse
electromotive force

9
From Neuron to Perceptron

In the 1950s and 60s computational modeling of
a network resembling the neuron was tested
The model was called a perceptron.

10
From Neuron to Neural Network
Rich
11

A Perceptron
x1
w1
?
Threshold
w2
x2
x1 and x2 are inputs w1 and w2 are connection
weights ? is the sum of these weights if ?
exceeds the threshold, the perceptron will fire.
Rosenblatt
12
The AND and OR Relationsare natural input to a
perceptron

AND
w1 w2 Sum (?) Threshold
0 0 0 0
0 1 1 1
1 0 1 1
1 1 2 1
OR
w1 w2 Sum (?) Threshold
0 0 0 0
0 1 1 1
1 0 1 1
1 1 1 1

13
Sample AND Relation

AND is the English 'and,' meaning both. For
example, to take
PSYC 301 you must have taken PSYC 220 and PSYC
203.
220 203 Take 301?
no no no
no yes no
yes no no
yes yes yes

14
Sample OR Relation

OR is 'inclusive or,' meaning either, possibly
both. For
example, an insurance policy states that
insurance premiums
will be waived in the event of sickness or
unemployment.
Sick Unemployed Premium Waiver?
no no no
no yes yes
yes no yes
yes yes yes

15
Perceptrons and Learnability

A perceptron, faced with new data, can reset the
weights
This means it can learn

16
Learnability

The symbol/rule systems (logic, rules, concepts,
analogies, images) have proposed only one
explanation of human learning
innate knowledge parameter setting (as in
Universal Grammar)
The perceptron provided another explanation of
learning, one that involved learning entirely
from experience

17
The exclusive OR relationcannot be computed by a
perceptron

Exclusive or (XOR)
w1 w2 Sum (?) Threshold
0 0 0 0
0 1 1 1
1 0 1 1
1 1 0 ?
An example of an XOR relation
at a restaurant the lunch special is a
cheeseburger with either
salad or french fries, but not both.
Minsky and Papert.

18
The Fate of the Perceptron

due to Minsky and Paperts XOR argument, interest
in the perceptron waned.
in the 1980s interest in neural networks revived
with the notions of
hidden units in the network
backpropagation

19
Stereopsis assigning structure to data

Interest in artificial neural nets revived with
studies of stereoscopic vision (thumbs)
the perception of depth involves assigning
structure (our perception of the object, its
distance, its depth, its structure) to data (the
object in the physical world)
whenever structure is assigned to data, the
question arises as to whether the assignment is
done top-down (from structure to data) or
bottom-up (from data to structure)

20
Top-down vs. Bottom-up Processing

Working top down, a system uses knowledge of
structure to predict the details to be found in
the data
Working bottom up, a system uses the data to
predict high-level structure
With stereopsis, the question is whether we work
bottom up from simple disparities between right
and left image, or whether we anticipate the
image in depth by knowing something about its
structure in advance

21
The Perception of Unstructured Data

Bela Julesz (1971) showed that pictures composed
of random dots could produce depth effects.
worksheet
This implies that stereopsis can work bottom up
from simple disparities in the data alone.
Even telling subjects what they should see does
not speed up the perception of depth. Frisby
and Clatworthy

22
Bottom up vs. Top down

Julesz discovery indicated that depth perception
is driven by low-level brain activity making
sense out of the data rather than high-level
principles or rules imposing structure on the
data.
Marr and Poggio (1976) built a computer program
for stereopsis that relies on two low-level
constraints they believed could be wired into the
brain to guide the matching of two images.

23
Stereo Matching 3 lines of sight from each eye
give nine possible points of fusion
3 adjacent points on the surface of an object
how do we resolve to these 3 out of the possible
9?
depth
24
Uniqueness Constraint on Stereo Matching

A point in one image can
normally be matched
with one and only one
point in the other image.
(Each link in the figure is
inhibitory if one fusion
point is active, the points
to which its linked are
not active.)

depth
25
Continuity Constraint on Stereo Matching

Two adjacent points in
an image will tend to
represent points at about
the same depth.
(Each link in the figure is
excitatory, and so if one
fusion point is active it
excites all those to which
it is connected.)

depth
26
Marr Poggios Programfor Stereopsis

The brain has to apply these constraints in
parallel (simultaneously)
MP designed an array of processors to handle the
fusion of images under these constraints
Each processor unit does the same computation
on the particular values that are local to it
(its own value and those of its neighbors)
The output from each processor is then used on a
fresh cycle of activity
The system continues to compute until it settles
down to stable values at each processor

Connectionism was back in business.
MPs program was followed by
Feldmans 1981 model of visual representation in
memory
McClelland and Rumelharts 1981 model of letter
perception

28
Key Elements in Current Connectionist Models

hidden units/layers
distributed representation
parallel processing
parallel constraint satisfaction
excitatory and inhibitory links

Learning
Hebbian learning
delta rule
backpropagation
feedforward network
spreading activation
relaxation/settling
graceful degradation

29

Hidden Units A Simple XOR network with one Hidden
Unit
Input Units
Hidden Unit
Output Unit
1
x1
1
-2
1.5
.5
1
x2
1
The number on the arrows show strengths of
connections among units. The numbers in the
circles show the thresholds of the units. If both
input units are on, the hidden unit threshold
will be tripped sending an inhibitory signal
(here -2) to the output unit. Since -2 is below
the .5 threshold of the output unit, the output
unit will not fire. Rumelhart, Hinton,
Williams
30
A Local Connectionist Network

Involves a one-to-one correspondence between
concepts and hardware units
Each unit is given an identifiable interpretation
in terms of specifiable concepts or propositions.
likes programming likes parties
computer geek outgoing
shy

-
31
Distributed Connectionist Networks

Each entity is represented by a pattern of
activity distributed over many computing elements
Each computing element is involved in
representing many different entities.
As an example, look at the Necker cube in B on
the worksheet. How many ways can you interpret
the cube?

32
The Necker Cube

There are 2 global interpretations of the Necker
cube
in one, point a is the front upper left point
in the other, point a is the back upper left
point
In each interpretation, the cube has 8 points
But the orientation of each point differs
depending on the interpretation

When point a is the front upper left (FUL) point
point b is the front lower left (FLL) point
point c is the front lower right (FLR) point
point d is the back upper right (BUR) point
point
wksht B

The interpretation of the Necker cube as a cube
with its front facing downward (i.e., with point
a as FUL) depends on all of the points in the
cube having one and only one orientation.
In other words, this interpretation is
distributed over 8 orientations that must be on
together in order for us to interpret the cube as
having its front facing downward.

35
(No Transcript)
36
Parallel Processing

To interpret the cube in one orientation, we do
not
first establish a as FUL
then establish b as FLL
then establish c as FLR
in a serial fashion
We interpret all the point simultaneously, in
parallel

37
Parallel Processing (2)

A single computer processor
Time nanoseconds (1 billionth of a sec.) for
each operation
Mode serial (consecutive)
The human brain
Time milliseconds (1 thousandth of a sec.) for
each operation
Mode parallel (simultaneous)

38
Parallel Constraint Satisfaction

There are 3 constraints on the relations between
the points in the Necker cube
Each point can have only one label at a time
(point a cannot be both FUL and BUL
simultaneously)
Each point depends on the interpretation of its
near neighbors
Each label can be used only once in an
interpretation (e.g., no 2 points can be FUL)
These constraints all hold simultaneously

39
Parallel Constraint Satisfaction in Word
Recognition
40
Constraints on Word Recognition

RED KEY BEE word
R K T 1st letter
features
of letters

41
Word Recognition

Constraints operate at the feature, letter, and
word levels.
Each word is represented by a processing unit
Each letter at each position in a word is
represented by a processing unit
Some pairs of units excite each other
the word is RED the 1st letter is R
Inconsistent pairs inhibit each other
the word is KEY the 1st letter is R
McClelland Rumelhart

42
Excitatory Inhibitory Links

Each point can have only one label at a time
? negative link between a points label in one
interpretation and the other
Each point depends on the interpretation of its
near neighbors
? positive link between a point and its 3
closest neighbors
Each label can be used only once in an
interpretation
? negative link between 2 identical labels
representing different points

43
Representation of Links as A Connectivity Matrix

u1 u2 u3 u4 u5 u6 u7 u8
u1 0 0 0 0 0 -5 -5 0
u2 0 0 0 0 6 5 0 6
u3 0 0 0 0 6 5 0 6
u4 0 0 0 0 6 5 0 6
u5 0 0 0 0 6 5 0 6
u5 strongly excites u2
u6 strongly inhibits u1

44
Hebb association between neurons

When an axon of cell A is near enough to excite
a cell B and repeatedly or persistently takes
part in firing it, some growth process or
metabolic change takes place in one or both cells
such that As efficiency, as one of the cells
firing B, is increased.
From The organization of behavior. 1949.

45
Hebb a cell assembly

many repetitions of a sensory event will lead to
the gradual building up of a set of perhaps 25 to
100 neurons in a cell assembly
One assembly will form connections with others,
and it may therefore be made active by one of
them in the total absence of the adequate
stimulus. In short, the assembly activity is the
simplest case of an image or an idea . . . .

46
Hebbian Learning

a type of learning in which the weight on a link
between two units is increased if both units are
active at the same time
Expressed as the delta rule

47
The Delta Rule

?wij g(ai(t),ti(t)) h(oj(t),wij).
A change (?) in the weight of a link between unit
i and unit j is the product of
the function g of the activation of uniti (ai at
time t) and its teaching input ( ti(t) ) and
the function h of the output value of unitj and
the connection strength between unit i and unitj
( wij ).

48
The Delta Rule (2)

a change in the weight from uniti to unitj is the
product of
the value of the teaching input
the value of the current activation state of
uniti
the value of the output of unitj
the current weight between uniti and unitj

49
The Delta Rule (3)
unitj
uniti
outputj
wij
teachingi
activationi

If there is no teaching input, the weights
change in
proportion to the change in ai

50
Advantages of Hebbian Learning

founded on known neurological activity
(neurologically real)
can occur without explicit correction (without a
teacher)

51
Feed forward Networks

A network in which information flows only from
input layer to output layer, but not back
input layer - the set of units in a network that
is directly activated by the input
output layer - the last layer of units
hidden layer - an intermediate layer that serves
to readjust threshold weights

52
Learning by Backpropagation

1. Weights between units are randomly assigned
2. There is an initial test phase in which an
input activation is introduced and propagates
through the network to yield an output
3. This output is compared with the required
output given by an external source (teacher). If
there is a difference, then an error signal is
calculated.
3. The error signal causes the weights to change
backward through the network.

53
Disadvantages of Back Propagation

not neurologically realistic
requires an external teaching source

54
Relaxation ? Settling

Relaxation -- the procedure whereby a system
settles into a locally optimal state in which as
many as possible of the constraints are satisfied
Suppose all units are off, and you focus on the
FLL unit on the left of the network. This unit
receives positive input from the diagram. The
activation of this FLL unit spreads to FUL, BLL,
and FLR on the left of the network, turning them
on, and on to the units for which they have
excitatory links. Why?

55
Relaxation ? Settling

The activation of this FLL unit also spreads to
FLL and BLL toward the right of the network,
turning them off. Why?
At this point, the network 'settles' into the
interpretation that corresponds to the network of
8 units on the left.
Gazing at the lower left corner of the diagram
may turn on the BLL unit near the center of the
network, turning on the BLL, FLL, and BLR units
with which it has excitatory links, and turning
off its competitors (FLL and BLL in the network
on the left).
At this point, the network settles in to the
other interpretation.

56
Graceful Degradation

Because knowledge is distributed, i.e.,
Every unit is involved in the storage of patterns
of connections
Each pattern of connections involves many units
if some units or connections are lost, the stored
knowledge will be degraded but not entirely
lost
in other CRUM models, the failure of one
component means a loss of knowledge

57
Rumelhart metaphors for mind

New the brain
slow (milliseconds)
parallel processing
parallel constraint satisfaction
knowledge stored in links between units
accommodates approximate matching (by prototype)

Old the computer
fast (nanoseconds)
serial processing
serial constraint satisfaction
knowledge stored as facts rules
demands exact matching

58
Approximate Matching

PDPs have content addressable memory
many patterns (units links) stored
the network can perform a match when only a
portion of the pattern is given
the network finds the closest match
Consider the Necker cube. How many units and
links do you need to be given to match one
orientation of the cube?

59
Issues in Rumelhart
60
Architecture more than just input, hidden, and
output units

Human activity that takes place in time involves
sequential as well as parallel behavior (e.g.,
movement, speech)
How do PDPs blend the sequential and the
parallel?
Plan units tell the network which sequence it is
producing (Jordan, 1986)
Context units keep track of where the system is
in the sequence (Jordan, 1986 Elman, 1988)

61
The Scaling Problem

moderately difficult problems require a few
hundred thousand input examples
One view grows from viewing learning and
evolution as continuous with one another. On
this view the fact that networks take a long time
to learn is to be expected because we normally
compare their behavior to organisms that have
long evolutionary histories (Rumelhart, p. 235)
Compare innateness and universal grammar

62
The Generalization Problemwksht C

How do neural networks perform in inducing the
best generalization from input data?
Rumelhart chose the simplest, most robust
network that is consistent with the observations
made.
But . . . .

63
An example of generalizationRich and Knight

hair scales feathers flies water
eggs
dog 1 0 0 0 0 0
cat 1 0 0 0 0 0
bat 1 0 0 1 0 0
whale 1 0 0 0 1 0
canary 0 0 1 1 0 1
robin 0 0 1 1 0 1
ostrich 0 0 1 1 0 1
snake 0 1 0 0 0 1
lizard 0 1 0 0 0 1
alligator 0 1 0 0 1 1

64
A Common Generalization Effect in Neural Net
Learning
training set

After a certain plateau, performance on the test
set gets worse
Given large amounts of input, the network begins
to memorize individual input-output pairs
It stores the entire training set, rather than
generalizing over it. Rich Knight

Performance ?
test set
Training Time ?
65
Applications of Connectionism

Transforming text to speech (wksht D)
Teaching sound discrimination to non-native
speakers
Language processing (past tense)
Decision systems
Teaching reading

66
Distinguishing /r/ from /l/

The following four slides are from a talk
entitled
Intervention Strategies that Promote Learning
Their Basis and Use in Enhancing Literacy
from the Center for the Neural Basis of
Cognition

67
Learning to identify speech sounds

Key hypothesis Brain reinforces whatever
pattern of neural representation is elicited.
Training that elicits undesired neural
representations may be counterproductive so
ensure correct perception during training!
Findings
Learning without feedback requires correct
perception.
Learning with feedback occurs effectively,
whether or not correct perception is ensured.

68
Learning to distinguish /l/ and /r/ Behavioral
experiment
Train Japanese natives on normal vs. adaptively
exaggerated /l/ and /r/
classified as /r/
After only 3 training sessions, adaptive training
yields much better performance!
classified as /r/
lock
lock
rock
rock
69
Learning /l/-/r/ distinction Neural network
model
Percept
Nearby units model /l/ and /r/ acoustic
inputs. English training on /l/ and /r/ learns
2 percepts. Japanese training maps both to
single percept. Later training on /l/ and /r/
reinforces this output preventing learning of
the English contrast.
Input
But training on exaggerated inputs learns /l/-/r/
contrast successfully and retains it even
under later training on normal /l/ and /r/ input.
70
Distinguishing /l/ from /r/Functional magnetic
resonance imaging (fMRI) study
Auditory brain areas in English speakers
habituate to a stream of similar speech input
(load), but dishabituate to oddballs that vary
only by the sound /r/ (road).
Schematic of the acoustic input
road
road
road
load
load
load
load
load
load
load
load
load
load
load
load
load
load
load
load
load
load
14 sec post-oddball time-window
post-oddball time-window
post-oddball time-window
Left Auditory Cortex
Right Auditory Cortex
.09
.09
.06
.06
p.0001
p.0001
..03
signal change from average
signal change from average
.03
0
0
-.03
-.03
-.06
-.06
1.6
4.8
8.0
11.2
14.4
1.6
4.8
8.0
11.2
14.4
Post-Oddball Time (sec)
Post-Oddball Time (sec)
Areas in white show parts of auditory cortex that
respond transiently to oddballs. Graphs show the
time course of this response (arrows show peak)
relative to baseline (dashed line).
Future work will determine if the same auditory
areas in Japanese speakers respond to /l/ vs. /r/
oddballs after training, so as to test whether
training modifies perceptual representations
learned in childhood, versus downstream,
non-acoustic processes.
71
Language Processing
-the past tense- Rumelhart McClelland

Net is trained on both regular and irregular past
tense forms
Training input go stand look
Training output went stood looked
no knowledge of verb stems (look decomposable
into look and ed)
explicitly coded word boundary information

72
Language Processing -the past
tense- Rumelhart McClelland

Testing sample 86 unseen low frequency verbs
(14 irregular, 72 regular)
Performance
Irregulars 78.6 error rate
Regulars 33.3 error rate
for 6 regular verbs it produced no response (it
cannot generalize to Ved)
strange errors squat ? squakt, mail ? membled,
tour ? toureder, shape ? shipt, brown ? brawned

73
Issues in Evaluating Connectionism

Implementational connectionism
PDP models have to be able to
implement symbolic structures
in order to enable them to
manipulate mental
representations with constituent
structure

Eliminative connectionism
once PDP models are fully
developed, they will replace
symbol-processing models as
explanations of cognitive
processes

74
Issues in evaluating connectionism

Neural networks are best suited to handle
classification problems they have have not been
tried extensively on planning, language modeling,
etc.
Still serious problems with their ability to
handle phenomena that involve time
Inability of the network to capture generalization

75
Issues in Evaluating Connectionism

Inability to deal with infinite sets that have no
finite sample for inductive modeling
In such dealings, humans are guided by a
knowledge of which similarities are important and
which are spurious
With pairs like (5x6) 2 32,
(8x4)7 39, (105x72) 3 7563, the net has a
potentially infinite number of pairs and no
knowledge of the structure of the arithmetic
expression
scaling PDPs require a large number of examples
for tasks a human does with one example (e.g.,
face recognition)

76
References

Frisby, J.P. and J.L. Clatworthy. 1975.
Learning to see complex random-dot stereograms.
Perception, 4, 173-8.
Interactive Tutorial Building Blocks of the
Nervous System http//www.wwnorton.com/gleitman/c
h2/tutorials/2tut2.htm
Johnson-Laird, P. 1988. The Computer and the
Mind. Harvard Univ. Press.
Julesz, B. 1971. Foundations of Cyclopean
Perception. Univ. of Chicago Press.
Marr, D. and T. Poggio. 1976. Co-operative
computation of stereo disparity. Science, 194,
283-7.
Rich, E. and K. Knight. 1991. Artificial
Intelligence. 2nd Edition. McGraw Hill.
Rumelhart, D, G. Hinton, and R. Williams. 1986.
Learning internal representations by error
propagation. in Rumelhart, McClelland et al.
Rumelhart, D., J. McClelland and the PDP Research
Group. 1986. Parallel Distributed Processing
Explorations in the Microstructure of Cognition.
MIT Press.
http//neuromod.uva.nl/courses/connectionism1999/i
ntro/sld029.htm