Connectionist Models: Basics

About This Presentation

Title:

Connectionist Models: Basics

Description:

... connection between an axon and a dendrite - hence it is an axo-dendritic synapse ... dendrite is modulated (inhibited) by the activity in axon 2 via the axo-axonic ... – PowerPoint PPT presentation

Number of Views:128

Avg rating:3.0/5.0

Slides: 85

Provided by: srinina

Category:

more less

Transcript and Presenter's Notes

Title: Connectionist Models: Basics

1
Connectionist Models Basics

Srini Narayanan
CS182/CogSci110/Ling109
Spring 2004

2
Lecture Overview

Spreading Activation Toward a Model
Connectionist Models Introduction
Model of a neuron
McCollugh-Pitts Neuron
Activation Functions
Node Types Sigma-Pi, Temporal
Network Types
Perceptron and Feed-forward Nets
Hopfield Nets
Pattern generator networks
Winner Take All Networks
Triangle Nodes
Representing Concepts
Connectionist Encoding of Concepts
Distributed vs. Localist representations
Coarse Coding

3
Cross-modal priming effects
4
Results
5
Toward a Model of Priming
6
Toward a Model of Priming
Mental Connections are Implemented as neural
connections
7
Toward a Model of Priming
TRIANGLE NODES
8
Toward a Model of Priming
Mutual Inhibition
9
Toward a Model of Priming
Initially both noun and verb get activation
(bottom-up)
10
Toward a Model of Priming
Flower gets primed With some residual activation
11
Toward a Model of Priming
Strong activation from other context
12
Toward a Model of Priming
Strong activation from other context
Mutual inhibition kills the bottom-up noun
activation
13
Toward a Model of Priming
No activation from the noun
Strong activation from other context
Mutual inhibition kills the bottom-up noun
activation
14
Toward a Model of Priming
Strong activation from other context
No priming effect
15
More complicated structure

The man saw the girl with the telescope.
The women discussed the dogs on the beach
She threw a ball for charity.
The cop arrested by the police turned out to be
unreliable.
The complex houses married and single students

16
Other results that can be explained by the simple
model

Word superiority effect
Isolated letters are harder that letters in
context
One of the first parallel spreading activation
models (McClelland et al)
Top down and bottom-up processing.
Priming effects in sub-word phones
Semantic priming effects
Gender priming from syntax
And others..

17
Link to Vision The Necker Cube
18
Basic Ideas behind the model

Parallel activation streams.
Top down and bottom up activation combine to
determine the best matching structure.
Triangle nodes bind features of objects to values
Mutual inhibition and competition between
structures
Mental connections are active neural connections

19
Can we formalize/model these intuitions

What is a neurally plausible computational model
of spreading activation that captures these
features.
What does semantics mean in neurally embodied
terms
What are the neural substrates of concepts that
underlie verbs, nouns, spatial predicates?

20
Lecture Overview

Spreading Activation Toward a Model
Connectionist Models Introduction
Model of a neuron
McCollugh-Pitts Neuron
Activation Functions
Node Types Sigma-Pi, Temporal
Network Types
Perceptron and Feed-forward Nets
Hopfield Nets
Pattern Generator Nets
Winner Take All Networks
Triangle Nodes
Representing Concepts
Connectionist Encoding of Concepts
Distributed vs. Localist representations
Coarse Coding

21
(No Transcript)
22
Information Processing Abstraction
23
(No Transcript)
24
(No Transcript)
25
(No Transcript)
26
A Node in a NN

The weighted sum is called the net input to unit
i, often written neti
The function f is the unit's activation function.
In the simplest case, f is the identity function,
and the unit's output is just its net input. This
is called a linear unit
If the output is based on a threshold (fire if gt
threshold), then the unit is called a Threshold
Linear Unit (TLU)
F(neti) is also sometimes called ai

27
Simple Threshold Linear Unit
28
Simple Neuron Model
29
A Simple Example

a x1w1x2w2x3w3... xnwn
a 1x1 0.5x2 0.1x3
x1 1, x2 0, x3 0
Net(input) f 1
Threshold 2
Net(input) threshold lt 0
Output 0

.
30
Operation of a simple Neuron
31
Activation Functions
32
Different Activation Functions

Threshold Activation Function (step)
Piecewise Linear Activation Function
Sigmoid Activation Funtion
Gaussian Activation Function
Radial Basis Function

33
Types of Activation functions
34
The Sigmoid Function
ya
xneti
35
The Sigmoid Function
Output1
ya
Output0
xneti
36
The Sigmoid Function
Output1
Sensitivity to input
ya
Output0
xneti
37
Changing the exponent k(neti)
K gt1
K lt 1
38
Radial Basis Function
39
Sigma-Pi nodes

The previous spatial summation function supposes
that each input contributes to the activation
independently of the others.
That is, the contribution to the activation from
input 1 say, is always a constant multiplier (
w1) times x1.
Suppose however, that the contribution from input
1 depends also on input 2 and that, the larger
input 2, the larger is input 1's contribution.
The simplest way of modelling this is to include
a term in the activation like w12x1x2 where
w12gt0 (for a diminishing influence of input 2 we
would, of course, have w12lt0 ). In general we
might have terms containing all possible pairs of
inputs and also a term in the three inputs
together
w1x1 w2x2 w3x3 w12x1x2 w23x2x3 w13x1x3

40
Sigma-Pi units
41
Sigma-Pi Unit
42
Biological Evidence for Sigma-Pi Units

axo-dendritic synapse The stereotypical synapse
consists of an electro-chemical connection
between an axon and a dendrite - hence it is an
axo-dendritic synapse
presynaptic inhibition However there is a large
variety of synaptic types and connection
grouping. Of special importance are cases where
the efficacy of the axo-dendritic synapse between
axon 1 and the dendrite is modulated (inhibited)
by the activity in axon 2 via the axo-axonic
synapse between the two axons. This might
therefore be modelled by a quadratic term like
w12x1x2
synapse cluster Here the effect of the
individual synapses will surely not be
independent and we should look to model this with
a multilinear term in all the inputs.

43
Biological Evidence for Sigma-Pi units
presynaptic inhibition
axo-dendritic synapse
synapse cluster
44
Lecture Overview

Spreading Activation Toward a Model
Connectionist Models Introduction
Model of a neuron
McCollugh-Pitts Neuron
Activation Functions
Node Types Sigma-Pi, Temporal
Network Types
Perceptron and Feed-forward Nets
Hopfield Nets
Winner Take All Networks
Triangle Nodes
Representing Concepts
Connectionist Encoding of Concepts
Distributed vs. Localist representations
Coarse Coding

45
Temporal Aspects

Decay functions
Explicit delays/time
Temoral Summation
Temporal AND
Sequence and Recurrent connections

46
Decay functions
47
Temporal-AND
48
Types of Neuron parameters

The form of the function - e.g. linear,
sigma-pi, cubic.
The activation-output relation - linear,
hard-limiter, or sigmoidal.
The nature of the signals used to communicate
between nodes - analogue or boolean.
The dynamics of the node - deterministic or
stochastic.

49
Lecture Overview

Spreading Activation Toward a Model
Connectionist Models Introduction
Model of a neuron
McCollugh-Pitts Neuron
Activation Functions
Node Types Sigma-Pi, Temporal
Network Types
Perceptron and Feed-forward Nets
Hopfield Nets
Winner Take All Networks
Triangle Nodes
Representing Concepts
Connectionist Encoding of Concepts
Distributed vs. Localist representations
Coarse Coding

50
The Perceptron
51
The Perceptron
Input Pattern
52
The Perceptron
Input Pattern
Output Classification
53
A Pattern Classification
54
The Input Pattern Space

55
Pattern Space

The space in which the inputs reside is referred
to as the pattern space. Each pattern determines
a point in the space by using its component
values as space-coordinates. In general, for
n-inputs, the pattern space will be
n-dimensional.
Clearly, for nD, the pattern space cannot be
drawn or represented in physical space. This is
not a problem we shall return to the idea of
using higher dimensional spaces later. However,
the geometric insight obtained in 2-D will carry
over (when expressed algebraically) into n-D.

56
The Linear Separation of Classes

Since the critical condition for classification
occurs when the activation equals the threshold,
it is useful to examine the geometric implication
of this. Putting the activation equal to the
threshold gives
?wixi ?
In the 2-D case we are considering
w1x1 w2x2 ?
x2 -(w1/w2)x1 ?/w2
x2 ax1 b
That is, a straight line with slope a and
intercept b on thex2 axis.

57
Decision Hyperplane
Pattern Classifying Hyperplane

58
Decision Hyperplane

The two classes are therefore separated by the
decision' line which is defined by putting the
activation equal to the threshold.
It turns out that it is possible to generalise
this result to TLUs with n inputs.
In 3-D the two classes are separated by a
decision-plane.
In n-D this becomes a decision-hyperplane.

59
Linearly separable patterns
An architecture for a Perceptron which can solve
this type of decision boundary problem. An "on"
response in the output node represents one
class, and an "off" response represents the
other.
Linearly Separable Patterns
60
The XOR Function
61
The Input Pattern Space

62
The Decision planes

63
Multi-layer Feed-forward Network
64
Pattern Separation and NN architecture
65
Hopfield Networks

symmetrical connections if there is a connection
going from unit j to unit i having a connection
weight equal to W_ij then there is also a
connection going from unit i to unit j with an
equal weight.
linear threshold activation if the total
weighted summed input (dot product of input and
weights) to a unit is greater than or equal to
zero, its state is set to 1, otherwise it is -1.
Normally, the threshold is zero. Note that the
Hopfield network for the travelling salesman
problem (assignment 3) behaved slightly
differently from this.
asynchronous state updates units are visited in
random order and updated according to the above
linear threshold rule.
Energy function it can be shown that the above
state dynamics minimizes an energy function.

66
Hopfield Nets
67
Hopfield Net Symmetric weights

Every node is connected to every other node (but
not to itself) and the connection strengths or
weights are symmetric in that the weight from
node i to node j is the same as that from node j
to node i .
Wii 0

68
Recurrence in Hopfield Nets

That is, there is feedback in the network and so
they are known as feedback or recurrent nets as
opposed to feedforward nets

69
Hopfield Net

The state of the net at any time is given by the
vector of the node outputs.
Suppose we now start this net in some initial
state and choose a node at random and let it
update its output or fire'.
That is, it evaluates its activation in the
normal way and outputs a 1' if this is greater
than or equal to zero and a -1' otherwise.

70
Hopfield Net Activation
71
Hopfield Nets
72
States and Transitions
73
State Transition Diagram
74
Interpreting the STN

States are represented by the circles with their
associated state number.
Directed arcs represent possible transitions
between states and the number alongside each arc
is the probability that each transition will take
place.
The states have been arranged in such a way that
transitions tend to take place down the diagram
this will be shown to reflect the way the system
decreases its energy.
The important thing to notice at this stage is
that, no matter where we start in the diagram,
the net will eventually find itself in one of the
states 3' or 6'. These reenter themselves with
probability 1.
That is they are stable states - once the net
finds itself in one of these it stays there.
The state vectors for 3' and 6' are (0,1,1) and
(1,1,0) respectively and so these are the
memories' stored by the net.

75
Associative memory

A Hopfield net stores specific patterns as
attractor states of the network.
When input is presented (a specific state
vector), the system settles into the closest
attractor state (Energy function).
Partial input and degraded input can trigger of
the closest associated memory vector.

If j were given the chance to update or fire, the
contribution to its activation from i is positive
and this may well serve to bring j's activation
above threshold and make it output a 1'.
A similar situation would prevail if the initial
output states of the two nodes had been reversed
since the connection is symmetric.
If, on the other hand, both units are on' they
are reinforcing each other's current output. The
weight may therefore be thought of as fixing a
constraint between i and j that tends to make
them both take on the value 1'.
A negative weight would tend to enforce opposite
outputs.
One way of viewing these networks is therefore as
constraint satisfaction nets.

77
Energy Function

The idea can be quantified using an energy
function. Consider the energy function eij -
wijxixj

78
The lowest energy

If the weight is positive then the last entry is
negative and is the lowest value in the table.
If eij is regarded as the energy' of the pair ij
then the lowest energy occurs when both units are
on which is consistent with the arguments above.
If the weight is negative, the 11' state is the
highest energy state and is not favoured.

79
Hopfield Net energy

The energy of the net is found by summing over
all pairs of nodes
Note that since the network is symmetric,we
count twice ij,ji

80
For a particular unit k

Suppose node k is chosen to be updated. Write the
energy E by singling out the terms involving this
node.

Now, because wij wji, the last two sums may
be combined
81
Energy function

Pulling xk out and rewriting (denoting the first
sum by S)
ak is the output of unit k

82
Activation function

Let the energy after k has updated be E and the
new output be xk . Then
E S - xkak
The change in energy E - E is thus
E E - (xk xk) ak

83
Consider the cases for the energy function

There are now two cases to consider
ak gt0 Then the output goes from 0' to 1' or
stays at 1'. In either case (xk xk) gt 0.
Therefore E-E lt 0.
ak lt 0. Then the output goes from 1' to 0' or
stays at 0'. In either case (xk xk) lt 0 .
Therefore, once again E - E lt0.
Thus, for any node being updated we always have
E-E lt 0, and so the energy of the net decreases
or stays the same. But the energy is bounded
below by a value obtained by putting all the
xi1, xj1 in the equation for E.
Thus E must reach some fixed value and then stay
the same state.

84
Energy surface and gradients

The notion of an energy surface is a central
component in understanding constraint
satisfaction systems.
Other examples include
MAP estimation and belief update in Bayes Nets.
Boltzman machine dynamics

Write a Comment

User Comments (0)