Title: Connectionist Models: Basics
1Connectionist Models Basics
- Srini Narayanan
- CS182/CogSci110/Ling109
- Spring 2004
2Lecture Overview
- Spreading Activation Toward a Model
- Connectionist Models Introduction
- Model of a neuron
- McCollugh-Pitts Neuron
- Activation Functions
- Node Types Sigma-Pi, Temporal
- Network Types
- Perceptron and Feed-forward Nets
- Hopfield Nets
- Pattern generator networks
- Winner Take All Networks
- Triangle Nodes
- Representing Concepts
- Connectionist Encoding of Concepts
- Distributed vs. Localist representations
- Coarse Coding
3Cross-modal priming effects
4Results
5Toward a Model of Priming
6Toward a Model of Priming
Mental Connections are Implemented as neural
connections
7Toward a Model of Priming
TRIANGLE NODES
8Toward a Model of Priming
Mutual Inhibition
9Toward a Model of Priming
Initially both noun and verb get activation
(bottom-up)
10Toward a Model of Priming
Flower gets primed With some residual activation
11Toward a Model of Priming
Strong activation from other context
12Toward a Model of Priming
Strong activation from other context
Mutual inhibition kills the bottom-up noun
activation
13Toward a Model of Priming
No activation from the noun
Strong activation from other context
Mutual inhibition kills the bottom-up noun
activation
14Toward a Model of Priming
Strong activation from other context
No priming effect
15More complicated structure
- The man saw the girl with the telescope.
- The women discussed the dogs on the beach
- She threw a ball for charity.
- The cop arrested by the police turned out to be
unreliable. - The complex houses married and single students
16Other results that can be explained by the simple
model
- Word superiority effect
- Isolated letters are harder that letters in
context - One of the first parallel spreading activation
models (McClelland et al) - Top down and bottom-up processing.
- Priming effects in sub-word phones
- Semantic priming effects
- Gender priming from syntax
- And others..
17Link to Vision The Necker Cube
18Basic Ideas behind the model
- Parallel activation streams.
- Top down and bottom up activation combine to
determine the best matching structure. - Triangle nodes bind features of objects to values
- Mutual inhibition and competition between
structures - Mental connections are active neural connections
19Can we formalize/model these intuitions
- What is a neurally plausible computational model
of spreading activation that captures these
features. - What does semantics mean in neurally embodied
terms - What are the neural substrates of concepts that
underlie verbs, nouns, spatial predicates?
20Lecture Overview
- Spreading Activation Toward a Model
- Connectionist Models Introduction
- Model of a neuron
- McCollugh-Pitts Neuron
- Activation Functions
- Node Types Sigma-Pi, Temporal
- Network Types
- Perceptron and Feed-forward Nets
- Hopfield Nets
- Pattern Generator Nets
- Winner Take All Networks
- Triangle Nodes
- Representing Concepts
- Connectionist Encoding of Concepts
- Distributed vs. Localist representations
- Coarse Coding
21(No Transcript)
22Information Processing Abstraction
23(No Transcript)
24(No Transcript)
25(No Transcript)
26A Node in a NN
- The weighted sum is called the net input to unit
i, often written neti - The function f is the unit's activation function.
- In the simplest case, f is the identity function,
and the unit's output is just its net input. This
is called a linear unit - If the output is based on a threshold (fire if gt
threshold), then the unit is called a Threshold
Linear Unit (TLU) - F(neti) is also sometimes called ai
27Simple Threshold Linear Unit
28Simple Neuron Model
29A Simple Example
- a x1w1x2w2x3w3... xnwn
- a 1x1 0.5x2 0.1x3
- x1 1, x2 0, x3 0
- Net(input) f 1
- Threshold 2
- Net(input) threshold lt 0
- Output 0
.
30Operation of a simple Neuron
31Activation Functions
32Different Activation Functions
- Threshold Activation Function (step)
- Piecewise Linear Activation Function
- Sigmoid Activation Funtion
- Gaussian Activation Function
- Radial Basis Function
33Types of Activation functions
34The Sigmoid Function
ya
xneti
35The Sigmoid Function
Output1
ya
Output0
xneti
36The Sigmoid Function
Output1
Sensitivity to input
ya
Output0
xneti
37Changing the exponent k(neti)
K gt1
K lt 1
38Radial Basis Function
39Sigma-Pi nodes
- The previous spatial summation function supposes
that each input contributes to the activation
independently of the others. - That is, the contribution to the activation from
input 1 say, is always a constant multiplier (
w1) times x1. - Suppose however, that the contribution from input
1 depends also on input 2 and that, the larger
input 2, the larger is input 1's contribution. - The simplest way of modelling this is to include
a term in the activation like w12x1x2 where
w12gt0 (for a diminishing influence of input 2 we
would, of course, have w12lt0 ). In general we
might have terms containing all possible pairs of
inputs and also a term in the three inputs
together - w1x1 w2x2 w3x3 w12x1x2 w23x2x3 w13x1x3
40Sigma-Pi units
41Sigma-Pi Unit
42Biological Evidence for Sigma-Pi Units
- axo-dendritic synapse The stereotypical synapse
consists of an electro-chemical connection
between an axon and a dendrite - hence it is an
axo-dendritic synapse - presynaptic inhibition However there is a large
variety of synaptic types and connection
grouping. Of special importance are cases where
the efficacy of the axo-dendritic synapse between
axon 1 and the dendrite is modulated (inhibited)
by the activity in axon 2 via the axo-axonic
synapse between the two axons. This might
therefore be modelled by a quadratic term like
w12x1x2 - synapse cluster Here the effect of the
individual synapses will surely not be
independent and we should look to model this with
a multilinear term in all the inputs.
43Biological Evidence for Sigma-Pi units
presynaptic inhibition
axo-dendritic synapse
synapse cluster
44Lecture Overview
- Spreading Activation Toward a Model
- Connectionist Models Introduction
- Model of a neuron
- McCollugh-Pitts Neuron
- Activation Functions
- Node Types Sigma-Pi, Temporal
- Network Types
- Perceptron and Feed-forward Nets
- Hopfield Nets
- Winner Take All Networks
- Triangle Nodes
- Representing Concepts
- Connectionist Encoding of Concepts
- Distributed vs. Localist representations
- Coarse Coding
45Temporal Aspects
- Decay functions
- Explicit delays/time
- Temoral Summation
- Temporal AND
- Sequence and Recurrent connections
46Decay functions
47Temporal-AND
48Types of Neuron parameters
- The form of the function - e.g. linear,
sigma-pi, cubic. - The activation-output relation - linear,
hard-limiter, or sigmoidal. - The nature of the signals used to communicate
between nodes - analogue or boolean. - The dynamics of the node - deterministic or
stochastic.
49Lecture Overview
- Spreading Activation Toward a Model
- Connectionist Models Introduction
- Model of a neuron
- McCollugh-Pitts Neuron
- Activation Functions
- Node Types Sigma-Pi, Temporal
- Network Types
- Perceptron and Feed-forward Nets
- Hopfield Nets
- Winner Take All Networks
- Triangle Nodes
- Representing Concepts
- Connectionist Encoding of Concepts
- Distributed vs. Localist representations
- Coarse Coding
50The Perceptron
51The Perceptron
Input Pattern
52The Perceptron
Input Pattern
Output Classification
53A Pattern Classification
54The Input Pattern Space
Â
55Pattern Space
- The space in which the inputs reside is referred
to as the pattern space. Each pattern determines
a point in the space by using its component
values as space-coordinates. In general, for
n-inputs, the pattern space will be
n-dimensional. - Clearly, for nD, the pattern space cannot be
drawn or represented in physical space. This is
not a problem we shall return to the idea of
using higher dimensional spaces later. However,
the geometric insight obtained in 2-D will carry
over (when expressed algebraically) into n-D.
56The Linear Separation of Classes
- Since the critical condition for classification
occurs when the activation equals the threshold,
it is useful to examine the geometric implication
of this. Putting the activation equal to the
threshold gives - ?wixi ?
- In the 2-D case we are considering
- w1x1 w2x2 ?
- x2 -(w1/w2)x1 ?/w2
- x2 ax1 b
- That is, a straight line with slope a and
intercept b on thex2 axis.
57Decision Hyperplane
Pattern Classifying Hyperplane
Â
58Decision Hyperplane
- The two classes are therefore separated by the
decision' line which is defined by putting the
activation equal to the threshold. - It turns out that it is possible to generalise
this result to TLUs with n inputs. - In 3-D the two classes are separated by a
decision-plane. - In n-D this becomes a decision-hyperplane.
59Linearly separable patterns
An architecture for a Perceptron which can solve
this type of decision boundary problem. An "on"
response in the output node represents one
class, and an "off" response represents the
other.
Linearly Separable Patterns
60The XOR Function
61The Input Pattern Space
Â
62The Decision planes
Â
63Multi-layer Feed-forward Network
64Pattern Separation and NN architecture
65Hopfield Networks
- symmetrical connections if there is a connection
going from unit j to unit i having a connection
weight equal to W_ij then there is also a
connection going from unit i to unit j with an
equal weight. - linear threshold activation if the total
weighted summed input (dot product of input and
weights) to a unit is greater than or equal to
zero, its state is set to 1, otherwise it is -1.
Normally, the threshold is zero. Note that the
Hopfield network for the travelling salesman
problem (assignment 3) behaved slightly
differently from this. - asynchronous state updates units are visited in
random order and updated according to the above
linear threshold rule. - Energy function it can be shown that the above
state dynamics minimizes an energy function.
66Hopfield Nets
67Hopfield Net Symmetric weights
- Every node is connected to every other node (but
not to itself) and the connection strengths or
weights are symmetric in that the weight from
node i to node j is the same as that from node j
to node i . - Wii 0
68Recurrence in Hopfield Nets
- That is, there is feedback in the network and so
they are known as feedback or recurrent nets as
opposed to feedforward nets
69Hopfield Net
- The state of the net at any time is given by the
vector of the node outputs. - Suppose we now start this net in some initial
state and choose a node at random and let it
update its output or fire'. - That is, it evaluates its activation in the
normal way and outputs a 1' if this is greater
than or equal to zero and a -1' otherwise.
70Hopfield Net Activation
71Hopfield Nets
72States and Transitions
73State Transition Diagram
74Interpreting the STN
- States are represented by the circles with their
associated state number. - Directed arcs represent possible transitions
between states and the number alongside each arc
is the probability that each transition will take
place. - The states have been arranged in such a way that
transitions tend to take place down the diagram
this will be shown to reflect the way the system
decreases its energy. - The important thing to notice at this stage is
that, no matter where we start in the diagram,
the net will eventually find itself in one of the
states 3' or 6'. These reenter themselves with
probability 1. - That is they are stable states - once the net
finds itself in one of these it stays there. - The state vectors for 3' and 6' are (0,1,1) and
(1,1,0) respectively and so these are the
memories' stored by the net.
75Associative memory
- A Hopfield net stores specific patterns as
attractor states of the network. - When input is presented (a specific state
vector), the system settles into the closest
attractor state (Energy function). - Partial input and degraded input can trigger of
the closest associated memory vector.
76- If j were given the chance to update or fire, the
contribution to its activation from i is positive
and this may well serve to bring j's activation
above threshold and make it output a 1'. - A similar situation would prevail if the initial
output states of the two nodes had been reversed
since the connection is symmetric. - If, on the other hand, both units are on' they
are reinforcing each other's current output. The
weight may therefore be thought of as fixing a
constraint between i and j that tends to make
them both take on the value 1'. - A negative weight would tend to enforce opposite
outputs. - One way of viewing these networks is therefore as
constraint satisfaction nets.
77Energy Function
- The idea can be quantified using an energy
function. Consider the energy function eij -
wijxixj
78The lowest energy
- If the weight is positive then the last entry is
negative and is the lowest value in the table. - If eij is regarded as the energy' of the pair ij
then the lowest energy occurs when both units are
on which is consistent with the arguments above. - If the weight is negative, the 11' state is the
highest energy state and is not favoured.
79Hopfield Net energy
- The energy of the net is found by summing over
all pairs of nodes - Note that since the network is symmetric,we
count twice ij,ji
80For a particular unit k
- Suppose node k is chosen to be updated. Write the
energy E by singling out the terms involving this
node.
Now, because wij wji, the last two sums may
be combined
81Energy function
- Pulling xk out and rewriting (denoting the first
sum by S) - ak is the output of unit k
82Activation function
- Let the energy after k has updated be E and the
new output be xk . Then - E S - xkak
- The change in energy E - E is thus
- E E - (xk xk) ak
83Consider the cases for the energy function
- There are now two cases to consider
- ak gt0 Then the output goes from 0' to 1' or
stays at 1'. In either case (xk xk) gt 0.
Therefore E-E lt 0. - ak lt 0. Then the output goes from 1' to 0' or
stays at 0'. In either case (xk xk) lt 0 .
Therefore, once again E - E lt0. - Thus, for any node being updated we always have
E-E lt 0, and so the energy of the net decreases
or stays the same. But the energy is bounded
below by a value obtained by putting all the
xi1, xj1 in the equation for E. - Thus E must reach some fixed value and then stay
the same state.
84Energy surface and gradients
- The notion of an energy surface is a central
component in understanding constraint
satisfaction systems. - Other examples include
- MAP estimation and belief update in Bayes Nets.
- Boltzman machine dynamics