Connectionist Models: Basics - PowerPoint PPT Presentation

1 / 84
About This Presentation
Title:

Connectionist Models: Basics

Description:

... connection between an axon and a dendrite - hence it is an axo-dendritic synapse ... dendrite is modulated (inhibited) by the activity in axon 2 via the axo-axonic ... – PowerPoint PPT presentation

Number of Views:128
Avg rating:3.0/5.0
Slides: 85
Provided by: srinina
Category:

less

Transcript and Presenter's Notes

Title: Connectionist Models: Basics


1
Connectionist Models Basics
  • Srini Narayanan
  • CS182/CogSci110/Ling109
  • Spring 2004

2
Lecture Overview
  • Spreading Activation Toward a Model
  • Connectionist Models Introduction
  • Model of a neuron
  • McCollugh-Pitts Neuron
  • Activation Functions
  • Node Types Sigma-Pi, Temporal
  • Network Types
  • Perceptron and Feed-forward Nets
  • Hopfield Nets
  • Pattern generator networks
  • Winner Take All Networks
  • Triangle Nodes
  • Representing Concepts
  • Connectionist Encoding of Concepts
  • Distributed vs. Localist representations
  • Coarse Coding

3
Cross-modal priming effects
4
Results
5
Toward a Model of Priming
6
Toward a Model of Priming
Mental Connections are Implemented as neural
connections
7
Toward a Model of Priming
TRIANGLE NODES
8
Toward a Model of Priming
Mutual Inhibition
9
Toward a Model of Priming
Initially both noun and verb get activation
(bottom-up)
10
Toward a Model of Priming
Flower gets primed With some residual activation
11
Toward a Model of Priming
Strong activation from other context
12
Toward a Model of Priming
Strong activation from other context
Mutual inhibition kills the bottom-up noun
activation
13
Toward a Model of Priming
No activation from the noun
Strong activation from other context
Mutual inhibition kills the bottom-up noun
activation
14
Toward a Model of Priming
Strong activation from other context
No priming effect
15
More complicated structure
  • The man saw the girl with the telescope.
  • The women discussed the dogs on the beach
  • She threw a ball for charity.
  • The cop arrested by the police turned out to be
    unreliable.
  • The complex houses married and single students

16
Other results that can be explained by the simple
model
  • Word superiority effect
  • Isolated letters are harder that letters in
    context
  • One of the first parallel spreading activation
    models (McClelland et al)
  • Top down and bottom-up processing.
  • Priming effects in sub-word phones
  • Semantic priming effects
  • Gender priming from syntax
  • And others..

17
Link to Vision The Necker Cube
18
Basic Ideas behind the model
  • Parallel activation streams.
  • Top down and bottom up activation combine to
    determine the best matching structure.
  • Triangle nodes bind features of objects to values
  • Mutual inhibition and competition between
    structures
  • Mental connections are active neural connections

19
Can we formalize/model these intuitions
  • What is a neurally plausible computational model
    of spreading activation that captures these
    features.
  • What does semantics mean in neurally embodied
    terms
  • What are the neural substrates of concepts that
    underlie verbs, nouns, spatial predicates?

20
Lecture Overview
  • Spreading Activation Toward a Model
  • Connectionist Models Introduction
  • Model of a neuron
  • McCollugh-Pitts Neuron
  • Activation Functions
  • Node Types Sigma-Pi, Temporal
  • Network Types
  • Perceptron and Feed-forward Nets
  • Hopfield Nets
  • Pattern Generator Nets
  • Winner Take All Networks
  • Triangle Nodes
  • Representing Concepts
  • Connectionist Encoding of Concepts
  • Distributed vs. Localist representations
  • Coarse Coding

21
(No Transcript)
22
Information Processing Abstraction
23
(No Transcript)
24
(No Transcript)
25
(No Transcript)
26
A Node in a NN
  • The weighted sum is called the net input to unit
    i, often written neti
  • The function f is the unit's activation function.
  • In the simplest case, f is the identity function,
    and the unit's output is just its net input. This
    is called a linear unit
  • If the output is based on a threshold (fire if gt
    threshold), then the unit is called a Threshold
    Linear Unit (TLU)
  • F(neti) is also sometimes called ai

27
Simple Threshold Linear Unit
28
Simple Neuron Model
29
A Simple Example
  • a x1w1x2w2x3w3... xnwn
  • a 1x1 0.5x2 0.1x3
  • x1 1, x2 0, x3 0
  • Net(input) f 1
  • Threshold 2
  • Net(input) threshold lt 0
  • Output 0

.
30
Operation of a simple Neuron
31
Activation Functions
32
Different Activation Functions
  • Threshold Activation Function (step)
  • Piecewise Linear Activation Function
  • Sigmoid Activation Funtion
  • Gaussian Activation Function
  • Radial Basis Function

33
Types of Activation functions
34
The Sigmoid Function
ya
xneti
35
The Sigmoid Function
Output1
ya
Output0
xneti
36
The Sigmoid Function
Output1
Sensitivity to input
ya
Output0
xneti
37
Changing the exponent k(neti)
K gt1
K lt 1
38
Radial Basis Function
39
Sigma-Pi nodes
  • The previous spatial summation function supposes
    that each input contributes to the activation
    independently of the others.
  • That is, the contribution to the activation from
    input 1 say, is always a constant multiplier (
    w1) times x1.
  • Suppose however, that the contribution from input
    1 depends also on input 2 and that, the larger
    input 2, the larger is input 1's contribution.
  • The simplest way of modelling this is to include
    a term in the activation like w12x1x2 where
    w12gt0 (for a diminishing influence of input 2 we
    would, of course, have w12lt0 ). In general we
    might have terms containing all possible pairs of
    inputs and also a term in the three inputs
    together
  • w1x1 w2x2 w3x3 w12x1x2 w23x2x3 w13x1x3

40
Sigma-Pi units
41
Sigma-Pi Unit
42
Biological Evidence for Sigma-Pi Units
  • axo-dendritic synapse The stereotypical synapse
    consists of an electro-chemical connection
    between an axon and a dendrite - hence it is an
    axo-dendritic synapse
  • presynaptic inhibition However there is a large
    variety of synaptic types and connection
    grouping. Of special importance are cases where
    the efficacy of the axo-dendritic synapse between
    axon 1 and the dendrite is modulated (inhibited)
    by the activity in axon 2 via the axo-axonic
    synapse between the two axons. This might
    therefore be modelled by a quadratic term like
    w12x1x2
  • synapse cluster Here the effect of the
    individual synapses will surely not be
    independent and we should look to model this with
    a multilinear term in all the inputs.

43
Biological Evidence for Sigma-Pi units
presynaptic inhibition
axo-dendritic synapse
synapse cluster
44
Lecture Overview
  • Spreading Activation Toward a Model
  • Connectionist Models Introduction
  • Model of a neuron
  • McCollugh-Pitts Neuron
  • Activation Functions
  • Node Types Sigma-Pi, Temporal
  • Network Types
  • Perceptron and Feed-forward Nets
  • Hopfield Nets
  • Winner Take All Networks
  • Triangle Nodes
  • Representing Concepts
  • Connectionist Encoding of Concepts
  • Distributed vs. Localist representations
  • Coarse Coding

45
Temporal Aspects
  • Decay functions
  • Explicit delays/time
  • Temoral Summation
  • Temporal AND
  • Sequence and Recurrent connections

46
Decay functions
47
Temporal-AND
48
Types of Neuron parameters
  • The form of the function - e.g. linear,
    sigma-pi, cubic.
  • The activation-output relation - linear,
    hard-limiter, or sigmoidal.
  • The nature of the signals used to communicate
    between nodes - analogue or boolean.
  • The dynamics of the node - deterministic or
    stochastic.

49
Lecture Overview
  • Spreading Activation Toward a Model
  • Connectionist Models Introduction
  • Model of a neuron
  • McCollugh-Pitts Neuron
  • Activation Functions
  • Node Types Sigma-Pi, Temporal
  • Network Types
  • Perceptron and Feed-forward Nets
  • Hopfield Nets
  • Winner Take All Networks
  • Triangle Nodes
  • Representing Concepts
  • Connectionist Encoding of Concepts
  • Distributed vs. Localist representations
  • Coarse Coding

50
The Perceptron
51
The Perceptron
Input Pattern
52
The Perceptron
Input Pattern
Output Classification
53
A Pattern Classification
54
The Input Pattern Space
 
55
Pattern Space
  • The space in which the inputs reside is referred
    to as the pattern space. Each pattern determines
    a point in the space by using its component
    values as space-coordinates. In general, for
    n-inputs, the pattern space will be
    n-dimensional.
  • Clearly, for nD, the pattern space cannot be
    drawn or represented in physical space. This is
    not a problem we shall return to the idea of
    using higher dimensional spaces later. However,
    the geometric insight obtained in 2-D will carry
    over (when expressed algebraically) into n-D.

56
The Linear Separation of Classes
  • Since the critical condition for classification
    occurs when the activation equals the threshold,
    it is useful to examine the geometric implication
    of this. Putting the activation equal to the
    threshold gives
  • ?wixi ?
  • In the 2-D case we are considering
  • w1x1 w2x2 ?
  • x2 -(w1/w2)x1 ?/w2
  • x2 ax1 b
  • That is, a straight line with slope a and
    intercept b on thex2 axis.

57
Decision Hyperplane
Pattern Classifying Hyperplane
 
58
Decision Hyperplane
  • The two classes are therefore separated by the
    decision' line which is defined by putting the
    activation equal to the threshold.
  • It turns out that it is possible to generalise
    this result to TLUs with n inputs.
  • In 3-D the two classes are separated by a
    decision-plane.
  • In n-D this becomes a decision-hyperplane.

59
Linearly separable patterns
An architecture for a Perceptron which can solve
this type of decision boundary problem. An "on"
response in the output node represents one
class, and an "off" response represents the
other.
Linearly Separable Patterns
60
The XOR Function
61
The Input Pattern Space
 
62
The Decision planes
 
63
Multi-layer Feed-forward Network
64
Pattern Separation and NN architecture
65
Hopfield Networks
  • symmetrical connections if there is a connection
    going from unit j to unit i having a connection
    weight equal to W_ij then there is also a
    connection going from unit i to unit j with an
    equal weight.
  • linear threshold activation if the total
    weighted summed input (dot product of input and
    weights) to a unit is greater than or equal to
    zero, its state is set to 1, otherwise it is -1.
    Normally, the threshold is zero. Note that the
    Hopfield network for the travelling salesman
    problem (assignment 3) behaved slightly
    differently from this.
  • asynchronous state updates units are visited in
    random order and updated according to the above
    linear threshold rule.
  • Energy function it can be shown that the above
    state dynamics minimizes an energy function.

66
Hopfield Nets
67
Hopfield Net Symmetric weights
  • Every node is connected to every other node (but
    not to itself) and the connection strengths or
    weights are symmetric in that the weight from
    node i to node j is the same as that from node j
    to node i .
  • Wii 0

68
Recurrence in Hopfield Nets
  • That is, there is feedback in the network and so
    they are known as feedback or recurrent nets as
    opposed to feedforward nets

69
Hopfield Net
  • The state of the net at any time is given by the
    vector of the node outputs.
  • Suppose we now start this net in some initial
    state and choose a node at random and let it
    update its output or fire'.
  • That is, it evaluates its activation in the
    normal way and outputs a 1' if this is greater
    than or equal to zero and a -1' otherwise.

70
Hopfield Net Activation
71
Hopfield Nets
72
States and Transitions
73
State Transition Diagram
74
Interpreting the STN
  • States are represented by the circles with their
    associated state number.
  • Directed arcs represent possible transitions
    between states and the number alongside each arc
    is the probability that each transition will take
    place.
  • The states have been arranged in such a way that
    transitions tend to take place down the diagram
    this will be shown to reflect the way the system
    decreases its energy.
  • The important thing to notice at this stage is
    that, no matter where we start in the diagram,
    the net will eventually find itself in one of the
    states 3' or 6'. These reenter themselves with
    probability 1.
  • That is they are stable states - once the net
    finds itself in one of these it stays there.
  • The state vectors for 3' and 6' are (0,1,1) and
    (1,1,0) respectively and so these are the
    memories' stored by the net.

75
Associative memory
  • A Hopfield net stores specific patterns as
    attractor states of the network.
  • When input is presented (a specific state
    vector), the system settles into the closest
    attractor state (Energy function).
  • Partial input and degraded input can trigger of
    the closest associated memory vector.

76
  • If j were given the chance to update or fire, the
    contribution to its activation from i is positive
    and this may well serve to bring j's activation
    above threshold and make it output a 1'.
  • A similar situation would prevail if the initial
    output states of the two nodes had been reversed
    since the connection is symmetric.
  • If, on the other hand, both units are on' they
    are reinforcing each other's current output. The
    weight may therefore be thought of as fixing a
    constraint between i and j that tends to make
    them both take on the value 1'.
  • A negative weight would tend to enforce opposite
    outputs.
  • One way of viewing these networks is therefore as
    constraint satisfaction nets.

77
Energy Function
  • The idea can be quantified using an energy
    function. Consider the energy function eij -
    wijxixj

78
The lowest energy
  • If the weight is positive then the last entry is
    negative and is the lowest value in the table.
  • If eij is regarded as the energy' of the pair ij
    then the lowest energy occurs when both units are
    on which is consistent with the arguments above.
  • If the weight is negative, the 11' state is the
    highest energy state and is not favoured.

79
Hopfield Net energy
  • The energy of the net is found by summing over
    all pairs of nodes
  • Note that since the network is symmetric,we
    count twice ij,ji

80
For a particular unit k
  • Suppose node k is chosen to be updated. Write the
    energy E by singling out the terms involving this
    node.

Now, because wij wji, the last two sums may
be combined
81
Energy function
  • Pulling xk out and rewriting (denoting the first
    sum by S)
  • ak is the output of unit k

82
Activation function
  • Let the energy after k has updated be E and the
    new output be xk . Then
  • E S - xkak
  • The change in energy E - E is thus
  • E E - (xk xk) ak

83
Consider the cases for the energy function
  • There are now two cases to consider
  • ak gt0 Then the output goes from 0' to 1' or
    stays at 1'. In either case (xk xk) gt 0.
    Therefore E-E lt 0.
  • ak lt 0. Then the output goes from 1' to 0' or
    stays at 0'. In either case (xk xk) lt 0 .
    Therefore, once again E - E lt0.
  • Thus, for any node being updated we always have
    E-E lt 0, and so the energy of the net decreases
    or stays the same. But the energy is bounded
    below by a value obtained by putting all the
    xi1, xj1 in the equation for E.
  • Thus E must reach some fixed value and then stay
    the same state.

84
Energy surface and gradients
  • The notion of an energy surface is a central
    component in understanding constraint
    satisfaction systems.
  • Other examples include
  • MAP estimation and belief update in Bayes Nets.
  • Boltzman machine dynamics
Write a Comment
User Comments (0)
About PowerShow.com