WK7 - PowerPoint PPT Presentation

About This Presentation
Title:

WK7

Description:

Typical examples are: Hebb s hypothesis: In the simplest case we have just the product of the two signals (it is also called the activity product rule): Hebb. – PowerPoint PPT presentation

Number of Views:23
Avg rating:3.0/5.0
Slides: 42
Provided by: geo92
Category:
Tags: hebb | wk7

less

Transcript and Presenter's Notes

Title: WK7


1
WK7 Hebbian Learning
CS 476 Networks of Neural Computation WK7
Hebbian Learning Dr. Stathis Kasderidis Dept. of
Computer Science University of Crete Spring
Semester, 2009
2
Contents
  • Introduction to Hebbian Learning
  • Definitions on Pattern Association
  • Pattern Association Network
  • Formal Theory of Associations Building
    Correlations
  • Examples
  • Conclusions

Contents
3
Hebbian Learning
  • Hebbian Learning is a learning rule which is the
    oldest and most famous of all learning rules. It
    was postulated by Donald Hebb (1949) in his book
    (The Organisation of Behaviour)
  • When an axon of cell A is near enough to excite
    a cell B and repeatedly or persistently takes
    part in firing it, some growth process or
    metabolic changes take place in one or both cells
    such as As efficiency as one of the cells firing
    B, is increased
  • Hebb proposed this change as a basis of
    associative learning. We may expand this as a
    two-part rule

Hebb. Learn.
4
Hebbian Learning-1
  • If two neurons on either side of a synapse
    (connection) are activated simultaneously then
    the strength of that synapse is selectively
    increased.
  • If two neurons on either side of a synapse are
    activated asynchronously, then that synapse is
    selectively weakened or eliminated.
  • Such a synapse is called a Hebbian synapse. More
    precisely, we define a Hebbian synapse as a
    synapse that uses a time-dependent, highly local,
    and strongly interactive mechanism to increase
    synaptic efficiency as a function of the
    correlation between the pre-synaptic and
    post-synaptic activities.

Hebb. Learn.
5
Hebbian Learning-2
  • We analyse the four key mechanisms mentioned
    above
  • Time dependent mechanism This mechanism refers
    to the fact that the modifications in the synapse
    depend on the exact time of occurrence of the
    pre-synaptic and post-synaptic signals
  • Local mechanism By its very nature, a synapse is
    the transmission site where information-bearing
    signals are in spationtemporal contiguity. This
    locally available information is used by the
    synapse to produce a local modification that is
    input specific
  • Interactive Mechanism The occurrence of a change
    in a synapse depends on signals on both

Hebb. Learn.
6
Hebbian Learning-3
  • sides of the synapse. That is, a Hebbian form of
    learning depends on a true interaction between
    the pre- and post-synaptic signals in the sense
    that we cannot make any prediction from either
    one of these two activities by itself. The
    interaction may be deterministic of stochastic
  • Correlational mechanism The condition for a
    change in synaptic efficiency is the
    co-occurrence of pre- and post-synaptic signals.
    The correlation over time between the two signals
    is responsible for the synaptic change.

Hebb. Learn.
7
Hebbian Learning-4
  • We may classify synaptic modifications of a
    synapse as
  • Hebbian which is a synapse that increases its
    strength when positively correlated pre- and
    post-synaptic signals are present and decreases
    its strength when these signals are either
    uncorrelated or negatively correlated
  • Anti-Hebbian Such a synapse weakens positively
    correlated pre- and post-synaptic signals and
    strengthens negatively correlated signals
  • Non-Hebbian It does not involve, in the
    modification of a synapse, any mechanism that is
    time dependent, highly local and strongly
    interactive in nature (as in the previous cases).

Hebb. Learn.
8
Hebbian Learning-5
  • To formulate the Hebbian rule mathematically we
    consider a weight wkj of a neuron k with pre-and
    post-synaptic signal denoted by xj and yk
    respectively. The adjustment to the weight wkj at
    time step n is given by
  • ?wkj (n)F(yk(n), xj(n))
  • where F(,) is a function of both signals. The
    above formula can take many specific forms.
    Typical examples are
  • Hebbs hypothesis In the simplest case we have
    just the product of the two signals (it is also
    called the activity product rule)

Hebb. Learn.
9
Hebbian Learning-6
  • ?wkj (n)?yk(n) xj(n)
  • where ? is a learning rate. This form emphasises
    the correlational nature of a Hebbian synapse.
    However this simple rule leads to an exponential
    growth of the weights (becomes unbounded). Thus
    we need to mechanism to stop the unbounded
    increase of the weights. One such is the
    following.
  • Covariance hypothesis In this case we replace
    the product of pre- and post-synaptic signals
    with the departure of of the same signals from
    their respective average values over a certain
    time interval. If x and y is their

Hebb. Learn.
10
Hebbian Learning-7
  • time-averaged value then the covariance form is
    defined by
  • ?wkj (n)?(yk(n)-y) (xj(n)-x)
  • The covariance hypothesis allows for the
    following
  • Convergence to a non-trivial state, which is
    reached when xj(n)x or yk(n)y
  • Prediction of both synaptic potentiation (i.e.
    increase in synaptic strength) and synaptic
    depression (i.e. decrease in synaptic strength).

Hebb. Learn.
11
Pattern Association
  • An associative memory is a brain-like distributed
    memory that learns associations. Association is a
    known and prominent feature of human memory.
  • Association takes two forms
  • Auto-association Here the task of a network is
    to store a set of patterns (vectors) by
    repeatedly presenting them to the network. The
    network subsequently is presented with a partial
    description or distorted (noisy) version of the
    original pattern stored in it, and the task is to
    retrieve (recall) that particular pattern.
  • Hetero-association In this task we want to pair
    an arbitrary set of input patterns to an
    arbitrary set of output patterns.

Patt. Assoc.
12
Pattern Association-1
  • Auto-association involves the use of unsupervised
    learning (Hebbian, Hopfield) while
    hetero-association involves the use of
    unsupervised (Hebbian) or supervised learning
    (e.g. MLP/BP) approaches.
  • Let xk denote a key pattern applied to an
    associative memory and yk denote a memorised
    pattern. The pattern association performed by the
    network is described by
  • xk ? yk , k1,2,,q
  • Where q is the number of patterns stored in the
    network. The key pattern xk acts as a stimulus
    that not only determines the storage location of
    memorised pattern yk but also holds the key for
    its retrieval.

Patt. Assoc.
13
Pattern Association-2
  • In an auto-associative memory, yk xk , so the
    input and output spaces have the same
    dimensionality. In a hetero-associative memory,
    yk ? xk , hence in this case the dimensionality
    of the output space may or may not equal the
    dimensionality of the input space.
  • There are two phases involved in the operation of
    the associative memory
  • Storage phase which refers to the training of
    the network in accordance with a suitable rule
  • Recall phase which involves the retrieval of a
    memorised pattern in response to the presentation
    of a noisy version of a key pattern to the
    network.

Patt. Assoc.
14
Pattern Association-3
  • Let the stimulus x (input) represent a noisy
    version of a key pattern xj. This stimulus
    produces a response y (output). For perfect
    recall, we should find that y yj where yj is the
    memorised pattern associated with the key pattern
    xj. When y ? yj for x xj , the associative
    memory is said to have made an error in recall.
  • The number q of patterns stored in an associative
    memory provides a direct measure of the storage
    capacity of the network. In designing an
    associative memory, we want to make the storage
    capacity q (expressed as a percentage of the
    total number N of neurons) as large as possible
    and yet insist that a large fraction of the
    patterns is recalled correctly.

Patt. Assoc.
15
Pattern Association Network
  • A pattern associator is a network which is able
    to learn hetero-associations of two patterns. A
    schematic representation is given below

Associator
16
Pattern Association Network-1
  • The net input that arrives to every unit is
    calculated as
  • Where i is an output neuron and j an index of an
    input neuron. The dimensionality of the input
    space is N and of the output space is M. wij is
    the weight from neuron j to neuron i. aj is the
    activation of a neuron j.
  • The activation of each neuron is produced by
    using a suitable threshold function and a
    threshold. For example we can assume that the
    activations are binary (i.e. either 0 or 1) and
    to achieve this we use the step

Associator
17
Pattern Association Network-2
  • function
  • The training of the network takes place by using
    for example the Hebbian form. Thus what we have
    is a matrix of weights, with all of them zero
    initially, assuming an input pattern of (101010)
    and an output pattern (1100)

Associator
18
Pattern Association Network-3
  • If we assume a learning rate ?1 and after a
    single learning step we get
  • To recall from the matrix we simply apply the
    input pattern and we perform matrix
    multiplication of the weight matrix with the
    input vector. We get in our example

Associator
19
Pattern Association Network-4
  • If we assume a threshold of 2 we can get the
    correct answer (1100) using a step function as
    activation function.
  • We can learn multiple associations using the same
    weight matrix. For example assume that a new
    input vector (110001) is given with corresponding
    output

Associator
20
Pattern Association Network-5
  • vector as (0101). In this case after a single
    presentation (with ?1) we will get an updated
    weight matrix
  • Again we can get the correct output vectors when
    we introduce the corresponding input vector

Associator
21
Pattern Association Network-6
  • Again by using the threshold of 2 and a step
    function we can get the correct answers of (1100)
    and (0101).
  • However, keep in mind that there is only a
    limited number of patterns which can be stored
    before perfect recall fails. Typical capacity of
    an associator network is 20 of the total number
    of neurons.

Associator
22
Pattern Association Network-7
  • Recall accuracy reflects the similarity of a key
    pattern with the stored patterns. The network can
    generalise in the sense when an input pattern is
    not exactly the same with any of the stored
    patterns, then it returns the (stored) patterns
    which more closely resembles the input.
  • Properties of pattern associators
  • Generalisation
  • Fault Tolerance
  • Distributed representations are necessary for
    generalisation and fault tolerance

Associator
23
Pattern Association Network-8
  • Prototype extraction and noise removal
  • Speed
  • Interference is not necessarily a bad thing (it
    is the basis of generalisation).

Associator
24
Correlations
  • We have stated that the simple Hebb form creates
    unbounded weights. One way to overcome this
    problem was the covariance rule. A second one is
    the Ojas rule. The latter rule has the benefit
    that is closely related to the principal
    components analysis method.
  • Let us restate the Hebb form for a single linear
    unit in the output layer and for an input vector
    with dimension larger than 1
  • ?wi ?V?i
  • Where V is the activation of the output unit, and
    ?i is the activation of input neuron i. ? is the
    learning rate.

Correlations
25
Correlations-1
  • This rule as it stands does not have any
    (non-trivial) stable fixed point. To see this,
    let us assume for the moment that it there are
    (hypothetically) some fixed points. (A fixed
    point is a pair of (V, ?) such that lt ?wgt0). In
    this case we will have
  • 0lt ?wigtltV?igtlt?jwj?j?igt?jCijwjCw
  • Where the angle brackets indicate an average over
    the input distribution P(?) and we have defined
    the correlation matrix C by
  • Cij?lt?i ?jgt
  • Or Cij?lt? ?Tgt

Correlations
26
Correlations-2
  • Several things should be noted for C
  • C is not the covariance matrix of the input,
    which would be defined in terms of the means ?ilt
    ?igt as lt(?i - ?i)(?j - ?j)gt
  • C is symmetric, i.e. Cij Cji which implies that
    the eigenvalues are real and the eigenvectors can
    be taken as orthogonal
  • Because of the outer product form, C is positive
    semi-definite, thus all its eigenvalues are
    positive or zero.
  • Now let us return to the equation
  • Cw 0

Correlations
27
Correlations-3
  • This equation says that w is an eigenvector of C
    with eigenvalue 0. But this will never be stable
    because C has some positive eigenvalues. Thus we
    conclude that there are only unstable fixed
    points for the plain Hebb learning procedure.
  • One can prevent the divergence of the Hebbian
    learning by constraining the growth of the weight
    vector w. There are several methods how this can
    be achieved
  • One way is to renormalise the new vector, wiawi
    , of all the weights after each update, choosing
    as such that w1

Correlations
28
Correlations-4
  • Another way is to clip the value of the weight at
    a lower and higher bound, in other words to
    constrain the value of the weight to higher or
    lower value when tries to cross over these
    values, i.e.
  • w- ? wi ? w
  • Another way is to use the Ojas rule. This will
    examine next.
  • Oja has modified the plain Hebb rule in such a
    way so as to make possible the weight vector to
    approach a constant length w1, without having
    to do any renormalisation by hand.
  • Moveover, w approaches an eigenvector of C with
    largest eigenvalue ?max. We call this maximal

Correlations
29
Correlations-5
  • eigenvector.
  • Ojas modification corresponds to adding a weight
    decay proportional to V2 to the plain Hebb rule
  • ?wi ?V(?i-Vwi)
  • Note that this form looks like a delta rule where
    the correction ?wi depends on the difference of
    the actual input and the backpropagated output.
  • We state some properties of Ojas rule without
    any proof
  • Unit length w1
  • Eigenvector direction w lies in the maximal

Correlations
30
Correlations-6
  • eigenvector direction of C
  • Variance maximisation w lies in a direction that
    maximises ltV2gt
  • Other rules exist in the literature about the
    modification of the plain Hebb rule. In most
    cases these are more complex forms.

Correlations
31
Examples
  • Ex1- Hippocampal Model There has been strong
    support up today to suggest that the brain area
    known as hipocampus uses a Hebb style learning
    for forming episodic memories.
  • A model which captures the interactions of the
    hippocampus (DG / CA3 /CA1) with the immediate
    surrounding regions (Entorhinal cortex,
    Subiculum) and the neocortex areas is given below

Examples
32
Examples-1
Examples
33
Examples-2
  • The module details are as follows
  • Entorhinal cortex 600 neurons, each with 200
    synapses and sparseness0.05
  • DG 1000 with 60 synapses each and
    sparseness0.05
  • CA3 1000 neurons each with
  • 200 recurrent synapses (from other CA3 neurons)
  • 120 synapses from Entorhinal cortex
  • 4 synapses from DG
  • With a sparseness0.05

Examples
34
Examples-3
  • CA1 1000 neurons 200 synapses each and
    sparseness0.01
  • Sparseness is the number of activated neurons
    when a new stimulus arrives. This is determined
    by true data from the rat hippocampal area.
  • Input is coming to Entorhinal cortex
  • The connections from Ent. Cortex ? DG are trained
    using Hebbian learning
  • DG is a competitive network
  • CA3 is an auto-association network
  • CA3 recurrent connections use Hebbian learning

Examples
35
Examples-4
  • The connections CA3 ? CA1 are trained with a
    Hebbian rule
  • CA1 is a competitive network
  • The connections from CA1 ? Ent. Coertx use
    Hebbian learning
  • Simulations of the model showed that one-shot
    learning is possible and it matched well a number
    of experimental data.

Examples
36
Examples-5
  • Ex2 VisNet This network is a model of
    biological vision and tries to solve the problem
    of position and view invariant representations
    built from multiple views of the same object,
    e.g. a human face.
  • It uses a hierarchical layered structure where
    the neurons of a top layer are connected to
    neurons of a previous layer by using receptive
    fields of appropriate size. The fields are
    progressively becoming wider as we move along the
    hierarchy.
  • In each layer we have an array of 32x32 cells,
    which use lateral inhibition in a competitive
    network arrangement.

Examples
37
Examples-6
  • Forward connections from one layer to another are
    trained by Hebbian-style learning.
  • Each cell receives 100 conenctions from the
    previous layer with 67 probability that a
    connections is coming from within 4 cells of the
    distribution centre.
  • The architecture is shown below

Examples
38
Examples-7
Examples
39
Examples-8
  • The input to the model is an image of a face
    which is then is convoluted with appropriate
    filters so as to recognise different orientations
    and edges in the input image. This corresponds
    roughly to V1 brain area.
  • The learning law that is used, is a Hebbian rule
    with a memory trace
  • ?wkj (n)?ak(n) mj(n)
  • mi(n)(1-?)ai(n) ?mi(n-1)
  • Where ? is a constant which determines the
    contribution of memory and of current activation.
    ai(n)is the activation of the neuron at time n
    and is calculated in the usual way.

Examples
40
Examples-9
  • The model successfully provides recognition of
    faces in different angles and positions in the
    input image. For more details one has to see the
    literature (Rolls Treves, 1998)

Examples
41
Conclusions
  • Hebbian learning is the oldest learning law
    discovered in neural networks
  • It is used mainly in order to build associators
    of patterns.
  • The original Hebb rule creates unbounded weights.
    For this reason there are other forms which try
    to correct this problem. There are also temporal
    forms of the Hebbian rule. A hybrid case is the
    memory case presented before in the VisNet case.
  • It has wide applications in pattern association
    problems and models of computational neuroscience
    cognitive science.

Conclusions
Write a Comment
User Comments (0)
About PowerShow.com