Self Organization: Hebbian Learning - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Self Organization: Hebbian Learning

Description:

Deflation method: subtract the principal component from the input ... Sanger's rule is a modification of Oja's rule that implements the deflation method for PCA ... – PowerPoint PPT presentation

Number of Views:500
Avg rating:3.0/5.0
Slides: 25
Provided by: asimk
Category:

less

Transcript and Presenter's Notes

Title: Self Organization: Hebbian Learning


1
Self Organization Hebbian Learning
  • CS/CMPE 333 Neural Networks

2
Introduction
  • So far, we have studied neural networks that
    learn from their environment in a supervised
    manner
  • Neural networks can also learn in an unsupervised
    manner as well. This is also known as self
    organized learning
  • Self organized learning discovers significant
    features or patterns in the input data through
    general rules that operate locally
  • Self organizing networks typically consist of two
    layers with feedforward connections and elements
    to facilitate local learning

3
Self-Organization
  • Global order can arise from local interactions
    Turing (1952)
  • Input signal produces certain activity patterns
    in network lt-gt weights are modified (feedback
    loop)
  • Principles of self organization
  • Modification in weights tend to self-amplify
  • Limitation of resources leads to competition and
    selection of the most active synapse and
    disregard of less active synapse
  • Modifications in weights tends to cooperate

4
Hebbian Learning
  • A self-organizing principle was proposed by Hebb
    in 1949 in the context of biological neurons
  • Hebbs principle
  • When a neuron repeatedly excites another neuron,
    then the threshold of the latter neuron is
    decreased, or the synaptic weight between the
    neurons is increased, in effect increasing the
    likelihood of the second neuron to excite
  • Hebbian learning rule
  • ?wji ?yjxi
  • There is no desired or target signal required in
    the Hebbian rule, hence it is unsupervised
    learning
  • The update rule is local to the weight

5
Hebbian Update
  • Consider the update of a single weight w (x and y
    are the pre- and post-synaptic activities)
  • w(n 1) w(n) ?x(n)y(n)
  • For a linear activation function
  • w(n 1) w(n)1 ?x2(n)
  • Weights increase without bounds. If initial
    weight is negative, then it will increase in the
    negative. If it is positive, then it will
    increase in the positive range
  • Hebbian learning is intrinsically unstable,
    unlike error-correction learning with BP algorithm

6
Geometric Interpretation of Hebbian Learning
  • Consider a single linear neuron with p inputs
  • y wTx xTw
  • and
  • ?w ?x1y x2y xpyT
  • The dot product can be written as
  • y wx cos(a)
  • a angle between vectors x and w
  • If a is zero (x and w are close) y is large. If
    a is 90 (x and w are far) y is zero.

7
Similarity Measure
  • A network trained with Hebbian learning creates a
    similarity measure (the inner product) in its
    input space according to the information
    contained in the weights
  • The weights capture (memorizes) the information
    in the data during training
  • During operation, when the weights are fixed, a
    large output y signifies that the present input
    is "similar" to the inputs x that created the
    weights during training
  • Similarity measures
  • Hamming distance
  • Correlation

8
Hebbian Learning as Correlation Learning
  • Hebbian learning (pattern-by-pattern mode)
  • ?w(n) ?x(n)y(n) ?xT(n)x(n)w(n)
  • Using batch mode
  • ?w(n) ?Sn1 Nx(n)xT(n)w(0)
  • The term Sn1 Nx(n)xT(n) is sample approximation
    of the auto-correlation of the input data
  • Thus Hebbian learning can be thought of learning
    the auto-correlation of the input space
  • Correlation is a well-known operation in signal
    processing and statistics. In particular, it
    completely describes signals defined by Gaussian
    distributions
  • Applications in signal processing

9
Ojas Rule
  • The simple Hebbian rule causes the weights to
    increase (or decrease) without bounds
  • The weights need to be normalized to one as
  • wji(n 1) wji(n) ?xi(n)yj(n) /
  • vSiwji(n) ?xi(n)yj(n)2
  • This equation effectively imposes a constraint on
    the weights that the sum at a neuron be equal to
    1
  • Oja approximated the normalization (for small ?)
    as
  • wji(n 1) wjin) ?yj(n)xi(n) yj(n)wji(n)
  • This is Ojas rule, or the generalized Hebbian
    rule
  • It involves a forgetting term that prevents the
    weights from growing without bounds

10
Ojas Rule Geometric Interpretation
  • The simple Hebbian rule finds the weight vector
    with the largest variance with the input data.
  • However, the magnitude of the weight vector
    increases without bounds
  • Ojas rule has a similar interpretation
    normalization only changes the magnitude while
    the direction of the weight vector is same
  • Magnitude is equal to one
  • Ojas rule converges asymptotically, unlike
    Hebbian rule which is unstable

11
(No Transcript)
12
The Maximum Eigenfilter
  • A linear neuron trained with Ojas rule produces
    a weight vector that is the eigenvector of the
    input auto correlation matrix, and produces at
    its output the largest eigenvalue
  • A linear neuron trained with Ojas rule solves
    the following eigen problem
  • Re1 ?1e1
  • R auto-correlation matrix of input data
  • e1 largest eigenvector which corresponds to the
    weight vector w obtained by Ojas rule
  • ?1 largest eigenvalue, which corresponds to the
    networks output

13
Principal Component Analysis (1)
  • Ojas rule when applied to a single neuron
    creates a principal component in the input space
    in the form of the weight vector
  • How can we find other components in the input
    space with significant variance ?
  • In statistics, PCA is used to obtain the
    significant components of data in the form of
    orthogonal principal axes
  • PCA is also known as K-L filtering in signal
    processing
  • First proposed in 1901. Later developments
    occurred in the 1930s, 1940s and 1960s.
  • Hebbian network with Ojas rule can perform PCA

14
Principal Component Analysis (2)
  • PCA
  • Consider a set of vectors x with zero mean and
    unit variance. There exist an orthogonal
    transformation y QTx such that the covariance
    matrix of y is ? EyyT
  • ?ij ?i if i j and ?ij 0 otherwise
    (diagonal matrix)
  • ?1 gt ?2 gt gt ?p eigenvalues of covariance
    matrix of x (C ExxT
  • Columns of Q are the corresponding eigenvectors
  • Vector y is the principal component that has the
    maximum variance with all other components

15
PCA Example
16
Hebbian Network for PCA
17
Hebbian Network for PCA
  • Procedure
  • Use Ojas rule to find the principal component
  • Project the data orthogonal to the principal
    component
  • Use Ojas rule on the projected data to find the
    next major component
  • Repeat the above for m lt p (m desired
    components p input space dimensionality)
  • How to find the projection onto orthogonal
    direction?
  • Deflation method subtract the principal
    component from the input
  • Ojas rule can be modified to perform this
    operation Sangers rule

18
Sangers Rule
  • Sangers rule is a modification of Ojas rule
    that implements the deflation method for PCA
  • Classical PCA involves matrix operations
  • Sangers rule implements PCA in an iterative
    fashion for neural networks
  • Consider p inputs and m outputs, where m lt p
  • yj(n) Si1 p wji(n)xi(n) j 1, m
  • and, the update (Sangers rule)
  • ?wji(n) ?yj(n)xi(n) yj(n) Sk1 j
    wki(n)yk(n)

19
PCA for Feature Extraction
  • PCA is the optimal linear feature extractor. This
    means that there is no other linear system that
    is able to provide better features for
    reconstruction.
  • PCA may or may not be the best preprocessing for
    pattern classification or recognition.
    Classification requires good discrimination which
    PCA might not be able to provide.
  • Feature extraction transform p-dimensional input
    space to an m-dimensional space (m lt p), such
    that the m-dimensions capture the information
    with minimal loss
  • The error e in the reconstruction is given by
  • e2 SiM1 p ?i

20
PCA for Data Compression
  • PCA identifies an orthogonal coordinate system
    for the input data such that the variance of the
    projection on the principal axis is largest,
    followed by the next major axis, and so on
  • By discarding some of the minor components, PCA
    can be used for data compression, where a
    p-dimension (bit) input is encoded in a m lt p
    dimensional space
  • Weights are computed by Sangers rule on typical
    inputs
  • The de-compressor (receiver) must know the
    weights of the network to reconstruct the
    original signal
  • x WTy

21
PCA for Classification (1)
  • Can PCA enhance classification ?
  • In general, no. PCA is good for reconstruction
    and not feature discrimination or classification

22
PCA for Classification (2)
23
PCA Some Remarks
  • Practical uses of PCA
  • Data Compression
  • Cluster analysis
  • Feature extraction
  • Preprocessing for classification/recognition
    (e.g. preprocessing for MLP training)
  • Biological basis
  • It is unlikely that the processing performed by
    biological neurons in, say perception, involves
    PCA only. More complex feature extraction
    processes are involved.

24
Anti-Hebbian Learning
  • Modifying the Hebbian rule as
  • ?wji(n) - ?xi(n)yj(n)
  • The anti-Hebbian rule find the direction in space
    that has the minimum variance. In other words, it
    is the complement of the Hebbian rule
  • Anti-Hebbian does de-correlation. It
    de-correlates the output from the input
  • Hebbian rule is unstable, since it tries to
    maximize the variance. Anti-Hebbian rule, on the
    other hand, is stable and converges
Write a Comment
User Comments (0)
About PowerShow.com