CS515 Neural Networks - PowerPoint PPT Presentation

1 / 74
About This Presentation
Title:

CS515 Neural Networks

Description:

... salivate at the sound of a bell, by ringing the bell whenever food was presented. When the bell is repeatedly paired with the food, the dog is conditioned to ... – PowerPoint PPT presentation

Number of Views:47
Avg rating:3.0/5.0
Slides: 75
Provided by: berr51
Category:
Tags: cs515 | networks | neural

less

Transcript and Presenter's Notes

Title: CS515 Neural Networks


1
  • Back-Propagation

2
Objectives
  • A generalization of the LMS algorithm, called
    backpropagation, can be used to train multilayer
    networks.
  • Backpropagation is an approximate steepest
    descent algorithm, in which the performance index
    is mean square error.
  • In order to calculate the derivatives, we need to
    use the chain rule of calculus.

3
Motivation
  • The perceptron learning and the LMS algorithm
    were designed to train single-layer
    perceptron-like networks.
  • They are only able to solve linearly separable
    classification problems.
  • Parallel Distributed Processing
  • The multilayer perceptron, trained by the
    backpropagation algorithm, is currently the most
    widely used neural network.

4
Three-Layer Network
Number of neurons in each layer
5
Pattern Classification XOR gate
  • The limitations of the single-layer perceptron
    (Minsky Papert, 1969)

6
Two-Layer XOR Network
  • Two-layer, 1-2-1 network

AND
Individual Decisions
7
Solved Problem P11.1
  • Design a multilayer network to distinguish these
    categories.

Class I
Class II
There is no hyperplane that can separate these
two categories.
8
Solution of Problem P11.1
OR
AND
9
Function Approximation
  • Two-layer, 1-2-1 network

10
Function Approximation
  • The centers of the steps occur where the net
    input to a neuron in the first layer is zero.
  • The steepness of each step can be adjusted by
    changing the network weights.

11
Effect of Parameter Changes
12
Effect of Parameter Changes
13
Effect of Parameter Changes
14
Effect of Parameter Changes
15
Function Approximation
  • Two-layer networks, with sigmoid transfer
    functions in the hidden layer and linear transfer
    functions in the output layer, can approximate
    virtually any function of interest to any degree
    accuracy, provided sufficiently many hidden units
    are available.

16
Backpropagation Algorithm
  • For multilayer networks the outputs of one layer
    becomes the input to the following layer.

17
Performance Index
  • Training Set
  • Mean Square Error
  • Vector Case
  • Approximate Mean Square Error
  • Approximate Steepest Descent Algorithm

18
Chain Rule
  • If f(n) en and n 2w, so that f(n(w)) e2w.
  • Approximate mean square error

19
Sensitivity Gradient
  • The net input to the ith neurons of layer m
  • The sensitivity of to changes in the ith
    element of the net input at layer m
  • Gradient

20
Steepest Descent Algorithm
  • The steepest descent algorithm for the
    approximate mean square error
  • Matrix form

21
BP the Sensitivity
  • Backpropagation a recurrence relationship in
    which the sensitivity at layer m is computed from
    the sensitivity at layer m1.
  • Jacobian matrix

22
Matrix Repression
  • The i,j element of Jacobian matrix

23
Recurrence Relation
  • The recurrence relation for the sensitivity
  • The sensitivities are propagated backward through
    the network from the last layer to the first
    layer.

24
Backpropagation Algorithm
  • At the final layer

25
Summary
  • The first step is to propagate the input forward
    through the network
  • The second step is to propagate the sensitivities
    backward through the network
  • Output layer
  • Hidden layer
  • The final step is to update the weights and
    biases

26
BP Neural Network
27
Ex Function Approximation
t
?
p
e

1-2-1 Network
28
Network Architecture
p
a
1-2-1 Network
29
Initial Values
Initial Network Response
30
Forward Propagation
Initial input
Output of the 1st layer
Output of the 2nd layer
error
31
Transfer Func. Derivatives
32
Backpropagation
  • The second layer sensitivity
  • The first layer sensitivity

33
Weight Update
  • Learning rate

34
Choice of Network Structure
  • Multilayer networks can be used to approximate
    almost any function, if we have enough neurons in
    the hidden layers.
  • We cannot say, in general, how many layers or how
    many neurons are necessary for adequate
    performance.

35
Illustrated Example 1
1-3-1 Network
36
Illustrated Example 2
1-2-1
1-3-1
1-5-1
1-4-1
37
Convergence
Convergence to Global Min.
Convergence to Local Min.
The numbers to each curve indicate the sequence
of iterations.
38
Generalization
  • In most cases the multilayer network is trained
    with a finite number of examples of proper
    network behavior
  • This training set is normally representative of a
    much larger class of possible input/output pairs.
  • Can the network successfully generalize what it
    has learned to the total population?

39
Generalization Example
1-9-1
1-2-1
Generalize well
Not generalize well
For a network to be able to generalize, it should
have fewer parameters than there are data points
in the training set.
40
Objectives
  • The neural networks, trained in a supervised
    manner, require a target signal to define correct
    network behavior.
  • The unsupervised learning rules give networks the
    ability to learn associations between patterns
    that occur together frequently.
  • Associative learning allows networks to perform
    useful tasks such as pattern recognition (instar)
    and recall (outstar).

41
What is an Association?
  • An association is any link between a systems
    input and output such that when a pattern A is
    presented to the system it will respond with
    pattern B.
  • When two patterns are link by an association, the
    input pattern is referred to as the stimulus and
    the output pattern is to referred to as the
    response.

42
Classic Experiment
  • Ivan Pavlov
  • He trained a dog to salivate at the sound of a
    bell, by ringing the bell whenever food was
    presented. When the bell is repeatedly paired
    with the food, the dog is conditioned to salivate
    at the sound of the bell, even when no food is
    present.
  • B. F. Skinner
  • He trained a rat to press a bar in order to
    obtain a food pellet.

43
Associative Learning
  • Anderson and Kohonen independently developed the
    linear associator in the late 1960s and early
    1970s.
  • Grossberg introduced nonlinear continuous-time
    associative networks during the same time period.

44
Simple Associative Network
  • Single-Input Hard Limit Associator
  • Restrict the value of p to be either 0 or 1,
    indicating whether a stimulus is absent or
    present.
  • The output a indicates the presence or absence of
    the networks response.

45
Two Types of Inputs
  • Unconditioned Stimulus
  • Analogous to the food presented to the dog in
    Pavlovs experiment.
  • Conditioned Stimulus
  • Analogous to the bell in Pavlovs experiment.
  • The dog salivates only when food is presented.
    This is an innate that does not have to be
    learned.

46
Banana Associator
  • An unconditioned stimulus (banana shape) and a
    conditioned stimulus (banana smell)
  • The network is to associate the shape of a
    banana, but not the smell.

47
Associative Learning
  • Both animals and humans tend to associate things
    occur simultaneously.
  • If a banana smell stimulus occurs simultaneously
    with a banana concept response (activated by some
    other stimulus such as the sight of a banana
    shape), the network should strengthen the
    connection between them so that later it can
    activate its banana concept in response to the
    banana smell alone.

48
Unsupervised Hebb Rule
  • Increasing the weighting wij between a neurons
    input pj and output ai in proportion to their
    product
  • Hebb rule uses only signals available within the
    layer containing the weighting being updated. ?
    Local learning rule
  • Vector form
  • Learning is performed in response to the training
    sequence

49
Ex Banana Associator
  • Initial weights
  • Training sequence
  • Learning rule

Sight
Banana ?
Smell
50
Ex Banana Associator
  • First iteration (sight fails)

  • (no
    response)
  • Second iteration (sight works)



  • (banana)

51
Ex Banana Associator
  • Third iteration (sight fails)



  • (banana)
  • From now on, the network is capable of responding
    to bananas that are detected either sight or
    smell. Even if both detection systems suffer
    intermittent faults, the network will be correct
    most of the time.

52
Problems of Hebb Rule
  • Weights will become arbitrarily large
  • Synapses cannot grow without bound.
  • There is no mechanism for weights to decrease
  • If the inputs or outputs of a Hebb network
    experience ant noise, every weight will grow
    (however slowly) until the network responds to
    any stimulus.

53
Hebb Rule with Decay
  • ? , the decay rate, is a positive constant less
    than one.
  • This keeps the weight matrix from growing without
    bound, which can be found by setting both ai and
    pj to 1, i.e.,The maximum weight value
    is determined by the decay rate ?.

54
Ex Banana Associator
  • First iteration (sight fails) no response
  • Second iteration (sight works) banana
  • Third iteration (sight fails) banana

55
Ex Banana Associator
Hebb Rule
Hebb with Decay
56
Prob. of Hebb Rule with Decay
  • Associations will decay away if stimuli are not
    occasionally presented.
  • If ai 0, thenIf ? 0.1, this reducesto
  • The weight decays by10 at each iterationfor
    which ai 0(no stimulus)

57
Instar (Recognition Network)
  • A neuron that has a vector input and a scalar
    output is referred to as an instar.
  • This neuron is capable of pattern recognition.
  • Instar is similar to perceptron, ADALINE and
    linear associator.

58
Instar Operation
  • Input-output expression
  • The instar is active when
    or
  • where ? is the angle between two vectors.
  • If , the inner product
    is maximized when the angle ? is 0.
  • Assume that all input vectors have the same
    length (norm).

59
Vector Recognition
  • If , then
    the instar will be only active when ? 0.
  • If , then
    the instarwill be active for a range of angles.
  • The larger the value of b, the more patterns
    there will be that can activate the instar, thus
    making it the less discriminatory.

60
Instar Rule
  • Hebb rule
  • Hebb rule with decay
  • Instar rule a decay term, the forgetting
    problem, is add that is proportion to
  • If

61
Graphical Representation
  • For the case where the instar is active(
    ),
  • For the case where the instaris inactive (
    ),

62
Ex Orange Recognizer
  • The elements of p will be contained to ?1 values.

63
Initialization Training
  • Initial weights
  • The instar rule (?1)
  • Training sequence
  • First iteration

64
Second Training Iteration
  • Second iteration
  • The network can now recognition the orange by its
    measurements.

65
Third Training Iteration
  • Third iteration

Orange will now be detected if either set of
sensors works.
66
Kohonen Rule
  • Kohonen rule
  • Learning occurs when the neurons index i is a
    member of the set X(q).
  • The Kohonen rule can be made equivalent to the
    instar rule by defining X(q) as the set of all i
    such that
  • The Kohonen rule allows the weights of a neuron
    to learn an input vector and is therefore
    suitable for recognition applications.

67
Ourstar (Recall Network)
  • The outstar network has a scalar input and a
    vector output.
  • It can perform pattern recall by associating a
    stimulus with a vector response.

68
Outstar Operation
  • Input-output expression
  • If we would like the outstar network to associate
    a stimulus (an input of 1) with a particular
    output vector a, set W a.
  • If p 1, a satlins(Wp) satlins(ap) a
    Hence, the pattern is correctly recalled.
  • The column of a weight matrix represents the
    pattern to be recalled.

69
Outstar Rule
  • In instar rule, the weight decay term of Hebb
    rule is proportional to the output of network,
    ai.
  • In outstar rule, the weight decay term of Hebb
    rule is proportional to the input of network, pj.
  • If ? ?,
  • Learning occurs whenever pj is nonzero (instead
    of ai). When learning occurs, column wj moves
    toward the output vector. (complimentary to
    instar rule)

70
Ex Pineapple Recaller
  • Any set of p0 (with ?1 values) will be copied to
    a.

71
Initialization
  • The outstar rule (?1)
  • Training sequence
  • Pineapple measurements

72
First Training Iteration
  • First iteration

73
Second Training Iteration
  • Second iteration
  • The network forms an association between the
    sight and the measurements.

74
Third Training Iteration
  • Third iteration
  • Even if the measurement system fail, the network
    is now able to recall the measurements of the
    pineapple when it sees it.
Write a Comment
User Comments (0)
About PowerShow.com