Learning Rules - PowerPoint PPT Presentation

1 / 48
About This Presentation
Title:

Learning Rules

Description:

... here is a useful generic sigmoid activation function associated with a ... Important thing about the generic sigmoid function is that it is differentiable, ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 49
Provided by: cscL6
Category:

less

Transcript and Presenter's Notes

Title: Learning Rules


1
Topic 3.
  • Learning Rules
  • of the Artificial Neural Networks.

2
Multilayer Perceptron.
  • The first layer is the input layer,
  • and the last layer is the output layer.
  • All other layers with no direct connections from
    or to the outside are called hidden layers.

3
Multilayer Perceptron.
  • The input is processed and relayed from one layer
    to the next, until the final result has been
    computed.
  • This process represents the feedforward scheme.

4
Multilayer Perceptron.
  • structural credit assignment problem when an
    error is made at the output of a network, how is
    credit (or blame) to be assigned to neurons deep
    within the network?
  • One of the most popular techniques to train the
    hidden neurons is error backpropagation,
  • whereby the error of output units is propagated
    back to yield estimates of how much a given
    hidden unit contributed to the output error.

5
Multilayer Perceptron.
  • The error function of multilayer perceptron

The best performance of the network corresponds
to the minimum of the total squared error, and
during the network training, we adjust the
weights of connections in order to get to that
minimum.
6
Multilayer Perceptron.
  • Combination of the weights, including that of
    hidden neurons, which minimises the error
    function E is considered to be a solution of
    multiple layer perceptron learning problem .

7
Multilayer Perceptron.
  • The error function of multilayer perceptron
  • The backpropagation algorithm looks for the
    minimum of the multi-variable error function E in
    the space of weights of connections w using the
    method of gradient descent.

8
Multilayer Perceptron.
  • Following calculus, a local minimum of a function
    of two or more variables is defined by equality
    to zero of its gradient

where is partial derivative of
the error function E with respect to the weight
of connection between h-th
unit in the layer k and t-th unit in the
previous layer number k-1.
9
Multilayer Perceptron.
We would like to go in the direction opposite to
to most rapidly minimise E. Therefore,
during the iterative process of gradient descent
each weight of connection, including the hidden
ones, is updated
using the increment
here C represents the learning rate.
10
Multilayer Perceptron.
where
Since calculus-based methods of minimisation rest
on the taking of derivatives, their application
to network training requires the error function E
be a differentiable function
11
Multilayer Perceptron.
Since calculus-based methods of minimisation rest
on the taking of derivatives, their application
to network training requires the error function E
be a differentiable function, which requires the
network output Xjp to be differentiable, which
requires the activation functions f(S) to be
differentiable
where
12
Multilayer Perceptron.
Since calculus-based methods of minimisation rest
on the taking of derivatives, their application
to network training requires the error function
E be a differentiable function, which requires
the network output Xjp to be differentiable,
which requires the activation functions f(S) to
be differentiable
This provides a powerful motivation for using
continuous and differentiable activation
functions f(w,a).
where
13
Multilayer Perceptron.
Since calculus-based methods of minimisation rest
on the taking of derivatives, their application
to network training requires the activation
functions f(S) to be differentiable.
  • To make a multiple layer perceptron to be able
    to learn here is a useful generic sigmoid
    activation function associated with a hidden or
    output neuron

where
14
Multilayer Perceptron.
Since calculus-based methods of minimisation rest
on the taking of derivatives, their application
to network training requires the activation
functions f(S) to be differentiable.
  • To make a multiple layer perceptron to be able
    to learn here is a useful generic sigmoid
    activation function associated with a hidden or
    output neuron

Important thing about the generic sigmoid
function is that it is differentiable, with a
very simple and easy to compute derivative
where
15
Multilayer Perceptron.
Since calculus-based methods of minimisation rest
on the taking of derivatives, their application
to network training requires the activation
functions f(S) to be differentiable.
  • To make a multiple layer perceptron to be able
    to learn here is a useful generic sigmoid
    activation function associated with a hidden or
    output neuron

If all activation functions f(S) in the network
are differentiable then, according to the chain
rule of calculus, differentiating the error
function E with respect to the weight of
connection in consideration we can express the
corresponding partial derivative of the error
function
where
16
Multilayer Perceptron.
Then.
where
17
Multilayer Perceptron.
where
18
Multilayer Perceptron.
where
Thus, correction to the hidden weight of
connection between h-th unit in the k-th layer
and t-th unit in the previous (k-1)-th layer can
be found by
19
Multilayer Perceptron Learning rule!!!
where
  • The correction is defined by
  • the output layer errors ejp,
  • derivatives of activation functions of
    all
  • neurons in the upper layers with numbers
    p gt k,
  • derivative of activation function of
    the neuron h itself in the layer k,
  • activation function of connected
    neuron t in the previous layer (k-1).

20
Multilayer Perceptron Learning rule!!!
where
We can easily measure the output errors of the
network, and it is us to define all the
activation functions. If we also know the
derivatives of the activation functions, then we
can easily find all the corrections to weights of
connections of all neurons in the network,
including the hidden ones, during the second run
back through the network.
21
Multilayer Perceptron Training.
The training process of multilayer perceptron
consists of two phases. Initial values of the
weights of connections set up randomly. Then,
during the first, feedforward phase, starting
from the input layer and further layer-by-layer,
outputs of every unit in the network are computed
together with the corresponding derivatives.
Figure Directions of two basic signal flows in
multilayer perceptron forward propagation of
function signals and back-propagation of error
signals.
22
Multilayer Perceptron Training.
The training process of multilayer perceptron
consists of two phases. Initial values of the
weights of connections set up randomly. Then,
during the first, feedforward phase, starting
from the input layer and further layer-by-layer,
outputs of every unit in the network are computed
together with the corresponding derivatives. In
the second, feedback phase corrections to all
weights of connections of all units including the
hidden ones are computed using the outputs and
derivatives computed during the feedforward phase.
Figure Directions of two basic signal flows in
multilayer perceptron forward propagation of
function signals and back-propagation of error
signals.
23
Multilayer Perceptron Training.
To understand the second, error back-propagation
phase of computing corrections to the weights,
let us follow an example of a small three-layer
perceptron.
24
Multilayer Perceptron Training.
To understand the second, error back-propagation
phase of computing corrections to the weights,
let us follow an example of a small three-layer
perceptron.
Suppose that we have found all outputs and
corresponding derivatives of activation functions
of all computing units including the hidden ones
in the network.
25
Multilayer Perceptron Training.
We shall mark values of
the layer in consideration,
values of the layer previous to the one in
consideration,
26
Multilayer Perceptron Training.
Weight of connection between unit number 1
(first lower index) in the output layer (layer
number 2 shown as the upper index) and unit
number 0 (second lower index) in the previous
layer (number 12-1) after presentation of a
training pattern would have a correction
27
Multilayer Perceptron Training.
Analogously, corrections to all six weights of
connections between the output layer and the
hidden layer are obtained as
28
Multilayer Perceptron Training.Corrections to
hidden units connections.
We shall mark values of
the layer in consideration,
values of the layer previous to the one in
consideration, values of
the layers above the one in consideration,
29
Multilayer Perceptron Training.Corrections to
hidden units connections.
Weight of connection between unit number 1
(first lower index) in the hidden layer (layer
number 1 shown in the upper index) and unit
number 0 in the previous input layer (second
lower index) would have a correction
30
Multilayer Perceptron Training.Corrections to
hidden units connections.
Analogously, for all six weights of connections
between the hidden layer and the input layer
31
Multilayer Perceptron Training.Corrections to
hidden units connections.
  • In this way going backwards through the network,
    one obtain the corrections to all weights ,

32
Multilayer Perceptron Training.Corrections to
hidden units connections.
  • In this way going backwards through the network,
    one obtain the corrections to all weights ,
  • then update the weights.

33
Multilayer Perceptron Training.Corrections to
hidden units connections.
  • In this way going backwards through the network,
    one obtain the corrections to all weights ,
  • then update the weights.
  • After that, with the new weights go forward to
    get new outputs

34
Multilayer Perceptron Training.Corrections to
hidden units connections.
  • In this way going backwards through the network,
    one obtain the corrections to all weights ,
  • then update the weights.
  • After that, with the new weights go forward to
    get new outputs
  • Find new error, go backwards and so on

35
Multilayer Perceptron Training.
  • In this way going backwards through the network,
    one obtain the corrections to all weights , then
    update the weights.
  • After that, with the new weights go forward to
    get new outputs
  • Find new error, go backwards and so on
  • Hopefully, sooner or later the iterative
    procedure will come to output with the minimum
    error, i.e. the absolute minimum of the error
    function E.

36
Multilayer Perceptron Training.
  • In this way going backwards through the network,
    one obtain the corrections to all weights , then
    update the weights.
  • After that, with the new weights go forward to
    get new outputs
  • Find new error, go backwards and so on
  • Hopefully, sooner or later the iterative
    procedure will come to output with the minimum
    error, i.e. the absolute minimum of the error
    function E.
  • Unfortunately, as a function of many variables,
    the error function might have more than one
    minimum, and one may get not to the absolute
    minimum but to a relative one.

37
Multilayer Perceptron Training.
  • Unfortunately, as a function of many variables,
    the
  • error function might have more than one minimum,
    and one may get not to the absolute minimum but
    to a relative one.
  • If it happens, the error function stops to
    decrease regardless of number of iteration.
  • Some measures must be taken to get out of the
    function relative minimum, for example, adding
    small random values, i.e. noise, to one or more
    of the weights.
  • Then the iterative procedure starts from that new
    point to get to the absolute minimum eventually.

38
Multilayer Perceptron Training.
  • Finally, after successful training, perceptron
    is able to produce the desired responses to all
    input patterns of the training set.

39
Multilayer Perceptron Training.
  • Finally, after successful training, perceptron
    is able to produce the desired responses to all
    input patterns of the training set.
  • Then all the network weights of connections are
    fixed,

40
Multilayer Perceptron Training.
  • Finally, after successful training, perceptron
    is able to produce the desired responses to all
    input patterns of the training set.
  • Then all the network weights of connections are
    fixed,
  • and the network is presented with inputs it must
    recognise, i.e. not the training set inputs.

41
Multilayer Perceptron Training.
  • Finally, after successful training, perceptron
    is able to produce the desired responses to all
    input patterns of the training set.
  • Then all the network weights of connections are
    fixed,
  • and the network is presented with inputs it must
    recognise, i.e. not the training set inputs.
  • If an input in consideration produces an output
    similar to one of the training set, such input is
    said to belong to the same type or cluster of
    inputs as the corresponding one of the training
    set.

42
Multilayer Perceptron Training.
  • Then all the network weights of connections are
    fixed,
  • and the network is presented with inputs it must
    recognise, i.e. not the training set inputs.
  • If an input in consideration produces an output
    similar to one of the training set, such input is
    said to belong to the same type or cluster of
    inputs as the corresponding one of the training
    set.
  • If the network produces an output not similar to
    any of the training set, then such an input is
    said not been recognised.

43
Multilayer Perceptron Training. Conclusion.
  • In 1969 Minsky and Papert not just found the
    solution to the XOR problem in a form of
    multilayer perceptron, they also gave a very
    thorough mathematical analysis of the time it
    takes to train such networks.
  • Minsky and Papert emphasized that training times
    increase very rapidly for certain problems as the
    number of input lines and weights of connections
    increases.

44
Multilayer Perceptron Training. Conclusion.
  • Minsky and Papert emphasized that training times
    increase very rapidly for certain problems as the
    number of input lines and weights of connections
    increases.
  • The difficulties were seized upon by opponents
    of the subject. In particular, this was true of
    those working in the field of artificial
    intelligence (AI), who at that time did not want
    to concern themselves with the underlying
    wetware of the brain, but only with the
    functional aspects regarded by them solely as
    logical processing.
  • Due to the limitations of funding, competition
    between AI and neural network communities could
    have only one victor.

45
Multilayer Perceptron Training. Conclusion.
  • Due to the limitations of funding, competition
    between AI and neural network communities could
    have only one victor.
  • Neural networks then went into a relative
    quietude for more then fifteen years, with only a
    few devotees still working on it.

46
Multilayer Perceptron Training. Conclusion.
  • Due to the limitations of funding, competition
    between AI and neural network communities could
    have only one victor.
  • Neural networks then went into a relative
    quietude for more then fifteen years, with only a
    few devotees still working on it.
  • Then new vigour came from various sources. One
    was from the increasing power of computers,
    allowing simulations of otherwise intractable
    problems.

47
Multilayer Perceptron Training. Conclusion.
  • New vigour came from various sources. One was
    from the increasing power of computers, allowing
    simulations of otherwise intractable problems.
  • Finally, established by the mid 80s the
    backpropagation algorithm solved the difficulty
    of training hidden neurons.

48
Multilayer Perceptron Training. Conclusion.
  • New vigour came from various sources. One was
    from the increasing power of computers, allowing
    simulations of otherwise intractable problems.
  • Finally, established by the mid 80s the
    backpropagation algorithm solved the difficulty
    of training hidden neurons.
  • Nowadays, Perceptron is an effective tool for
    recognising protein and amino-acid sequences and
    processing other complex biological data.
Write a Comment
User Comments (0)
About PowerShow.com