Learning Rules - PowerPoint PPT Presentation

1 / 48

About This Presentation

Title:

Learning Rules

Description:

... here is a useful generic sigmoid activation function associated with a ... Important thing about the generic sigmoid function is that it is differentiable, ... – PowerPoint PPT presentation

Number of Views:46

Avg rating:3.0/5.0

Slides: 49

Provided by: cscL6

Category:

more less

Transcript and Presenter's Notes

Title: Learning Rules

1
Topic 3.

Learning Rules
of the Artificial Neural Networks.

2
Multilayer Perceptron.

The first layer is the input layer,
and the last layer is the output layer.
All other layers with no direct connections from
or to the outside are called hidden layers.

3
Multilayer Perceptron.

The input is processed and relayed from one layer
to the next, until the final result has been
computed.
This process represents the feedforward scheme.

4
Multilayer Perceptron.

structural credit assignment problem when an
error is made at the output of a network, how is
credit (or blame) to be assigned to neurons deep
within the network?

One of the most popular techniques to train the
hidden neurons is error backpropagation,
whereby the error of output units is propagated
back to yield estimates of how much a given
hidden unit contributed to the output error.

5
Multilayer Perceptron.

The error function of multilayer perceptron

The best performance of the network corresponds
to the minimum of the total squared error, and
during the network training, we adjust the
weights of connections in order to get to that
minimum.
6
Multilayer Perceptron.

Combination of the weights, including that of
hidden neurons, which minimises the error
function E is considered to be a solution of
multiple layer perceptron learning problem .

7
Multilayer Perceptron.

The error function of multilayer perceptron

The backpropagation algorithm looks for the
minimum of the multi-variable error function E in
the space of weights of connections w using the
method of gradient descent.

8
Multilayer Perceptron.

Following calculus, a local minimum of a function
of two or more variables is defined by equality
to zero of its gradient

where is partial derivative of
the error function E with respect to the weight
of connection between h-th
unit in the layer k and t-th unit in the
previous layer number k-1.
9
Multilayer Perceptron.
We would like to go in the direction opposite to
to most rapidly minimise E. Therefore,
during the iterative process of gradient descent
each weight of connection, including the hidden
ones, is updated
using the increment
here C represents the learning rate.
10
Multilayer Perceptron.
where
Since calculus-based methods of minimisation rest
on the taking of derivatives, their application
to network training requires the error function E
be a differentiable function
11
Multilayer Perceptron.
Since calculus-based methods of minimisation rest
on the taking of derivatives, their application
to network training requires the error function E
be a differentiable function, which requires the
network output Xjp to be differentiable, which
requires the activation functions f(S) to be
differentiable
where
12
Multilayer Perceptron.
Since calculus-based methods of minimisation rest
on the taking of derivatives, their application
to network training requires the error function
E be a differentiable function, which requires
the network output Xjp to be differentiable,
which requires the activation functions f(S) to
be differentiable
This provides a powerful motivation for using
continuous and differentiable activation
functions f(w,a).
where
13
Multilayer Perceptron.
Since calculus-based methods of minimisation rest
on the taking of derivatives, their application
to network training requires the activation
functions f(S) to be differentiable.

To make a multiple layer perceptron to be able
to learn here is a useful generic sigmoid
activation function associated with a hidden or
output neuron

where
14
Multilayer Perceptron.
Since calculus-based methods of minimisation rest
on the taking of derivatives, their application
to network training requires the activation
functions f(S) to be differentiable.

To make a multiple layer perceptron to be able
to learn here is a useful generic sigmoid
activation function associated with a hidden or
output neuron

Important thing about the generic sigmoid
function is that it is differentiable, with a
very simple and easy to compute derivative
where
15
Multilayer Perceptron.
Since calculus-based methods of minimisation rest
on the taking of derivatives, their application
to network training requires the activation
functions f(S) to be differentiable.

To make a multiple layer perceptron to be able
to learn here is a useful generic sigmoid
activation function associated with a hidden or
output neuron

If all activation functions f(S) in the network
are differentiable then, according to the chain
rule of calculus, differentiating the error
function E with respect to the weight of
connection in consideration we can express the
corresponding partial derivative of the error
function
where
16
Multilayer Perceptron.
Then.
where
17
Multilayer Perceptron.
where
18
Multilayer Perceptron.
where
Thus, correction to the hidden weight of
connection between h-th unit in the k-th layer
and t-th unit in the previous (k-1)-th layer can
be found by
19
Multilayer Perceptron Learning rule!!!
where

The correction is defined by
the output layer errors ejp,
derivatives of activation functions of
all
neurons in the upper layers with numbers
p gt k,
derivative of activation function of
the neuron h itself in the layer k,
activation function of connected
neuron t in the previous layer (k-1).

20
Multilayer Perceptron Learning rule!!!
where
We can easily measure the output errors of the
network, and it is us to define all the
activation functions. If we also know the
derivatives of the activation functions, then we
can easily find all the corrections to weights of
connections of all neurons in the network,
including the hidden ones, during the second run
back through the network.
21
Multilayer Perceptron Training.
The training process of multilayer perceptron
consists of two phases. Initial values of the
weights of connections set up randomly. Then,
during the first, feedforward phase, starting
from the input layer and further layer-by-layer,
outputs of every unit in the network are computed
together with the corresponding derivatives.
Figure Directions of two basic signal flows in
multilayer perceptron forward propagation of
function signals and back-propagation of error
signals.
22
Multilayer Perceptron Training.
The training process of multilayer perceptron
consists of two phases. Initial values of the
weights of connections set up randomly. Then,
during the first, feedforward phase, starting
from the input layer and further layer-by-layer,
outputs of every unit in the network are computed
together with the corresponding derivatives. In
the second, feedback phase corrections to all
weights of connections of all units including the
hidden ones are computed using the outputs and
derivatives computed during the feedforward phase.
Figure Directions of two basic signal flows in
multilayer perceptron forward propagation of
function signals and back-propagation of error
signals.
23
Multilayer Perceptron Training.
To understand the second, error back-propagation
phase of computing corrections to the weights,
let us follow an example of a small three-layer
perceptron.
24
Multilayer Perceptron Training.
To understand the second, error back-propagation
phase of computing corrections to the weights,
let us follow an example of a small three-layer
perceptron.
Suppose that we have found all outputs and
corresponding derivatives of activation functions
of all computing units including the hidden ones
in the network.
25
Multilayer Perceptron Training.
We shall mark values of
the layer in consideration,
values of the layer previous to the one in
consideration,
26
Multilayer Perceptron Training.
Weight of connection between unit number 1
(first lower index) in the output layer (layer
number 2 shown as the upper index) and unit
number 0 (second lower index) in the previous
layer (number 12-1) after presentation of a
training pattern would have a correction
27
Multilayer Perceptron Training.
Analogously, corrections to all six weights of
connections between the output layer and the
hidden layer are obtained as
28
Multilayer Perceptron Training.Corrections to
hidden units connections.
We shall mark values of
the layer in consideration,
values of the layer previous to the one in
consideration, values of
the layers above the one in consideration,
29
Multilayer Perceptron Training.Corrections to
hidden units connections.
Weight of connection between unit number 1
(first lower index) in the hidden layer (layer
number 1 shown in the upper index) and unit
number 0 in the previous input layer (second
lower index) would have a correction
30
Multilayer Perceptron Training.Corrections to
hidden units connections.
Analogously, for all six weights of connections
between the hidden layer and the input layer
31
Multilayer Perceptron Training.Corrections to
hidden units connections.

In this way going backwards through the network,
one obtain the corrections to all weights ,

32
Multilayer Perceptron Training.Corrections to
hidden units connections.

In this way going backwards through the network,
one obtain the corrections to all weights ,
then update the weights.

33
Multilayer Perceptron Training.Corrections to
hidden units connections.

In this way going backwards through the network,
one obtain the corrections to all weights ,
then update the weights.
After that, with the new weights go forward to
get new outputs

34
Multilayer Perceptron Training.Corrections to
hidden units connections.

In this way going backwards through the network,
one obtain the corrections to all weights ,
then update the weights.
After that, with the new weights go forward to
get new outputs
Find new error, go backwards and so on

35
Multilayer Perceptron Training.

In this way going backwards through the network,
one obtain the corrections to all weights , then
update the weights.
After that, with the new weights go forward to
get new outputs
Find new error, go backwards and so on

Hopefully, sooner or later the iterative
procedure will come to output with the minimum
error, i.e. the absolute minimum of the error
function E.

36
Multilayer Perceptron Training.

In this way going backwards through the network,
one obtain the corrections to all weights , then
update the weights.
After that, with the new weights go forward to
get new outputs
Find new error, go backwards and so on

Hopefully, sooner or later the iterative
procedure will come to output with the minimum
error, i.e. the absolute minimum of the error
function E.
Unfortunately, as a function of many variables,
the error function might have more than one
minimum, and one may get not to the absolute
minimum but to a relative one.

37
Multilayer Perceptron Training.

Unfortunately, as a function of many variables,
the
error function might have more than one minimum,
and one may get not to the absolute minimum but
to a relative one.

If it happens, the error function stops to
decrease regardless of number of iteration.
Some measures must be taken to get out of the
function relative minimum, for example, adding
small random values, i.e. noise, to one or more
of the weights.
Then the iterative procedure starts from that new
point to get to the absolute minimum eventually.

38
Multilayer Perceptron Training.

Finally, after successful training, perceptron
is able to produce the desired responses to all
input patterns of the training set.

39
Multilayer Perceptron Training.

Finally, after successful training, perceptron
is able to produce the desired responses to all
input patterns of the training set.
Then all the network weights of connections are
fixed,

40
Multilayer Perceptron Training.

Finally, after successful training, perceptron
is able to produce the desired responses to all
input patterns of the training set.
Then all the network weights of connections are
fixed,
and the network is presented with inputs it must
recognise, i.e. not the training set inputs.

41
Multilayer Perceptron Training.

Finally, after successful training, perceptron
is able to produce the desired responses to all
input patterns of the training set.
Then all the network weights of connections are
fixed,
and the network is presented with inputs it must
recognise, i.e. not the training set inputs.
If an input in consideration produces an output
similar to one of the training set, such input is
said to belong to the same type or cluster of
inputs as the corresponding one of the training
set.

42
Multilayer Perceptron Training.

Then all the network weights of connections are
fixed,
and the network is presented with inputs it must
recognise, i.e. not the training set inputs.
If an input in consideration produces an output
similar to one of the training set, such input is
said to belong to the same type or cluster of
inputs as the corresponding one of the training
set.
If the network produces an output not similar to
any of the training set, then such an input is
said not been recognised.

43
Multilayer Perceptron Training. Conclusion.

In 1969 Minsky and Papert not just found the
solution to the XOR problem in a form of
multilayer perceptron, they also gave a very
thorough mathematical analysis of the time it
takes to train such networks.
Minsky and Papert emphasized that training times
increase very rapidly for certain problems as the
number of input lines and weights of connections
increases.

44
Multilayer Perceptron Training. Conclusion.

Minsky and Papert emphasized that training times
increase very rapidly for certain problems as the
number of input lines and weights of connections
increases.
The difficulties were seized upon by opponents
of the subject. In particular, this was true of
those working in the field of artificial
intelligence (AI), who at that time did not want
to concern themselves with the underlying
wetware of the brain, but only with the
functional aspects regarded by them solely as
logical processing.
Due to the limitations of funding, competition
between AI and neural network communities could
have only one victor.

45
Multilayer Perceptron Training. Conclusion.

Due to the limitations of funding, competition
between AI and neural network communities could
have only one victor.
Neural networks then went into a relative
quietude for more then fifteen years, with only a
few devotees still working on it.

46
Multilayer Perceptron Training. Conclusion.

Due to the limitations of funding, competition
between AI and neural network communities could
have only one victor.
Neural networks then went into a relative
quietude for more then fifteen years, with only a
few devotees still working on it.
Then new vigour came from various sources. One
was from the increasing power of computers,
allowing simulations of otherwise intractable
problems.

47
Multilayer Perceptron Training. Conclusion.

New vigour came from various sources. One was
from the increasing power of computers, allowing
simulations of otherwise intractable problems.
Finally, established by the mid 80s the
backpropagation algorithm solved the difficulty
of training hidden neurons.

48
Multilayer Perceptron Training. Conclusion.

New vigour came from various sources. One was
from the increasing power of computers, allowing
simulations of otherwise intractable problems.
Finally, established by the mid 80s the
backpropagation algorithm solved the difficulty
of training hidden neurons.
Nowadays, Perceptron is an effective tool for
recognising protein and amino-acid sequences and
processing other complex biological data.

Write a Comment

User Comments (0)