Artificial Intelligence Methods

About This Presentation

Title:

Artificial Intelligence Methods

Description:

Artificial Intelligence Methods Neural Networks Lecture 4 Rakesh K. Bissoondeeal – PowerPoint PPT presentation

Number of Views:238

Avg rating:3.0/5.0

Slides: 22

Provided by: Nottingha73

Category:

more less

Transcript and Presenter's Notes

Title: Artificial Intelligence Methods

1
Artificial Intelligence Methods

Neural Networks
Lecture 4
Rakesh K. Bissoondeeal

2
Learning in Multilayer Networks

Backpropagation Learning
A Multi-Layer neural network trained using the
Backpropagation learning algorithm is one of the
most powerful forms of supervised neural network
system.
The training of such a network involves three
stages
1) the feedforward of the input training
pattern,
2) the calculation and backpropagation of the
associated error,
3) and the adjustment of the weights.

3
Architecture of Network

In a typical Multilayer network, the input units
(Xi) are fully connected to all hidden layer
units (Yj) and the hidden layer units are fully
connected to all output layer units (Zk).

4
Architecture of Network

Each of the connections between the input to
hidden and hidden to output layer units has an
associated weight attached to it (Wij or Vij).
The hidden and output layer units also receive
signals from weighted connections (bias) from
units whose values are always 1.

5
Architecture of Network

Activation Functions
The choice of activation function to use in a
backpropagation network is limited to functions
that are continuous, differentiable and
monotonically non-decreasing.
Furthermore, for computational efficiency, it is
desirable that its derivative is easy to compute.
Usually the function is also expected to
saturate, i.e. approach finite maximum and
minimum values asymptotically.
One of the most typical activation functions used
is the binary sigmoidal function
f(x) 1 .
1 exp(-x)
where the derivative is given by f (x) f(x)1
- f(x)

6
Backpropagation Learning Algorithm

During the feedforward phase, each of the input
units (Xi) is set to its given input pattern
value
Xi inputi
Each input unit is then multiplied by the weight
of its connection. The weighted inputs are then
fed into the hidden units (Y1 to Yj).
Each hidden unit then sums the incoming signals
and applies an activation function to produce an
output. Yj f( bj ?XiWij)

7
Backpropagation Learning Algorithm

Each of the outputs of the hidden units is then
multiplied by the weight of its connection and
the weighted signals are fed into the output
units (Z1 - Zk).
Each output unit then sums the incoming signals
from the hidden units and applies an activation
function to form the response of the net for a
given input pattern.
Zk f( bk ?YjVjk)

8
Backpropagation Learning Algorithm

Backpropagation of errors
During training, each output unit then compares
its output (Zk) with the required target value
(dk) to determine the associated error for that
pattern. Based on this error, a factor ?k is
computed that is used to distribute the error at
Zk back to all units in the previous layer.
?k f (Zk)(dk - Zk)
Each hidden unit then computes a similar factor
?j that is a weighted sum of all the
backpropagated delta terms from units in the
previous layer multiplied by the derivative of
the activation function for that unit.
?j f (Yj) ? ?k Vjk

9
Weight adjustment

After all the delta terms have been calculated,
each hidden and output layer unit updates its
connection weights and bias weights accordingly.
Output layer
bk(new) bk(old) ??k
Vjk(new) Vjk(old) ??k Yj
Hidden Layer
bj(new) bj(old) ??j
Wij(new) Wij(old) ??j Xi
Where ? is a learning rate coefficient that is
given a value between 0 and 1 at the start of
training.

10
Test stopping condition

After each epoch of training (one epoch one
cycle through the entire training set) the
performance of the network is measured by
computing the average (Root Mean Square(RMS))
error of the network for all of the patterns in
the training set and for all of the patterns in a
validation set. These two sets being disjoint.
Training is terminated when the RMS value for the
training set is continuing to decrease but the
RMS value for the validation set is starting to
increase. This prevents the network from being
OVERTRAINED (i.e. memorising the training set)
and ensures that the ability of the network to
GENERALISE (i.e. correctly classify non-trained
patterns) will be at its maximum.

11
A simple example of overfitting (overtraining)

Which model is better?
The complicated model fits the data better.
But it is not economical
A model is convincing when it fits a lot of data
surprisingly well.

12
Validation
E
Validation
Training
amount of training, parameter adjustment
Stop training here
13
Problems with basic Backpropagation

One of the problems with the basic
backpropagation algorithm is that it is possible
for the network to get stuck in a local minimum
area on the error surface rather than in the
desired global minimum.
The weight updating therefore ceases in a local
minimum and the network becomes trapped because
it cannot alter the weights to get out of the
local minimum.

14
Local Minima
Local Minimum
Global Minimum
15
Backpropagation with Momentum

One solution to the problems with the basic
backpropagation algorithm is to use a slightly
modified weight updating procedure. In
backpropagation with momentum, the weight change
is in a direction that is a combination of the
current error gradient and the previous error
gradient.
The modified weight updating procedures are
Wij(t1) Wij(t) ??jXi ?Wij(t) - Wij(t
- 1)
Vjk(t1) Vjk(t) ??kYj ?Vjk(t) - Vjk(t
- 1)
where ? is a momentum term coefficient that is
given a value between 0 and 1 at the start of
training.
The use of the extra momentum term can help the
network to climb out of local minima and can
also help speed up the network training.

16
Momentum

Adds a percentage of the last movement to the
current movement

17
Choice of Parameters

Initial weight set
Normally, the network weights are initialised to
small random values before training is started.
However, the choice of starting weight set can
affect whether or not the network can find the
global error minimum. This is due to the presence
of local minima within the error surface. Some
starting weight sets may therefore set the
network off on a path that leads to a given local
minimum whilst other starting weight sets avoid
the local minimum.
It may therefore be necessary for several
training runs to be performed using different
random starting weight sets in order to determine
whether or not the network has achieved the
desired global minimum.

18
Choice of Parameters

Number of hidden neurons
Usually determined by experimentation
Too many network will memorise training set and
will not generalise well
Too few risk that network may not be able to
learn the pattern in the training set
Learning rate
Value between 0 and 1
Too low training will be very slow
Too high network may never reach a global
minimum
It is often necessary to train the network with
different learning rates to find the optimum
value for the problem under investigation

19
Choice of Parameters

Training, validation and test sets
Training set - The choice of training set can
also affect the ability of the network to reach
the global minimum. The aim is to have a set of
patterns that are representative of the whole
population of patterns that the network is
expected to encounter.
Example
Training set 75
Validation set -10
Test set 5

20
Pre-processing and Post-processing

Pre-process Train network
Post-process
data
data
Why pre-process?
Input variables sometime differ by several orders
of magnitude and the sizes of the variables do
not necessarily reflect their importance in
finding the required output
Types of pre-processing
input normalisation normalised inputs will fall
in the range -1,1
Normalise mean and standard deviation of training
set so that input variables will have 0 mean and
standard 1

21
Recommended Reading

Fundamentals of neural networks Architectures,
Algorithms and Applications, L. Fausett, 1994.
Artificial Intelligence A Modern Approach, S.
Russel and P. Norvig, 1995.
An Introduction to Neural Networks. 2nd Edition,
Morton, IM.

Write a Comment

User Comments (0)

About PowerShow.com

Artificial Intelligence Methods - PowerPoint PPT Presentation

Artificial Intelligence Methods

Artificial Intelligence Methods Neural Networks Lecture 4 Rakesh K. Bissoondeeal – PowerPoint PPT presentation