Artificial Intelligence Methods - PowerPoint PPT Presentation

About This Presentation
Title:

Artificial Intelligence Methods

Description:

Artificial Intelligence Methods Neural Networks Lecture 4 Rakesh K. Bissoondeeal – PowerPoint PPT presentation

Number of Views:238
Avg rating:3.0/5.0
Slides: 22
Provided by: Nottingha73
Category:

less

Transcript and Presenter's Notes

Title: Artificial Intelligence Methods


1
Artificial Intelligence Methods
  • Neural Networks
  • Lecture 4
  • Rakesh K. Bissoondeeal

2
Learning in Multilayer Networks
  • Backpropagation Learning
  • A Multi-Layer neural network trained using the
    Backpropagation learning algorithm is one of the
    most powerful forms of supervised neural network
    system.
  • The training of such a network involves three
    stages
  • 1) the feedforward of the input training
    pattern,
  • 2) the calculation and backpropagation of the
    associated error,
  • 3) and the adjustment of the weights.

3
Architecture of Network
  • In a typical Multilayer network, the input units
    (Xi) are fully connected to all hidden layer
    units (Yj) and the hidden layer units are fully
    connected to all output layer units (Zk).

4
Architecture of Network
  • Each of the connections between the input to
    hidden and hidden to output layer units has an
    associated weight attached to it (Wij or Vij).
  • The hidden and output layer units also receive
    signals from weighted connections (bias) from
    units whose values are always 1.

5
Architecture of Network
  • Activation Functions
  • The choice of activation function to use in a
    backpropagation network is limited to functions
    that are continuous, differentiable and
    monotonically non-decreasing.
  • Furthermore, for computational efficiency, it is
    desirable that its derivative is easy to compute.
    Usually the function is also expected to
    saturate, i.e. approach finite maximum and
    minimum values asymptotically.
  • One of the most typical activation functions used
    is the binary sigmoidal function
  • f(x) 1 .
    1 exp(-x)
  • where the derivative is given by f (x) f(x)1
    - f(x)

6
Backpropagation Learning Algorithm
  • During the feedforward phase, each of the input
    units (Xi) is set to its given input pattern
    value
  • Xi inputi
  • Each input unit is then multiplied by the weight
    of its connection. The weighted inputs are then
    fed into the hidden units (Y1 to Yj).
  • Each hidden unit then sums the incoming signals
    and applies an activation function to produce an
    output. Yj f( bj ?XiWij)

7
Backpropagation Learning Algorithm
  • Each of the outputs of the hidden units is then
    multiplied by the weight of its connection and
    the weighted signals are fed into the output
    units (Z1 - Zk).
  • Each output unit then sums the incoming signals
    from the hidden units and applies an activation
    function to form the response of the net for a
    given input pattern.
  • Zk f( bk ?YjVjk)

8
Backpropagation Learning Algorithm
  • Backpropagation of errors
  • During training, each output unit then compares
    its output (Zk) with the required target value
    (dk) to determine the associated error for that
    pattern. Based on this error, a factor ?k is
    computed that is used to distribute the error at
    Zk back to all units in the previous layer.
  • ?k f (Zk)(dk - Zk)
  • Each hidden unit then computes a similar factor
    ?j that is a weighted sum of all the
    backpropagated delta terms from units in the
    previous layer multiplied by the derivative of
    the activation function for that unit.
  • ?j f (Yj) ? ?k Vjk

9
Weight adjustment
  • After all the delta terms have been calculated,
    each hidden and output layer unit updates its
    connection weights and bias weights accordingly.
  • Output layer
  • bk(new) bk(old) ??k
  • Vjk(new) Vjk(old) ??k Yj
  • Hidden Layer
  • bj(new) bj(old) ??j
  • Wij(new) Wij(old) ??j Xi
  • Where ? is a learning rate coefficient that is
    given a value between 0 and 1 at the start of
    training.

10
Test stopping condition
  • After each epoch of training (one epoch one
    cycle through the entire training set) the
    performance of the network is measured by
    computing the average (Root Mean Square(RMS))
    error of the network for all of the patterns in
    the training set and for all of the patterns in a
    validation set. These two sets being disjoint.
  • Training is terminated when the RMS value for the
    training set is continuing to decrease but the
    RMS value for the validation set is starting to
    increase. This prevents the network from being
    OVERTRAINED (i.e. memorising the training set)
    and ensures that the ability of the network to
    GENERALISE (i.e. correctly classify non-trained
    patterns) will be at its maximum.

11
A simple example of overfitting (overtraining)
  • Which model is better?
  • The complicated model fits the data better.
  • But it is not economical
  • A model is convincing when it fits a lot of data
    surprisingly well.

12
Validation
E
Validation
Training
amount of training, parameter adjustment
Stop training here
13
Problems with basic Backpropagation
  • One of the problems with the basic
    backpropagation algorithm is that it is possible
    for the network to get stuck in a local minimum
    area on the error surface rather than in the
    desired global minimum.
  • The weight updating therefore ceases in a local
    minimum and the network becomes trapped because
    it cannot alter the weights to get out of the
    local minimum.

14
Local Minima
Local Minimum
Global Minimum
15
Backpropagation with Momentum
  • One solution to the problems with the basic
    backpropagation algorithm is to use a slightly
    modified weight updating procedure. In
    backpropagation with momentum, the weight change
    is in a direction that is a combination of the
    current error gradient and the previous error
    gradient.
  • The modified weight updating procedures are
  • Wij(t1) Wij(t) ??jXi ?Wij(t) - Wij(t
    - 1)
  • Vjk(t1) Vjk(t) ??kYj ?Vjk(t) - Vjk(t
    - 1)
  • where ? is a momentum term coefficient that is
    given a value between 0 and 1 at the start of
    training.
  • The use of the extra momentum term can help the
    network to climb out of local minima and can
    also help speed up the network training.

16
Momentum
  • Adds a percentage of the last movement to the
    current movement

17
Choice of Parameters
  • Initial weight set
  • Normally, the network weights are initialised to
    small random values before training is started.
    However, the choice of starting weight set can
    affect whether or not the network can find the
    global error minimum. This is due to the presence
    of local minima within the error surface. Some
    starting weight sets may therefore set the
    network off on a path that leads to a given local
    minimum whilst other starting weight sets avoid
    the local minimum.
  • It may therefore be necessary for several
    training runs to be performed using different
    random starting weight sets in order to determine
    whether or not the network has achieved the
    desired global minimum.

18
Choice of Parameters
  • Number of hidden neurons
  • Usually determined by experimentation
  • Too many network will memorise training set and
    will not generalise well
  • Too few risk that network may not be able to
    learn the pattern in the training set
  • Learning rate
  • Value between 0 and 1
  • Too low training will be very slow
  • Too high network may never reach a global
    minimum
  • It is often necessary to train the network with
    different learning rates to find the optimum
    value for the problem under investigation

19
Choice of Parameters
  • Training, validation and test sets
  • Training set - The choice of training set can
    also affect the ability of the network to reach
    the global minimum. The aim is to have a set of
    patterns that are representative of the whole
    population of patterns that the network is
    expected to encounter.
  • Example
  • Training set 75
  • Validation set -10
  • Test set 5

20
Pre-processing and Post-processing
  • Pre-process Train network
    Post-process
  • data
    data
  • Why pre-process?
  • Input variables sometime differ by several orders
    of magnitude and the sizes of the variables do
    not necessarily reflect their importance in
    finding the required output
  • Types of pre-processing
  • input normalisation normalised inputs will fall
    in the range -1,1
  • Normalise mean and standard deviation of training
    set so that input variables will have 0 mean and
    standard 1

21
Recommended Reading
  • Fundamentals of neural networks Architectures,
    Algorithms and Applications, L. Fausett, 1994.
  • Artificial Intelligence A Modern Approach, S.
    Russel and P. Norvig, 1995.
  • An Introduction to Neural Networks. 2nd Edition,
    Morton, IM.
Write a Comment
User Comments (0)
About PowerShow.com