Title: Multilayer Perceptron
1Multilayer Perceptron
- One and More Layers Neural Network
2The association problem
- ? - input to the network with length NI, i.e.,
?k k 1,2,,NI - O - output with length No, i.e., Oi
i1,2,,No - ? - desired output , i.e., ?i i1,2,,No
- w - weights in the network, i.e., wik weight
between ?k and Oi - T threshold value for output unit be activated
- g function to convert input to output values
between 0 and 1. Special case threshold
function, g(x)?(x)1 or 0 if x gt 0 or not.
Given an input pattern ? we would like the output
O to be the desired one ? . Indeed we would like
it to be true for a set of p input patterns
and desired output patterns ,ยต1, , p. The
inputs and outputs may be continuous or boolean.
3The geometric view of the weights
- For the boolean case, we want
,
the
boundary between positive and negative threshold
is defined by which gives a plane
(hyperplane) perpendicular to .
- The solution is to find the hyperplane
that separates all the inputs according to
the desired classification - For example the boolean function AND
Hyperplane (line)
4Learning Steepest descent on weights
- The optimal set of weights minimize the following
cost - Steepest descent method will find a local minima
via -
or -
- where the update can be done each
- pattern at a time, h is the learning
- rate, , and
5Analysis of Learning Weights
- The steepest descent rule
- produces changes on the weight vector
only in the direction of each pattern vector
. Thus, components of the vector perpendicular
to the input patterns are left unchanged. If
is perpendicular to all input patterns, than the
change in weight - will not affect
the solution. - For
, which is largest
when is small. Since
, the largest changes occur for units in
doubt(close to the threshold value.)
1
0
6Limitations of the Perceptron
- Many problems, as simple as the XOR problem, can
not be solved by the perceptron (no hyperplane
can separate the input)
7Multilayer Neural Network
- - input of layer L to layer L1 ?
- - weights connecting layer L to layer L1.
- threshold values for units at layer L
- Thus, the output of a two layer network is
written as - The cost optimization on all weights is given
by
8Properties and How it Works
- With one input layer, one output layer, and one
or more hidden layers, and enough units for each
layer, any classification problem can be solved - Example The XOR problem
0 1
Layer L2
- Later we address the generalization problem (for
new examples)
9Learning Steepest descent on weights
10Learning Threshold Values