Title: 2L490 CNperceptrons 1
1Disadvantages of Discrete Neurons
- Only boolean valued functions can be computed
- A simple learning algorithm for multi-layer
discrete-neuron perceptrons is lacking - The computational capabilities of single-layer
discrete-neuron perceptrons is limited - These disadvantages disappear when we
- consider multi-layer continuous-neuron
- perceptrons
2Preliminaries
- A continuous-neuron perceptron with n input and m
outputs computes - a function Rn ! 0,1m ,when the sigmoid
activation function is used - a function Rn ! Rm ,when a linear activation
function is used - The learning rules for continuous-neuron
perceptrons are based on optimization techniques
for error-functions. This requires a continuous
and differentiable error function.
3Sigmoid transfer function
4Computational Capabilities
- Let g0,1n!R be a continuous function and let
. Then there exists a two layer
perceptron with - First layer build from neurons with threshold and
standard sigmoid activation function - Second layer build from one neuron without
threshold and linear activation function - such that the function G computed by this network
satis- - fies
5Single-layer networks
- Compute function from Rn to 0, 1m
- Sufficient to consider a single neuron
- Compute a function f(w0 ?1 j n wjxj )
- Assume x0 1 then compute
- a function f(?0 j n wjxj )
-
6Error function
7Gradient Descent
8Update of Weight i by Training Pair q
9Delta Rule Learning (incremental version,
arbitrary transfer function)
10Delta Rule Learning (incremental version,
sigmoid transfer function)
11Delta Rule Learning (incremental version, linear
transfer function)
12Stopcriteria
- The mean square error becomes small enough
- The mean square error does not decrease any-
more, i.e. the gradient has become very small or
even changes sign - The maximum number of iterations has been exceeded
13Remarks
- Delta rule learning is also called L(east) M(ean)
S(quare) learning or Widrow Hoff learning - Note that the incremental version of the delta
rule is strictly not a gradient descent
algorithm, because in each step a different error
function E(q) is used - Convergence of the incremental version can only
be guaranteed if the learning parameter a goes to
0 during learning
14Perceptron Learning Rule (batch version,
arbitrary transfer function)
15Perceptron Learning Delta Rule (batch version,
sigmoidal transfer function)
16Perceptron Learning Rule (batch version, linear
transfer function)
17Convergence of the batch version
For small enough learning parameter the batch
version of the delta rule always converges. The
resulting weights, however, may correspond to a
local minimum of the error function, instead of
the global minimum
18Linear Neurons and Least Squares
19Linear Neurons and Least Squares
20C is non-singular
21Linear Least Squares Convergence
22(No Transcript)
23Linear Least Squares Convergence
24 Find the line
25 Solution