Title: Neural Networks I
1Neural Networks I
- Course V
- Alexandra Cristea Huub ten Eikelder
2Contents
- Summary course IV
- Delta Rule
- Linear Neurons
- Next
- Error Backpropagation
- Practical Aspects of Error BP
3Summary of course IV
- Delta Rule
- incremental version
- batch version
- Linear Neurons
4Minumum of a function gradient
5Minumum of a function gradient
y
x
6Gradient examples
7The gradient in one dimension
Let y(x)mgh, gravitational potential energy (x
horizontal hill direction hh(x))
Then gradient is
slope of the hill
horizontal component of net force ? downhill
force
8The gradient in two dimensions
Let x1East, x2North (hh(x1,x2))
x2
Gradient points uphill, in direction of steepest
ascent
9Memotechnics
- This is why the error function is compared
usually with the gravitational potential energy
10- What does this mean for an Error Function?
11Delta rule WidrowHoff,1960
?E/dwgt0 E?
12Downhill force - E
?
- yk(time) f(S wk,j(time) xj) gt
- ? w - ? E (gradient of energy)
- ?wk,j a ekfxj agt0(xinput)
- ( a learning rate)
- wk,j(time1) wk,j(time) ?wk,j
13Meaningerror correction ?forecast
- yk(time1)S wk,j(time)xj a
(dk-yk(time)) Sxj - (f linear)
- dk-yk(time1) (dk-yk(time))
fct(xj) - E is approaching an input xj
14Intro BP
- Disadvantages of discrete MLP lack of simple
learning algorithm - Continuous MLP several
- Most of them variants on a basic learning
algorithm error back propagation
15Backpropagation
- Most famous learning algorithm
- Uses a rule similar to WidrowHoff
- (slightly more complicated)
16Facts
- We know how to compute the weights with the
gradient descent rule - Gradient descent is based on the error
computation - We know how to compute the error in the output
layer
17BKPError
y1,t1
Hidden layer Error?
18Synapse
W weight
neuron1
neuron2
Weight serves as amplifier!
Value (v1,v2) Internal activation
19Inverse Synapse
W weight
neuron1
neuron2
Weight serves as amplifier!
Value(v1,v2) Error
20Inverse Synapse
W weight
neuron1
neuron2
Weight serves as amplifier!
Value(v1,v2) Error
21BKPError
O1
y1,t1
I1
O2
O2, I2
Hidden layer Error?
22Backpropagation to hidden layer
O2, I2
23Update rule for 2 weight types
- I2 (hidden layer), O1 (system output)
- I1 (system input), O2 (hidden layer)
- ?w a(ti-yi) f(Si)f(Si)
adi f(Si) (simplification f1 for repeater,
e.g.) - Si ?jwj,i(t)hj
- ?w a(?idiwj,i)f(Sj)f(Sj)
a ddjf(Sj) - Sj ?kwk,j(t)xk
24(more) Formal Derivation of BP
25Notations BP derivation
26BP premises
- Suppose a finite training set
X(x(q),t(q))x(q)?Rn1, t(q)?(0,1) for
q1,...,P - x(q) inputs t(q) required outputs.
- Consider (x,t)?X actual output yr
- Then we can define the squared error
- E(q) ½t-yr2
- I-O relation for layer s isYsF(Wsys-1)
- For output layer Yr YrF(Wr F(Wr-1 F(W1x)))
- gradient descent for a weight w will be
27Weight computation in BP
E(q) ½t-yr2
28Cases for A (output variation)
YsF(Wsys-1)
YrF(Wryr-1)
- w is in output layer r-1 wwrij
- w is before layer r-1 wwsij (in layer sltr)
29Cases for A(cont)
YrF(Wr F(Wr-1 F(W1x)))
- w is in output layer r-1 wwrij
- w is before layer r-1 wwsij(in layer sltr)
30Arbitrary output variation?
YsF(Wsys-1)
31- Up to now, we computed
- Aoutput variation w. rsp. to weight change
- What does this mean for
- ?w weight change?
32Cases of weight backpropagation in BP (1)
- Case 1. w is between layer r-1 and layer r
33Cases of weight backpropagation in BP (2)
- Case 2. w is before layer r-1 (layer s lt r)
34Actual vectors matrixes we used in BP
derivation (1)
Weights
Activation function
f standard activation function or other
continuous function
35Actual vectors matrixes we used in BP derivation
Why f continuous?
Derivative used in formulas
Output used in formulas
36Elements BP
37Backpropagated error
- We have defined error in output layer
- Which is backpropagated as
- And we defined the weight increase
38Backpropagation algorithm
- FOR s 1 TO r DO Ws initial matrix(often
random) - REPEAT
- select a pair (x,t) in X y0x
- forward phase compute the actual output ys
of the network with input x - FOR s 1 TO r DO ys F(Ws ys-1) END
- yr is the output vector of the network
- backpropagation phase propagate the errors
back through the network - and adapt the weights of all layers
- dr Fr (t - yr)
- FOR s r TO 2 DO ds-1 Fs-1' WsT ds
- Ws Ws ?
ds ys-1T END - W1 W1 ? d1 y0T
- UNTIL stop criterion
39Summarizing BP
- to train NN, we adjust weights to reduce error
between desired actual output. - NN should compute error derivative of the weights
- how the error changes as each weight is
increased or decreased slightly. - BP is the most widely used method for determining
error derivative
40Summarizing BP explanation (1)
- BP easiest to understand if NN units are linear.
- BP computes ? error derivative by computing error
rate change / activation change. - output units diff. between actual desired
output. - hidden unit just before output layer multiply
weights between hidden output units add the
products. - for other layers move from layer to layer
opposite to way activities propagate through NN. - This is what gives back propagation its name.
41BKPError
y1,t1
W
42Summarizing BP explanation (1)
- for non-linear units, BP includes an extra step
- Before back-propagating, error rate change /
activation change must be converted into rate at
which error changes as total input received by a
unit is changed.
43BKPError
y1,t1
W
44Algorithms and their relations
dw?(t-y)xi
Discrete neuron
Perceptron Learning
Gradient Descent
Continuous neuron
Continuous neurons
BP
Delta Rule
dw ?(t-y)fxi ?(t-y)y(1-y)xi
dr Fr (t-yr) ds-1 Fs-1WsTds Ws ? ds ys-1T
45Course to be found at
- http//wwwis.win.tue.nl/alex/
- Neural Networks (2L490 )