Title: Neural Networks I
1Neural Networks I
- Course V
- Alexandra Cristea Huub ten Eikelder
- Summary course IV
- Delta Rule
- Linear Neurons
- Next
- Error Backpropagation
- Practical Aspects of Error BP
3Summary of course IV
- Delta Rule
- incremental version
- batch version
- Linear Neurons
4Minumum of a function gradient
5Minumum of a function gradient
6Gradient examples
7The gradient in one dimension
Let y(x)mgh, gravitational potential energy (x
horizontal hill direction hh(x))
Then gradient is
slope of the hill
horizontal component of net force ? downhill
8The gradient in two dimensions
Let x1East, x2North (hh(x1,x2))
Gradient points uphill, in direction of steepest
- This is why the error function is compared
usually with the gravitational potential energy
10- What does this mean for an Error Function?
11Delta rule WidrowHoff,1960
?E/dwgt0 E?
12Downhill force - E
- yk(time) f(S wk,j(time) xj) gt
- ? w - ? E (gradient of energy)
- ?wk,j a ekfxj agt0(xinput)
- ( a learning rate)
- wk,j(time1) wk,j(time) ?wk,j
13Meaningerror correction ?forecast
- yk(time1)S wk,j(time)xj a
(dk-yk(time)) Sxj - (f linear)
- dk-yk(time1) (dk-yk(time))
fct(xj) - E is approaching an input xj
14Intro BP
- Disadvantages of discrete MLP lack of simple
learning algorithm - Continuous MLP several
- Most of them variants on a basic learning
algorithm error back propagation
- Most famous learning algorithm
- Uses a rule similar to WidrowHoff
- (slightly more complicated)
- We know how to compute the weights with the
gradient descent rule - Gradient descent is based on the error
computation - We know how to compute the error in the output
Hidden layer Error?
W weight
Weight serves as amplifier!
Value (v1,v2) Internal activation
19Inverse Synapse
W weight
Weight serves as amplifier!
Value(v1,v2) Error
20Inverse Synapse
W weight
Weight serves as amplifier!
Value(v1,v2) Error
O2, I2
Hidden layer Error?
22Backpropagation to hidden layer
O2, I2
23Update rule for 2 weight types
- I2 (hidden layer), O1 (system output)
- I1 (system input), O2 (hidden layer)
- ?w a(ti-yi) f(Si)f(Si)
adi f(Si) (simplification f1 for repeater,
e.g.) - Si ?jwj,i(t)hj
- ?w a(?idiwj,i)f(Sj)f(Sj)
a ddjf(Sj) - Sj ?kwk,j(t)xk
24(more) Formal Derivation of BP
25Notations BP derivation
26BP premises
- Suppose a finite training set
X(x(q),t(q))x(q)?Rn1, t(q)?(0,1) for
q1,...,P - x(q) inputs t(q) required outputs.
- Consider (x,t)?X actual output yr
- Then we can define the squared error
- E(q) ½t-yr2
- I-O relation for layer s isYsF(Wsys-1)
- For output layer Yr YrF(Wr F(Wr-1 F(W1x)))
- gradient descent for a weight w will be
27Weight computation in BP
E(q) ½t-yr2
28Cases for A (output variation)
- w is in output layer r-1 wwrij
- w is before layer r-1 wwsij (in layer sltr)
29Cases for A(cont)
YrF(Wr F(Wr-1 F(W1x)))
- w is in output layer r-1 wwrij
- w is before layer r-1 wwsij(in layer sltr)
30Arbitrary output variation?
31- Up to now, we computed
- Aoutput variation w. rsp. to weight change
- What does this mean for
- ?w weight change?
32Cases of weight backpropagation in BP (1)
- Case 1. w is between layer r-1 and layer r
33Cases of weight backpropagation in BP (2)
- Case 2. w is before layer r-1 (layer s lt r)
34Actual vectors matrixes we used in BP
derivation (1)
Activation function
f standard activation function or other
continuous function
35Actual vectors matrixes we used in BP derivation
Why f continuous?
Derivative used in formulas
Output used in formulas
36Elements BP
37Backpropagated error
- We have defined error in output layer
- Which is backpropagated as
- And we defined the weight increase
38Backpropagation algorithm
- FOR s 1 TO r DO Ws initial matrix(often
random) - REPEAT
- select a pair (x,t) in X y0x
- forward phase compute the actual output ys
of the network with input x - FOR s 1 TO r DO ys F(Ws ys-1) END
- yr is the output vector of the network
- backpropagation phase propagate the errors
back through the network - and adapt the weights of all layers
- dr Fr (t - yr)
- FOR s r TO 2 DO ds-1 Fs-1' WsT ds
- Ws Ws ?
ds ys-1T END - W1 W1 ? d1 y0T
- UNTIL stop criterion
39Summarizing BP
- to train NN, we adjust weights to reduce error
between desired actual output. - NN should compute error derivative of the weights
- how the error changes as each weight is
increased or decreased slightly. - BP is the most widely used method for determining
error derivative
40Summarizing BP explanation (1)
- BP easiest to understand if NN units are linear.
- BP computes ? error derivative by computing error
rate change / activation change. - output units diff. between actual desired
output. - hidden unit just before output layer multiply
weights between hidden output units add the
products. - for other layers move from layer to layer
opposite to way activities propagate through NN. - This is what gives back propagation its name.
42Summarizing BP explanation (1)
- for non-linear units, BP includes an extra step
- Before back-propagating, error rate change /
activation change must be converted into rate at
which error changes as total input received by a
unit is changed.
44Algorithms and their relations
Discrete neuron
Perceptron Learning
Gradient Descent
Continuous neuron
Continuous neurons
Delta Rule
dw ?(t-y)fxi ?(t-y)y(1-y)xi
dr Fr (t-yr) ds-1 Fs-1WsTds Ws ? ds ys-1T
45Course to be found at
- http//wwwis.win.tue.nl/alex/
- Neural Networks (2L490 )