Neural Networks I - PowerPoint PPT Presentation

1 / 45

About This Presentation

Title:

Neural Networks I

Description:

We know how to compute the weights with the gradient descent rule ... to train NN, we adjust weights to reduce error between desired & actual output. ... – PowerPoint PPT presentation

Number of Views:19

Avg rating:3.0/5.0

Slides: 46

Provided by: wwwisW

Category:

more less

Transcript and Presenter's Notes

Title: Neural Networks I

1
Neural Networks I

Course V
Alexandra Cristea Huub ten Eikelder

2
Contents

Summary course IV
Delta Rule
Linear Neurons
Next
Error Backpropagation
Practical Aspects of Error BP

3
Summary of course IV

Delta Rule
incremental version
batch version
Linear Neurons

4
Minumum of a function gradient
5
Minumum of a function gradient
y
x
6
Gradient examples
7
The gradient in one dimension
Let y(x)mgh, gravitational potential energy (x
horizontal hill direction hh(x))
Then gradient is
slope of the hill
horizontal component of net force ? downhill
force
8
The gradient in two dimensions
Let x1East, x2North (hh(x1,x2))
x2
Gradient points uphill, in direction of steepest
ascent
9
Memotechnics

This is why the error function is compared
usually with the gravitational potential energy

What does this mean for an Error Function?

11
Delta rule WidrowHoff,1960
?E/dwgt0 E?
12
Downhill force - E
?

yk(time) f(S wk,j(time) xj) gt
? w - ? E (gradient of energy)
?wk,j a ekfxj agt0(xinput)
( a learning rate)
wk,j(time1) wk,j(time) ?wk,j

13
Meaningerror correction ?forecast

yk(time1)S wk,j(time)xj a
(dk-yk(time)) Sxj
(f linear)
dk-yk(time1) (dk-yk(time))
fct(xj)
E is approaching an input xj

14
Intro BP

Disadvantages of discrete MLP lack of simple
learning algorithm
Continuous MLP several
Most of them variants on a basic learning
algorithm error back propagation

15
Backpropagation

Most famous learning algorithm
Uses a rule similar to WidrowHoff
(slightly more complicated)

16
Facts

We know how to compute the weights with the
gradient descent rule
Gradient descent is based on the error
computation
We know how to compute the error in the output
layer

17
BKPError
y1,t1
Hidden layer Error?
18
Synapse
W weight
neuron1
neuron2
Weight serves as amplifier!
Value (v1,v2) Internal activation
19
Inverse Synapse
W weight
neuron1
neuron2
Weight serves as amplifier!
Value(v1,v2) Error
20
Inverse Synapse
W weight
neuron1
neuron2
Weight serves as amplifier!
Value(v1,v2) Error
21
BKPError
O1
y1,t1
I1
O2
O2, I2
Hidden layer Error?
22
Backpropagation to hidden layer
O2, I2
23
Update rule for 2 weight types

I2 (hidden layer), O1 (system output)
I1 (system input), O2 (hidden layer)

?w a(ti-yi) f(Si)f(Si)
adi f(Si) (simplification f1 for repeater,
e.g.)
Si ?jwj,i(t)hj

?w a(?idiwj,i)f(Sj)f(Sj)
a ddjf(Sj)
Sj ?kwk,j(t)xk

24
(more) Formal Derivation of BP
25
Notations BP derivation
26
BP premises

Suppose a finite training set
X(x(q),t(q))x(q)?Rn1, t(q)?(0,1) for
q1,...,P
x(q) inputs t(q) required outputs.
Consider (x,t)?X actual output yr
Then we can define the squared error
E(q) ½t-yr2
I-O relation for layer s isYsF(Wsys-1)
For output layer Yr YrF(Wr F(Wr-1 F(W1x)))
gradient descent for a weight w will be

27
Weight computation in BP
E(q) ½t-yr2
28
Cases for A (output variation)
YsF(Wsys-1)
YrF(Wryr-1)

w is in output layer r-1 wwrij
w is before layer r-1 wwsij (in layer sltr)

29
Cases for A(cont)
YrF(Wr F(Wr-1 F(W1x)))

w is in output layer r-1 wwrij
w is before layer r-1 wwsij(in layer sltr)

30
Arbitrary output variation?
YsF(Wsys-1)
31

Up to now, we computed
Aoutput variation w. rsp. to weight change
What does this mean for
?w weight change?

32
Cases of weight backpropagation in BP (1)

Case 1. w is between layer r-1 and layer r

33
Cases of weight backpropagation in BP (2)

Case 2. w is before layer r-1 (layer s lt r)

34
Actual vectors matrixes we used in BP
derivation (1)
Weights
Activation function
f standard activation function or other
continuous function
35
Actual vectors matrixes we used in BP derivation
Why f continuous?
Derivative used in formulas
Output used in formulas
36
Elements BP
37
Backpropagated error

We have defined error in output layer

Which is backpropagated as

And we defined the weight increase

38
Backpropagation algorithm

FOR s 1 TO r DO Ws initial matrix(often
random)
REPEAT
select a pair (x,t) in X y0x
forward phase compute the actual output ys
of the network with input x
FOR s 1 TO r DO ys F(Ws ys-1) END
yr is the output vector of the network
backpropagation phase propagate the errors
back through the network
and adapt the weights of all layers
dr Fr (t - yr)
FOR s r TO 2 DO ds-1 Fs-1' WsT ds
Ws Ws ?
ds ys-1T END
W1 W1 ? d1 y0T
UNTIL stop criterion

39
Summarizing BP

to train NN, we adjust weights to reduce error
between desired actual output.
NN should compute error derivative of the weights
how the error changes as each weight is
increased or decreased slightly.
BP is the most widely used method for determining
error derivative

40
Summarizing BP explanation (1)

BP easiest to understand if NN units are linear.
BP computes ? error derivative by computing error
rate change / activation change.
output units diff. between actual desired
output.
hidden unit just before output layer multiply
weights between hidden output units add the
products.
for other layers move from layer to layer
opposite to way activities propagate through NN.
This is what gives back propagation its name.

41
BKPError
y1,t1
W
42
Summarizing BP explanation (1)

for non-linear units, BP includes an extra step
Before back-propagating, error rate change /
activation change must be converted into rate at
which error changes as total input received by a
unit is changed.

43
BKPError
y1,t1
W
44
Algorithms and their relations
dw?(t-y)xi
Discrete neuron
Perceptron Learning
Gradient Descent
Continuous neuron
Continuous neurons
BP
Delta Rule
dw ?(t-y)fxi ?(t-y)y(1-y)xi
dr Fr (t-yr) ds-1 Fs-1WsTds Ws ? ds ys-1T
45
Course to be found at