Neural Networks I - PowerPoint PPT Presentation

1 / 45
About This Presentation
Title:

Neural Networks I

Description:

We know how to compute the weights with the gradient descent rule ... to train NN, we adjust weights to reduce error between desired & actual output. ... – PowerPoint PPT presentation

Number of Views:19
Avg rating:3.0/5.0
Slides: 46
Provided by: wwwisW
Category:
Tags: networks | neural

less

Transcript and Presenter's Notes

Title: Neural Networks I


1
Neural Networks I
  • Course V
  • Alexandra Cristea Huub ten Eikelder

2
Contents
  • Summary course IV
  • Delta Rule
  • Linear Neurons
  • Next
  • Error Backpropagation
  • Practical Aspects of Error BP

3
Summary of course IV
  • Delta Rule
  • incremental version
  • batch version
  • Linear Neurons

4
Minumum of a function gradient
5
Minumum of a function gradient
y
x
6
Gradient examples
7
The gradient in one dimension
Let y(x)mgh, gravitational potential energy (x
horizontal hill direction hh(x))
Then gradient is
slope of the hill
horizontal component of net force ? downhill
force
8
The gradient in two dimensions
Let x1East, x2North (hh(x1,x2))
x2
Gradient points uphill, in direction of steepest
ascent
9
Memotechnics
  • This is why the error function is compared
    usually with the gravitational potential energy

10
  • What does this mean for an Error Function?

11
Delta rule WidrowHoff,1960
?E/dwgt0 E?
12
Downhill force - E
?
  • yk(time) f(S wk,j(time) xj) gt
  • ? w - ? E (gradient of energy)
  • ?wk,j a ekfxj agt0(xinput)
  • ( a learning rate)
  • wk,j(time1) wk,j(time) ?wk,j

13
Meaningerror correction ?forecast
  • yk(time1)S wk,j(time)xj a
    (dk-yk(time)) Sxj
  • (f linear)
  • dk-yk(time1) (dk-yk(time))
    fct(xj)
  • E is approaching an input xj

14
Intro BP
  • Disadvantages of discrete MLP lack of simple
    learning algorithm
  • Continuous MLP several
  • Most of them variants on a basic learning
    algorithm error back propagation

15
Backpropagation
  • Most famous learning algorithm
  • Uses a rule similar to WidrowHoff
  • (slightly more complicated)

16
Facts
  • We know how to compute the weights with the
    gradient descent rule
  • Gradient descent is based on the error
    computation
  • We know how to compute the error in the output
    layer

17
BKPError
y1,t1
Hidden layer Error?
18
Synapse
W weight
neuron1
neuron2
Weight serves as amplifier!
Value (v1,v2) Internal activation
19
Inverse Synapse
W weight
neuron1
neuron2
Weight serves as amplifier!
Value(v1,v2) Error
20
Inverse Synapse
W weight
neuron1
neuron2
Weight serves as amplifier!
Value(v1,v2) Error
21
BKPError
O1
y1,t1
I1
O2
O2, I2
Hidden layer Error?
22
Backpropagation to hidden layer
O2, I2
23
Update rule for 2 weight types
  • I2 (hidden layer), O1 (system output)
  • I1 (system input), O2 (hidden layer)
  • ?w a(ti-yi) f(Si)f(Si)
    adi f(Si) (simplification f1 for repeater,
    e.g.)
  • Si ?jwj,i(t)hj
  • ?w a(?idiwj,i)f(Sj)f(Sj)
    a ddjf(Sj)
  • Sj ?kwk,j(t)xk

24
(more) Formal Derivation of BP
25
Notations BP derivation
26
BP premises
  • Suppose a finite training set
    X(x(q),t(q))x(q)?Rn1, t(q)?(0,1) for
    q1,...,P
  • x(q) inputs t(q) required outputs.
  • Consider (x,t)?X actual output yr
  • Then we can define the squared error
  • E(q) ½t-yr2
  • I-O relation for layer s isYsF(Wsys-1)
  • For output layer Yr YrF(Wr F(Wr-1 F(W1x)))
  • gradient descent for a weight w will be

27
Weight computation in BP
E(q) ½t-yr2
28
Cases for A (output variation)
YsF(Wsys-1)
YrF(Wryr-1)
  • w is in output layer r-1 wwrij
  • w is before layer r-1 wwsij (in layer sltr)

29
Cases for A(cont)
YrF(Wr F(Wr-1 F(W1x)))
  • w is in output layer r-1 wwrij
  • w is before layer r-1 wwsij(in layer sltr)

30
Arbitrary output variation?
YsF(Wsys-1)
31
  • Up to now, we computed
  • Aoutput variation w. rsp. to weight change
  • What does this mean for
  • ?w weight change?

32
Cases of weight backpropagation in BP (1)
  • Case 1. w is between layer r-1 and layer r

33
Cases of weight backpropagation in BP (2)
  • Case 2. w is before layer r-1 (layer s lt r)

34
Actual vectors matrixes we used in BP
derivation (1)
Weights
Activation function
f standard activation function or other
continuous function
35
Actual vectors matrixes we used in BP derivation
Why f continuous?
Derivative used in formulas
Output used in formulas
36
Elements BP
37
Backpropagated error
  • We have defined error in output layer
  • Which is backpropagated as
  • And we defined the weight increase

38
Backpropagation algorithm
  • FOR s 1 TO r DO Ws initial matrix(often
    random)
  • REPEAT
  • select a pair (x,t) in X y0x
  • forward phase compute the actual output ys
    of the network with input x
  • FOR s 1 TO r DO ys F(Ws ys-1) END
  • yr is the output vector of the network
  • backpropagation phase propagate the errors
    back through the network
  • and adapt the weights of all layers
  • dr Fr (t - yr)
  • FOR s r TO 2 DO ds-1 Fs-1' WsT ds
  • Ws Ws ?
    ds ys-1T END
  • W1 W1 ? d1 y0T
  • UNTIL stop criterion

39
Summarizing BP
  • to train NN, we adjust weights to reduce error
    between desired actual output.
  • NN should compute error derivative of the weights
  • how the error changes as each weight is
    increased or decreased slightly.
  • BP is the most widely used method for determining
    error derivative

40
Summarizing BP explanation (1)
  • BP easiest to understand if NN units are linear.
  • BP computes ? error derivative by computing error
    rate change / activation change.
  • output units diff. between actual desired
    output.
  • hidden unit just before output layer multiply
    weights between hidden output units add the
    products.
  • for other layers move from layer to layer
    opposite to way activities propagate through NN.
  • This is what gives back propagation its name.

41
BKPError
y1,t1
W
42
Summarizing BP explanation (1)
  • for non-linear units, BP includes an extra step
  • Before back-propagating, error rate change /
    activation change must be converted into rate at
    which error changes as total input received by a
    unit is changed.

43
BKPError
y1,t1
W
44
Algorithms and their relations
dw?(t-y)xi
Discrete neuron
Perceptron Learning
Gradient Descent
Continuous neuron
Continuous neurons
BP
Delta Rule
dw ?(t-y)fxi ?(t-y)y(1-y)xi
dr Fr (t-yr) ds-1 Fs-1WsTds Ws ? ds ys-1T
45
Course to be found at
  • http//wwwis.win.tue.nl/alex/
  • Neural Networks (2L490 )
Write a Comment
User Comments (0)
About PowerShow.com