2L490 Backpropagation 1 - PowerPoint PPT Presentation

1 / 44

About This Presentation

Title:

2L490 Backpropagation 1

Description:

The weights are modeled by separate product nodes. 9/7/09 ... zero weights such that there exists a connection. between any pair of nodes in successive layers ... – PowerPoint PPT presentation

Number of Views:89

Avg rating:3.0/5.0

Slides: 45

Provided by: Rudol

Category:

more less

Transcript and Presenter's Notes

Title: 2L490 Backpropagation 1

1
Error Backpropagation

All learning algorithms for (layered)
feed-forward networks are based on a technique
called error backpropagation
This is a form corrective supervised learning
which consists of two phases. In the first
(for-ward) phase the output of each neuron is
computed, in the second (backward) phase the
partial derivatives of the error function with
respect to the weights are computed, where-after
the weights are updated

2
Approach

The approach we take
is a minor variation of the one in R. Rojas,
Neural Networks, Springer, 1996.
applies to general feed-forward networks
allows distinct activation functions for each of
the neurons
uses a graphical method called B-diagrams
to illustrate how partial derivatives of the
error
function can be computed

3
General Feed-forward Networks

A general feed-forward network consists of
n input nodes (numbered 1, , n)
l hidden neurons (numbered n1, , nl)
m output neurons (numbered nl1, , nlm)
a set of connections such that the network does
not contain cycles. Hence the hidden neurons
can be topologically sorted, i.e. numbered such
that (i, j) is a connection, iff
i lt j and n lt j and i lt nl1.

4
(No Transcript)
5
B-diagrams

A B-diagram is a directed acyclic network
containing four types of nodes
Fan-in nodes
Fan-out nodes
Product nodes
Function nodes
The forward phase computes function composition,
the backward phase computes partial derivatives.

6
B-diagram (fan-in node)
Forward phase
Backward phase
7
B-diagram (fan-out node)
Forward phase Backward phase
8
B-diagram (product node)
Forward phase Backward phase
9
B-diagram (function node)
Forward phase Backward phase
10
Chain-rule
x
1
(g ? f)(x) g(f(x)) (g ? f)(x)
g(f(x))f (x)
11
Remark
Note that the product node, the fan-in node,
and the function node are all special cases of a
more general node for functions with an arbitrary
num- ber of arguments that stores all partial
derivates.
f (x1, x2)
12
(No Transcript)
13
Translation scheme

As a first step in the development of the error
backpropagation algorithm we show how to
translate a general feed-forward net into a
B-diagram
Replace each input node by a fan-out node
Replace each edge by a product node
Replace each neuron by a fan-in node, followed
by a function node, followed by a fan-out node

14
Translation of a neuron
Note that this translation only captures the
activa-tion function and connection pattern of a
neuron. The weights are modeled by separate
product nodes.
15
Simplifications

The B-diagram of a general feed-forward net can
be simplified as follows
Neurons with a single output do not require a
fan-out node
Neurons with a single input do not require a
fan-in node
Neurons with activation function f(z) z do
not
require a function node
Edges with weight 1 do not require a product
node

16
Backpropagation theorem
Let B be the B-diagram of a general
feed-forward net N that computes a function F
Rn ! R Presenting value xi at the input
node i of B and performing the forward phase
of each node (in the order indicated by the
numbering of the nodes of N) will result in the
value F(x) at the output of B. Subsequently
presenting value 1 at the output node and
performing the backward phase will result in
partial derivative F(x) / xi at input i.
17
Error function
Consider a general FFN that computes
with training set
Then the error of training pair q is defined by
18
FFNs that compute Error Functions
Hidden neurons
19
Error Dependence on Weight wij
20
E(rror)B(ack)P(ropagation) Learning
21
EBP learning (forward phase)
22
EBP learning (backward phase)
23
EBP learning (update phase)
Beware a weight update can only be
performed after all errors that depend on that
weight have been computed. A separate phase
trivially gua- rantees this requirement.
24
Layered version of EBP

To obtain a version of the error backpropagation
algorithm for layered feedforward networks, i.e.
multi-layer perceptrons, we
introduce a layer-oriented node numbering
visit the nodes on a layer by layer basis
introduce vector notation for quantities
pertain-
ing to a single layer

25
Layer-oriented Node Numbers

Assume that the nodes of the network can be
organized in r1 layers, numbered 0, , r
For 0 s r1, let ns denote the number
of nodes in layers 0, , (s -1). Hence node
i
lies in layer s iff ns lt i ns1
Renumber the nodes according to the scheme

26
Weight Matrix of Layer s
Let Ws be the (nsns-1)-matrix defined
by Note that for the sake of simplicity we
have added zero weights such that there exists a
connection between any pair of nodes in
successive layers For convenience we write wsij
instead of (Ws)ij
27
EBP (forward phase, layered)
28
EBP (backward phase, layered)
29
EBP (update phase, layered)
30
Vector notation
For a continuous and differentiable function f
R!R and vector z2 Rn for arbitrary
dimen-sion n define the n-dimensional vector F
(z) by and the diagonal matrix by
31
EBP (layered and vectorized)
32
Practical Aspects

Convergence improvements
Elementary improvements
Advanced first-order methods
Second order methods
Generalization
Overtraining
Training with cross validation

33
Elementary Improvements

Momentum term
Resilient backpropagation
gradient determines the sign of the weight
updates
learning rate increases for stable gradient
learning rate decreases for alternating gradient

34
First-order Methods

Steepest descent where
is chosen such that is
minimal.
Conjugated gradient methods directions are given
by
with suitably chosen.

35
Second-order Methods (derivation)

Consider the Taylor expansion of the error func-
tion around w0
Ignore third- and higher-order terms and choose
such that is
minimal, i.e.

36
(Quasi) Newton methods

Quasi Newton methods use the update rule
with
Fast convergence (Newtons method requires1
iteration for a quadratic error function)
Solving the above equation is time consuming
Hessian matrix H can be very large

37
Levenberg-Marquardt Methods

LM-methods use update rule
This is a combination of gradient descent and
Newtons method
If small, then
If large, then

38
Generalization

Generalization addresses the issue how well a
net performs on fresh (not part of the training
set)
samples from the population.
Generalization is influenced by three factors
The architecture of the network
The size of the training set
The complexity of the problem

39
Overtraining

Overtraining is the situation in which the
network memorizes the data of the training set,
but generalizes poorly.
The size of the training set must be related to
the amount of data the network can memorize (i.e.
the number of weights).
Vice-versa in order to prevent overtraining the
number of weights must be kept in proportion to
the size of the training set

40
Cross Validation

To protect against overtraining a technique
called
cross-validation can be used. It involves
an additional data set called the validation set
computing the error made by the net on this
validation set, while training with the training
set
stop training when the error on the validation
set starts increasing
Usually the size of the validation set is chosen
roughly halve the size of the training set.

41
Practical Aspects

Preprocessing
Normalization
Decorrelation
Network pruning
Magnitude-based
Optimal brain damage
Optimal brain surgeon

42
Preprocessing

Normalization
Decorrelation

43
Pruning

Pruning is a technique to increase network
perfor-
mance by elimination (pruning in the strict
sense)
or addition (pruning in the broad sense) of neu-
rons and/or connections.

training set validation set action taken
error too large irrelevant add neurons
error small error too large remove neurons
error small error small stop pruning
44
Pruning connections
Optimal Brain Damage
Optimal Brain Surgeon

Write a Comment

User Comments (0)