Neural Networks - PowerPoint PPT Presentation

1 / 53

About This Presentation

Title:

Neural Networks

Description:

Feed-forward neural networks provide a general ... The non-linear function of many variables is represented in terms of ... second-layer unit sigmoid. RBF ... – PowerPoint PPT presentation

Number of Views:189

Avg rating:3.0/5.0

Slides: 54

Provided by: exi4

Category:

more less

Transcript and Presenter's Notes

Title: Neural Networks

1
Neural Networks
and
2
Pattern Recognition
3
Giansalvo EXIN Cirrincione
Sabine Van Huffel
4
unit 5
5
The multi-layer perceptron
Feed-forward network mappings
Feed-forward neural networks provide a general
framework for representing non-linear functional
mappings.
6
The multi-layer perceptron
Feed-forward network mappings
Layered networks
7
The multi-layer perceptron
Feed-forward network mappings
six layers
8
The multi-layer perceptron
Feed-forward network mappings
Hinton diagram
The size of a square is proportional to the
magnitude of the corresponding parameter and the
square is black or white according to whether the
parameter is positive or negative.
9
The multi-layer perceptron
Feed-forward network mappings
The outputs can be expressed as deterministic
functions of the inputs
General topologies
10
Threshold units
A two-layer network can generate any Boolean
function provided the number M of hidden units is
sufficiently large.
Binary inputs
11
Each hidden unit acts as a template for the
corresponding input pattern and only generates an
output when the input pattern matches the
template pattern
Threshold units
1
-1
1
1-b
Binary inputs
no generalization
12
Possible decision boundaries
single convex region
AND

M hidden units
output bias - M

Relaxing this, more general decision boundaries
can be constructed
Continuous inputs
13
Possible decision boundaries
hidden unit activation transitions from 0 to 1
2
3
4
hyperplanes corresponding to hidden units
The second-layer weights are all set to 1 and so
the numbers represent the value of the linear sum
presented to the output unit
Continuous inputs
14
Possible decision boundaries
2
output unit bias -3.5
3
4
Continuous inputs
non-convex decision boundary
15
Possible decision boundaries
output unit bias -4.5
Continuous inputs
disjoint decision region
16
IMPOSSIBLE decision boundaries an example
However, any given decision boundary can be
approximated arbitrarily closely by a two-layer
network having sigmoidal activation functions.
Continuous inputs
17
Possible decision boundaries
Arbitrary decision region
Continuous inputs
18
Possible decision boundaries
divide the input space into a fine grid of
hypercubes
Continuous inputs
19
Possible decision boundaries
CONCLUSION Feed-forward neural networks with
threshold units can generate arbitrarily complex
decision boundaries.
Problem classify a dichotomy
For N data points in general position in
d-dimensional space, a network with ?N/d? hidden
units in a single hidden layer can separate them
correctly into two classes.
Continuous inputs
20
Sigmoidal units
linear transformation
A neural network using tanh activation functions
is equivalent to one using logistic activation
functions but having different values for the
weights and biases. Empirically, tanh activation
functions often give rise to faster convergence
of training algorithms than logistic functions.
21
Sigmoidal units
linear output units
22
Three-layer networks
They approximate, to arbitrary accuracy, any
smoothing mapping.
23
Sigmoidal units
They approximate arbitrarily well any functional
continuous mapping
They approximate arbitrarily well any decision
boundary
They approximate arbitrarily well both a function
and its derivative
two-layer networks
24
Sigmoidal units

1-5-1
BFGS

two-layer networks
25
Generalized Mapping Regressor GMR
Pollock, Convergence 10
26
Weight-space symmetries
27
Error back-propagation
Credit assignment problem

Hessian matrix evaluation
Jacobian evaluation
several error functions
several kinds of networks

back-propagation
e.g. gradient descent
28
Error back-propagation

arbitrary feed-forward topology
arbitrary differentiable non-linear activation
function
arbitrary differentiable error function

29
Error back-propagation
First step forward propagation
30
Error back-propagation
? computation
hidden unit
output unit
31
Error back-propagation
example
32
Error back-propagation
33
Homework 1
Show, for a feedforward network with tanh hidden
unit activation functions and a sum of squares
error function, that the origin in weight space
is a stationary point of the error function.
34
Homework 2
Let W the total number of weights and biases.
Show that, for each input pattern, the cost of
backpropagation for the evaluation of all the
derivatives is O(W) (if the derivatives are
evaluated numerically by forward propagation, the
total cost is O(W2)).
35
Numerical differentiation
finite differences
perturb each weight in turn
O(W2)
symmetrical central finite differences
O(W2)
BP correctness check
node perturbation
O(MW)
36
The Jacobian matrix
It provides a measure of the local sensitivity of
the outputs to changes in each of the input
variables.
It is valid only for small perturbations of the
inputs and the Jacobian must be re-evaluated for
each new input vector.
forward propagation
37
The Jacobian matrix
38
The Jacobian matrix
39
The Jacobian matrix
40
1. Several non-linear optimization algorithms
used for training neural networks are based
on the second-order properties of the error
surface.
2. The Hessian forms the basis of a fast
procedure for training a feed-fw. network
following a small change in the training data.
3. The inverse Hessian is used to identify the
least significant weights in a network as
part of a pruning algorithm.
4. The inverse Hessian is used to assign error
bars to the predictions made by a trained
network.
41
The inverse of a diagonal matrix is trivial to
compute.
O(W)
42
Regression problems
straightforward extension
Levenberg-Marquardt approximation (outer product
approximation)
O(W2)
43
Sherman-Morrison-Woodbury formula
44
BP check
O( W 2 )
45

arbitrary feed-forward topology
arbitrary differentiable activation function
arbitrary differentiable error function

O( W 2 )
wij does not occur on any forward propagation
path connecting unit l to the outputs of the
network
46
Initial conditions for each unit j (except for
input units) set hjj 1 and set hkj 0 ? k ? j
(units which do not lie on any forward
propagation path starting from unit j).
forward propagation
47
back propagation
48
ALGORITHM
1. Evaluate the activations of all of the hidden
and output unit, for a given input pattern,
by forward propagation. Similarly, compute
the initial conditions for the hkj and forward
propagate through the network to find the
remaining non-zero elements of hkj .
2. Evaluate ?k for the output units and,
similarly, evaluate the Hkk for all the
output units.
3. Use BP to find ?j for all hidden units.
Similarly, back propagate to find the blj
by using the given initial conditions.
4. Evaluate the elements of the Hessian for this
input pattern.
5. Repeat the above steps for each pattern in the
TS and then sum to obtain the full Hessian.
49
Exact Hessian for two-layer network

Legenda
indices i and i denote inputs
indices j and j denote hidden units
indices k and k denote outputs

50
homework
51
Consider a feed-forward network which has been
trained to a minimum of some error function E,
corresponding to a set of weights wj. Suppose
that all of the input values xin and target
values tkn in the TS are perturbed by small
amounts Dxin and D tkn respectively. This causes
the minimum of the error function to change to a
new set of weight values given by wjDwj. Write
down the Taylor expansion (to second order in the
Ds) of the new error function
By minimizing this expression w.r.t. the Dwj,
show that the new set of weights which minimizes
the error function can be calculated from the
original set of weights by adding corrections Dwj
which are given by solutions of the following
equation
where Hlj are the elements of the Hessian matrix,
and we have defined
52
Projection pursuit regression
Parameters are optimized cyclically in groups.
Specifically, training takes place for one hidden
unit at a time, and for each hidden unit the
second-layer weights are optimized first (OLS),
followed by the activation function
(one-dimensional curve fitting, e.g. cubic
splines), followed by the first-layer weights
(non-linear techniques). The process is repeated
for each hidden unit in turn until a stopping
criterion is satisfied.
Several generalizations to more than one output
variable are possible depending on whether the
outputs share common basis functions fj and, if
not, whether the separate basis functions fjk
(where k labels the outputs) share common
projection directions.
53
FINE

Write a Comment

User Comments (0)