Function Approximation With ANNs - PowerPoint PPT Presentation

1 / 41

About This Presentation

Title:

Function Approximation With ANNs

Description:

Function approximation problem is to find neural network weights, f, such that ... Does this adequately approximate the training data? ... – PowerPoint PPT presentation

Number of Views:63

Avg rating:3.0/5.0

Slides: 42

Provided by: dUmn

Category:

more less

Transcript and Presenter's Notes

Title: Function Approximation With ANNs

1
Function Approximation With ANNs

Resources
Chapter 20, textbook
Sections 20.5
Fausett (1994) chapter Delta rule
Supervised Training
Delta rule Single-layer networks
Backpropagation Multi-layer networks

2
Single Layer Feed-Forward ANN

Network has n inputs and m outputs
One layer of weights
Training data
Pairs (input, output)
input is a vector of length n
output is a vector of length m
Function approximation problem is to find neural
network weights, f, such that
f(input) output
Sometimes call this
association learning

n inputs
m outputs
fully connected
. . .
. . .
3
Interpolation Problem

May be viewed as multidimensional interpolation
number of dimensions correspond to the number of
input units, and output units
Example
one-dimensional input, one-dimensional output
problem
input vectors have only one component output
vectors also have one component
this is now a curve fitting problem which curve
we choose to fit the data will change the result
for new inputs i.e., for generalization
In general, there is no unique solution to curve
the we choose
Raises a sampling issue
We need to have data that adequately sample the
distribution

4
Sampling issue
If this was the underlying system we are
obtaining data from, how would we select the
samples?
5
Example of Association Learning

Images of three types
Converted to grayscale
Want to associate each image with a shape name
Dog shape with audio representation of word dog
Yellow pail with audio representation of pail
Green bucket with audio representation of
bucket
Assume image is 16x16 pixels (256 inputs)
Assume output audio is represented by 10 real
number values
How do we find neural network weights to
approximate
f(image) audio

6
Simplify Problem

One Output Unit
Activation function Identity f(x) x
output defined just by weighted sum of inputs
Example problem, with n 3

n inputs
1 output
fully connected
. . .
How do we establish weights for the ANN?
7
General Weight Update Algorithm

Initialize the weight to random values
Compute errors
While errors produced by the network are too
great
For current sample
Update network weights using a weight update rule
Re-compute error for current sample

(Incremental weight update)
8
Weight Update Rule 1 Delta Rule
Change in Ith weight of weight vector
Learning rate (scalar, constant)
t
Target or correct output
y_in
Net (summed, weighted) input to output unit
Ith input value
9
Example

W (W1, W2, W3)
Initially W (.5 .2 .4)
Let ? 0.5
Apply delta rule

W1
W2
W3
Delta rule
10
One Epoch of Training
Delta rule
11
Step 1 of Training
Delta rule
12
Remaining Steps in First Epoch of Training
Delta rule
13
Completing the Example

After 18 epochs
Weights
W1 0.990735
W2 -0.970018005
W3 0.98147
Does this adequately approximate the training
data?

W1
W2
W3
http//www.cprince.com/courses/cs5541fall03/lectur
es/neural-networks/delta-rule1.xls
14
Actual Outputs
So, we have one method to incrementally adjust
the network weights, based on a series of
training samples This is typically called
training or learning
15
What about

The following weights?
W1 1
W2 -1
W3 1
Generalization?
(0 1 0)
(1 1 0)
(0 1 1)

16
Why is the Delta-Rule Effective?

Delta rule implements a form of error
minimization
Change weights to reduce sum squared error, E
For a specific training pattern, the sum squared
error is
E (t y_in)2
Recall
t desired output of network
y_in actual output of network
The derivative of E gives the slope or gradient
of E
Gives both direction of most rapid increase in E,
the error, and direction of most rapid decrease
Want the derivative with respect to the weights
We are adjusting the weights in an effort to
decrease the error
Since y_in is a function of multiple weights, we
will have partial derivatives
Adjusting weight WI in the direction of
will reduce the error

17
Delta Rule Approach
E (t y_in)2

E and y_in defined as before
Note that y_in is computed for one training
sample
Define training as modifying weights so that
the error is reduced
Typically, this is done iteratively
E.g., Weights modified to reduce error for the
current training sample, then modified to reduce
error for another training sample etc.
Approach
Take the derivative of E, the error, with respect
to the weights
Tells us how to change weights so as to minimize
E
Results in changes in weights that reduce E, the
error

18
Partial Derivative of E, Error
Since
i.e., chain rule
(t is a constant in this context)
Since
19
Completing Derivation of Delta Rule
Because we are looking for
we negate
Incorporating constant of 2 into the learning
rate, ? gives
Changing the weight by this amount will reduce
the error, E, for this data sample
20
Delta Rule with Activation Functions
Example for f
Change in Ith weight of weight vector
Learning rate (scalar, constant)
t
Target or correct output
y_in
Net (summed, weighted) input to output unit
Ith input value
f
Differentiable activation function
y
Output of network f(y_in)
21
Derivation of Delta Rule for Use With Activation
Functions

f differentiable activation function
e.g., sigmoidal

Output of network
y
Now we need
22
Derivation of
In derivation, we can apply chain rule to
this f(g(x0) f(g(x0))g(x0)
23
Delta Rule With Activation Function
Because we are looking for
we negate
Incorporating the constant into ? gives
24
How do we modify this to use the sigmoidal
activation function?
25
Delta Rule With Sigmoidal Activation Function

Need to take the derivative of the sigmoidal
function

26
Extension to Multiple Output Units

Have been dealing with only a single output unit
Need to generalize to multiple output units

Weight from i-th input unit to j-th output unit
n inputs
m outputs
Expected output from j-th output unit
Actual output from j-th output unit
fully connected
Summed weighted input to j-th output unit
. . .
. . .
Ith input value
27
What about Multiple Layers?

This works when we have
known outputs (supervision)
single layer ANN
How can we train weights for multi-layer ANNs?

28
Backpropagation

A method of training weights in a multi-layer,
feedforward network
Problem is to establish correct or expected
values for layers other than the output layer
Method
Start at output layer
Work backwards from output layer, to successive
layers to left
Modify weights at each step

29
Problem
General form of connections between layers

First, compute weight updates for right weight
layer using the delta rule

30
Now, need to consider input layer to hidden layer

Previously, in the delta rule, we needed the
partial derivative

We now need the partial derivative

31
Expected output
Where
Previously,
Actual output
Now, well consider all p output units (error for
one training sample)
Our goal is to find
32
Taking the partial derivative of this, well find
it defined in terms of the v weights. Why?
Because the y outputs are indirectly generated,
in part, by the v weights Since each v weight may
have an indirect effect on potentially each of
the y outputs, we need to consider each of the y
outputs in the formulation.
33
Since
Collapsing back to the summation, we have
34
Recall,
Now, deriving
(Chain rule)
Substituting gives
35
For convenience, let
(We can calculate this directly)
Now, derive
36
Deriving
Recall,
is the single hidden layer unit that weight
is affecting
Now, need
37
Deriving
Recall,
(Chain rule)
Since
(We can calculate this directly)
Finally, we have an expression in terms of the v
weights!
38
Now, we have
Putting it together
39
Finally, the weight change for connections to
hidden layer units
40
Application of Backpropagation