One-layer neural networks Approximation problems - PowerPoint PPT Presentation

1 / 18

About This Presentation

Title:

One-layer neural networks Approximation problems

Description:

Widrow-Hoff algorithm = learning algorithm for a linear network ... algorithm similar with Widrow-Hoff but for networks with nonlinear activation functions ... – PowerPoint PPT presentation

Number of Views:80

Avg rating:3.0/5.0

Slides: 19

Provided by: UVT

Category:

more less

Transcript and Presenter's Notes

Title: One-layer neural networks Approximation problems

1
One-layer neural networksApproximation problems

Approximation problems
Architecture and functioning (ADALINE, MADALINE)
Learning based on error minimization
The gradient algorithm
Widrow-Hoff and delta algorithms

2
Approximation problems

Approximation (regression)
Problem estimate a functional dependence
between two variables
The training set contains pairs of corresponding
values

Linear approximation
Nonlinear approximation
3
Architecture

One layer NN one layer of input units and one
layer of functional units

Fictive unit
-1
W
Y
X
Total connectivity
Output vector
Input vector
N input units
M functional units (output units
4
Functioning

Computing the output signal

Usually the activation function is linear
Examples
ADALINE (ADAptive LINear Element)
MADALINE (Multiple ADAptive LINear Element)

5
Learning based on error minimization

Training set (X1,d1),,(XL,dL),
Xl - vector from RN, dl vector from
RM
Error function measure of the distance between
the output produced by the network and the
desired output
Notations

6
Learning based on error minimization

Learning optimization task
find W which minimizes E(W)
Variants
In the case of linear activation functions W can
be computed by using tools from linear algebra
In the case of nonlinear functions the minimum
can be estimated by using a numerical method

7
Learning based on error minimization

First variant. Particular case
M1 (one output unit with linear activation
function)
L1 (one example)

8
Learning based on error minimization

First variant

9
Learning based on error minimization

Second variant use of a numerical minimization
method
Gradient method
Is an iterative method based on the idea that the
gradient of a function indicates the direction
on which the function is increasing
In order to estimate the minimum of a function
the current position is moved in the opposite
direction of the gradient

10
Learning based on error minimization

Gradient method

Direction opposite to the gradient
Direction opposite to the gradient
f(x)lt0
f(x)gt0
x1
xk-1
x0
11
Learning based on error minimization

Algorithm to minimize E(W) based on the gradient
method
Initialization
W(0)initial values,
k0 (iteration counter)
Iterative process
REPEAT
W(k1)W(k)-etagrad(E(W(k)))
kk1
UNTIL a stopping condition is satisfied

12
Learning based on error minimization

Remark the gradient method is a local
optimization method it can be easily trapped in
local minima

13
Widrow-Hoff algorithm

learning algorithm for a linear network
it minimizes E(W) by applying a gradient-like
adjustment for each example from the training set
Gradient computation

14
Widrow-Hoff algorithm

Algorithms structure
Initialization
wij(0)rand(-1,1) (the weights are randomly
initialized in -1,1),
k0 (iteration counter)
Iterative process
REPEAT
FOR l1,L DO
Compute yi(l) and deltai(l)di(l)-yi(l), i1,M
Adjust the weights wijwijetadeltai(l)xj(l)
Compute the E(W) for the new values of the
weights
kk1
UNTIL E(W)ltE OR kgtkmax

15
Widrow-Hoff algorithm

Remarks
If the error function has only one optimum the
algorithm converges (but not in a finite number
of steps) to the optimal values of W
The convergence speed is influenced by the value
of the learning rate (eta)
The value E is a measure of the accuracy we
expect to obtain
Is one of the simplest learning algorithms but it
can by applied only for one-layer networks with
linear activation functions

16
Delta algorithm

algorithm similar with Widrow-Hoff but for
networks with nonlinear activation functions
the only difference is in the gradient
computation
Gradient computation

17
Delta algorithm

Particularities
1. The error function can have many minima, thus
the algorithm can be trapped in one of these
(meaning that the learning is not complete)
2. For sigmoidal functions the derivates can be
computed in an efficient way by using the
following relations

18
Limits of one-layer networks