One-layer neural networks Approximation problems - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

One-layer neural networks Approximation problems

Description:

Widrow-Hoff algorithm = learning algorithm for a linear network ... algorithm similar with Widrow-Hoff but for networks with nonlinear activation functions ... – PowerPoint PPT presentation

Number of Views:80
Avg rating:3.0/5.0
Slides: 19
Provided by: UVT
Category:

less

Transcript and Presenter's Notes

Title: One-layer neural networks Approximation problems


1
One-layer neural networksApproximation problems
  • Approximation problems
  • Architecture and functioning (ADALINE, MADALINE)
  • Learning based on error minimization
  • The gradient algorithm
  • Widrow-Hoff and delta algorithms

2
Approximation problems
  • Approximation (regression)
  • Problem estimate a functional dependence
    between two variables
  • The training set contains pairs of corresponding
    values

Linear approximation
Nonlinear approximation
3
Architecture
  • One layer NN one layer of input units and one
    layer of functional units

Fictive unit
-1
W
Y
X
Total connectivity
Output vector
Input vector
N input units
M functional units (output units
4
Functioning
  • Computing the output signal
  • Usually the activation function is linear
  • Examples
  • ADALINE (ADAptive LINear Element)
  • MADALINE (Multiple ADAptive LINear Element)

5
Learning based on error minimization
  • Training set (X1,d1),,(XL,dL),
  • Xl - vector from RN, dl vector from
    RM
  • Error function measure of the distance between
    the output produced by the network and the
    desired output
  • Notations

6
Learning based on error minimization
  • Learning optimization task
  • find W which minimizes E(W)
  • Variants
  • In the case of linear activation functions W can
    be computed by using tools from linear algebra
  • In the case of nonlinear functions the minimum
    can be estimated by using a numerical method

7
Learning based on error minimization
  • First variant. Particular case
  • M1 (one output unit with linear activation
    function)
  • L1 (one example)

8
Learning based on error minimization
  • First variant

9
Learning based on error minimization
  • Second variant use of a numerical minimization
    method
  • Gradient method
  • Is an iterative method based on the idea that the
    gradient of a function indicates the direction
    on which the function is increasing
  • In order to estimate the minimum of a function
    the current position is moved in the opposite
    direction of the gradient

10
Learning based on error minimization
  • Gradient method

Direction opposite to the gradient
Direction opposite to the gradient
f(x)lt0
f(x)gt0
x1
xk-1
x0
11
Learning based on error minimization
  • Algorithm to minimize E(W) based on the gradient
    method
  • Initialization
  • W(0)initial values,
  • k0 (iteration counter)
  • Iterative process
  • REPEAT
  • W(k1)W(k)-etagrad(E(W(k)))
  • kk1
  • UNTIL a stopping condition is satisfied

12
Learning based on error minimization
  • Remark the gradient method is a local
    optimization method it can be easily trapped in
    local minima

13
Widrow-Hoff algorithm
  • learning algorithm for a linear network
  • it minimizes E(W) by applying a gradient-like
    adjustment for each example from the training set
  • Gradient computation

14
Widrow-Hoff algorithm
  • Algorithms structure
  • Initialization
  • wij(0)rand(-1,1) (the weights are randomly
    initialized in -1,1),
  • k0 (iteration counter)
  • Iterative process
  • REPEAT
  • FOR l1,L DO
  • Compute yi(l) and deltai(l)di(l)-yi(l), i1,M
  • Adjust the weights wijwijetadeltai(l)xj(l)
  • Compute the E(W) for the new values of the
    weights
  • kk1
  • UNTIL E(W)ltE OR kgtkmax

15
Widrow-Hoff algorithm
  • Remarks
  • If the error function has only one optimum the
    algorithm converges (but not in a finite number
    of steps) to the optimal values of W
  • The convergence speed is influenced by the value
    of the learning rate (eta)
  • The value E is a measure of the accuracy we
    expect to obtain
  • Is one of the simplest learning algorithms but it
    can by applied only for one-layer networks with
    linear activation functions

16
Delta algorithm
  • algorithm similar with Widrow-Hoff but for
    networks with nonlinear activation functions
  • the only difference is in the gradient
    computation
  • Gradient computation

17
Delta algorithm
  • Particularities
  • 1. The error function can have many minima, thus
    the algorithm can be trapped in one of these
    (meaning that the learning is not complete)
  • 2. For sigmoidal functions the derivates can be
    computed in an efficient way by using the
    following relations

18
Limits of one-layer networks
  • The one layer networks have limited capability
    being able only to
  • Solve simple (e.g. linearly separable)
    classification problems
  • Approximate simple (e.g. linear) dependences
  • Solution include hidden layers
  • Remark the hidden units should have nonlinear
    activation functions
Write a Comment
User Comments (0)
About PowerShow.com