Connectionist Machine Learning IIa - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Connectionist Machine Learning IIa

Description:

Function s is called the sigmoid or logistic function. It has the following property: ... each node is a sigmoid or squashing function. ... – PowerPoint PPT presentation

Number of Views:52
Avg rating:3.0/5.0
Slides: 25
Provided by: ricardo125
Category:

less

Transcript and Presenter's Notes

Title: Connectionist Machine Learning IIa


1
Connectionist Machine Learning IIa
  • Basics
  • Backpropagation Algorithm
  • Momentum
  • Summary

2
Basics
In contrast to perceptrons, multilayer networks
can learn multiple decision boundaries. In
addition, the boundaries may be nonlinear.
Output nodes
Internal nodes
Input nodes
3
Example
x2
x1
4
Example
5
One Single Unit
To make nonlinear partitions on the space we need
to define each unit as a nonlinear function
(unlike the perceptron). One solution is to use
the sigmoid unit.
x1
w1
x2
net
w2
S
w0
wn
xn
xo1
O s(net) 1 / 1 e -net
6
One Single Unit
The sigmoid or squashing function.
s(net)
net
O s(net) 1 / 1 e -net
7
More Precisely
O(x1,x2,,xn)
s ( WX )
where s ( WX ) 1 / 1 e -WX
Function s is called the sigmoid or logistic
function. It has the following property d
s(y) / dy s(y) (1 s(y))
8
Connectionist Machine Learning IIa
  • Basics
  • Backpropagation Algorithm
  • Momentum
  • Summary

9
Many weights need adjustment
Multilayer networks need many weights to be
adjusted
Output nodes
Internal nodes
Input nodes
10
Backpropagation Algorithm
Goal To learn the weights for all links in an
interconnected multilayer network. We begin by
defining our measure of error E(W) ½ Sd Sk
(tkd okd) 2 k varies along the output nodes
and d over the training examples. The idea is to
use again a gradient descent over the space of
weights to find a global minimum.
11
Output Nodes
Output nodes
12
Algorithm
The idea is to use again a gradient descent over
the space of weights to find a global minimum (no
guarantee).
  • Create a network with nin input nodes, nhidden
  • internal nodes, and nout output nodes.
  • Initialize all weights to small random numbers.
  • Until error is small do
  • For each example X do
  • Propagate example X forward through the network
  • Propagate errors backward through the network

13
Propagating Forward
Given example X, compute the output of every
node until we reach the output nodes
Output nodes
Compute sigmoid function
Internal nodes
Input nodes
Example X
14
Error Output Nodes
Estimation
Target function
Output nodes
15
Propagating Error Backward
  • For each output node k compute the error
  • dk Ok (1-Ok)(tk Ok)
  • Update each network weight
  • Wji Wji ?Wji
  • where ?Wji ? dj Xji (Wji and Xji
    are the input and
  • weight of node i to node j)

16
Error Intermediate Nodes
Output nodes
?
Estimation
Intermediate nodes
Input nodes
17
Propagating Error Backward
  • For each hidden unit h, calculate the error
  • dh Oh (1-Oh) Sk Wkh dk
  • Update each network weight
  • Wji Wji ?Wji
  • where ?Wji ? dj Xji (Wji and Xji
    are the input and
  • weight of node i to node j)

18
Connectionist Machine Learning IIa
  • Basics
  • Backpropagation Algorithm
  • Momentum
  • Summary

19
Adding Momentum
  • The weight update rule can be modified so as to
    depend
  • on the last iteration. At iteration n we have the
    following
  • ?Wji (n) ? dj Xji a?Wji (n)
  • Where a ( 0 lt a lt 1) is a constant called the
    momentum.
  • It increases the speed along a local minimum.
  • It increases the speed along flat regions.

20
Adding Momentum
Flat region Where do we go??
E(W)
W
21
Remarks on Backpropagation
  • It implements a gradient descent search over the
  • weight space.
  • 2. It may become trapped in local minima.
  • 3. In practice, it is very effective.
  • 4. How to avoid local minima?
  • Add momentum
  • Use stochastic gradient descent
  • Use different networks with different initial
    values
  • for the weights.

22
Representational Power
  • Boolean functions. Every boolean function can be
  • represented with a network having two
    layers of units.
  • Continuous functions. All bounded continuous
  • functions can also be approximated with a
    network
  • having two layers of units.
  • Arbitrary functions. Any arbitrary function can
    be
  • approximated with a network with three
    layers of units.

23
Connectionist Machine Learning IIa
  • Basics
  • Backpropagation Algorithm
  • Momentum
  • Summary

24
Summary
  • In multi-layer neural networks the output of
  • each node is a sigmoid or squashing
    function.
  • In propagating error backwards, intermediate
    nodes
  • compute a weighted sum of the error factor
    on the
  • output nodes.
  • Momentum helps increase the speed along a local
  • minimum and along flat regions.
  • Any arbitrary function can be approximated with
  • a network with three layers of units.
Write a Comment
User Comments (0)
About PowerShow.com