Connectionist Machine Learning IIa presentation

About This Presentation

Transcript and Presenter's Notes

Title: Connectionist Machine Learning IIa

1
Connectionist Machine Learning IIa

Basics
Backpropagation Algorithm
Momentum
Summary

2
Basics
In contrast to perceptrons, multilayer networks
can learn multiple decision boundaries. In
addition, the boundaries may be nonlinear.
Output nodes
Internal nodes
Input nodes
3
Example
x2
x1
4
Example
5
One Single Unit
To make nonlinear partitions on the space we need
to define each unit as a nonlinear function
(unlike the perceptron). One solution is to use
the sigmoid unit.
x1
w1
x2
net
w2
S
w0
wn
xn
xo1
O s(net) 1 / 1 e -net
6
One Single Unit
The sigmoid or squashing function.
s(net)
net
O s(net) 1 / 1 e -net
7
More Precisely
O(x1,x2,,xn)
s ( WX )
where s ( WX ) 1 / 1 e -WX
Function s is called the sigmoid or logistic
function. It has the following property d
s(y) / dy s(y) (1 s(y))
8
Connectionist Machine Learning IIa

Basics
Backpropagation Algorithm
Momentum
Summary

9
Many weights need adjustment
Multilayer networks need many weights to be
adjusted
Output nodes
Internal nodes
Input nodes
10
Backpropagation Algorithm
Goal To learn the weights for all links in an
interconnected multilayer network. We begin by
defining our measure of error E(W) ½ Sd Sk
(tkd okd) 2 k varies along the output nodes
and d over the training examples. The idea is to
use again a gradient descent over the space of
weights to find a global minimum.
11
Output Nodes
Output nodes
12
Algorithm
The idea is to use again a gradient descent over
the space of weights to find a global minimum (no
guarantee).

Create a network with nin input nodes, nhidden
internal nodes, and nout output nodes.
Initialize all weights to small random numbers.
Until error is small do
For each example X do
Propagate example X forward through the network
Propagate errors backward through the network

13
Propagating Forward
Given example X, compute the output of every
node until we reach the output nodes
Output nodes
Compute sigmoid function
Internal nodes
Input nodes
Example X
14
Error Output Nodes
Estimation
Target function
Output nodes
15
Propagating Error Backward

For each output node k compute the error
dk Ok (1-Ok)(tk Ok)
Update each network weight
Wji Wji ?Wji
where ?Wji ? dj Xji (Wji and Xji
are the input and
weight of node i to node j)

16
Error Intermediate Nodes
Output nodes
?
Estimation
Intermediate nodes
Input nodes
17
Propagating Error Backward

For each hidden unit h, calculate the error
dh Oh (1-Oh) Sk Wkh dk
Update each network weight
Wji Wji ?Wji
where ?Wji ? dj Xji (Wji and Xji
are the input and
weight of node i to node j)

18
Connectionist Machine Learning IIa

Basics
Backpropagation Algorithm
Momentum
Summary

19
Adding Momentum

The weight update rule can be modified so as to
depend
on the last iteration. At iteration n we have the
following
?Wji (n) ? dj Xji a?Wji (n)
Where a ( 0 lt a lt 1) is a constant called the
momentum.
It increases the speed along a local minimum.
It increases the speed along flat regions.

20
Adding Momentum
Flat region Where do we go??
E(W)
W
21
Remarks on Backpropagation

It implements a gradient descent search over the
weight space.
2. It may become trapped in local minima.
3. In practice, it is very effective.
4. How to avoid local minima?
Add momentum
Use stochastic gradient descent
Use different networks with different initial
values
for the weights.

22
Representational Power

Boolean functions. Every boolean function can be
represented with a network having two
layers of units.
Continuous functions. All bounded continuous
functions can also be approximated with a
network
having two layers of units.
Arbitrary functions. Any arbitrary function can
be
approximated with a network with three
layers of units.

23
Connectionist Machine Learning IIa

Basics
Backpropagation Algorithm
Momentum
Summary

24
Summary

In multi-layer neural networks the output of
each node is a sigmoid or squashing
function.
In propagating error backwards, intermediate
nodes
compute a weighted sum of the error factor
on the
output nodes.
Momentum helps increase the speed along a local
minimum and along flat regions.
Any arbitrary function can be approximated with
a network with three layers of units.

Write a Comment

User Comments (0)

About PowerShow.com

Connectionist Machine Learning IIa PowerPoint PPT Presentation