Cooperating Intelligent Systems - PowerPoint PPT Presentation

1 / 42

About This Presentation

Title:

Cooperating Intelligent Systems

Description:

There are approx. 1011 neurons in the brain. ... Inputs (digits) are provided as 32x32 bitmaps. ... param map the bitmap on the screen ... – PowerPoint PPT presentation

Number of Views:60

Avg rating:3.0/5.0

Slides: 43

Provided by: HH3

Category:

more less

Transcript and Presenter's Notes

Title: Cooperating Intelligent Systems

1
Cooperating Intelligent Systems

Statistical learning methods
Chapter 20, AIMA
(only ANNs SVMs)

2
Artificial neural networks

The brain is a pretty intelligent system.
Can we copy it?
There are approx. 1011 neurons in the brain.
There are approx. 23?109 neurons in the male
cortex (females have about 15 less).

3
The simple model

The McCulloch-Pitts model (1943)

w2
w1
w3
y g(w0w1x1w2x2w3x3)
Image from Neuroscience Exploring the brain by
Bear, Connors, and Paradiso
4
Transfer functions g(z)
The logistic function
The Heaviside function
5
The simple perceptron
With -1,1 representation Traditionally
(early 60s) trained with Perceptron learning.
6
Perceptron learning
Desired output

Repeat until no errors are made anymore
Pick a random example x(n),f(n)
If the classification is correct, i.e. if
y(x(n)) f(n) , then do nothing
If the classification is wrong, then do the
following update to the parameters (h, the
learning rate, is a small positive number)

7
Example Perceptron learning
x2
x1
The AND function
Initial values h 0.3
8
Example Perceptron learning
x2
This one is correctlyclassified, no action.
x1
The AND function
9
Example Perceptron learning
x2
This one is incorrectlyclassified, learning
action.
x1
The AND function
10
Example Perceptron learning
x2
This one is incorrectlyclassified, learning
action.
x1
The AND function
11
Example Perceptron learning
x2
This one is correctlyclassified, no action.
x1
The AND function
12
Example Perceptron learning
x2
This one is incorrectlyclassified, learning
action.
x1
The AND function
13
Example Perceptron learning
x2
This one is incorrectlyclassified, learning
action.
x1
The AND function
14
Example Perceptron learning
x2
x1
The AND function
Final solution
15
Perceptron learning

Perceptron learning is guaranteed to find a
solution in finite time, if a solution exists.
Perceptron learning cannot be generalized to more
complex networks.
Better to use gradient descent based on
formulating an error and differentiable functions

16
Gradient search
The learning rate (h) is set heuristically
E(W)
Go downhill
W
W(k)
W(k1) W(k) DW(k)
17
The Multilayer Perceptron (MLP)

Combine several single layer perceptrons.
Each single layer perceptron uses a sigmoid
function (C?)E.g.

input
output
Can be trained using gradient descent
18
Example One hidden layer

Can approximate any continuous function
q(z) sigmoid or linear,
f(z) sigmoid.

Example of computing the gradient

What we need to do is to compute
We have the complete equation for the network
20
Example of computing the gradient
21

When should you stop learning?
After a set number of learning epochs
When the change in the gradient becomes smaller
than a certain number
Validation data - early stopping

RPROP (Resilient PROPagation)

Parameter update rule
Learning rate update rule
No parameter tuning unlike standard
backpropagation!
23

Model selection

Use this to determine
Number of hidden
nodes
Which input signals
to use
If a pre-processing
strategy is good or not
Etc...

Variability typically induced by
Varying train
and test data sets
Random initial
model parameters

24
Support vector machines
25
Linear classifier on a linearly separable problem
There are infinitely manylines that have zero
trainingerror. Which line should we choose?
26
Linear classifier on a linearly separable problem
There are infinitely manylines that have zero
trainingerror. Which line should we choose? ?
Choose the line with thelargest margin. The
large margin classifier
margin
27
Linear classifier on a linearly separable problem
There are infinitely manylines that have zero
trainingerror. Which line should we choose? ?
Choose the line with thelargest margin. The
large margin classifier
margin
Support vectors
28
Computing the margin
The plane separating and is defined
by The dashed planes are given by
w
margin
29
Computing the margin
Divide by b Define new w w/b and a a/b
w
margin
We have defined a scalefor w and b
30
Computing the margin
We have which gives
x lw
lw
x
margin
31
Linear classifier on a linearly separable problem
Maximizing the margin isequal to minimizing
w subject to the constraints wTx(n) a
? 1 for all wTx(n) a ? -1 for all
w
Quadratic programming problem, constraints can
be included with Lagrange multipliers.
32
Quadratic programming problem
Minimize cost (Lagrangian)
Minimum of Lp occurs at the maximum of (the Wolfe
dual)
Only scalar productin cost. IMPORTANT!
33
Linear Support Vector Machine
Test phase, the predicted output
Still only scalar products in the expression.
34
Example Robot color vision(Competition 1999)
Classify the Lego pieces into red, blue, and
yellow. Classify white balls, black sideboard,
and green carpet.
35
What the camera sees (RGB space)
Yellow
Red
Green
36
Mapping RGB (3D) to rgb (2D)
37
Lego in normalized rgb space
x2
Output is 6D red, blue, yellow, green, black,
white
x1
Input is 2D
38
MLP classifier
E_train 0.21 E_test 0.24
2-3-1 MLP Levenberg- Marquardt
Training time (150 epochs) 51 seconds
39
SVM classifier
E_train 0.19 E_test 0.20
SVM with g 1000
Training time 22 seconds
40

Lab 4 Digit recognition
Inputs (digits) are provided as 32x32 bitmaps.
Task is to investigate how well these handwritten
digits can be recognized by neural networks.
Assignment includes changing in the program code
to answer
How good is the generalization performance? (test
data error)
Can pre-processing improve performance?
What is the best configuration of the network?

public AppTrain() // create a new network
of given size nnnew NN(3232, 10, seed)
// each row contains 32321 integer //
create the matrix holding the data // read
data into the matrix filenew
TFile("digits.dat") System.out.println(file.r
ows()" digits have been loaded") double
inputnew double3232 double targetnew
double10 // the training session (below)
is iterative for (int e0 eltnEpochs e)
// reset the error accumulated over each
training epoch double err0 // in
each epoch, go through all examples/tuples/digits
// note all examples are here used for
training, consequently no systematic testing
// you may consider dividing the data set into
training, testing and validation sets. for
(int p0 pltfile.rows() p) for (int
i0 ilt3232 i) inputifile.values
pi // the last value on each row
contains the target (0-9) // convert it
to a double target vector for (int i0
ilt10 i) if (file.valuesp3232
i) targeti1 else
targeti0 // present a
sample and // calculate errors and adjust
weights errnn.train(input, target,
eta) System.out.println("Epoch
"e" finished with error "err/file.rows())
// save network weights in a file for
later use, e.g. in AppDigits
nn.save("network.m")

/ classify _at_param map the bitmap on
the screen _at_return int the most likely
digit (0-9) according to network / public
int classify(boolean map) double
inputnew double3232 for (int c0
cltmap.length c) for (int r0
rltmapc.length r) if (mapcr) //
bit set inputrmapr.lengthc1
else inputrmapr.lengthc0
// activate the network, produce
output vector double outputnn.feedforward(i
nput) // alternative version assumes that
the network has been trained on an 8x8 map //
double outputnn.feedforward(to8x8(input))
double highscore0 int highscoreIndex0
// print out each output value (gives an idea of
the network's support for each digit).
System.out.println("--------------") for
(int k0 klt10 k) System.out.println(k
""(double)((int)(outputk1000)/1000.0))
if (outputkgthighscore)
highscoreoutputk highscoreIndexk
System.out.println("--------------"
) return highscoreIndex