Perceptron presentation

About This Presentation

Transcript and Presenter's Notes

Title: Perceptron

1
Perceptron

Danny meisler
Koby lion

2
Neural Networks

A large number of very simple neuron like
processing elements
A large number of weighted connections between
the elements
Highly parallel, distributed control
An emphasis on learning internal representations
automatically

3
Why Neural Nets?

Solving problems under the constraints similar to
those of the brain may lead to solutions to AI
problems that might otherwise be overlooked.
Individual neurons operate relatively slowly, but
make up for that with massive parallelism.

4
The Parts of a Neuron
5
How it Works

Each neuron has branching from it a number of
small fibers called dendrites and a single long
fiber, the axon.

6
How it Works

The axon eventually splits and ends in a number
of synapses which connect the axon to the
dendrites of other neurons.

7
How it Works

Communication between neurons occurs along these
paths. When the electric potential in a neuron
rises above a threshold, the neuron activates.

8
How it Works

The neuron sends the electrical impulse down the
axon to the synapses.

9
How it Works

A synapse can either add to the electrical
potential or subtract from the electrical
potential.

10
How it Works

The pulse then enters the connected neurons
dendrites, and the process begins again.

11
neural network

A neural network is made up of the
interconnection of a large number
of nonlinear processing units (neurons)
The network may consist of
feedforward and feedback paths
Interesting properties
nonlinearity
learning

12
McCulloch and Pitts, 1943

modern era of neural networks starts in the
1940s, when Warren McCulloch (a psychiatrist and
neuroanatomist) and Walter Pitts (amathematician)
explored the computational capabilities of
networks made of very simple neurons
A McCulloch-Pitts network fires if the sum of its
excitatory inputs exceeds its threshold, as long
as it does not receive an inhibitory input
Using a network of such neurons, they showed that
it was possible to construct any logical function

13
Each logical function can be computed by a
two-layered McCulloch-Pitt network.Every finite
automaton can be simulated by a network of
(recurrent) McCulloch- Pitts cells.
14
Hebb, 1949

In his book The organization of Behavior,
Donald Hebb introduced his postulate of learning
(a.k.a. Hebbian learning), which states that the
effectiveness of a variable synapse between two
neurons is increased by the repeated activation
of one neuron by the other across that synapse
The Hebbian rule has a strong similarity to the
biological process in which a neural pathway is
strengthened each time it is used

15
Rosenblatt, 1958

Frank Rosenblatt introduced the perceptron, the
simplest form of a neural network
The perceptron consists of a single neuron with
adjustable synaptic weights and a threshold
activation function
Rosenblatts original perceptron in fact
consisted of three layers (sensory, association
and response) of with only one layer had variable
weights.

16
Rosenblatt,1958-continuation

Rosenblatt also developed an error-correction
rule to adapt these weights (a.k.a. the
perceptron learning rule), and proved that if the
(two) classes were linearly separable, the
algorithm would converge to a solution (a.k.a.
the perceptron convergence theorem)

17
(No Transcript)
18
Inputs To Neurons

Arise from other neurons or from outside the
network
Nodes whose inputs arise outside the network are
called input nodes and simply copy values
An input may excite or inhibit the response of
the neuron to which it is applied, depending upon
the weight of the connection

19
Weights

Represent synaptic efficacy and may be excitatory
or inhibitory
Normally, positive weights are considered as
excitatory while negative weights are thought of
as inhibitory
Learning is the process of modifying the weights
in order to produce a network that performs some
function

20
Output

The response function is normally nonlinear
Samples include
Sigmoid
Piecewise linear

21
(No Transcript)
22
(No Transcript)
23
Representational Power of Perceptrons

Perceptrons can represent the logical AND, OR,
and NOT functions as above.
we consider 1 to represent True and 1 to
represent False.

Here there is no way to draw a single line that
separates the "" (true) values from the "-"
(false) values.

25
The Good and the Bad News

Good- every Boolean function can be represented
by some network of perceptrons only two levels
deep.
Bad- any single perceptron can only represent
linearly separable functions.
Good-there is a perceptron algorithm that will
learn any linearly separable function

26
train a perceptron

To train a perceptron , Rosenblatt developed a
procedure for changing the synaptic weight
Y(t) sgn ? Xi(t)Wi(t) 0.
Sgn- 1 if its argument is positive otherwise -1
Xi(t) - inputs signal
Wi(t) - the synaptic weight
0 - the threshold for that node
If the sum of the weighted inputs xi wi exceed
the threshold y(t)1 otherwise y(t)-1

27
train a perceptron -continuation

At start of the experimenter the W(0) and 0 are
set of random values
Than the train begin with objective of teaching
it to differentiate two classes of inputs I and
II
The goal is to have the nodes output y(t) 1 if
the input is of class I , and to have
y(t) -1 if the input is of class II
You can free to choose any inputs (Xi) and to
designate them as being of class I or II

28
train a perceptron - continuation

If the node happened to output 1 signal when
given a class II input or output -1 signal when
given a class I input the weight Wi no change
If the node happened to output -1 signal when
given a class I input or output 1 signal when
given a class II input the weight Wi change
according to the rule

29
train a perceptron - continuation

Wi(t1)Wi(t) r d(t) y(t) Xi(t)
d(t) desire or target output (1 or -1)
Since d and y can be 1 or -1 the difference if on
zero can only equal 2 or -2
r present positive learning (no greater than 1
or 2)

30
Example

Let's say we want to figure out the appropriate
weights to model the AND function we discussed
above (1 1 1,
1 (-1) -1 , (-1) (-1) -1)
We're assuming, of course, that no one gave us
the weights.
We set up a perceptron with two inputs (three, if
we include X0). Now let's guess some weights.

31
reminder

d(t)- desire or target ?
input (1 or -1)
Y(t) 1 if ? Xi(t)Wi(t) gt 0
-1 otherwise
Change the weights if d(t) ? y(t)
Wi(t1)Wi(t) r d(t) y(t) Xi(t)

32
Example - continuation

W0 0.1, W1 0.1, W2 0.1
And let's let our learning rate r be 0.1
Our first training example has
X1 X2 1, so the output of the perceptron
should be 1.
Fortunately, that is the output of the
perceptron, so no modifications are needed.

33
Example - continuation

Our second training example has
X1 1, X2 -1, so the target output of the
perceptron should be -1.
Unfortunately, the actual output of the
perceptron is 1. So we need to modify the
weights.
Following the equations above, we calculate
W0 0.1 (0.1)(-2)(1) -0.1
W1 0.1 (0.1)(-2)(1) -0.1
W2 0.1 (0.1)(-2)(-1) 0.3

34
Example - continuation

Now we get a third training example
X1 X2 -1, for which the target output is
-1.
Fortunately, it is, so the weights need not be
modified.

35
Example - continuation

Our fourth training example has
X1 -1, X2 1, so the target output of the
perceptron should be -1.
Unfortunately, the actual output is 1.
So it's time for more modification of the
weights.
Again following the equations above, we
calculate
W0 -0.1 (0.1)(-2)(1) -0.3
W1 -0.1 (0.1)(-2)(-1) 0.1
W2 0.3 (0.1)(-2)(1) 0.1

36
Example - continuation

Our fifth training example has
X1 X2 1, so the output of the perceptron
should be 1.
This time the output is -1.
So it's time for more modification of the
weights.
Again following the equations above, we
calculate
W0 -0.3 (0.1)(2)(1) -0.1
W1 0.1 (0.1)(2)(1) 0.3
W2 0.1 (0.1)(2)(1) 0.3

37
Example - continuation

Our sixth training example has
X1 1, X2 -1, so the target output of the
perceptron should be -1.
Indeed, that's what the perceptron produces.

38
Example - continuation

Our seventh example has
X1 X2 -1, for which the target output is
-1.
Fortunately, it is, so the weights need not be
modified.
Our eighth example has
X1 -1, X2 1, so the target output of the
perceptron should be -1.
Again, it is!
We've converged on appropriate weights!

39
Single Layer Perceptron
40
Single Layer Perceptron

For a problem which calls for more then 2
classes, several perceptrons can be combined into
a network.
Can distinguish only linear separable functions

41
Single Layer Perceptron
Single layer, five nodes. 2 inputs and 3 outputs
Recognizes 3 linear separate classes, by means of
2 features
42
For general problem we have to resort to
multi-layer network, as in our brain
Perceptron can do it
Perceptron can not do it
43
Multi-Layer Networks
44
Multi-Layer Networks

A Multi layer perceptron can classify non linear
separable problems.
A Multilayer (feedforward) network has one or
more hidden layers.

45
Multi-layer networks
x1
x2
Input (visual input)
Output (Motor output)
xn
Hidden layers
46
XOR
47
XOR

Activation Function if (input gt threshold),
fire else, dont fire
1
1
0
0
2
1
1
-2
All weights are 1, unless otherwise labeled.
1
0
48
XOR

Activation Function if (input gt threshold),
fire else, dont fire
1
1
0
0
2
1
1
-2
All weights are 1, unless otherwise labeled.
1
1
49
XOR

Activation Function if (input gt threshold),
fire else, dont fire
1
1
0
0
2
1
1
-2
All weights are 1, unless otherwise labeled.
1
1
50
XOR

Activation Function if (input gt threshold),
fire else, dont fire
1
1
0
0
2
1
1
-2
All weights are 1, unless otherwise labeled.
1
0
51
Network Topology
52
Feedforward Networks

Feedforward Networks
Solutions are known
Weights are learned
Evolves in the weight space
Mostly Used for
Interpolation.
System modeling.
Classification, example face, handwrite and
voice.
Adaptive Filtering.
Non Linear Control.

53
Training Multilayer Perceptron

The training of multilayer networks raises some
important issues
How many layers ?, how many neurons per layer ?
Too few neurons makes the network unable to learn
the desired behavior. Too many neurons increases
the complexity of the learning algorithm.

54
Training Multilayer Perceptron

A desired property of a neural network is its
ability to generalize from the training set.
If there are too many neurons, there is the
danger of over fitting.
Does there exist an effective training algorithm?

55
Neural Network Model Building (Supervised
Learning)
56
The Backpropagation Algorithm
57
Backpropagation Algorithm

It is a gradient-descent method. A
generalization of the LMS rule.
Requires that the function describing the neural
network should be differentiable. This especially
means that the activation function should be
differentiable.
Activation function that is often used is the
sigmoid function.

58
Gradient Descent Learning Rule

Consider linear unit without threshold and
continuous output o (not just 1,1)
ow0 w1 x1 wn xn
Train the wis such that they minimize the
squared error LMS, least mean square
Ew1,,wn ½ ?d?S (td-od)2
where S is the set of training examples
The opposite of hill climbing.

59
Gradient Descent
Slt(1,1),1gt,lt(-1,-1),1gt,
lt(1,-1),-1gt,lt(-1,1),-1gt
?w-? ?Ew
?wi-? ?E/?wi
60
Sigmoid Unit
x01
w1
w0
z?i0n wi xi
o?(z)1/(1e-z)
w2
S
o
. . .
wn
?(z) 1/(1e-z) sigmoid function.
d?(z)/dz ?(z) (1- ?(z))
61
Backpropagation Preparation

Training SetA collection of input-output
patterns that are used to train the network
Testing SetA collection of input-output patterns
that are used to assess network performance
Learning Rate-?A scalar parameter, analogous to
step size in numerical integration, used to set
the rate of adjustments

62
A Pseudo-Code Algorithm

Randomly choose the initial weights
While error is too large E gt E-acceptable
For each training pattern
Apply the inputs to the network
Calculate the output for every neuron from the
input layer, through the hidden layer(s), to the
output layer.
Calculate the error at the outputs
Use the output error to compute error signals for
pre-output layers
Use the error signals to compute weight
adjustments
Apply the weight adjustments

63
Backpropagation Math

Consider the square error
ESw1/2?d ? S ?k ? output (td,k-od,k)2
Gradient ?ESw
Update ww - ? ?ESw
How do we compute the Gradient?
Use the chain rule to compute the Gradient

64
Calculate The Error Signal For Each Output Neuron

The output neuron error signal dpj is given by
dpj(Tpj-Opj) Opj (1-Opj)
Tpj is the target value of output neuron j for
pattern p
Opj is the actual output value of output neuron j
for pattern p

65
Calculate The Error Signal For Each Hidden Neuron

The hidden neuron error signal dpj is given by
where dpk is the error signal of a post-synaptic
neuron k and Wkj is the weight of the connection
from hidden neuron j to the post-synaptic neuron
k

66
Calculate And Apply Weight Adjustments

Compute weight adjustments DWji byDWji ? dpj
Opi
Apply weight adjustments according to Wji lt
Wji DWji

67
Backpropagation The Momentum

Backpropagation has the disadvantage of being too
slow if ? is small, and it can oscillate too
widely if ? is large.
To solve this problem, we can add a momentum (?)
to give each connection some inertia, forcing it
to change in the direction of the downhill
force.
Weight change is proportional to current gradient
and previous gradient
New Delta Rule
?Wji(t1) -? ?E/?Wji ? ?Wji(t)

68
Backpropagation Summary

Gradient descent over entire network weight
vector
Finds a local, not necessarily global error
minimum
in practice often works well
requires multiple invocations with different
initial weights
Training is fairly slow, yet prediction is fast

69
Problems with training

Nets get stuck
Not enough degrees of freedom
Hidden layer is too small
Training becomes unstable
too many degrees of freedom
Hidden layer is too big / too many hidden layers
Over-fitting
Can find every pattern, not all are significant.
If neural net is over-fit it will not
generalize well to the testing dataset

70
Comparison Perceptron and Gradient Descent Rule

Perceptron learning rule guaranteed to succeed if
Training examples are linearly separable
No guarantee otherwise
Linear unit using Gradient Descent
Converges to hypothesis with minimum squared
error.
Given sufficiently small learning rate ?
Even when training data contains noise
Even when training data not linearly separable

Write a Comment

User Comments (0)

About PowerShow.com

Perceptron PowerPoint PPT Presentation