Perceptron - PowerPoint PPT Presentation

1 / 70
About This Presentation
Title:

Perceptron

Description:

... neurons operate relatively slowly, but make up for that with massive parallelism. ... A neural network is made up of the. interconnection of a large number ... – PowerPoint PPT presentation

Number of Views:189
Avg rating:3.0/5.0
Slides: 71
Provided by: scie200
Category:
Tags: perceptron

less

Transcript and Presenter's Notes

Title: Perceptron


1
Perceptron
  • Danny meisler
  • Koby lion

2
Neural Networks
  • A large number of very simple neuron like
    processing elements
  • A large number of weighted connections between
    the elements
  • Highly parallel, distributed control
  • An emphasis on learning internal representations
    automatically

3
Why Neural Nets?
  • Solving problems under the constraints similar to
    those of the brain may lead to solutions to AI
    problems that might otherwise be overlooked.
  • Individual neurons operate relatively slowly, but
    make up for that with massive parallelism.

4
The Parts of a Neuron
5
How it Works
  • Each neuron has branching from it a number of
    small fibers called dendrites and a single long
    fiber, the axon.

6
How it Works
  • The axon eventually splits and ends in a number
    of synapses which connect the axon to the
    dendrites of other neurons.

7
How it Works
  • Communication between neurons occurs along these
    paths. When the electric potential in a neuron
    rises above a threshold, the neuron activates.

8
How it Works
  • The neuron sends the electrical impulse down the
    axon to the synapses.

9
How it Works
  • A synapse can either add to the electrical
    potential or subtract from the electrical
    potential.

10
How it Works
  • The pulse then enters the connected neurons
    dendrites, and the process begins again.

11
neural network
  • A neural network is made up of the
  • interconnection of a large number
  • of nonlinear processing units (neurons)
  • The network may consist of
  • feedforward and feedback paths
  • Interesting properties
  • nonlinearity
  • learning

12
McCulloch and Pitts, 1943
  • modern era of neural networks starts in the
    1940s, when Warren McCulloch (a psychiatrist and
    neuroanatomist) and Walter Pitts (amathematician)
    explored the computational capabilities of
    networks made of very simple neurons
  • A McCulloch-Pitts network fires if the sum of its
    excitatory inputs exceeds its threshold, as long
    as it does not receive an inhibitory input
  • Using a network of such neurons, they showed that
    it was possible to construct any logical function

13
Each logical function can be computed by a
two-layered McCulloch-Pitt network.Every finite
automaton can be simulated by a network of
(recurrent) McCulloch- Pitts cells.
14
Hebb, 1949
  • In his book The organization of Behavior,
    Donald Hebb introduced his postulate of learning
    (a.k.a. Hebbian learning), which states that the
    effectiveness of a variable synapse between two
    neurons is increased by the repeated activation
    of one neuron by the other across that synapse
  • The Hebbian rule has a strong similarity to the
    biological process in which a neural pathway is
    strengthened each time it is used

15
Rosenblatt, 1958
  • Frank Rosenblatt introduced the perceptron, the
    simplest form of a neural network
  • The perceptron consists of a single neuron with
    adjustable synaptic weights and a threshold
    activation function
  • Rosenblatts original perceptron in fact
    consisted of three layers (sensory, association
    and response) of with only one layer had variable
    weights.

16
Rosenblatt,1958-continuation
  • Rosenblatt also developed an error-correction
    rule to adapt these weights (a.k.a. the
    perceptron learning rule), and proved that if the
    (two) classes were linearly separable, the
    algorithm would converge to a solution (a.k.a.
    the perceptron convergence theorem)

17
(No Transcript)
18
Inputs To Neurons
  • Arise from other neurons or from outside the
    network
  • Nodes whose inputs arise outside the network are
    called input nodes and simply copy values
  • An input may excite or inhibit the response of
    the neuron to which it is applied, depending upon
    the weight of the connection

19
Weights
  • Represent synaptic efficacy and may be excitatory
    or inhibitory
  • Normally, positive weights are considered as
    excitatory while negative weights are thought of
    as inhibitory
  • Learning is the process of modifying the weights
    in order to produce a network that performs some
    function

20
Output
  • The response function is normally nonlinear
  • Samples include
  • Sigmoid
  • Piecewise linear

21
(No Transcript)
22
(No Transcript)
23
Representational Power of Perceptrons
  • Perceptrons can represent the logical AND, OR,
    and NOT functions as above.
  • we consider 1 to represent True and 1 to
    represent False.

24
  • Here there is no way to draw a single line that
    separates the "" (true) values from the "-"
  • (false) values.

25
The Good and the Bad News
  • Good- every Boolean function can be represented
    by some network of perceptrons only two levels
    deep.
  • Bad- any single perceptron can only represent
    linearly separable functions.
  • Good-there is a perceptron algorithm that will
    learn any linearly separable function

26
train a perceptron
  • To train a perceptron , Rosenblatt developed a
    procedure for changing the synaptic weight
  • Y(t) sgn ? Xi(t)Wi(t) 0.
  • Sgn- 1 if its argument is positive otherwise -1
  • Xi(t) - inputs signal
  • Wi(t) - the synaptic weight
  • 0 - the threshold for that node
  • If the sum of the weighted inputs xi wi exceed
    the threshold y(t)1 otherwise y(t)-1

27
train a perceptron -continuation
  • At start of the experimenter the W(0) and 0 are
    set of random values
  • Than the train begin with objective of teaching
    it to differentiate two classes of inputs I and
    II
  • The goal is to have the nodes output y(t) 1 if
    the input is of class I , and to have
  • y(t) -1 if the input is of class II
  • You can free to choose any inputs (Xi) and to
    designate them as being of class I or II

28
train a perceptron - continuation
  • If the node happened to output 1 signal when
    given a class II input or output -1 signal when
    given a class I input the weight Wi no change
  • If the node happened to output -1 signal when
    given a class I input or output 1 signal when
    given a class II input the weight Wi change
    according to the rule

29
train a perceptron - continuation
  • Wi(t1)Wi(t) r d(t) y(t) Xi(t)
  • d(t) desire or target output (1 or -1)
  • Since d and y can be 1 or -1 the difference if on
    zero can only equal 2 or -2
  • r present positive learning (no greater than 1
    or 2)

30
Example
  • Let's say we want to figure out the appropriate
    weights to model the AND function we discussed
    above (1 1 1,
  • 1 (-1) -1 , (-1) (-1) -1)
  • We're assuming, of course, that no one gave us
    the weights.
  • We set up a perceptron with two inputs (three, if
    we include X0). Now let's guess some weights.

31
reminder
  • d(t)- desire or target ?
  • input (1 or -1)
  • Y(t) 1 if ? Xi(t)Wi(t) gt 0
  • -1 otherwise
  • Change the weights if d(t) ? y(t)
  • Wi(t1)Wi(t) r d(t) y(t) Xi(t)

32
Example - continuation
  • W0 0.1, W1 0.1, W2 0.1
  • And let's let our learning rate r be 0.1
  • Our first training example has
  • X1 X2 1, so the output of the perceptron
    should be 1.
  • Fortunately, that is the output of the
    perceptron, so no modifications are needed.

33
Example - continuation
  • Our second training example has
  • X1 1, X2 -1, so the target output of the
    perceptron should be -1.
  • Unfortunately, the actual output of the
    perceptron is 1. So we need to modify the
    weights.
  • Following the equations above, we calculate
  • W0 0.1 (0.1)(-2)(1) -0.1
  • W1 0.1 (0.1)(-2)(1) -0.1
  • W2 0.1 (0.1)(-2)(-1) 0.3

34
Example - continuation
  • Now we get a third training example
  • X1 X2 -1, for which the target output is
  • -1.
  • Fortunately, it is, so the weights need not be
    modified.

35
Example - continuation
  • Our fourth training example has
  • X1 -1, X2 1, so the target output of the
    perceptron should be -1.
  • Unfortunately, the actual output is 1.
  • So it's time for more modification of the
    weights.
  • Again following the equations above, we
    calculate
  • W0 -0.1 (0.1)(-2)(1) -0.3
  • W1 -0.1 (0.1)(-2)(-1) 0.1
  • W2 0.3 (0.1)(-2)(1) 0.1

36
Example - continuation
  • Our fifth training example has
  • X1 X2 1, so the output of the perceptron
    should be 1.
  • This time the output is -1.
  • So it's time for more modification of the
    weights.
  • Again following the equations above, we
    calculate
  • W0 -0.3 (0.1)(2)(1) -0.1
  • W1 0.1 (0.1)(2)(1) 0.3
  • W2 0.1 (0.1)(2)(1) 0.3

37
Example - continuation
  • Our sixth training example has
  • X1 1, X2 -1, so the target output of the
    perceptron should be -1.
  • Indeed, that's what the perceptron produces.

38
Example - continuation
  • Our seventh example has
  • X1 X2 -1, for which the target output is
    -1.
  • Fortunately, it is, so the weights need not be
    modified.
  • Our eighth example has
  • X1 -1, X2 1, so the target output of the
    perceptron should be -1.
  • Again, it is!
  • We've converged on appropriate weights!

39
Single Layer Perceptron
40
Single Layer Perceptron
  • For a problem which calls for more then 2
    classes, several perceptrons can be combined into
    a network.
  • Can distinguish only linear separable functions

41
Single Layer Perceptron
Single layer, five nodes. 2 inputs and 3 outputs
Recognizes 3 linear separate classes, by means of
2 features
42
For general problem we have to resort to
multi-layer network, as in our brain
Perceptron can do it
Perceptron can not do it
43
Multi-Layer Networks
44
Multi-Layer Networks
  • A Multi layer perceptron can classify non linear
    separable problems.
  • A Multilayer (feedforward) network has one or
    more hidden layers.

45
Multi-layer networks
x1
x2
Input (visual input)
Output (Motor output)
xn
Hidden layers
46
XOR
47
XOR
  • XOR

Activation Function if (input gt threshold),
fire else, dont fire
1
1
0
0
2
1
1
-2
All weights are 1, unless otherwise labeled.
1
0
48
XOR
  • XOR

Activation Function if (input gt threshold),
fire else, dont fire
1
1
0
0
2
1
1
-2
All weights are 1, unless otherwise labeled.
1
1
49
XOR
  • XOR

Activation Function if (input gt threshold),
fire else, dont fire
1
1
0
0
2
1
1
-2
All weights are 1, unless otherwise labeled.
1
1
50
XOR
  • XOR

Activation Function if (input gt threshold),
fire else, dont fire
1
1
0
0
2
1
1
-2
All weights are 1, unless otherwise labeled.
1
0
51
Network Topology
52
Feedforward Networks
  • Feedforward Networks
  • Solutions are known
  • Weights are learned
  • Evolves in the weight space
  • Mostly Used for
  • Interpolation.
  • System modeling.
  • Classification, example face, handwrite and
    voice.
  • Adaptive Filtering.
  • Non Linear Control.

53
Training Multilayer Perceptron
  • The training of multilayer networks raises some
    important issues
  • How many layers ?, how many neurons per layer ?
  • Too few neurons makes the network unable to learn
    the desired behavior. Too many neurons increases
    the complexity of the learning algorithm.

54
Training Multilayer Perceptron
  • A desired property of a neural network is its
    ability to generalize from the training set.
  • If there are too many neurons, there is the
    danger of over fitting.
  • Does there exist an effective training algorithm?

55
Neural Network Model Building (Supervised
Learning)
56
The Backpropagation Algorithm
57
Backpropagation Algorithm
  • It is a gradient-descent method. A
    generalization of the LMS rule.
  • Requires that the function describing the neural
    network should be differentiable. This especially
    means that the activation function should be
    differentiable.
  • Activation function that is often used is the
    sigmoid function.

58
Gradient Descent Learning Rule
  • Consider linear unit without threshold and
    continuous output o (not just 1,1)
  • ow0 w1 x1 wn xn
  • Train the wis such that they minimize the
    squared error LMS, least mean square
  • Ew1,,wn ½ ?d?S (td-od)2
  • where S is the set of training examples
  • The opposite of hill climbing.

59
Gradient Descent
Slt(1,1),1gt,lt(-1,-1),1gt,
lt(1,-1),-1gt,lt(-1,1),-1gt
?w-? ?Ew
?wi-? ?E/?wi
60
Sigmoid Unit
x01
w1
w0
z?i0n wi xi
o?(z)1/(1e-z)
w2
S
o
. . .
wn
?(z) 1/(1e-z) sigmoid function.
d?(z)/dz ?(z) (1- ?(z))
61
Backpropagation Preparation
  • Training SetA collection of input-output
    patterns that are used to train the network
  • Testing SetA collection of input-output patterns
    that are used to assess network performance
  • Learning Rate-?A scalar parameter, analogous to
    step size in numerical integration, used to set
    the rate of adjustments

62
A Pseudo-Code Algorithm
  • Randomly choose the initial weights
  • While error is too large E gt E-acceptable
  • For each training pattern
  • Apply the inputs to the network
  • Calculate the output for every neuron from the
    input layer, through the hidden layer(s), to the
    output layer.
  • Calculate the error at the outputs
  • Use the output error to compute error signals for
    pre-output layers
  • Use the error signals to compute weight
    adjustments
  • Apply the weight adjustments

63
Backpropagation Math
  • Consider the square error
  • ESw1/2?d ? S ?k ? output (td,k-od,k)2
  • Gradient ?ESw
  • Update ww - ? ?ESw
  • How do we compute the Gradient?
  • Use the chain rule to compute the Gradient

64
Calculate The Error Signal For Each Output Neuron
  • The output neuron error signal dpj is given by
    dpj(Tpj-Opj) Opj (1-Opj)
  • Tpj is the target value of output neuron j for
    pattern p
  • Opj is the actual output value of output neuron j
    for pattern p

65
Calculate The Error Signal For Each Hidden Neuron
  • The hidden neuron error signal dpj is given by
  • where dpk is the error signal of a post-synaptic
    neuron k and Wkj is the weight of the connection
    from hidden neuron j to the post-synaptic neuron
    k

66
Calculate And Apply Weight Adjustments
  • Compute weight adjustments DWji byDWji ? dpj
    Opi
  • Apply weight adjustments according to Wji lt
    Wji DWji

67
Backpropagation The Momentum
  • Backpropagation has the disadvantage of being too
    slow if ? is small, and it can oscillate too
    widely if ? is large.
  • To solve this problem, we can add a momentum (?)
    to give each connection some inertia, forcing it
    to change in the direction of the downhill
    force.
  • Weight change is proportional to current gradient
    and previous gradient
  • New Delta Rule
  • ?Wji(t1) -? ?E/?Wji ? ?Wji(t)

68
Backpropagation Summary
  • Gradient descent over entire network weight
    vector
  • Finds a local, not necessarily global error
    minimum
  • in practice often works well
  • requires multiple invocations with different
    initial weights
  • Training is fairly slow, yet prediction is fast

69
Problems with training
  • Nets get stuck
  • Not enough degrees of freedom
  • Hidden layer is too small
  • Training becomes unstable
  • too many degrees of freedom
  • Hidden layer is too big / too many hidden layers
  • Over-fitting
  • Can find every pattern, not all are significant.
    If neural net is over-fit it will not
    generalize well to the testing dataset

70
Comparison Perceptron and Gradient Descent Rule
  • Perceptron learning rule guaranteed to succeed if
  • Training examples are linearly separable
  • No guarantee otherwise
  • Linear unit using Gradient Descent
  • Converges to hypothesis with minimum squared
    error.
  • Given sufficiently small learning rate ?
  • Even when training data contains noise
  • Even when training data not linearly separable
Write a Comment
User Comments (0)
About PowerShow.com