Learning and Perceptrons - PowerPoint PPT Presentation

1 / 57
About This Presentation
Title:

Learning and Perceptrons

Description:

By grouping results, the batch algorithm can be used to find values for weights and bias ... better served by a math-free method of approximating this type of ... – PowerPoint PPT presentation

Number of Views:110
Avg rating:3.0/5.0
Slides: 58
Provided by: BMA2
Category:

less

Transcript and Presenter's Notes

Title: Learning and Perceptrons


1
Learning and Perceptrons
  • CIS 479/579
  • Bruce R. Maxim
  • UM-Dearborn

2
Momentum and Friction
  • When human players use a mouse to aim
  • Momentum turns the view more than they expect for
    large angles (the ballistic mouse thing)
  • Friction slows down the turn for small angles
  • Adjustments are needed to avoid losing accuracy
  • For AI players they just aim at an exact target
    and shoot
  • Perfect shooters may not be fun to play against
    so turning errors can be introduced

3
(No Transcript)
4
Explicit Model
  • A mathematical function for computing the actual
    turning angle in terms of desired angle and
    previous output
  • ? 1.0 0.1 noise(angle)
  • output(t) (angle ?) ? output(t 1) (1
    - ?)
  • scaling factors for blending previous output
    with angle request in range 0.3,0.5
  • initialized to random value between in range
    0.9, 1.1
  • noise( ) returns value in range -1,1 could use
    cos(angle2 0.217 342/angle)

5
Linear Approximation
  • We can use a perceptron to approximate the
    function described earlier
  • Once the animat learns the a faster approximation
    for function it can be removed from the AI code
  • Aiming errors just become a constraint on animat
    behavior

6
(No Transcript)
7
Methodology
  • Approximation computed by training network
    iteratively
  • Desired output is computed for random inputs
  • By grouping results, the batch algorithm can be
    used to find values for weights and bias
  • A small perceptron is applied twice (to get pitch
    and then yaw) rather creating a larger one that
    does both
  • This reduces memory use at the expense of
    programming time

8
Accumulating Errors
  • Momentum and friction causes errors or drift that
    tend to accumulate after several turns
  • These errors allow the AI to perform more
    realistically performance
  • Ignoring the variations in aiming will make the
    AI too error prone to challenge human players

9
Inverse Error
  • To compensate for aiming errors, we could define
    an inverse error function to help correct the
    aiming errors
  • Not every function as a definable inverse so that
    AI would be better served by a math-free method
    of approximating this type of function
  • Given enough trial and error through simulation
    opportunities the AI should be able to predict
    the corrected angles needed

10
Learning - 1
  • In effect the AI learns how to deal with aiming
    errors by receiving evaluative feedback
  • Using this feedback the AI can incrementally
    improve its task performance
  • The AI uses its sensors to detect the actual
    angles the body was turned since the last update
  • Unfortunately the AI learns to shoot where it
    should have shot last time

11
Learning - 2
  • With enough trials the AI can learn to anticipate
    where to shoot (the NN weights provide a crude
    memory to work with)
  • Both the inputs and outputs will need to be
    scaled because the perceptron will have to deal
    with values that are not within the unit vector

12
Aimy
  • Perceptron is used to learn corrected angles
    needed to prevent undershooting and overshooting
  • Gathers data from its sensors to determine how
    far its body turned based on each requested angle
  • Incremental training is used to approximate the
    inverse function needed to prevent aming errors

13
Evaluation - 1
  • Animat should have the opportunity to correct
    aiming while moving around
  • Perceptrons can learn more quickly when more
    training samples are presented
  • The animat can corrects its aim on only two
    dimensions (pitch and yaw)
  • Only when pitch is near horizontal can the animat
    aim while it is moving

14
Evaluation - 2
  • When looking fully up or fully down there is no
    forward movement is possible, this prevents
    learning
  • To prevent this trap, the animat is only allowed
    to control yaw until satisfactory results are
    obtained
  • The worst that happens is the animat spinning
    around while learning

15
Evaluation - 3
  • The way in which the yaw is chosen determines the
    angles available for learning
  • If the animat full control over the yaw, it can
    decide what to learn and what to ignore (the
    effect may be for the NN to always predict the
    same turn to correct aiming errors)
  • This is a good reason for forcing the NN to
    examine a variety of randomly generated angles
    during training to get a more representative
    training set and better learning

16
Multilayer Perceptrons
  • Single layer perceptrons can only deal with
    linear problems
  • Non-linear problems can only be approximated by
    single layer perceptrons
  • Multilayer perceptrons (MLP)
  • Have extra middle layers know as hidden layers
  • The middle layers require more sophisticated
    activation functions than single layer
    perceptrons (e.g. linear activations would make
    MLP behave like single layer perceptron)

17
(No Transcript)
18
Topology
  • MLP topology is said to be forward feed because
    there are no backward (recurrent) connections
  • There can be an arbitrary number of hidden layers
    in MLP
  • Adding too many hidden layers increases the
    computational complexity of the network
  • One hidden layer is usually enough to allow the
    MLP to be a universal approximator capable of
    approximating any continuous function

19
Hidden Layers
  • In some cases, there may be many independencies
    among the input variables and adding an extra
    hidden layer can be helpful
  • Adding hidden layers some times can reduce the
    total number of weights needed for suitable
    approximation
  • MLP with two hidden layers can approximate any
    non-continuous functions

20
Hidden Neurons
  • Choosing the number of neurons in the hidden
    layer is an art, often depends on the AI
    designers intuition and experience
  • The neurons in the hidden layer are needed to
    represent the problem knowledge internally
  • As the number of dimensions grows the complexity
    of the decision surface (path through hidden
    layer) increases
  • Basically the output on one side of the surface
    is positive and negative on the other side

21
Connections
  • Neurons can be fully connected to one another
    within and between layers
  • Neurons can also be sparsely connected and even
    skip layers (e.g. straight from input to output)
  • Most MLP are fully connected to simplify
    programming

22
Activation Function Properties
  • Derivable (known and computable derivative)
  • Continuous (derivative defined every where)
  • Complexity (nonlinear for higher order tasks)
  • Monotonous (derivative positive)
  • Boundless (activation output and its derivative
    are finite)
  • Polarity (bipolar preferred to positive)

23
Activation Functions
  • Activation functions for the input and output
    layers are usually one of the following
  • Step, Linear, Threshold logic, Sigmoid
  • Hidden layer activation functions might be one of
    the following
  • Sigmoid sig(x) 1/(1 e-?x)
  • Hyperbolic tangent
  • Bipolar Sigmoid sigb(x) 2/(1 e-?x) - 1

24
Role of Hidden Layers
  • The use of a hidden layer implies that the
    information needed to compute the output must be
    filtered before passing it on to the next layer
  • Each layer of the MLP receives its input from the
    previous layer and passes its modified output on
    to the next layer

25
Feed-Forward Algorithm
  • current input // process input layer
  • for layer 1 to n
  • for i 1 to m // compute output of each neuron
  • // multiply arrays and sum result
  • s NetSum(neuron(I).weights.current)
  • outputi Activate(s)
  • // next layer uses this layers output as input
  • current output

26
Benefits of MLP
  • The importance of MLPs is not that they really
    mimic animal brains, they do not
  • MLP have a thoroughly researched mathematical
    foundation and have been proven to work well in
    some applications
  • MLP can be trained to do interesting things and
    this training really just involves numeric
    optimization (minimizing output error)

27
Back Propagation - 1
  • BP is the process of filtering error from the
    output layer back through the preceding layers
  • BP was developed in response to fact that single
    layer perceptron algorithms do not train hidden
    layers
  • BP is the essence of most MLP learning algorithms

28
(No Transcript)
29
Back Propagation - 2
  • Form of hill climbing know as gradient ascent
    hill climbing
  • several directions tried simultaneously
  • steepest gradient used to direct search
  • Training may require thousands of
    backpropagations
  • BP can get stuck or become unstable during
    training
  • BP can be done in stages

30
Back Propagation - 3
  • BP can train a net to recognize several concepts
    simultaneously
  • Trained neural networks can be used to make
    predictions
  • Too many trainable weights relative to the number
    of training facts can lead to overflow problems

31
Back Propagation Algorithm - 1
  • Given set of input-output pairs
  • Task compute weights for 3 layer network at maps
    inputs to corresponding outputs
  • Algorithm
  • 1.Determine the number of neurons required
  • 2.Initialize weights to random values
  • 3.Set activation values for threshold units

32
Back Propagation Algorithm - 2
  • 4.Choose and input-output pair and assign
    activation levels to input neurons
  • 5.Propagate activations from input neurons to
    hidden layer neurons for each neuron
  • hj 1/(1 e-? w1ijXi)
  • 6.Propagate activations from hidden layer neurons
    to output neurons for each neuron
  • oj 1/(1 e-? w2ijhi)

33
Back PropagationAlgorithm - 3
  • 7.Compute error for output neurons by comparing
    pattern to actual
  • 8.Compute error for neurons in hidden layer
  • 9.Adjust weights in between hidden layer and
    output layer
  • 10.Adjust weights between input layer and hidden
    layer
  • 11.Go to step 4

34
Backprop - 1
  • // compute gradient in last layer neurons
  • for j 1 to m
  • deltaj deriv_activate(net_sum)
  • (desiredj outputj)
  • for i last 1 to first // process layers
  • for j 1 to m
  • total 0
  • for k 1 to n
  • total deltak weightsjk
  • deltaj deriv_activate(net_sum) total

35
Backprop - 2
  • // steepest descent for error gradient for
  • // each weight
  • for j 1 to m
  • for i 1 to n
  • // adjust weights using error gradient
  • weightji learning_rate
  • deltaj outputI
  • // The generalized delta rule is used to
  • // compute each weight ?wij
  • // learning_rate set by KE
  • // deltaj is gradient of neuron j error

36
Quick Propagation
  • Batch technique
  • Exploits locally adaptive techniques to adjust
    step magnitude based on local parameters
  • Uses knowledge of higher-order derivatives (e.g.
    Newtons methods)
  • Allows for better prediction of the slope of the
    curve and location of minima
  • Weights updated using method similar to backprop

37
Quickprop - 1
  • // Requires two additional arrays for step and
  • // gradient - it remembers last set of values
  • // New weight update replaces steepest descent
  • for j 1 to m
  • for i 1 to n // compute gradient and step
  • new_gradientji -deltaj inputi
  • new_stepji new_gradientji /
  • (old_gradientji
  • new_gradientjI)
  • old_stepji

38
Quickprop - 1
  • // adjust weight
  • weightji new_stepji
  • // store values for next iteration
  • old_stepji new_stepji
  • old_gradientji new_gradientji
  • Note since this is a batch algorithm all
    gradients for each training samples are added
    together

39
Resilient Propagation
  • Weights updated only after all training samples
    have been seen
  • The step size is not determined by the gradient
    unlike steepest descent techniques
  • Equations are not too hard to implement

40
Rprop - 1
  • // New weight update replaces steepest descent
  • for j 1 to m
  • for i 1 to n // compute gradient and step
  • new_gradientji -deltaj inputi
  • // analyze change to get size of update
  • if(new_gradientjiold_gradientjigt0)
  • new_updateji nplus
    new_updateji
  • else if(new_gradientjiold_gradientjilt
    0)
  • new_updateji nminus
    new_updateji
  • else
  • new_updateji old_updateji

41
Rprop - 2
  • // determine step direction
  • if(new_gradientj gt 0)
  • stepji -new_updateji
  • else if(new_gradientj lt 0)
  • stepji new_updateji
  • else
  • stepji 0
  • // adjust weight and store values
  • weightji stepji
  • old_updateji new_updateji
  • old_gradientji new_gradientji

42
Building Neural Networks
  • Define the problem in terms of neurons
  • think in terms of layers
  • Represent information as neurons
  • operationalize neurons
  • select their data type
  • locate data for testing and training
  • Define the network
  • Train the network
  • Test the network

43
Structuring the Training Facts
  • Use randomly ordered facts
  • Use representative data
  • Include people who survive surgery as well as
    people who do not
  • Neurons cant be coded
  • 1horse1, 2horse2, etc.
  • Networks like lots of inputs and outputs
  • Better to use two output neurons (one for buy and
    one for sell than one coded 1buy and 0sell)

44
Structuring the Training Facts
  • For historical data, use rows not columns
  • dont use
  • day1 day2 day3
  • 3 4 5
  • do use
  • day
  • 3
  • 4
  • 5

45
Structuring the Training Facts
  • Neural networks like differences over big numbers
  • use 50
  • not 350 vs 400
  • For seasonal data
  • use 1 column per month with winter cases coded
  • 1 for Dec, Jan, Feb, and 0 for other months
  • Think qualitatively not quantitatively
  • use restaurant visit on Monday in early Feb
  • not restaurant visit on day 43

46
Generalization 1
  • Learning phase is responsible for optimizing the
    weights from the training examples
  • It would be good if the NN could also process new
    or unseen examples correctly as well
    (generalization)
  • If NN is bound too tightly to training examples
    is known as overfitting
  • Overfitting is never a problem with single layer
    perceptrons

47
Generalization 2
  • For MLP number of hidden neurons affects
    complexity of decision surface
  • Need to find the trade-off between the number of
    hidden neurons and result quality
  • Incorrect or incomplete data interferes with
    generalization
  • Bad training examples are usually to blame for
    failure of MLP to learn concepts

48
Testing and Validation
  • Training sets used to optimize the weights for
    a given set of parameters
  • Validation sets used to check the quality of
    training, help to find best combination of
    parameters
  • Testing sets check final quality of validated
    perceptrons (no test info is used to improve NN)

49
How can you tell things arent working out?
  • Your network refuses to learn 10-20 of the
    training facts
  • Things to try
  • Check definition file for data range errors
  • Check for bad (incorrect) facts
  • Some training facts may conflict with one another
  • The training tolerance level may be too strict
    for the data being used
  • Switch from absolute score to differences

50
Batch vs Incremental
  • Batch preferred over incremental training
  • Converge to answer faster
  • Have greater accuracy
  • Incremental data can be gathered for batch
    processing if necessary
  • Incremental approaches best suited for real-time,
    in-game learning (requires less memory)

51
Forgetting
  • With incremental learning, it may be wise to slow
    down learning rate later in the game to avoid
    forgetting earlier lessons
  • No formal approach to reducing learning rate,
    linear or exponential decay strategies are often
    successful
  • This implies that learning will eventually become
    frozen as time passes

52
Perceptron Advantages
  • Good mathematical foundation
  • If solution exists it can be found
  • Work best for well defined problems
  • If things go wrong the parameters can be adjusted
  • Lots training algorithms exist
  • MLP works easily with continuous values
  • Deals well with noise

53
Perceptron Disadvantages - 1
  • NN do not contain an easily understood
    representation of their knowledge
  • MLP depends entirely on the algorithms used to
    create it
  • MLP does not scale well
  • Once trained MLP is not updated without
    retraining
  • Retraining does not preserve pervious MLP
    knowledge

54
Perceptron Disadvantages - 2
  • Design of inputs and outputs can have a profound
    impact on MLP success
  • Input may require pre-processing and outputs may
    require post-processing
  • Getting the right number of layers and neurons
    requires trial and error

55
Onno
  • Uses a large neural network to handle shooting
    (prediction, target selection, aiming)
  • Input is similar to that described in previous
    chapters
  • Results are moderate, but demonstrates
    versatility of MLP and benefits of decomposing
    behaviors

56
Why havent there been more NN commercial
successes?
  • Programming neural networks is very difficult
    each constraint must be hardwired with O(N2)
    lateral inhibitory and O(N3) diagonal excitatory
    connections
  • Learning in NN is hard
  • learning algorithms are hard to write
  • choosing the right knowledge representation in
    the hidden layer is non-trivial

57
Why havent there been more NN commercial
successes?
  • For many application symbol-based knowledge is
    superior to circuit-based knowledge in terms of
    performance
  • Neural networks may have been oversold (as has
    been the problem with many early AI technologies)
Write a Comment
User Comments (0)
About PowerShow.com