Learning and Perceptrons

About This Presentation

Title:

Learning and Perceptrons

Description:

By grouping results, the batch algorithm can be used to find values for weights and bias ... better served by a math-free method of approximating this type of ... – PowerPoint PPT presentation

Number of Views:110

Avg rating:3.0/5.0

Slides: 58

Provided by: BMA2

Category:

more less

Transcript and Presenter's Notes

Title: Learning and Perceptrons

1
Learning and Perceptrons

CIS 479/579
Bruce R. Maxim
UM-Dearborn

2
Momentum and Friction

When human players use a mouse to aim
Momentum turns the view more than they expect for
large angles (the ballistic mouse thing)
Friction slows down the turn for small angles
Adjustments are needed to avoid losing accuracy
For AI players they just aim at an exact target
and shoot
Perfect shooters may not be fun to play against
so turning errors can be introduced

3
(No Transcript)
4
Explicit Model

A mathematical function for computing the actual
turning angle in terms of desired angle and
previous output
? 1.0 0.1 noise(angle)
output(t) (angle ?) ? output(t 1) (1
- ?)
scaling factors for blending previous output
with angle request in range 0.3,0.5
initialized to random value between in range
0.9, 1.1
noise( ) returns value in range -1,1 could use
cos(angle2 0.217 342/angle)

5
Linear Approximation

We can use a perceptron to approximate the
function described earlier
Once the animat learns the a faster approximation
for function it can be removed from the AI code
Aiming errors just become a constraint on animat
behavior

6
(No Transcript)
7
Methodology

Approximation computed by training network
iteratively
Desired output is computed for random inputs
By grouping results, the batch algorithm can be
used to find values for weights and bias
A small perceptron is applied twice (to get pitch
and then yaw) rather creating a larger one that
does both
This reduces memory use at the expense of
programming time

8
Accumulating Errors

Momentum and friction causes errors or drift that
tend to accumulate after several turns
These errors allow the AI to perform more
realistically performance
Ignoring the variations in aiming will make the
AI too error prone to challenge human players

9
Inverse Error

To compensate for aiming errors, we could define
an inverse error function to help correct the
aiming errors
Not every function as a definable inverse so that
AI would be better served by a math-free method
of approximating this type of function
Given enough trial and error through simulation
opportunities the AI should be able to predict
the corrected angles needed

10
Learning - 1

In effect the AI learns how to deal with aiming
errors by receiving evaluative feedback
Using this feedback the AI can incrementally
improve its task performance
The AI uses its sensors to detect the actual
angles the body was turned since the last update
Unfortunately the AI learns to shoot where it
should have shot last time

11
Learning - 2

With enough trials the AI can learn to anticipate
where to shoot (the NN weights provide a crude
memory to work with)
Both the inputs and outputs will need to be
scaled because the perceptron will have to deal
with values that are not within the unit vector

12
Aimy

Perceptron is used to learn corrected angles
needed to prevent undershooting and overshooting
Gathers data from its sensors to determine how
far its body turned based on each requested angle
Incremental training is used to approximate the
inverse function needed to prevent aming errors

13
Evaluation - 1

Animat should have the opportunity to correct
aiming while moving around
Perceptrons can learn more quickly when more
training samples are presented
The animat can corrects its aim on only two
dimensions (pitch and yaw)
Only when pitch is near horizontal can the animat
aim while it is moving

14
Evaluation - 2

When looking fully up or fully down there is no
forward movement is possible, this prevents
learning
To prevent this trap, the animat is only allowed
to control yaw until satisfactory results are
obtained
The worst that happens is the animat spinning
around while learning

15
Evaluation - 3

The way in which the yaw is chosen determines the
angles available for learning
If the animat full control over the yaw, it can
decide what to learn and what to ignore (the
effect may be for the NN to always predict the
same turn to correct aiming errors)
This is a good reason for forcing the NN to
examine a variety of randomly generated angles
during training to get a more representative
training set and better learning

16
Multilayer Perceptrons

Single layer perceptrons can only deal with
linear problems
Non-linear problems can only be approximated by
single layer perceptrons
Multilayer perceptrons (MLP)
Have extra middle layers know as hidden layers
The middle layers require more sophisticated
activation functions than single layer
perceptrons (e.g. linear activations would make
MLP behave like single layer perceptron)

17
(No Transcript)
18
Topology

MLP topology is said to be forward feed because
there are no backward (recurrent) connections
There can be an arbitrary number of hidden layers
in MLP
Adding too many hidden layers increases the
computational complexity of the network
One hidden layer is usually enough to allow the
MLP to be a universal approximator capable of
approximating any continuous function

19
Hidden Layers

In some cases, there may be many independencies
among the input variables and adding an extra
hidden layer can be helpful
Adding hidden layers some times can reduce the
total number of weights needed for suitable
approximation
MLP with two hidden layers can approximate any
non-continuous functions

20
Hidden Neurons

Choosing the number of neurons in the hidden
layer is an art, often depends on the AI
designers intuition and experience
The neurons in the hidden layer are needed to
represent the problem knowledge internally
As the number of dimensions grows the complexity
of the decision surface (path through hidden
layer) increases
Basically the output on one side of the surface
is positive and negative on the other side

21
Connections

Neurons can be fully connected to one another
within and between layers
Neurons can also be sparsely connected and even
skip layers (e.g. straight from input to output)
Most MLP are fully connected to simplify
programming

22
Activation Function Properties

Derivable (known and computable derivative)
Continuous (derivative defined every where)
Complexity (nonlinear for higher order tasks)
Monotonous (derivative positive)
Boundless (activation output and its derivative
are finite)
Polarity (bipolar preferred to positive)

23
Activation Functions

Activation functions for the input and output
layers are usually one of the following
Step, Linear, Threshold logic, Sigmoid
Hidden layer activation functions might be one of
the following
Sigmoid sig(x) 1/(1 e-?x)
Hyperbolic tangent
Bipolar Sigmoid sigb(x) 2/(1 e-?x) - 1

24
Role of Hidden Layers

The use of a hidden layer implies that the
information needed to compute the output must be
filtered before passing it on to the next layer
Each layer of the MLP receives its input from the
previous layer and passes its modified output on
to the next layer

25
Feed-Forward Algorithm

current input // process input layer
for layer 1 to n
for i 1 to m // compute output of each neuron
// multiply arrays and sum result
s NetSum(neuron(I).weights.current)
outputi Activate(s)
// next layer uses this layers output as input
current output

26
Benefits of MLP

The importance of MLPs is not that they really
mimic animal brains, they do not
MLP have a thoroughly researched mathematical
foundation and have been proven to work well in
some applications
MLP can be trained to do interesting things and
this training really just involves numeric
optimization (minimizing output error)

27
Back Propagation - 1

BP is the process of filtering error from the
output layer back through the preceding layers
BP was developed in response to fact that single
layer perceptron algorithms do not train hidden
layers
BP is the essence of most MLP learning algorithms

28
(No Transcript)
29
Back Propagation - 2

Form of hill climbing know as gradient ascent
hill climbing
several directions tried simultaneously
steepest gradient used to direct search
Training may require thousands of
backpropagations
BP can get stuck or become unstable during
training
BP can be done in stages

30
Back Propagation - 3

BP can train a net to recognize several concepts
simultaneously
Trained neural networks can be used to make
predictions
Too many trainable weights relative to the number
of training facts can lead to overflow problems

31
Back Propagation Algorithm - 1

Given set of input-output pairs
Task compute weights for 3 layer network at maps
inputs to corresponding outputs
Algorithm
1.Determine the number of neurons required
2.Initialize weights to random values
3.Set activation values for threshold units

32
Back Propagation Algorithm - 2

4.Choose and input-output pair and assign
activation levels to input neurons
5.Propagate activations from input neurons to
hidden layer neurons for each neuron
hj 1/(1 e-? w1ijXi)
6.Propagate activations from hidden layer neurons
to output neurons for each neuron
oj 1/(1 e-? w2ijhi)

33
Back PropagationAlgorithm - 3

7.Compute error for output neurons by comparing
pattern to actual
8.Compute error for neurons in hidden layer
9.Adjust weights in between hidden layer and
output layer
10.Adjust weights between input layer and hidden
layer
11.Go to step 4

34
Backprop - 1

// compute gradient in last layer neurons
for j 1 to m
deltaj deriv_activate(net_sum)
(desiredj outputj)
for i last 1 to first // process layers
for j 1 to m
total 0
for k 1 to n
total deltak weightsjk
deltaj deriv_activate(net_sum) total

35
Backprop - 2

// steepest descent for error gradient for
// each weight
for j 1 to m
for i 1 to n
// adjust weights using error gradient
weightji learning_rate
deltaj outputI
// The generalized delta rule is used to
// compute each weight ?wij
// learning_rate set by KE
// deltaj is gradient of neuron j error

36
Quick Propagation

Batch technique
Exploits locally adaptive techniques to adjust
step magnitude based on local parameters
Uses knowledge of higher-order derivatives (e.g.
Newtons methods)
Allows for better prediction of the slope of the
curve and location of minima
Weights updated using method similar to backprop

37
Quickprop - 1

// Requires two additional arrays for step and
// gradient - it remembers last set of values
// New weight update replaces steepest descent
for j 1 to m
for i 1 to n // compute gradient and step
new_gradientji -deltaj inputi
new_stepji new_gradientji /
(old_gradientji
new_gradientjI)
old_stepji

38
Quickprop - 1

// adjust weight
weightji new_stepji
// store values for next iteration
old_stepji new_stepji
old_gradientji new_gradientji
Note since this is a batch algorithm all
gradients for each training samples are added
together

39
Resilient Propagation

Weights updated only after all training samples
have been seen
The step size is not determined by the gradient
unlike steepest descent techniques
Equations are not too hard to implement

40
Rprop - 1

// New weight update replaces steepest descent
for j 1 to m
for i 1 to n // compute gradient and step
new_gradientji -deltaj inputi
// analyze change to get size of update
if(new_gradientjiold_gradientjigt0)
new_updateji nplus
new_updateji
else if(new_gradientjiold_gradientjilt
0)
new_updateji nminus
new_updateji
else
new_updateji old_updateji

41
Rprop - 2

// determine step direction
if(new_gradientj gt 0)
stepji -new_updateji
else if(new_gradientj lt 0)
stepji new_updateji
else
stepji 0
// adjust weight and store values
weightji stepji
old_updateji new_updateji
old_gradientji new_gradientji

42
Building Neural Networks

Define the problem in terms of neurons
think in terms of layers
Represent information as neurons
operationalize neurons
select their data type
locate data for testing and training
Define the network
Train the network
Test the network

43
Structuring the Training Facts

Use randomly ordered facts
Use representative data
Include people who survive surgery as well as
people who do not
Neurons cant be coded
1horse1, 2horse2, etc.
Networks like lots of inputs and outputs
Better to use two output neurons (one for buy and
one for sell than one coded 1buy and 0sell)

44
Structuring the Training Facts

For historical data, use rows not columns
dont use
day1 day2 day3
3 4 5
do use
day
3
4
5

45
Structuring the Training Facts

Neural networks like differences over big numbers
use 50
not 350 vs 400
For seasonal data
use 1 column per month with winter cases coded
1 for Dec, Jan, Feb, and 0 for other months
Think qualitatively not quantitatively
use restaurant visit on Monday in early Feb
not restaurant visit on day 43

46
Generalization 1

Learning phase is responsible for optimizing the
weights from the training examples
It would be good if the NN could also process new
or unseen examples correctly as well
(generalization)
If NN is bound too tightly to training examples
is known as overfitting
Overfitting is never a problem with single layer
perceptrons

47
Generalization 2

For MLP number of hidden neurons affects
complexity of decision surface
Need to find the trade-off between the number of
hidden neurons and result quality
Incorrect or incomplete data interferes with
generalization
Bad training examples are usually to blame for
failure of MLP to learn concepts

48
Testing and Validation

Training sets used to optimize the weights for
a given set of parameters
Validation sets used to check the quality of
training, help to find best combination of
parameters
Testing sets check final quality of validated
perceptrons (no test info is used to improve NN)

49
How can you tell things arent working out?

Your network refuses to learn 10-20 of the
training facts
Things to try
Check definition file for data range errors
Check for bad (incorrect) facts
Some training facts may conflict with one another
The training tolerance level may be too strict
for the data being used
Switch from absolute score to differences

50
Batch vs Incremental

Batch preferred over incremental training
Converge to answer faster
Have greater accuracy
Incremental data can be gathered for batch
processing if necessary
Incremental approaches best suited for real-time,
in-game learning (requires less memory)

51
Forgetting

With incremental learning, it may be wise to slow
down learning rate later in the game to avoid
forgetting earlier lessons
No formal approach to reducing learning rate,
linear or exponential decay strategies are often
successful
This implies that learning will eventually become
frozen as time passes

52
Perceptron Advantages

Good mathematical foundation
If solution exists it can be found
Work best for well defined problems
If things go wrong the parameters can be adjusted
Lots training algorithms exist
MLP works easily with continuous values
Deals well with noise

53
Perceptron Disadvantages - 1

NN do not contain an easily understood
representation of their knowledge
MLP depends entirely on the algorithms used to
create it
MLP does not scale well
Once trained MLP is not updated without
retraining
Retraining does not preserve pervious MLP
knowledge

54
Perceptron Disadvantages - 2

Design of inputs and outputs can have a profound
impact on MLP success
Input may require pre-processing and outputs may
require post-processing
Getting the right number of layers and neurons
requires trial and error

55
Onno

Uses a large neural network to handle shooting
(prediction, target selection, aiming)
Input is similar to that described in previous
chapters
Results are moderate, but demonstrates
versatility of MLP and benefits of decomposing
behaviors

56
Why havent there been more NN commercial
successes?

Programming neural networks is very difficult
each constraint must be hardwired with O(N2)
lateral inhibitory and O(N3) diagonal excitatory
connections
Learning in NN is hard
learning algorithms are hard to write
choosing the right knowledge representation in
the hidden layer is non-trivial

57
Why havent there been more NN commercial
successes?

For many application symbol-based knowledge is
superior to circuit-based knowledge in terms of
performance
Neural networks may have been oversold (as has
been the problem with many early AI technologies)

Write a Comment

User Comments (0)