Title: Neural Networks
1Neural Networks
2Pattern Recognition
- Humans are very good at recognition. It is easy
for us to identify the Dalmatian dog in the image - This recognition capability would be very
difficult to implement in a program
3Biological Neurons
- The human body is made up of trillions of cells.
Cells of the nervous system, called nerve cells
or neurons, are specialized to carry "messages"
through an electrochemical process. The human
brain has approximately 100 billion neurons.
http//faculty.washington.edu/chudler/cells.html
4A Tour of our Neural Circuit
From brain to Neurons
Nerves Flash
Communications between Neurons
http//www.learner.org/channel/courses/biology/vid
eo/hires/a_neuro1.c.synapse.mov
http//www.onintelligence.org/forum/viewtopic.php?
t173sidb0e0b92b35f74c1cdc21adbce6302b60
5Neurons come in many different shapes and sizes.
Some of the smallest neurons have cell bodies
that are only 4 microns wide. Some of the biggest
neurons have cell bodies that are 100 microns
wide.
1 micron is equal to one thousandth of a
millimeter!
6ANN
- Although neural networks are the natural form of
information processing mechanism each cell has
very little processing power. They just
accumulate information and pass it on. - The human body has in the order of 1010 neurons
with 10100 connections between them. Their
processing cycle time is in the order of 1
millisecond. Their power comes from the extent of
the network and the fact that they are all
operating in parallel. - In computer terms we can think of 10 billion
simple CPUs processing 10 billion times 10
billion variables once every millisecond.
Modern computers and modern ANNs do not even
begin to approach this level of complexity.
7Artificial Neural Networks
- adaptive sets of interconnected simple
biologically-inspired units which operate in some
parallel and distributed mode to perform some
common global task
- Connectionism, PDP networks, Neural Computing,
Empirical Learning Systems...
8Artificial Neural Networks
- Neural nets are quantitative, numerical and don't
require a knowledge engineer to extract expert
information - Neural networks are inductive programs they take
in a great amount of information all at once and
then draw a conclusion.
9NN Features
- Learning ability
- inherent parallelism
- distributed mode of operation
- simplicity of units behavior
- absence of centralized control
10Components borrowed from the biological neuron
- soma
- axon
- dendrites
- synapse
- neuro-transmitters
Could receive excitatory/inhibitory nerve impulses
11The computational architecture borrowed several
components and functionalities from the
biological neuron
- Soma
- cell body
- Axon
- output link
- Dendrites
- input link
- Synaptic Junction/Synapse
- connect the axons of one neuron to various parts
of other neurons - Neurotransmitters
- chemicals/substances released by the presynaptic
cells to communicate with other neurons - Nerve impulses through these connecting neurons
can result in local changes in the potential in
the cell body of the receiving neuron. - Excitatory decreasing the polarization of the
cell - Inhibitory - increasing the polarization of the
cell
12the artificial neuron
Input
Output (Activation)
connection weights
13Activation/ Squashing/Transfer Function
where
Activation Functions Logistic function
Hyperbolic tangent, etc.
14Neural Network Models
- Perceptron
- Hopfield Networks
- Bi-Directional Associative Memory
- Self-Organizing Maps
- Neocognitron
- Adaptive Resonance Theory
- Boltzmann Machine
- Radial Basis Function Networks
- Cascade-Correlation Networks
- Reduced-Coulomb Energy Networks
- Multi-layered Feed-forward Network
15Various NN Architectures
16Learning
- Supervised requires input-output pairs for
training - Unsupervised only inputs are given it is able
to organize itself in response to external stimuli
17A Simple Kohonen Network
Neural Network Architecture with Unsupervised
Learning
Lattice
4x4
Node
Weight Vectors
Input Nodes
Input Vector
18SOM for Color Clustering
Unsupervised learning
Reduces dimensionality of information
Clustering of data
Topological relationship between data is
maintained
Input 3D , Output 2D
Vector quantisation
19Character Recognition
Pattern Classification Network
5 output nodes
16 hidden nodes
100 input nodes
C\NN\FFPR
20Multi-layer Feed-forward Network
What are the components of a Network?
How a network responds to a stimulus?
How a network learns or trains?
21Neural Network Architecture
- Multi-layer Feed-forward Network
Output Node
Layer 4
Hidden Nodes
Layer 3
weight
Layer 2
Input Nodes
Layer 1
e.g. temperature, pressure, color, age, valve
status, etc.
22Multi-layer Feed-forward Network Sample
Three-layered Network (2-1-1) for solving the XOR
Problem
-3.29
0.91
Output Layer
1
Bias unit
10.9
-4.95
-4.95
0.98
Hidden Layer
-2.76
1
7.1
7.1
Input Layer
1 0
23Multi-layer Feed-forward Network
Components of a Network
circles represent neurons or units or nodes that
are extremely simple analog computing devices
numbers within the circles represent the
activation values of the units
there are three layers, the input layer that
contains the values for x and y, a hidden layer
that contains one node h, and an output unit that
gives the value of the output value, z
24Multi-layer Feed-forward Network
There are two other units present called bias
units whose values are always 1.0
The lines connecting the circles represent
weights and the number beside a weight is the
value of the weight
Much of the time backpropagation networks only
have connections within adjacent layers however,
this one has two extra connections that go
directly from the input units to the output unit.
In some problems, like xor these extra
inputoutput connections make training the
network much faster.
25Evaluating a Network
Networks are usually just described by the number
of units in each layer so the network in the
figure can be described as a 211 network with
extra inputoutput connections, or 211x.
To compute the value of the output unit, z, we
place values for x and y on the input layer
units, say x 1.0 and y 0.0, then propagate
the signals up to the next succeeding layer.
For the hidden node h, find all lower level units
connected to it. For each of the connections,
multiply the weight attached to the link by the
value of the unit and sum them all up.
26Evaluating a Network
In some neural networks we might just leave the
activation value of the unit to be 4.34. In this
case we would say that we are using the linear
activation function, however backprop is at its
best when this value is passed to certain types
of nonlinear functions.
The most commonly used nonlinear function is
Standard Sigmoid
where s is the sum of the inputs to the neuron
and v is the value of the neuron. Thus, with s
4.34, v 0.987.
Of course, 0.91 is not quite 1 but for this
example it is close enough. When using this
particular activation function for a problem
where the output is supposed to be a 0 or 1,
getting the output to within 0.1 of the target
value is a very common standard.
27Evaluating a Network
With this particular activation function it is
actually somewhat hard to get very close to 1 or
0 because the function only approaches 1 and 0 as
the input to the function approaches 8 and -8
Standard Sigmoid
The other values the network computes for the xor
function are
28Evaluating a Network
Computing the activation value for a Neuron
The formulas for computing the activation value
for a neuron, j can be written more concisely as
follows Let the activation value for neuron j
be oj. Let the activation function be the
general function, f. Let the weight between
neuron j and neuron i be wij. Let the net input
to neuron j be netj , then
where n is the number of units feeding into unit
j and
29Now, lets look more closely to see how a Network
is trained
30Training a Network
BACKPROPAGATION TRAINING
We will now look at the formulas for adjusting
the weights that lead into the output units of a
backpropagation network. The actual activation
value of an output unit, k, will be ok and the
target for unit, k, will be tk . First of all
there is a term in the formula for ?k , the error
signal
where f is the derivative of the activation
function, f . If we use the usual activation
function
the derivative term is
31Training a Network
BACKPROPAGATION TRAINING
The formula to change the weight, wjk between the
output unit, k, and unit j is
where ? is some relatively small positive
constant called the learning rate. With the
network given, assuming that all weights start
with zero values, and with ? 0.1 we have
32Training a Network
BACKPROPAGATION TRAINING
The formula for computing the error ?j for a
hidden unit, j, is
The k subscript is for all the units in the
output layer however in this example there is
only one unit. In the example, then
33Training a Network
BACKPROPAGATION TRAINING
The weight change formula for a weight, wij that
goes between the hidden unit, j and the input
unit, i is essentially the same as before
The new weights will be
34Training a Network
BACKPROPAGATION TRAINING
The activation value for the output layer will
now be 0.507031. If we now do the same for the
other three patterns the output will be
Sad to say but to get the outputs to within 0.1
requires 20,862 iterations, a very long time
especially for such a short problem. Fortunately
there are a large number of things that can be
done to speedup the training and the time to do
the XOR problem can be reduced to around 1220
iterations or so. The very simplest thing to do
is to increase the learning rate, ?. The
following table shows how many iterations are
used for different values of ?.
35Training a Network
BACKPROPAGATION TRAINING
Another unfortunate problem with backprop is that
when the learning rate is too large the training
can fail as it did in the case when ? 3.0.
Here, after 10,000 iterations the results were
where the output for the last pattern is 1 not 0.
The geometric interpretation of this problem is
that when the network tries to make the error go
down the network may get stuck in a valley that
is not the lowest possible valley.
When backprop starts at point A and tries to
minimize the error you hope the process will stop
when it hits the low point at B however you could
get unlucky and hit the not so low point at C.
The low point is a global minimum and the not so
low point is a local minimum.
36Backpropagation Training
- Iterative minimization of error over training set
- Put one of the training patterns to be learned on
the input units. - Find the values for the hidden unit and output
unit. - Find out how large the error is on the output
unit. - Use one of the back-propagation formulas to
adjust the weights leading into the output unit. - Use another formula to find out errors for the
hidden layer unit. - Adjust the weights leading into the hidden layer
unit via another formula. - Repeat steps 1 thru 6 for the second, third
patterns,
37Training Neural Nets
- Given Data set, desired outputs and a Neural
Net with m weights. Find a setting for the
weights that will give good predictive
performance on new data. Estimate expected
performance on new data. - Split data set (randomly) into three subsets
- Training set used for picking weights
- Validation set used to stop training
- Test set used to evaluate performance
- Pick random, small weights as initial values.
- Perform iterative minimization of error over
training set. - Stop when error on validation set reaches a
minimum (to avoid overfitting). - Repeat training (from Step 2) several times
(avoid local minima). - Use best weights to compute error on test set,
which is estimate of performance on new data. Do
not repeat training to improve this.
38BACKPROPAGATION TRAINING
Ok
k (OUTPUT)
Wjk
Wik
Oj
j (HIDDEN)
Wij
Oi
i (INPUT)
A summary of all the formulas can be viewed in
Backprop Formulas.
39Duration of training
- 1 training cycle (feedforward propagation
retropropagation) - Each training cycle is repeated for each training
pattern (e.g. aggtccattacgctatatgcgacttc) - 1 Epoch all training patterns have been
subjected to one training cycle each - Neural Network training usually takes many
training cycles (until Sum of Squared Errors is
at an acceptable level) - (NOTE Sum of Squared Errors is used to gauge
the accuracy of the constructed Neural Network)
40Simulation
Lets see a working model of our Neural Network
41Three-layer Network to Solve the XOR Problem
-3.29
0.91
Output Layer
1
10.9
-4.95
-4.95
0.98
Hidden Layer
-2.76
1
7.1
7.1
Input Layer
1 0
42Design Issues
- Number of Nodes
- Connection
- Learning Paradigm
Adjustment of weights
43Design Issues
Input/Output Nodes easy to determine Hidden
Nodes ? Number of Hidden Layers ? Bias Units ?
44Error Signal Formulas
Error Signal Formulas
1.
Standard Backpropagation
d
( 1
-
)
e
-
a
a
a
i
i
i
i
i
2.
Cross entropy error formula
log
(d
)
d
( 1
-
)
e
/
a
((1
-
)/(1
-
a
))
a
a
2
i
i
i
i
i
i
i
3.
Normalized Exponential Error Formula
a
e
-
d
i
i
i
where
a
is the actual output signal
i
d
is the desired output signal
i
-
error signal propagating back from each output
unit
e
i
45Multi-Layer Feed-Forward Neural Network
Why do we need BIAS UNITS?
Apart from improving the speed of learning for
some problems (XOR problem), bias units or
threshold nodes are required for universal
approximation. Without them, the feedforward
network always assigns 0 output to 0 input.
Without thresholds, it would be impossible to
approximate functions which assign nonzero output
to zero input. Threshold nodes are needed in
much the same way that the constant polynomial
1 is required for approximation by polynomials.
46Sigmoid Unit
47Data Sets
- Split data set (randomly) into three subsets
- Training set used for picking weights
- Validation set used to stop training
- Test set used to evaluate performance
48Training
49Input Representation
- All the signals in a neural net are 0, 1.
Input values should also be scaled to this range
(or approximately so) so as to speed training.
50Input Representation
- All the signals in a neural net are 0, 1.
Input values should also be scaled to this range
(or approximately so) so as to speed training. - If the input values are discrete, e.g. A,B,C,D
or 1,2,3,4, they need to be coded in unary form.
51Output Representation
- A neural net with a single sigmoid output is
aimed at binary classification. Class is 0 if y
lt 0.5 and 1 otherwise. - For multi-class problems
- Can use one output per class (unary encoding)
- There may be confusing outputs (two outputs gt 0.5
in unary encoding) - More sophisticated method is to use special
softmax units, which force outputs to sum to 1.
52Target Value
- During training it is impossible for the outputs
to reach 0 or 1 (with finite weights) - Customary to use 0.1 and 0.9 as targets
- But, most termination criteria, e.g. small change
in training or validation error will stop
training before targets are reached.
53Parameters
54Character Recognition
Pattern Classification Network
5 output nodes
16 hidden nodes
100 input nodes
C\NN\FFPR
55Character Recognition
Learning Curve for Training the Pattern
Classification Network
The chart depicts the learning curve, as the
error tolerance was successively lowered through
the values 0.01, 0.005, 0.0025, 0.001, 0.0005,
0.00025, and finally 0.0001.
The vertical axis shows the cumulative iterations
needed to achieve the perfect 5 out of 5
training performance at each level of error
tolerance.
56Training the Network
Tips for Training
Training Tip 1
Start with a relatively large error tolerance,
and incrementally lower it to the desired level
as training is achieved at each succeeding level.
This usually results in fewer training
iterations than starting out with the desired
final error tolerance.
Training Tip 2
If the network fails to train at a certain error
tolerance, try successively lowering the learning
rate.
57Training the Network
Tips for Training
Training Tip 3
In a system to be used for a real-world
application, such as character recognition, you
would want the network to be able to handle not
only pixel noise, but also size variance
(slightly smaller letters), some rotation, as
well as some variations in font style.
To produce such a robust network classifier, you
would need to add representative samples to the
training set. For example, samples of rotated
versions of a character could be added to the
training set, etc.
58Training a Network
The Momentum Term
Smooth out the effect of weight adjustments over
time.
In general, the formula is
Weight Change learning rate input error
output
momentum_parameter previous_weight_change
Momentum term can be disabled by setting it to
zero.
Warning! Setting the momentum term and learning
rate too large can overshoot a good minimum as it
takes large steps!
59Training a Network
The Momentum Term
Smooth out the effect of weight adjustments over
time.
More accurately,
Formula
Momentum term can be disabled by setting it to
zero.
60Using Neural Networks in a Control Problem
Inverted Pendulum Problem
1 output node
20 hidden nodes
5 input nodes
Input x, v, theta, angular velocity
Output Angle
See FFBRM.EXE
61Using Neural Networks in a Control Problem
Inverted Pendulum Problem (Dynamics of the
Problem)
Formulas
broom angle(with vertical) at time t (in
radians)
angular velocity
Input x, v, theta, angular velocity
x(t) cart position at time t ( in meters)
x(t) cart velocity
F(t) force applied to cart at time t (Newtons)
Output Angle
m combined mass of cart and broom (1.1 kg)
mb mass of broom (0.1 kg)
See FFBRM.EXE
l length of broom (pivot point to center of
mass, 0.5 meters)
62Using Neural Networks in a Control Problem
Inverted Pendulum Problem (Limits of Network
Training)
The equations weve seen ignore the effects of
friction. Control system failure occurs when
the cart hits either end of the track (at x 2.4
meters), or the angle ? reaches pradians (180
degrees)
For practical purposes, though, unrecoverable
failure occurs when the angle ? reaches 12
degrees in magnitude so the Neural Network
training will be restricted to this range.
State of the Cart-Broom System x, x, theta
?, angular velocity ?-dot
63Using Neural Networks in a Control Problem
SIMULATION ISSUES
How can we simulate the complete behaviour of the
system now that we know how to derive all the
state variables describing the system?
What is the cart-broom state given any time t?
Eulers Method
Eulers method relates a variable and its
derivative via the simple approximation equation
Eulers method has only its simplicity to
recommend it, and is normally not used when any
amount of accuracy is required of a numerical
solution. However, it is adequate for our needs.
State of the Cart-Broom System x, x, theta
?, angular velocity ?-dot
64Using Neural Networks in a Control Problem
SIMULATION ISSUES
Heres an appplication of Eulers method
float f_theta (float frce,float th1,float th2)
float denom,numer,cost,sint cost cos
(th1) sint sin (th1) denom
four_thirds m l - mb l cost cost /
Always gt 0 / numer m g sint - cost
(frce mb l th2 th2 sint) return
numer/denom float f_x (float frce,float
th1,float th2,float th3) float
cost,sint,term cost cos (th1) sint
sin (th1) term mb l (th2 th2 sint -
th3 cost) return (frce term)/m
Eulers Method
65Using Neural Networks in a Control Problem
SIMULATION ISSUES
Heres an appplication of Eulers method
void new_broom_state (float frce,state_rec
old_state,state_rec new_state) const
float h 0.02 float th3 / Euler's
method applied to system of equations / /
(not known for accuracy, but good enough here !)
/ new_state-gttheta old_state.theta h
old_state.theta_dot th3 f_theta
(frce,old_state.theta,old_state.theta_dot)
new_state-gttheta_dot old_state.theta_dot h
th3 new_state-gtx_pos old_state.x_pos h
old_state.x_dot new_state-gtx_dot
old_state.x_dot h f_x (frce,old_state.theta,o
ld_state.theta_dot,th3) return
Eulers Method
66Using Neural Networks in a Control Problem
Feed-Forward Controller System
How can we teach the Network to learn broom
balancing?
Analogy to a Human Controller
Suppose that a human controller has no idea how
the cart-broom system is going to respond to
input forces. By randomly applying a force and
observing whether or not that force helps in
balancing the broom, eventually, the controller
may notice that pushing the cart in the same
direction that the broom is leaning will slide
the cart back under the broom and tends to
restore a vertical angle.
With enough repetitions, the controller learns
how much force to supply and how often. An
expert controller can anticipate which way the
broom is going to fall and can apply a corrective
force before the broom goes far in that direction.
67Using Neural Networks in a Control Problem
Feed-Forward Controller System
Training the Network
The approach in teaching the network is similar
to the process of teaching a human controller.
The networks will learn the dynamics of the
cart-broom system by observing numerous
repetitions of random-force applications at
different broom states.
The trained network will then be a model of the
cart-broom dynamics. Like the human expert, the
network can predict the next broom state, and
with this knowledge the correct force can be
applied.
Collection of Training Data
Run the cart-broom system through 100 random
force applications and collect max-min data.
All input data should be normalized. Each
parameter will have its own min, max values.
68Using Neural Networks in a Control Problem
Feed-Forward Controller System (BANG-BANG Control)
Running the Network
The controller operates by looking ahead, like an
experienced controller would. The trained
network emulates the broom dynamics, so that the
controller can ask What will happen if I do
nothing (zero force)?
The trained network answers this question by
supplying the broom angle that would result on
the next iteration if zero force were applied.
Once the angle is predicted, the appropriate
action can be taken.
69Normalisation of Input Parameters
1. From the training data, calculate max, min
values of each input parameter
2. xFactor
3. xDotFactor
70Normalisation of Input Parameters
4. thetaFactor
5. thetaDotFactor
6. forceFactor
71Normalisation of Input Parameters
Normalisation of the x position input
72Calculating the actual state
Given Normalised x position input, calculate
the actual state
73Running the FF-Network Controller (Bang-bang)
outAngle(networkOutput 0.5) 2 thetaFactor
if( outAngle lt zeroTheta) return 0 if(
outAngle gt zeroTheta) return
systemForceIncrement return
-systemForceIncrement
74Using Neural Networks in a Control Problem
Slight variation to the Inverted Pendulum Problem
1 output node
20 hidden nodes
5 input nodes
Output Force, direction
Input x, v, theta, angular velocity
You could check all three possibilities with the
look-ahead network zero, positive and negative
force. Then, you could pick the force input that
resulted in the smallest output angle.
This is more computationally expensive, and in
tests it did not do any better than the zero
look ahead strategy.
75Using Neural Networks in a Control Problem
Alternative Solution to the Inverted Pendulum
Problem
2 output nodes (Force magnitude direction)
16 (44) hidden nodes
4 input nodes
Output Force, direction
Input x, v, theta, angular velocity
Noise factor could be added during training (e.g.
0.05) to make the network more robust.
76Advantages of Neural Nets
- Learning is inherent
- Uses examples for training
- Broad response capability
- Very powerful in handling noise and uncertainty
- Once the network is trained, execution is very
fast. - Easy to manage and maintain
77Limitations of Neural Nets
- Functions like a black box.
- (no explanation facility)
?
Output
Input
78Limitations of Neural Nets
- Paucity of available examples could deter
accuracy of learning - Learns solely from
- examples
79Numerous Applications
- Character recognition
- Detection of stock market
- patterns that presage interesting moves
- Classification of objects that cause SONAR
returns - Classifications of bioelectric signals (EKG
EEG) into normal or pathological conditions - Classification of lesions based on
photomicrographs
Lets see the NN solving the character
recognition problem
80Some observations (2004, MIT)
- Although Neural Nets kicked off the current phase
of interest in machine learning, they are
extremely problematic.. - Too many parameters (weights, learning rate,
momentum, etc.) - Hard to choose the architecture
- Very slow to train
- Easy to get stuck in local minima
- Interest has shifted to other methods, such as
support vector machines, which can be viewed as
variants of perceptrons (with a twist or two).
81Fujitsu Robot Project HOAP Series
HOAP-2 is a fast learner because of how it was
designed. HOAP, or Humanoid for Open Architecture
Platform, represents a fundamentally different
approach to creating humanoid robots. Instead of
using the popular model-based approach to robot
motion control, it harnesses the power of a
neural network, processors that emulate the human
brain and how it learns, to tackle movements and
other tasks.
Feb 4th, 2005
This dynamically reconfigurable neural network,
the first of its kind developed by Fujitsu for
humanoid robots, speeds up and simplifies the
huge computational task of motion generation. The
neural network can also be expanded with little
effort and requires minimal software to run.
http//www.fujitsu.com/nz/about/rd/200506hoap-seri
es.html
82Activation Functions
The sigmoid function varies between zero and one.
This is inconvenient in calculation we cannot
handle negative values. It is more convenient to
use the tanh() function, which has exactly the
same shape but which swings between /- 1.