MultiLayer Feedforward Neural Networks - PowerPoint PPT Presentation

1 / 46

About This Presentation

Title:

MultiLayer Feedforward Neural Networks

Description:

Between the input and output layers there are hidden layers, as illustrated below. ... Passenger car, van, pick up, bus, truck. Traffic sign detection. Challenges ... – PowerPoint PPT presentation

Number of Views:1091

Avg rating:3.0/5.0

Slides: 47

Provided by: cse7

Category:

more less

Transcript and Presenter's Notes

Title: MultiLayer Feedforward Neural Networks

1
Multi-Layer Feedforward Neural Networks

CAP 4630 Intro. to Artificial Intelligence
Xingquan (Hill) Zhu

2
Outline

Multi-layer Neural Networks
Feedforward Neural Networks
FF NN model
Backpropogation (BP) Algorithm
Practical Issues of FFNN
FFNN for Face Recognition

3
Multi-layer NN

Between the input and output layers there are
hidden layers, as illustrated below.
Hidden nodes do not directly send outputs to the
external environment.
Multi-layer NN overcome the limitation of a
single-layer NN
they can handle non-linearly separable learning
tasks.

Input layer
Output layer
Hidden Layer
4
XOR problem
Two classes, green and red, cannot be separated
using one line, but two lines. The NN below with
two hidden nodes realizes this non-linear
separation, where each hidden node represents one
of the two blue lines.
x2
1
1
-1
x1
-1
-1
w0
x1
1
y1
w1
-1
z
x2
-1
w3
y2
1
-1
5
Types of decision regions
Network with a single node
One-hidden layer network that realizes the
convex region each hidden node realizes one of
the lines bounding the convex region
-3.5
P1
two-hidden layer network that realizes the union
of three convex regions each box represents a
one hidden layer network realizing one convex
region
P2
P3
1.5
6
Different Non-Linearly Separable Problems
Class Separation
Types of Decision Regions
Exclusive-OR Problem
Most General Region Shapes
Structure
Single-Layer
Half Plane Bounded By Hyperplane
Two-Layer
Convex Open Or Closed Regions
Arbitrary (Complexity Limited by No. of Nodes)
Three-Layer
7
Outline

Multi-layer Neural Networks
Feedforward Neural Networks
FF NN model
Backpropogation (BP) Algorithm
BP rules derivation
Practical Issues of FFNN
FFNN for Face Recognition

8
FFNN NEURON MODEL

The classical learning algorithm of FFNN is based
on the gradient descent method.
The activation function used in FFNN are
continuous functions of the weights,
differentiable everywhere.
A typical activation function is the Sigmoid
Function

9
FFNN NEURON MODEL

A typical activation function is the Sigmoid
Function
When a approaches to 0, ? tends to a linear
function
when a tends to infinity then ? tends to the step
function

10
FFNN MODEL

xij The input from node i to node j
wij The weight from node i to node j
?wij The weight updating amount from node i to
node j
ok The output from node k

11
The objective of multi-layer NN

The error of output neuron j after the activation
of the network on the n-th training example
is
The network error is the sum of the squared
errors of the output neurons
The total mean squared error is the average of
the network errors over the training examples.

Feed forward NN
Idea Credit assignment problem
Problem of assigning credit or blame to
individual elements involving in forming overall
response of a learning system (hidden units)
In neural networks, problem relates to
distributing the network error to the weights.

13
Outline

Multi-layer Neural Networks
Feedforward Neural Networks
FF NN model
Backpropogation (BP) Algorithm
Practical Issues of FFNN
FFNN for Face Recognition

14
Training Backprop algorithm

Searches for weight values that minimize the
total error of the network over the set of
training examples.
Repeated procedures of the following two passes
Forward pass Compute the outputs of all units in
the network, and the error of the output layers.
Backward pass The network error is used for
updating the weights (credit assignment problem).
Starting at the output layer, the error is
propagated backwards through the network, layer
by layer. This is done by recursively computing
the local gradient of each neuron.

15
Backprop

Back-propagation training algorithm illustrated
Backprop adjusts the weights of the NN in order
to minimize the network total mean squared error.

Network activation Error computation Forward Step
Error propagation Backward Step
16
BP
17
BP Example

XOR
X0 X1 X2 Y
1 0 0 0
1 0 1 1
1 1 0 1
1 1 1 0

18
?0.5 For instance (1, 0, 0), 0
19

Weight updating

20
Outline

Multi-layer Neural Networks
Feedforward Neural Networks
FF NN model
Backpropogation (BP) Algorithm
Practical Issues of FFNN
FFNN for Face Recognition

Network training
Two types of network training
Incremental mode (on-line, stochastic, or
per-observation) Weights updated after each
instance is presented
Batch mode (off-line or per -epoch)
Weights updated after all the patterns are
presented

22
Stopping criterions

Sensible stopping criterions
total mean squared error change Back-prop is
considered to have converged when the absolute
rate of change in the average squared error per
epoch is sufficiently small (in the range 0.01,
0.1).
generalization based criterion After each epoch
the NN is tested for generalization using a
different set of examples (validation set). If
the generalization performance is adequate then
stop.

23
Use of Available Data Set for Training
The available data set is normally split into
three sets as follows

Training set use to update the weights.
Patterns in this set are repeatedly in random
order. The weight update equation are applied
after a certain number of patterns.
Validation set use to decide when to stop
training only by monitoring the error.
Test set Use to test the performance of the
neural network. It should not be used as part of
the neural network development cycle.

24
Earlier Stopping - Good Generalization

Running too many epochs may overtrain the network
and result in overfitting and perform poorly in
generalization.
Keep a hold-out validation set and test accuracy
after every epoch. Maintain weights for best
performing network on the validation set and stop
training when error increases increases beyond
this.

Validation set
error
Training set
No. of epochs
25
Model Selection by Cross-validation

Too few hidden units prevent the network from
learning adequately fitting the data and learning
the concept.
Too many hidden units leads to overfitting.
Similar cross-validation methods can be used to
determine an appropriate number of hidden units
by using the optimal test error to select the
model with optimal number of hidden layers and
nodes.

Validation set
error
Training set
No. of epochs
26
NN DESIGN

Data representation
Network Topology
Network Parameters
Training

27
Data Representation

Data representation depends on the problem. In
general NNs work on continuous (real valued)
attributes. Therefore symbolic attributes are
encoded into continuous ones.
Attributes of different types may have different
ranges of values which affect the training
process. Normalization may be used, like the
following one which scales each attribute to
assume values between 0 and 1.
for each value of attribute , where
are the minimum and maximum
value of that attribute over the training set.

28
Network Topology

The number of layers and neurons depend on the
specific task. In practice this issue is solved
by trial and error.
Two types of adaptive algorithms can be used
start from a large network and successively
remove some neurons and links until network
performance degrades.
begin with a small network and introduce new
neurons until performance is satisfactory.

29
Network parameters

How are the weights initialized?
How is the learning rate chosen?
How many hidden layers and how many neurons?
How many examples in the training set?

30
Initialization of weights

In general, initial weights are randomly chosen,
with typical values between -1.0 and 1.0 or -0.5
and 0.5.
If some inputs are much larger than others,
random initialization may bias the network to
give much more importance to larger inputs. In
such a case, weights can be initialized as
follows

For weights from the input to the first layer
For weights from the first to the second layer
31
Choice of learning rate

The right value of ? depends on the application.
Values between 0.1 and 0.9 have been used in many
applications.

32
Size of Training set

Rule of thumb
the number of training examples should be at
least five to ten times the number of weights of
the network.
Other rule

W number of weights aexpected accuracy
33
Applications of FFNN

Classification, pattern recognition
FFNN can be applied to tackle non-linearly
separable learning tasks.
Recognizing printed or handwritten characters
Face recognition
Classification of loan applications into
credit-worthy and non-credit-worthy groups
Analysis of sonar radar to determine the nature
of the source of a signal
Regression and forecasting
FFNN can be applied to learn non-linear functions
(regression) and in particular functions whose
inputs is a sequence of measurements over time
(time series).

34
Outline

Multi-layer Neural Networks
Feedforward Neural Networks
FF NN model
Backpropogation (BP) Algorithm
BP rules derivation
Practical Issues of FFNN
FFNN for Face Recognition

35
Categorical attributes and multi-classes

A categorical attribute is usually decomposed
into a series of (0, 1) continuous attributes
Whether an attribute value exists or now.
Each class corresponds to one output node, the
desired output of the node is 1 for any
instance belonging to this class (otherwise, 0)
For each test instance, the final class label is
determined by the output node with the maximum
output value.

36
A generalized delta rule

If ? is small then the algorithm learns the
weights very slowly, while if ? is large then the
large changes of the weights may cause an
unstable behavior with oscillations of the weight
values.
A technique for tackling this problem is the
introduction of a momentum term in the delta rule
which takes into account previous updates. We
obtain the following generalized Delta rule

? momentum constant
momentum term accelerates the descent in steady
downhill directions
37
Neural Net for object recognition from images

Objective
Identify interesting objects from input images
Face recognition
Locate faces, happy/sad faces, gender, face pose,
orientation
Recognize specific faces authorization
Vehicle recognition (traffic control or safe
driving assistant)
Passenger car, van, pick up, bus, truck
Traffic sign detection
Challenges
Image size (100x100, 10240x10240)
Object size, pose and object orientation
Illuminations

38
Example
39
Example Face Detection Challenges
pose variation
lighting condition variation
facial expression variation
40
Normal procedures

Training (identify your problem and build
specific model)
Build training dataset
Isolate sample images
Images containing faces
Extract regions containing the objects
region containing faces
Normalization (size and illumination)
200x200 etc.
Select counter-class examples
Non-face regions
Determine Neural Net
Input layers are determined by the input images
E.g., a 200x200 image requires 40,000 input
dimensions, each containing a value between 0-255
Neural net architectures
A three layer FF NN (two hidden layers) is a
common practice
Output layers are determined by the learning
problem
Bi-class classification or multi-class
classification
Train Neural Net

41
Normal procedures

Test
Given a test image
Select a small region (considering all
possibilities of the object location and size)
Scanning from the top left to the bottom right
Sampling at different scale levels
Feed the region into the network, determine
whether this region contains the object or not
Repeat the above process
Which is a time consuming process

42
CMU Neural Nets for Face Pose Recognition
Head pose (1-of-4) 90 accuracy Face
recognition (1-of-20) 90 accuracy
43
Neural Net Based Face Detection

Large training set of faces and small set of
non-faces
Training set of non-faces automatically built up
Set of images with no faces
Every face detected is added to the non-face
training set.

44
Traffic sign detection

Demo
http//www.mathworks.com/products/demos/videoimage
/traffic_sign/vipwarningsigns.html
Intelligent traffic light control system
Instead of using loop detectors (like metal
detectors)
Using surveillance video Detecting vehicle and
bicycles

45
Vehicle Detection