MultiLayer Feedforward Neural Networks - PowerPoint PPT Presentation

1 / 46
About This Presentation
Title:

MultiLayer Feedforward Neural Networks

Description:

Between the input and output layers there are hidden layers, as illustrated below. ... Passenger car, van, pick up, bus, truck. Traffic sign detection. Challenges ... – PowerPoint PPT presentation

Number of Views:1090
Avg rating:3.0/5.0
Slides: 47
Provided by: cse7
Category:

less

Transcript and Presenter's Notes

Title: MultiLayer Feedforward Neural Networks


1
Multi-Layer Feedforward Neural Networks
  • CAP 4630 Intro. to Artificial Intelligence
  • Xingquan (Hill) Zhu

2
Outline
  • Multi-layer Neural Networks
  • Feedforward Neural Networks
  • FF NN model
  • Backpropogation (BP) Algorithm
  • Practical Issues of FFNN
  • FFNN for Face Recognition

3
Multi-layer NN
  • Between the input and output layers there are
    hidden layers, as illustrated below.
  • Hidden nodes do not directly send outputs to the
    external environment.
  • Multi-layer NN overcome the limitation of a
    single-layer NN
  • they can handle non-linearly separable learning
    tasks.

Input layer
Output layer
Hidden Layer
4
XOR problem
Two classes, green and red, cannot be separated
using one line, but two lines. The NN below with
two hidden nodes realizes this non-linear
separation, where each hidden node represents one
of the two blue lines.
x2
1
1
-1
x1
-1
-1
w0
x1
1
y1
w1
-1
z
x2
-1
w3
y2
1
-1
5
Types of decision regions
Network with a single node
One-hidden layer network that realizes the
convex region each hidden node realizes one of
the lines bounding the convex region
-3.5
P1
two-hidden layer network that realizes the union
of three convex regions each box represents a
one hidden layer network realizing one convex
region
P2
P3
1.5
6
Different Non-Linearly Separable Problems
Class Separation
Types of Decision Regions
Exclusive-OR Problem
Most General Region Shapes
Structure
Single-Layer
Half Plane Bounded By Hyperplane
Two-Layer
Convex Open Or Closed Regions
Arbitrary (Complexity Limited by No. of Nodes)
Three-Layer
7
Outline
  • Multi-layer Neural Networks
  • Feedforward Neural Networks
  • FF NN model
  • Backpropogation (BP) Algorithm
  • BP rules derivation
  • Practical Issues of FFNN
  • FFNN for Face Recognition

8
FFNN NEURON MODEL
  • The classical learning algorithm of FFNN is based
    on the gradient descent method.
  • The activation function used in FFNN are
    continuous functions of the weights,
    differentiable everywhere.
  • A typical activation function is the Sigmoid
    Function

9
FFNN NEURON MODEL
  • A typical activation function is the Sigmoid
    Function
  • When a approaches to 0, ? tends to a linear
    function
  • when a tends to infinity then ? tends to the step
    function

10
FFNN MODEL
  • xij The input from node i to node j
  • wij The weight from node i to node j
  • ?wij The weight updating amount from node i to
    node j
  • ok The output from node k

11
The objective of multi-layer NN
  • The error of output neuron j after the activation
    of the network on the n-th training example
    is
  • The network error is the sum of the squared
    errors of the output neurons
  • The total mean squared error is the average of
    the network errors over the training examples.

12
  • Feed forward NN
  • Idea Credit assignment problem
  • Problem of assigning credit or blame to
    individual elements involving in forming overall
    response of a learning system (hidden units)
  • In neural networks, problem relates to
    distributing the network error to the weights.

13
Outline
  • Multi-layer Neural Networks
  • Feedforward Neural Networks
  • FF NN model
  • Backpropogation (BP) Algorithm
  • Practical Issues of FFNN
  • FFNN for Face Recognition

14
Training Backprop algorithm
  • Searches for weight values that minimize the
    total error of the network over the set of
    training examples.
  • Repeated procedures of the following two passes
  • Forward pass Compute the outputs of all units in
    the network, and the error of the output layers.
  • Backward pass The network error is used for
    updating the weights (credit assignment problem).
  • Starting at the output layer, the error is
    propagated backwards through the network, layer
    by layer. This is done by recursively computing
    the local gradient of each neuron.

15
Backprop
  • Back-propagation training algorithm illustrated
  • Backprop adjusts the weights of the NN in order
    to minimize the network total mean squared error.

Network activation Error computation Forward Step
Error propagation Backward Step
16
BP
17
BP Example
  • XOR
  • X0 X1 X2 Y
  • 1 0 0 0
  • 1 0 1 1
  • 1 1 0 1
  • 1 1 1 0

18
?0.5 For instance (1, 0, 0), 0
19
  • Weight updating

20
Outline
  • Multi-layer Neural Networks
  • Feedforward Neural Networks
  • FF NN model
  • Backpropogation (BP) Algorithm
  • Practical Issues of FFNN
  • FFNN for Face Recognition

21
  • Network training
  • Two types of network training
  • Incremental mode (on-line, stochastic, or
    per-observation) Weights updated after each
    instance is presented
  • Batch mode (off-line or per -epoch)
  • Weights updated after all the patterns are
    presented

22
Stopping criterions
  • Sensible stopping criterions
  • total mean squared error change Back-prop is
    considered to have converged when the absolute
    rate of change in the average squared error per
    epoch is sufficiently small (in the range 0.01,
    0.1).
  • generalization based criterion After each epoch
    the NN is tested for generalization using a
    different set of examples (validation set). If
    the generalization performance is adequate then
    stop.

23
Use of Available Data Set for Training
The available data set is normally split into
three sets as follows
  • Training set use to update the weights.
    Patterns in this set are repeatedly in random
    order. The weight update equation are applied
    after a certain number of patterns.
  • Validation set use to decide when to stop
    training only by monitoring the error.
  • Test set Use to test the performance of the
    neural network. It should not be used as part of
    the neural network development cycle.

24
Earlier Stopping - Good Generalization
  • Running too many epochs may overtrain the network
    and result in overfitting and perform poorly in
    generalization.
  • Keep a hold-out validation set and test accuracy
    after every epoch. Maintain weights for best
    performing network on the validation set and stop
    training when error increases increases beyond
    this.

Validation set
error
Training set
No. of epochs
25
Model Selection by Cross-validation
  • Too few hidden units prevent the network from
    learning adequately fitting the data and learning
    the concept.
  • Too many hidden units leads to overfitting.
  • Similar cross-validation methods can be used to
    determine an appropriate number of hidden units
    by using the optimal test error to select the
    model with optimal number of hidden layers and
    nodes.

Validation set
error
Training set
No. of epochs
26
NN DESIGN
  • Data representation
  • Network Topology
  • Network Parameters
  • Training

27
Data Representation
  • Data representation depends on the problem. In
    general NNs work on continuous (real valued)
    attributes. Therefore symbolic attributes are
    encoded into continuous ones.
  • Attributes of different types may have different
    ranges of values which affect the training
    process. Normalization may be used, like the
    following one which scales each attribute to
    assume values between 0 and 1.
  • for each value of attribute , where
    are the minimum and maximum
    value of that attribute over the training set.

28
Network Topology
  • The number of layers and neurons depend on the
    specific task. In practice this issue is solved
    by trial and error.
  • Two types of adaptive algorithms can be used
  • start from a large network and successively
    remove some neurons and links until network
    performance degrades.
  • begin with a small network and introduce new
    neurons until performance is satisfactory.

29
Network parameters
  • How are the weights initialized?
  • How is the learning rate chosen?
  • How many hidden layers and how many neurons?
  • How many examples in the training set?

30
Initialization of weights
  • In general, initial weights are randomly chosen,
    with typical values between -1.0 and 1.0 or -0.5
    and 0.5.
  • If some inputs are much larger than others,
    random initialization may bias the network to
    give much more importance to larger inputs. In
    such a case, weights can be initialized as
    follows

For weights from the input to the first layer
For weights from the first to the second layer
31
Choice of learning rate
  • The right value of ? depends on the application.
    Values between 0.1 and 0.9 have been used in many
    applications.

32
Size of Training set
  • Rule of thumb
  • the number of training examples should be at
    least five to ten times the number of weights of
    the network.
  • Other rule

W number of weights aexpected accuracy
33
Applications of FFNN
  • Classification, pattern recognition
  • FFNN can be applied to tackle non-linearly
    separable learning tasks.
  • Recognizing printed or handwritten characters
  • Face recognition
  • Classification of loan applications into
    credit-worthy and non-credit-worthy groups
  • Analysis of sonar radar to determine the nature
    of the source of a signal
  • Regression and forecasting
  • FFNN can be applied to learn non-linear functions
    (regression) and in particular functions whose
    inputs is a sequence of measurements over time
    (time series).

34
Outline
  • Multi-layer Neural Networks
  • Feedforward Neural Networks
  • FF NN model
  • Backpropogation (BP) Algorithm
  • BP rules derivation
  • Practical Issues of FFNN
  • FFNN for Face Recognition

35
Categorical attributes and multi-classes
  • A categorical attribute is usually decomposed
    into a series of (0, 1) continuous attributes
  • Whether an attribute value exists or now.
  • Each class corresponds to one output node, the
    desired output of the node is 1 for any
    instance belonging to this class (otherwise, 0)
  • For each test instance, the final class label is
    determined by the output node with the maximum
    output value.

36
A generalized delta rule
  • If ? is small then the algorithm learns the
    weights very slowly, while if ? is large then the
    large changes of the weights may cause an
    unstable behavior with oscillations of the weight
    values.
  • A technique for tackling this problem is the
    introduction of a momentum term in the delta rule
    which takes into account previous updates. We
    obtain the following generalized Delta rule

? momentum constant
momentum term accelerates the descent in steady
downhill directions
37
Neural Net for object recognition from images
  • Objective
  • Identify interesting objects from input images
  • Face recognition
  • Locate faces, happy/sad faces, gender, face pose,
    orientation
  • Recognize specific faces authorization
  • Vehicle recognition (traffic control or safe
    driving assistant)
  • Passenger car, van, pick up, bus, truck
  • Traffic sign detection
  • Challenges
  • Image size (100x100, 10240x10240)
  • Object size, pose and object orientation
  • Illuminations

38
Example
39
Example Face Detection Challenges
pose variation
lighting condition variation
facial expression variation
40
Normal procedures
  • Training (identify your problem and build
    specific model)
  • Build training dataset
  • Isolate sample images
  • Images containing faces
  • Extract regions containing the objects
  • region containing faces
  • Normalization (size and illumination)
  • 200x200 etc.
  • Select counter-class examples
  • Non-face regions
  • Determine Neural Net
  • Input layers are determined by the input images
  • E.g., a 200x200 image requires 40,000 input
    dimensions, each containing a value between 0-255
  • Neural net architectures
  • A three layer FF NN (two hidden layers) is a
    common practice
  • Output layers are determined by the learning
    problem
  • Bi-class classification or multi-class
    classification
  • Train Neural Net

41
Normal procedures
  • Test
  • Given a test image
  • Select a small region (considering all
    possibilities of the object location and size)
  • Scanning from the top left to the bottom right
  • Sampling at different scale levels
  • Feed the region into the network, determine
    whether this region contains the object or not
  • Repeat the above process
  • Which is a time consuming process

42
CMU Neural Nets for Face Pose Recognition
Head pose (1-of-4) 90 accuracy Face
recognition (1-of-20) 90 accuracy
43
Neural Net Based Face Detection
  • Large training set of faces and small set of
    non-faces
  • Training set of non-faces automatically built up
  • Set of images with no faces
  • Every face detected is added to the non-face
    training set.

44
Traffic sign detection
  • Demo
  • http//www.mathworks.com/products/demos/videoimage
    /traffic_sign/vipwarningsigns.html
  • Intelligent traffic light control system
  • Instead of using loop detectors (like metal
    detectors)
  • Using surveillance video Detecting vehicle and
    bicycles

45
Vehicle Detection
  • Intelligent vehicles aim at improving the driving
    safety by machine vision techniques

http//www.mobileye.com/visionRange.shtml
46
Outline
  • Multi-layer Neural Networks
  • Feedforward Neural Networks
  • FF NN model
  • Backpropogation (BP) Algorithm
  • BP rules derivation
  • Practical Issues of FFNN
  • FFNN for Face Recognition
Write a Comment
User Comments (0)
About PowerShow.com