Overview of Back Propagation Algorithm - PowerPoint PPT Presentation

1 / 19

About This Presentation

Title:

Overview of Back Propagation Algorithm

Description:

Title: Training Hierarchical Feed-forward Visual Recognition Models Using Transfer Learning from Pseudo-Tasks Author: Shuiwang Ji Last modified by – PowerPoint PPT presentation

Number of Views:92

Avg rating:3.0/5.0

Slides: 20

Provided by: Shuiw

Category:

more less

Transcript and Presenter's Notes

Title: Overview of Back Propagation Algorithm

1
Overview of Back Propagation Algorithm

Shuiwang Ji

2
A Sample Network
3
Forward Operation

The general feed-forward operation is

4
Back Propagation Algorithm

The hidden to output weights can be learned by
minimizing the error
The power of back-propagation is that it allows
us to calculate an effective error for each
hidden unit, and thus derive a learning rule for
the input-to-hidden weights
We consider the error function
The update rule is

5
Hidden-to-output Weights
The chain rule
The sensitivity of unit k is
and
Overall, the derivative is
6
Input-to-hidden Weights
The chain rule
The real back propagation
Overall the rule is
7
Back Propagation of Sensitivity

The sensitivity at a hidden unit is proportional
to the weighted sum of the sensitivities at the
output units
The output unit sensitivities are thus propagated
back to the hidden units

8
Training Hierarchical Feed-forward Visual
Recognition Models Using Transfer Learning from
Pseudo-Tasks

ECCV08
Kai Yu
Presented by Shuiwang Ji

9
Transfer Learning

Transfer learning, also known as multi-task
learning, is a mechanism that improves
generalization by leveraging shared
domain-specific information contained in related
tasks
In the setting considered in this paper, all
tasks share the same input space

10
General Formulation

The main task to be learnt has index m with
training examples
A neural network has a natural architecture to
tackle this learning problem by minimizing

11
General Formulation

The is learned by additionally
introducing pseudo auxiliary tasks, each
represented by learning the input-output pairs
Then the regularization term becomes
A Bayesian perspective (skipped)

12
CNN for Transfer Learning

Input 140x140 pixel images, including R/G/B
channels and additionally two channels Dx and Dy,
which are the horizontal and vertical gradients
of gray intensities
C1 layer 16 filters of size 16 by 16
P1 layer max pooling over each 5 by 5
neighborhood
C2 layer 256 filters of size 6 by 6,
connections with sparsity 0.5 between the
16 dimensions of P1 layer and the 256 dimensions
of C2 layer
P2 layer max pooling over each 5 by 5
neighborhood
Output layer full connections between (256 by
4 by 4) P2 features and outputs

13
Generating Pseudo Tasks

The pseudo-task is constructed by sampling a
random 2D patch and using it as a template to
form a local 2D filter that operates on every
training image. The value assigned to an image
under this task is taken to be the maximum over
the result of this 2D convolution operation
brittle to scale, translation, and slight
intensity variations

14
Generating Pseudo Tasks

Applying Gabor filters of 4 orientations and 16
scales result in 64 feature maps of size 104104
for each image
Max-pooling operation is performed first within
each non-overlapping 44 neighborhood and then
within each band of two successive scales
resulting in 32 feature maps of size 2626 for
each image
An set of K RBF filter of size 77 with 4
orientations are then sampled and used as the
parameters of the pseudo-tasks, resulting in 8
feature maps of size 2020
Finally, max pooling is performed on the result
across all the scales and within every
non-overlapping 1010 neighborhood, giving a 22
feature map which constitutes the value of this
image under this pseudo-task
Obtained 4K pseudo-tasks (K actual random
patches, each operating at a different quadrant
of the image)

15
Object Class Recognition and Localization Using
Sparse Features with Limited Receptive Fields,
IJCV, in press
16
Results on Caltech-101
0.18 second for testing one image (the forward
operation)
17
Gender and Ethnicity Recognition
18
First-layer Features
19
Convergence Rate

Write a Comment

User Comments (0)