Overview of Back Propagation Algorithm - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

Overview of Back Propagation Algorithm

Description:

Title: Training Hierarchical Feed-forward Visual Recognition Models Using Transfer Learning from Pseudo-Tasks Author: Shuiwang Ji Last modified by – PowerPoint PPT presentation

Number of Views:92
Avg rating:3.0/5.0
Slides: 20
Provided by: Shuiw
Category:

less

Transcript and Presenter's Notes

Title: Overview of Back Propagation Algorithm


1
Overview of Back Propagation Algorithm
  • Shuiwang Ji

2
A Sample Network
3
Forward Operation
  • The general feed-forward operation is

4
Back Propagation Algorithm
  • The hidden to output weights can be learned by
    minimizing the error
  • The power of back-propagation is that it allows
    us to calculate an effective error for each
    hidden unit, and thus derive a learning rule for
    the input-to-hidden weights
  • We consider the error function
  • The update rule is

5
Hidden-to-output Weights
The chain rule
The sensitivity of unit k is
and
Overall, the derivative is
6
Input-to-hidden Weights
The chain rule
The real back propagation
Overall the rule is
7
Back Propagation of Sensitivity
  1. The sensitivity at a hidden unit is proportional
    to the weighted sum of the sensitivities at the
    output units
  2. The output unit sensitivities are thus propagated
    back to the hidden units

8
Training Hierarchical Feed-forward Visual
Recognition Models Using Transfer Learning from
Pseudo-Tasks
  • ECCV08
  • Kai Yu
  • Presented by Shuiwang Ji

9
Transfer Learning
  • Transfer learning, also known as multi-task
    learning, is a mechanism that improves
    generalization by leveraging shared
    domain-specific information contained in related
    tasks
  • In the setting considered in this paper, all
    tasks share the same input space

10
General Formulation
  • The main task to be learnt has index m with
    training examples
  • A neural network has a natural architecture to
    tackle this learning problem by minimizing

11
General Formulation
  • The is learned by additionally
    introducing pseudo auxiliary tasks, each
    represented by learning the input-output pairs
  • Then the regularization term becomes
  • A Bayesian perspective (skipped)

12
CNN for Transfer Learning
  • Input 140x140 pixel images, including R/G/B
    channels and additionally two channels Dx and Dy,
    which are the horizontal and vertical gradients
    of gray intensities
  • C1 layer 16 filters of size 16 by 16
  • P1 layer max pooling over each 5 by 5
    neighborhood
  • C2 layer 256 filters of size 6 by 6,
    connections with sparsity 0.5 between the
    16 dimensions of P1 layer and the 256 dimensions
    of C2 layer
  • P2 layer max pooling over each 5 by 5
    neighborhood
  • Output layer full connections between (256 by
    4 by 4) P2 features and outputs

13
Generating Pseudo Tasks
  • The pseudo-task is constructed by sampling a
    random 2D patch and using it as a template to
    form a local 2D filter that operates on every
    training image. The value assigned to an image
    under this task is taken to be the maximum over
    the result of this 2D convolution operation
  • brittle to scale, translation, and slight
    intensity variations

14
Generating Pseudo Tasks
  • Applying Gabor filters of 4 orientations and 16
    scales result in 64 feature maps of size 104104
    for each image
  • Max-pooling operation is performed first within
    each non-overlapping 44 neighborhood and then
    within each band of two successive scales
    resulting in 32 feature maps of size 2626 for
    each image
  • An set of K RBF filter of size 77 with 4
    orientations are then sampled and used as the
    parameters of the pseudo-tasks, resulting in 8
    feature maps of size 2020
  • Finally, max pooling is performed on the result
    across all the scales and within every
    non-overlapping 1010 neighborhood, giving a 22
    feature map which constitutes the value of this
    image under this pseudo-task
  • Obtained 4K pseudo-tasks (K actual random
    patches, each operating at a different quadrant
    of the image)

15
Object Class Recognition and Localization Using
Sparse Features with Limited Receptive Fields,
IJCV, in press
16
Results on Caltech-101
0.18 second for testing one image (the forward
operation)
17
Gender and Ethnicity Recognition
18
First-layer Features
19
Convergence Rate
Write a Comment
User Comments (0)
About PowerShow.com