Title: Outline
1Outline
- Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner,
Gradient-based learning applied to document
recognition, Proceedings of the IEEE, vol. 86,
no. 11, pp. 2278-2324, November, 1998.
2Invariant Object Recognition
- The central goal of computer vision research is
to detect and recognize objects invariant to
scale, viewpoint, illumination, and other changes
3(Invariant) Object Recognition
4Generalization Performance
- Many classifiers are available
- Maximum likelihood estimation, Bayesian
estimation, Parzen Windows, Kn-nearest neighbor,
discriminant functions, support vector machines,
neural networks, decision trees, ....... - Which method is the best to classify unseen test
data? - The performance is often determined by features
- In addition, we are interested in systems that
can solve a particular problem well
5Error Rate on Hand Written Digit Recognition
6No Free Lunch Theorem
7No Free Lunch Theorem cont.
8Ugly Duckling Theorem
In the absence of prior information, there is no
principled reason to prefer one representation
over another.
9Bias and Variance Dilemma
- Regression
- Find an estimate of a true but unknown function
F(x) based on n samples generated by F(x) - Bias the difference between the expected value
and the true value a low bias means on average
we will accurately estimate F from D - Variance the variability of estimation a low
bias means that the estimate does not change much
as the training set varies.
10Bias-Variance Dilemma
- When the training data is finite, there is an
intrinsic problem of any classifier function - If the function is very generic, i.e., a
non-parametric family, it suffers from high
variance - If the function is very specific, i.e., a
parametric family, it suffers from high bias - The central problem is to design a family of
classifiers a priori such that both the variance
and bias are low
11(No Transcript)
12(No Transcript)
13Bias and Variance vs. Model Complexity
14Gap Between Training and Test Error
- Typically the performance of a classifier on a
disjoint test set will be larger than that on the
training set - Where P is the number of training examples, h a
measure of capacity (model complexity), a between
0.5 and 1, and k a constant
15Check Reading System
16End-to-End Training
17Graph Transformer Networks
18Training Using Gradient-Based Learning
- A multiple module system can be trained using a
gradient-based method - Similar to backpropagation used for multiple
layer perceptrons
19(No Transcript)
20Convolutional Networks
21Handwritten Digit Recognition Using a
Convolutional Network
22Training a Convolutional Network
- The loss function used is
- Training algorithm is stochastic diagonal
Levenberg-Marquardt - RBF output is given by
23MNIST Dataset
- 60,000 training images
- 10,000 test images
- There are several different versions of the
dataset
24Experimental Results
25Experimental Results
26Distorted Patterns
- By using distorted patterns, the training error
dropped to 0.8 from 0.95 without deformation
27Misclassified Examples
28Comparison
29Rejection Performance
30Number of Operations
Unit Thousand operations
31Memory Requirements
32Robustness
33Convolutional Network for Object Recognition
34NORB Dataset
35Convolutional Network for Object Recognition
36Experimental Results
37Jittered Cluttered Dataset
38Experimental Results
39Face Detection
40Face Detection
41Multiple Object Recognition
- Based on heuristic over segmentation
- It avoids making hard decisions about
segmentation by taking a large number of
different segmentations
42Graph Transformer Network for Character
Recognition
43Recognition Transformer and Interpretation Graph
44Viterbi Training
45Discriminative Viterbi Training
46Discriminative Forward Training
47Space Displacement Neural Networks
- By considering all possible locations, one can
avoid explicit segmentation - Similar to detection and recognition
48Space Displacement Neural Networks
- We can replicate convolutional networks at all
possible locations
49Space Displacement Neural Networks
50Space Displacement Neural Networks
51Space Displacement Neural Networks
52SDNN/HMM System
53Graph Transformer Networks and Transducers
54On-line Handwriting Recognition System
55On-line Handwriting Recognition System
56Comparative Results
57Check Reading System
58Confidence Estimation
59Summary
- By carefully designing systems with desired
invariance properties, one can often achieve
better generalization performance by limiting
systems capacity - Multiple module systems can be trained often
effectively using gradient-based learning methods - Even though in theory local gradient-based
methods are subject to local minima, in practice
it seems it is not a serious problem - Incorporating contextual information into
recognition systems are often critical for real
world applications - End-to-end training is often more effective