Title: Introduction to Pattern Recognition
1Introduction to Pattern Recognition
2Pattern Recognition System
input
Sensing
Segmentation
Feature extraction
Classification
Post-processing
Decision
3Design Cycle
Start
Data Collection
Choose Features
Prior Knowledge
Choose Model
Training Set
Train Classifier
Evaluation Set
Evaluate classifier
End
4Feature Extractor and Classifier
More Heuristic
More Theoretical
Abstract Representation
5Common Image Features
- Color
- Color Coordinate System (RGB, YUV, HSI, )
- Color Histogram and Its Moments
- Global Shape
- Moments (Hu, Zernike)
- Fourier Descriptor
- Texture
- Co-occurrence Matrix
- Gabor and Wavelet Transform
- Local Shape
- Curvature, Turning Angles
6Common Voice Features
- Pitch
- Voiced / Unvoiced
- Formants
- Silence
- Phoneme
7Pre-Normalization
- Image
- Rotation
-
- Translation
- Scaling
- Voice
- Automatic Gain Control
- Time warping
8Feature Extraction
- Heuristic
- Application Specific
- However, there are some general rules for
picking features. - The dimension of feature vectors should be far
less than the number of samples. (The curse of
dimensionality) - Principal Component Analysis (PCA)
- Discriminant Analysis (Fisher Linear
Discriminant)
9Principal Component Analysis
- PCA seeks a projection that best represents the
data in a least-squares sense.
PCA reduces the dimensionality of feature space
by restricting attention to those directions
along which the scatter of the cloud is greatest.
10Fisher Linear Discriminant (1)
- Fisher linear discriminant seeks a projection
that is efficient for discrimination.
11Fisher Linear Discriminant (2)
12Fisher Linear Discriminant (3)
- The discrimination ability of a particular
feature.
13Features (Summary)
- Heuristic and Application Dependent.
- Curse of Dimensionality
- Principal Component Analysis
- Fisher Linear Discriminant
14Types of Pattern Classification
- Supervised Classification
- With Training Samples
- Unsupervised Classification (Clustering)
- Without Training Samples
15Approaches to Pattern Recognition
- Heuristic
- Nearest Neighbor
- Statistical
- Bayesian Classifier
- Parameter Estimation
- Decision Tree
- Neural Networks
- Syntactic Method
16Nearest Neighbor (1)
- Suppose there are n training samples, and let x
be the training sample nearest to a test sample
x. Then classifying x is to assign it the label
associated with x.
The test point would be labeled as red.
17Nearest Neighbor (2)
- Very simple.
- Computation intensive. There are data structures
and algorithms to speed up. (KD tree, BD tree). - Metric or Distance function.
- In practice, if there are a large number of
training samples, the performance of nearest
neighbor rule is good. - In theory, with an unlimited number of training
samples, the error rate is never worse than twice
the Bayes error.
18K-nearest-neighbor
- The k-nearest-neighbor rule starts at the test
point and grows until encloses k training
samples, and it labels the test point by a
majority vote of these samples.
k3
The test point would be labeled as white.
19Bayesian Classification (1)
20Bayesian Classification (2)
21Bayesian Classification (3)
22Bayesian Classification (4)
Example of 1D Gaussian with two classes
Scanned from Pattern Classification by Duda,
Hart, and Stork
23Bayesian Classification (5)
Example of 1D Gaussian with two classes
Scanned from Pattern Classification by Duda,
Hart, and Stork
24Bayesian Classification (6)
Decision boundary is a circle
Scanned from Pattern Classification by Duda,
Hart, and Stork
25Bayesian Classification (7)
Decision boundary are lines.
Scanned from Pattern Classification by Duda,
Hart, and Stork
26Bayesian Classification (8)
Decision boundary ellipse
Scanned from Pattern Classification by Duda,
Hart, and Stork
27Bayesian Classification (9)
Decision boundary is a parabola
Scanned from Pattern Classification by Duda,
Hart, and Stork
28Bayesian Classification (10)
Decision boundary is a hyperbola
Scanned from Pattern Classification by Duda,
Hart, and Stork
29Bayesian Classification (11)
Example of 2D Gaussian with several classes
Scanned from Pattern Classification by Duda,
Hart, and Stork
30Bayesian Classification (Summary)
- The basic idea underlying Bayesian
classification is very simple. To find the
maximum posterior probability.
- If the underlying distributions are multivariate
Gaussian, the decision boundaries will be
hyperquadrics.
- Bayesian error rate is the minimum error rate.
- In practice, the likelihood (class conditional
distribution) is unknown.
31Decision Tree (Basic)
Cut the feature space with straight lines which
are parallel to the axes.
How do we find the cut automatically?
32Decision Tree (Impurity Measurement)
Entropy
Gini Index
Misclassification
33Decision Tree (Impurity Measurement)
Scanned from Pattern Classification by Duda,
Hart, and Stork
34Decision Tree (Construction)
35Decision Tree (Overfitting Problem)
We can eventually make the leaf nodes contain
training samples from only one class. Is it good??
No, because we are going to classify unseen test
samples, not training samples.
If there is no convincing prior knowledge, less
complex classifier should be preferred.
Especially if the number of training samples are
small.(Occams Razor)
There are ways to decide when to stop splitting
experimentally. (Cross-validation)
Which curve do you prefer?
36Decision Tree (Summary)
- Cut the feature space with straight lines
(hyperplane) parallel to the axes. - Impurity measurement is used to select the best
cut. The idea is to make the children as pure as
possible. - To avoid overfitting problem.
- CART, ID3, C4.5
37Parameter Estimation
- Non-parametric method (Histogram)
- Parametric method (Parameter Estimation)
Parameter Estimation determines the value of
parameters from a set of training samples.
- Maximum-Likelihood (ML) Estimation
- Maximum-A-Posteriori (MAP) Estimation
- Bayesian Estimation (Bayesian Learning)
38ML Parameter Estimation
Assume that the true parameters are fixed.The
goal is to find the values from a set of training
samples.
39ML Parameter Estimation - Example
Variance is known, we want to estimate mean.
What is the value of ML estimation for this
example?
40MAP Parameter Estimation
Similar to ML, except maximize posterior
41Bayesian Parameter Estimation (1)
Consider the parameters to be random variables.
What is the difference between p(x) and p(xD)?
42Bayesian Parameter Estimation (2)
43Bayesian Parameter Estimation (3)
Incremental Learning (Online Learning)
44Bayesian Parameter Estimation (4)
Example, Bayesian estimation of a Gaussian mean.
Scanned from Pattern Classification by Duda,
Hart, and Stork
45Parameter Estimation(Summary)
- Both ML and Bayesian methods are used for
parameter estimation of parametric model. - Generally, the estimation results we get from
both methods are nearly identical. - However, their approaches are conceptually
different. Maximum-Likelihood views the true
parameters to be fixed. Bayesian Learning
considers the parameters to be random variables.
46Advanced Topic
- Hidden Markov Model
- Support Vector Machine
- Bayesian Belief Networks
47Clustering
48Applications
- Optical Character Recognition (OCR)
- Printed Character
- Handwritten Character
- Online Handwritten Character
- Face Recognition
- Fingerprint Identification