Introduction to Pattern Recognition

About This Presentation

Title:

Introduction to Pattern Recognition

Description:

Scanned from Pattern Classification by Duda, Hart, and Stork. Jie Zou ECSE RPI. 23 ... from Pattern Classification by Duda, Hart, and Stork. Jie Zou ECSE RPI ... – PowerPoint PPT presentation

Number of Views:163

Avg rating:3.0/5.0

Slides: 49

Provided by: Jie7

Category:

more less

Transcript and Presenter's Notes

Title: Introduction to Pattern Recognition

1
Introduction to Pattern Recognition

RPI ECSE
Jie Zou

2
Pattern Recognition System
input
Sensing
Segmentation
Feature extraction
Classification
Post-processing
Decision
3
Design Cycle
Start
Data Collection
Choose Features
Prior Knowledge
Choose Model
Training Set
Train Classifier
Evaluation Set
Evaluate classifier
End
4
Feature Extractor and Classifier
More Heuristic
More Theoretical
Abstract Representation
5
Common Image Features

Color
Color Coordinate System (RGB, YUV, HSI, )
Color Histogram and Its Moments
Global Shape
Moments (Hu, Zernike)
Fourier Descriptor
Texture
Co-occurrence Matrix
Gabor and Wavelet Transform
Local Shape
Curvature, Turning Angles

6
Common Voice Features

Pitch
Voiced / Unvoiced
Formants
Silence
Phoneme

7
Pre-Normalization

Image
Rotation
Translation
Scaling

Voice
Automatic Gain Control
Time warping

8
Feature Extraction

Heuristic
Application Specific
However, there are some general rules for
picking features.
The dimension of feature vectors should be far
less than the number of samples. (The curse of
dimensionality)
Principal Component Analysis (PCA)
Discriminant Analysis (Fisher Linear
Discriminant)

9
Principal Component Analysis

PCA seeks a projection that best represents the
data in a least-squares sense.

PCA reduces the dimensionality of feature space
by restricting attention to those directions
along which the scatter of the cloud is greatest.
10
Fisher Linear Discriminant (1)

Fisher linear discriminant seeks a projection
that is efficient for discrimination.

11
Fisher Linear Discriminant (2)
12
Fisher Linear Discriminant (3)

The discrimination ability of a particular
feature.

13
Features (Summary)

Heuristic and Application Dependent.
Curse of Dimensionality
Principal Component Analysis
Fisher Linear Discriminant

14
Types of Pattern Classification

Supervised Classification
With Training Samples
Unsupervised Classification (Clustering)
Without Training Samples

15
Approaches to Pattern Recognition

Heuristic
Nearest Neighbor
Statistical
Bayesian Classifier
Parameter Estimation
Decision Tree
Neural Networks
Syntactic Method

16
Nearest Neighbor (1)

Suppose there are n training samples, and let x
be the training sample nearest to a test sample
x. Then classifying x is to assign it the label
associated with x.

The test point would be labeled as red.
17
Nearest Neighbor (2)

Very simple.
Computation intensive. There are data structures
and algorithms to speed up. (KD tree, BD tree).
Metric or Distance function.
In practice, if there are a large number of
training samples, the performance of nearest
neighbor rule is good.
In theory, with an unlimited number of training
samples, the error rate is never worse than twice
the Bayes error.

18
K-nearest-neighbor

The k-nearest-neighbor rule starts at the test
point and grows until encloses k training
samples, and it labels the test point by a
majority vote of these samples.

k3
The test point would be labeled as white.
19
Bayesian Classification (1)
20
Bayesian Classification (2)
21
Bayesian Classification (3)
22
Bayesian Classification (4)
Example of 1D Gaussian with two classes
Scanned from Pattern Classification by Duda,
Hart, and Stork
23
Bayesian Classification (5)
Example of 1D Gaussian with two classes
Scanned from Pattern Classification by Duda,
Hart, and Stork
24
Bayesian Classification (6)
Decision boundary is a circle
Scanned from Pattern Classification by Duda,
Hart, and Stork
25
Bayesian Classification (7)
Decision boundary are lines.
Scanned from Pattern Classification by Duda,
Hart, and Stork
26
Bayesian Classification (8)
Decision boundary ellipse
Scanned from Pattern Classification by Duda,
Hart, and Stork
27
Bayesian Classification (9)
Decision boundary is a parabola
Scanned from Pattern Classification by Duda,
Hart, and Stork
28
Bayesian Classification (10)
Decision boundary is a hyperbola
Scanned from Pattern Classification by Duda,
Hart, and Stork
29
Bayesian Classification (11)
Example of 2D Gaussian with several classes
Scanned from Pattern Classification by Duda,
Hart, and Stork
30
Bayesian Classification (Summary)

The basic idea underlying Bayesian
classification is very simple. To find the
maximum posterior probability.

If the underlying distributions are multivariate
Gaussian, the decision boundaries will be
hyperquadrics.

Bayesian error rate is the minimum error rate.

In practice, the likelihood (class conditional
distribution) is unknown.

31
Decision Tree (Basic)
Cut the feature space with straight lines which
are parallel to the axes.
How do we find the cut automatically?
32
Decision Tree (Impurity Measurement)
Entropy
Gini Index
Misclassification
33
Decision Tree (Impurity Measurement)
Scanned from Pattern Classification by Duda,
Hart, and Stork
34
Decision Tree (Construction)
35
Decision Tree (Overfitting Problem)
We can eventually make the leaf nodes contain
training samples from only one class. Is it good??
No, because we are going to classify unseen test
samples, not training samples.
If there is no convincing prior knowledge, less
complex classifier should be preferred.
Especially if the number of training samples are
small.(Occams Razor)
There are ways to decide when to stop splitting
experimentally. (Cross-validation)
Which curve do you prefer?
36
Decision Tree (Summary)

Cut the feature space with straight lines
(hyperplane) parallel to the axes.
Impurity measurement is used to select the best
cut. The idea is to make the children as pure as
possible.
To avoid overfitting problem.
CART, ID3, C4.5

37
Parameter Estimation

Non-parametric method (Histogram)

Parametric method (Parameter Estimation)

Parameter Estimation determines the value of
parameters from a set of training samples.

Maximum-Likelihood (ML) Estimation
Maximum-A-Posteriori (MAP) Estimation
Bayesian Estimation (Bayesian Learning)

38
ML Parameter Estimation
Assume that the true parameters are fixed.The
goal is to find the values from a set of training
samples.
39
ML Parameter Estimation - Example
Variance is known, we want to estimate mean.
What is the value of ML estimation for this
example?
40
MAP Parameter Estimation
Similar to ML, except maximize posterior
41
Bayesian Parameter Estimation (1)
Consider the parameters to be random variables.
What is the difference between p(x) and p(xD)?
42
Bayesian Parameter Estimation (2)
43
Bayesian Parameter Estimation (3)
Incremental Learning (Online Learning)
44
Bayesian Parameter Estimation (4)
Example, Bayesian estimation of a Gaussian mean.
Scanned from Pattern Classification by Duda,
Hart, and Stork
45
Parameter Estimation(Summary)

Both ML and Bayesian methods are used for
parameter estimation of parametric model.
Generally, the estimation results we get from
both methods are nearly identical.
However, their approaches are conceptually
different. Maximum-Likelihood views the true
parameters to be fixed. Bayesian Learning
considers the parameters to be random variables.

46
Advanced Topic

Hidden Markov Model
Support Vector Machine
Bayesian Belief Networks

47
Clustering

K-means
E-M Algorithm

48
Applications

Optical Character Recognition (OCR)
Printed Character
Handwritten Character
Online Handwritten Character
Face Recognition
Fingerprint Identification

Write a Comment

User Comments (0)

About PowerShow.com

Introduction to Pattern Recognition - PowerPoint PPT Presentation

Introduction to Pattern Recognition

Scanned from Pattern Classification by Duda, Hart, and Stork. Jie Zou ECSE RPI. 23 ... from Pattern Classification by Duda, Hart, and Stork. Jie Zou ECSE RPI ... – PowerPoint PPT presentation