Introduction to Pattern Recognition - PowerPoint PPT Presentation

1 / 54

About This Presentation

Title:

Introduction to Pattern Recognition

Description:

Most of the material in these s was taken from the figures in ... (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley & Sons, 2001 ... – PowerPoint PPT presentation

Number of Views:713

Avg rating:3.0/5.0

Slides: 55

Provided by: ctap1

Category:

more less

Transcript and Presenter's Notes

Title: Introduction to Pattern Recognition

1
Introduction toPattern Recognition

Charles Tappert
Seidenberg School of CSIS, Pace University

2
Pattern ClassificationMost of the material in
these slides was taken from the figures in
Pattern Classification (2nd ed) by R. O. Duda,
P. E. Hart and D. G. Stork, John Wiley Sons,
2001
3
What is pattern recognition?

Definition from Duda, et al. the act of taking
in raw data and taking an action based on the
category of the pattern
We gain an understanding and appreciation for
pattern recognition in the real world most
particularly in humans

4
An Introductory Example

Sorting incoming Fish on a conveyor according to
species using optical sensing
Sea bass
Species
Salmon

5
Problem Analysis

Set up a camera and take some sample images to
extract features
Length
Lightness
Width
Number and shape of fins
Position of the mouth, etc

6
Pattern Classification System

Preprocessing
Segment (isolate) fishes from one another and
from the background
Feature Extraction
Reduce the data by measuring certain features
Classification
Divide the feature space into decision regions

7
(No Transcript)
8
Classification

Initially use the length of the fish as a
possible feature for discrimination

9
(No Transcript)
10
Feature Selection

The length is a poor feature alone!
Select the lightness as a possible feature

11
(No Transcript)
12
Threshold decision boundary and cost relationship

Move decision boundary toward smaller values of
lightness in order to minimize the cost (reduce
the number of sea bass that are classified
salmon!)
Task of decision theory

13
Feature Vector

Adopt the lightness and add the width of the fish
to the feature vector
Fish xT x1, x2

Width
Lightness
14
Straight line decision boundary
15
Features

We might add other features that are not highly
correlated with the ones we already have. Be sure
not to reduce the performance by adding noisy
features
Ideally, you might think the best decision
boundary is the one that provides optimal
performance on the training data (see the
following figure)

16
Is this a good decision boundary?
17
Decision Boundary Choice

Our satisfaction is premature because the central
aim of designing a classifier is to correctly
classify new (test) input
Issue of generalization!

18
Better decision boundary
19
Pattern Recognition Stages

Sensing
Use of a transducer (camera or microphone)
PR system depends on the bandwidth, the
resolution sensitivity distortion of the
transducer
Segmentation and grouping
Patterns should be well separated and should not
overlap

20
Pattern Recognition Stages (cont)

Feature extraction
Discriminative features
Invariant features with respect to translation,
rotation, and scale
Classification
Use the feature vector provided by a feature
extractor to assign the object to a category
Post Processing
Exploit context-dependent information to improve
performance

21
(No Transcript)
22
The Design Cycle

Data collection
Feature Choice
Model Choice
Training
Evaluation
Computational Complexity

23
(No Transcript)
24
Data Collection

How do we know when we have collected an
adequately large and representative set of
examples for training and testing the system?

25
Choice of Features

Depends on the characteristics of the problem
domain
Simple to extract, invariant to irrelevant
transformations, insensitive to noise

26
Model Choice

Unsatisfied with the performance of our fish
classifier and want to jump to another class of
model

27
Training

Use data to determine the classifier
(Many different procedures for training
classifiers and choosing models)

28
Evaluation

Measure the error rate (or performance)
Possibly switch from one set of features to
another one

29
Computational Complexity

What is the trade-off between computational ease
and performance?
How does an algorithm scale as a function of the
number of features, patterns, or categories?

30
Learning and Adaptation

Supervised learning
A teacher provides a category label for each
pattern in the training set
Unsupervised learning
The system forms clusters or natural groupings
of the unlabeled input patterns

31
Introductory example conclusion

Reader may be overwhelmed by the number,
complexity, and magnitude of the sub-problems of
Pattern Recognition
Many of these sub-problems can indeed be solved
Many fascinating unsolved problems still remain

32
Baysian Decision Theory

Fundamental statistical approach
Assumes relevant probabilities are known
Makes optimal decisions

e.g., P(x ?1) and P(x ?2) describe the
difference in lightness between populations of
sea and salmon (see next slide)
33
(No Transcript)
34
(No Transcript)
35
(No Transcript)
36
Nonparametric Techniques

Probabilities are not known
Two approaches
Estimate the density functions from sample
patterns
Bypass probability estimation
nearest neighbor asymptotic error never worst
than twice Baysian error

37
(No Transcript)
38
(No Transcript)
39
Simple PR System

Good for feasibility studies easy to implement
Extract features
Normalize features to 0-1 range
Classify by nearest neighbor
Using Euclidean distance

40
Simple PR System (cont)

Two modes of operation
Leave one out procedure
One input file of training/test patterns
Good for feasibility study with little data
Train and test on separate files
One input file for training
One input file for testing
Good for measuring performance change when
varying an independent variable (e.g., different
keyboards for keystroke biometric)

41
Simple PR System (cont)

Used in keystroke biometric studies
Feasibility study Dr. Mary Curtin
Different keyboards/modes Dr. Mary Villani
Study of procedures for handling incomplete and
missing data e.g., fallback procedures in the
keystroke biometric system Mark Ritzmann
Also used in Mouse Movement and Stylometry
Projects

42
Linear Discriminant Functions

Linear function of set of parameters
Hyperplane decision boundaries
Methods
Simple Perceptron
Solve linear algebra directly
Support Vector Machines (SVM)

43
(No Transcript)
44
(No Transcript)
45
Multilayer Neural Networks

Feedforward networks backpropagation algorithm
A three-layer neural network has an input layer,
a hidden layer, and an output layer
interconnected by modifiable weights represented
by links between layers
Benefits
Simplicity of learning algorithm
Ease of model selection
Incorporation of heuristics/constraints

46
(No Transcript)
47
(No Transcript)
48
Stochastic Methods

Relies of randomness to find model parameters
Used for highly complex problems where gradient
descent algorithms unlikely to work
Methods
Simulated annealing
Boltzman learning
Genetic algorithms

49
Nonmetric Methods

Nominal data
No measure of distance between vectors
No notion of similarity or ordering
Methods
Decision trees
Grammatical methods
e.g., finite state machines
Rule-based systems
e.g., propositional logic or first-order logic

50
Unsupervised Learning

Often called clustering
The system is not given a set of labeled patterns
for training
Instead the system establishes the classes itself
based on the regularities of the patterns

51
Clustering Separate Clouds

Methods work fine when clusters form well
separated compact clouds
Less well when there are great differences in the
number of samples in different clusters

52
Hierarchical Clustering

Sometimes clusters are not disjoint, but may have
subclusters, which in turn having
sub-subclusters, etc.
Consider partitioning n samples into clusters
Start with n cluster, each one containing exactly
one sample
Then partition into n-1 clusters, then into n-2,
etc.

53
Dendrogram of uppercase As from DPS
Dissertation by Mary Manfredi
54
Conclusions