Introduction to Pattern Recognition - PowerPoint PPT Presentation

1 / 54
About This Presentation
Title:

Introduction to Pattern Recognition

Description:

Most of the material in these s was taken from the figures in ... (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley & Sons, 2001 ... – PowerPoint PPT presentation

Number of Views:711
Avg rating:3.0/5.0
Slides: 55
Provided by: ctap1
Category:

less

Transcript and Presenter's Notes

Title: Introduction to Pattern Recognition


1
Introduction toPattern Recognition
  • Charles Tappert
  • Seidenberg School of CSIS, Pace University

2
Pattern ClassificationMost of the material in
these slides was taken from the figures in
Pattern Classification (2nd ed) by R. O. Duda,
P. E. Hart and D. G. Stork, John Wiley Sons,
2001
3
What is pattern recognition?
  • Definition from Duda, et al. the act of taking
    in raw data and taking an action based on the
    category of the pattern
  • We gain an understanding and appreciation for
    pattern recognition in the real world most
    particularly in humans

4
An Introductory Example
  • Sorting incoming Fish on a conveyor according to
    species using optical sensing
  • Sea bass
  • Species
  • Salmon

5
Problem Analysis
  • Set up a camera and take some sample images to
    extract features
  • Length
  • Lightness
  • Width
  • Number and shape of fins
  • Position of the mouth, etc

6
Pattern Classification System
  • Preprocessing
  • Segment (isolate) fishes from one another and
    from the background
  • Feature Extraction
  • Reduce the data by measuring certain features
  • Classification
  • Divide the feature space into decision regions

7
(No Transcript)
8
Classification
  • Initially use the length of the fish as a
    possible feature for discrimination

9
(No Transcript)
10
Feature Selection
  • The length is a poor feature alone!
  • Select the lightness as a possible feature

11
(No Transcript)
12
Threshold decision boundary and cost relationship
  • Move decision boundary toward smaller values of
    lightness in order to minimize the cost (reduce
    the number of sea bass that are classified
    salmon!)
  • Task of decision theory

13
Feature Vector
  • Adopt the lightness and add the width of the fish
    to the feature vector
  • Fish xT x1, x2

Width
Lightness
14
Straight line decision boundary
15
Features
  • We might add other features that are not highly
    correlated with the ones we already have. Be sure
    not to reduce the performance by adding noisy
    features
  • Ideally, you might think the best decision
    boundary is the one that provides optimal
    performance on the training data (see the
    following figure)

16
Is this a good decision boundary?
17
Decision Boundary Choice
  • Our satisfaction is premature because the central
    aim of designing a classifier is to correctly
    classify new (test) input
  • Issue of generalization!

18
Better decision boundary
19
Pattern Recognition Stages
  • Sensing
  • Use of a transducer (camera or microphone)
  • PR system depends on the bandwidth, the
    resolution sensitivity distortion of the
    transducer
  • Segmentation and grouping
  • Patterns should be well separated and should not
    overlap

20
Pattern Recognition Stages (cont)
  • Feature extraction
  • Discriminative features
  • Invariant features with respect to translation,
    rotation, and scale
  • Classification
  • Use the feature vector provided by a feature
    extractor to assign the object to a category
  • Post Processing
  • Exploit context-dependent information to improve
    performance

21
(No Transcript)
22
The Design Cycle
  • Data collection
  • Feature Choice
  • Model Choice
  • Training
  • Evaluation
  • Computational Complexity

23
(No Transcript)
24
Data Collection
  • How do we know when we have collected an
    adequately large and representative set of
    examples for training and testing the system?

25
Choice of Features
  • Depends on the characteristics of the problem
    domain
  • Simple to extract, invariant to irrelevant
    transformations, insensitive to noise

26
Model Choice
  • Unsatisfied with the performance of our fish
    classifier and want to jump to another class of
    model

27
Training
  • Use data to determine the classifier
  • (Many different procedures for training
    classifiers and choosing models)

28
Evaluation
  • Measure the error rate (or performance)
  • Possibly switch from one set of features to
    another one

29
Computational Complexity
  • What is the trade-off between computational ease
    and performance?
  • How does an algorithm scale as a function of the
    number of features, patterns, or categories?

30
Learning and Adaptation
  • Supervised learning
  • A teacher provides a category label for each
    pattern in the training set
  • Unsupervised learning
  • The system forms clusters or natural groupings
    of the unlabeled input patterns

31
Introductory example conclusion
  • Reader may be overwhelmed by the number,
    complexity, and magnitude of the sub-problems of
    Pattern Recognition
  • Many of these sub-problems can indeed be solved
  • Many fascinating unsolved problems still remain

32
Baysian Decision Theory
  • Fundamental statistical approach
  • Assumes relevant probabilities are known
  • Makes optimal decisions

e.g., P(x ?1) and P(x ?2) describe the
difference in lightness between populations of
sea and salmon (see next slide)
33
(No Transcript)
34
(No Transcript)
35
(No Transcript)
36
Nonparametric Techniques
  • Probabilities are not known
  • Two approaches
  • Estimate the density functions from sample
    patterns
  • Bypass probability estimation
  • nearest neighbor asymptotic error never worst
    than twice Baysian error

37
(No Transcript)
38
(No Transcript)
39
Simple PR System
  • Good for feasibility studies easy to implement
  • Extract features
  • Normalize features to 0-1 range
  • Classify by nearest neighbor
  • Using Euclidean distance

40
Simple PR System (cont)
  • Two modes of operation
  • Leave one out procedure
  • One input file of training/test patterns
  • Good for feasibility study with little data
  • Train and test on separate files
  • One input file for training
  • One input file for testing
  • Good for measuring performance change when
    varying an independent variable (e.g., different
    keyboards for keystroke biometric)

41
Simple PR System (cont)
  • Used in keystroke biometric studies
  • Feasibility study Dr. Mary Curtin
  • Different keyboards/modes Dr. Mary Villani
  • Study of procedures for handling incomplete and
    missing data e.g., fallback procedures in the
    keystroke biometric system Mark Ritzmann
  • Also used in Mouse Movement and Stylometry
    Projects

42
Linear Discriminant Functions
  • Linear function of set of parameters
  • Hyperplane decision boundaries
  • Methods
  • Simple Perceptron
  • Solve linear algebra directly
  • Support Vector Machines (SVM)

43
(No Transcript)
44
(No Transcript)
45
Multilayer Neural Networks
  • Feedforward networks backpropagation algorithm
  • A three-layer neural network has an input layer,
    a hidden layer, and an output layer
    interconnected by modifiable weights represented
    by links between layers
  • Benefits
  • Simplicity of learning algorithm
  • Ease of model selection
  • Incorporation of heuristics/constraints

46
(No Transcript)
47
(No Transcript)
48
Stochastic Methods
  • Relies of randomness to find model parameters
  • Used for highly complex problems where gradient
    descent algorithms unlikely to work
  • Methods
  • Simulated annealing
  • Boltzman learning
  • Genetic algorithms

49
Nonmetric Methods
  • Nominal data
  • No measure of distance between vectors
  • No notion of similarity or ordering
  • Methods
  • Decision trees
  • Grammatical methods
  • e.g., finite state machines
  • Rule-based systems
  • e.g., propositional logic or first-order logic

50
Unsupervised Learning
  • Often called clustering
  • The system is not given a set of labeled patterns
    for training
  • Instead the system establishes the classes itself
    based on the regularities of the patterns

51
Clustering Separate Clouds
  • Methods work fine when clusters form well
    separated compact clouds
  • Less well when there are great differences in the
    number of samples in different clusters

52
Hierarchical Clustering
  • Sometimes clusters are not disjoint, but may have
    subclusters, which in turn having
    sub-subclusters, etc.
  • Consider partitioning n samples into clusters
  • Start with n cluster, each one containing exactly
    one sample
  • Then partition into n-1 clusters, then into n-2,
    etc.

53
Dendrogram of uppercase As from DPS
Dissertation by Mary Manfredi
54
Conclusions
  • PR systems are used in many areas of research
  • DPS dissertations that used PR systems
  • Visual systems Rick Bassett, Sheb Bishop, Tom
    Lombardi
  • Speech recognition Jonathan Law
  • Handwriting Mary Manfredi
  • NLP Bashir Ahmed
  • Keystroke Biometric Mary Curtin, Mary Villani,
    Mark Ritzmann
  • Fundamental research areas Kwang Lee, Carl
    Abrams
  • DPS dissertations in progress using PR systems
  • Ted Markowitz, John Galatti
Write a Comment
User Comments (0)
About PowerShow.com