Title: Pattern Recognition Concepts
1Pattern Recognition Concepts
Dr. Ramprasad Bala Computer and Information
Science UMASS Dartmouth CIS 585 Image
Processing and Machine Vision
2Pattern Recognition Problem
- In many practical problems, there is a need to
make some decision about the content of an image
or about the classification of an object that it
contains. - Character recognition on a PDA
- ATM face recognition
- In general we will accommodate an unknown state.
3Pattern Recognition
- Definition The process of matching an object
instance to a single object prototype or class
definition is called verification. - One definition of recognition is to know again
- A recognition system must contain some memory of
the objects that it is to recognize.
4Image Memory
- This memory representation might be
- built-in (a frogs model of a fly)
- Taught by a large number of samples (a student
learning the alphabets) - Programmed in terms of specific image features.
5(No Transcript)
6Common Model for Classification
- 1. Classes
- There is a set of m known classes of objects
(either programmed or trained). - There will be a reject class for objects that
cannot be placed in one of the known classes. - An ideal class is a set of objects having some
important common property, a class to which an
object belongs is denoted by some class label.
Classification is a process that assigns a label
to an object according to the properties of the
object. A classifier is a device or algorithm
that performs takes an object as input and
outputs the object label.
7Common Model
- 2. Sensor/Transducer
- There must be some device that captures the
object and provides a digital representation
(image) of the object as an input to the
classifier. - Generic devices off the shelf devices
- For computer vision digital color cameras.
8Common Model
- 3. Feature Extractor
- The feature extractor extracts information
relevant to classification by the sensor. Usually
done in software. Can also be done in hardware. - 4. Classifier
- The classifier uses the features extracted from
the sensed object data to assign the object one
of the m designated classes C1, C2, , Cm-1, Cm ,
Cr where Cr denotes the reject class.
9A d-dimensional input feature vector is processed
by the m classes. K is the knowledge of the
class. Finally the results are compared and a
decision made on the class label.
10Building a classification system
- Each part of this system has many alternate
implementations. - Chapter 2 image sensors
- Chapter 3 binary image features
- Chapter 5 gray-scale image features
- Chapter 6 color image features
- Chapter 7 texture features
11Evaluation of System Error
- The error rate of a classification system is one
measure of how well the system solves the problem
for which it is designed. - Other factors are speed, cost (h/w, s/w),
developmental cost etc. - The classifier makes a classification error
whenever it classifies the input object as class
Ci when the true class is Cj ( i .ne. j and Ci
.ne. Cr).
12- The empirical error rate of a classification
system is the number of errors made on
independent test data divided by the number of
classifications attempted. - The empirical reject rate of a classification
system is the number of rejects made on
independent test data divided by the number of
classifications attempted. - Independent test data are sample objects with
true class known, includes objects from the
reject class, that were not used in designing the
feature extraction and classification algorithms.
13False Alarms and False Dismissals
- Some problems are special two-class problems
- Good objects vs bad objects
- Object present vs abscent
- Person has disease vs does not have disease
- In each case false-positives and false-negatives
have different consequences.
14Receiver Operating Curve
In order to increase the percentage of objects
correctly recognized, one usually has to pay a
cost of incorrectly passing along objects that
should be rejected.
15Precision versus Recall
- In document retrieval(DR) or image retrieval, the
objective is to retrieve interesting objects of
class C1 and not too many of uninteresting class
C2 according to features supplied. - The precision of a DR system is the number of
relevant documents retrieved divided by the total
number of documents retrieved. - The recall of a DR system is the number of
relevant documents retrieved by the system
divided by the total number of relevant documents
in the DB.
16Features used for Representation
- What representation or encoding of the object is
used in the recognition process? - Consider the case of a handwriting CR system
- The area of the character in units of black
pixels - The height and width of the bounding box
- The number of holes inside the character
- The center (centroid) of the set of pixels
- The best axis direction through the pixels as the
axis of inertia - The second moments of the pixels about the axis
of least inertia and most inertia.
17Assuming there is no error in computing the
features, a sequential Decision procedure can be
used to classify instances of these 10 Classes.
The algorithm 4.1 outlines this procedure. This
structure Is called a decision tree.
18(No Transcript)
19Feature Vector Representation
- Objects maybe classified based on their
representation as a vector of measurements. - Suppose each object is represented by exactly d
measurements. - The similarity or closeness, between the feature
vectors can be described using the Euclidean
distance between the vector.
20Implementing the Classifier
- Assume there are m classes of objects, not
including the reject class, and there are ni
sample vectors for class i. - In our character recognition example, we had m
10 - n 100 (say)
- d 8 (features)
21Nearest Class Mean
- A simple classification algorithm is to summarize
the sample data for each class using the class
mean vector or centroid. - An unknown object with feature vector x is
classified as class i if it is closer to the mean
vector of class i than to any other class mean. - x could be put into the reject class if it is not
close enough to any of the classes.
22Returning to Figure 4.2, we now have a definition
of the function boxes. The ith function box
computes the distance between unknown Input x and
the mean vector of the training samples from that
class.
23- While simple to implement nearest class mean does
not work well for a variety of problems.
- The classes are not regular and in one case is
multi-modal. - In such cases a scaled Euclidean distance may be
considered.
24Nearest Neighbor
- A more flexible but expensive method is to
classify unknown feature vector x into the class
of the individual sample closes to it. - This is the nearest-neighbor rule.
- NN classification can be effective even when
classes have complex structure in d-space and
when classes overlap. - The algorithm uses only the existing training
samples. A brute force approach computes the
distance from x to all samples in the database of
samples and remembers the minimum distance.
25- A better classification decision can be made by
examining the nearest k-feature vectors in the
DB. - Using larger k depends on a larger number if
samples in each neighborhood of the space to
prevent us from searching too far from x. - In a two-class problem, using k 3, we would
classify x into the class that has 2 of the three
samples nearest x. - If there are more than two classes then there are
more combinations possible and the decision is
more complex.
26(No Transcript)
27Structural Techniques.
Consider the two images above. They both have the
same bounding box, the same centroid, same number
of strokes and hole pixels and the same moments.
Each has two bays (intrusions of the
background into the character). Each bay has a
lid (a virtual line that closes the bay). The
difference is the placement of the lids
(structural relations).
28Statistical PR
- Traditionally represents entities by feature
vectors (set of real and Boolean values). - These values measure some global aspect of the
entities, such as area or spatial moments. - Our example even took into account holes and
strokes (do we know how to find them?)
29Structural PR
- An entity is represented by its primitive parts,
their attributes and their relationships as well
as by its global features.
Structural properties -4 major strokes -2
vertical or slanted - 2 horizontal -one hole or
lake on top -one bay -lake and bay separated by a
horizontal line.
30Graphical representation
- When the relationships are binary, a structural
description of an entity can be viewed as a graph
structure. - Define
- CON specifies the connection of two strokes
- ADJ specifies that a stroke is immediately
adjacent to a lake or bay region - ABOVE specifies that one hole (lake or bay)
lies above another. - We can then construct a graph of the structure.
31In this example we use three binary relations.
Even ternary or quaternary relations can be used
for stronger constraints. For example A ternary
relation exists between the lake, the horizontal
line and the bay.
32Structural PR
- Structural PR is often achieved through
graph-matching algorithms. - However, the relationship between two primitives
can itself be considered an atomic feature and
thus can be used in feature vector (the number of
such relationships can be used as a feature) for
statistical PR. - Structural methods are useful for recognizing
complex patterns involving sub-patterns. They
also offer understanding of a scene, especially
when multiple objects are present.
33The Confusion Matrix
- The confusion matrix is often used to report
results of classification experiments. - The entry in row i, column j records the number
of times that an object labeled to be truly of
class i was classified as j. - The diagonal indicates success. High off-diagonal
numbers indicate confusion between classes and
forces us to reconsider our feature extraction or
classification procedures.
34(No Transcript)
35Decision Trees
- When the PR task is complex, computing
similarities between feature vectors can become
very expensive (or impossible). - In such cases one should consider using a
decision tree. A decision tree is a compact
structure that uses one feature (or a small set)
at a time to split the search space. - Algorithm 4.1 presented a simple decision tree.
- Each branching node has nodes that represents
different features of the feature vector. At each
stage a feature is selected for the decision.
36(No Transcript)
37Binary Decision Tree
- A binary decision tree is a binary tree structure
that has a decision function associated with each
node. The decision function is applied to the
unknown vector and determines whether the next
node to be visited is the left child or right
child of the current node. - Each leaf node stores names of a pattern class.
38(No Transcript)
39Automatic Construction of DT
- Decision trees are not unique for a specific
problem. Several trees may yield the same result. - One simple but effective method is grounded in
information theory. The most basic concept in
information theory is entropy. - Definition the entropy of a set of events is
given by
40(No Transcript)
41- Entropy can be interpreted as the average
uncertainty of the information source. - IT allows us to measure the information content
of an event. In particular, the information
content of a class w.r.t. each of the feature
events. - The information content I(CF) of the class
variable C with possible values c1,c2,,cm
w.r.t. the feature variable F with possible
values f1,f2,,fd is - where P(Cci) is the probability of class C
having value ci, P(Ffj) is the probability of
feature F having fj and P(Cci, Ffj) is their
joint probability.
42- These probabilities can be estimated from the
frequency of the associated events in the
training data. - In the previous example, since class I occurs in
two out of the four training samples P(CI)
0.5. Since three of the four training samples
have 1 for feature X, P(X1) 0.75. - We can use this information content measure to
decide which feature is best one to select at the
root of the tree. - We calculate I(C,F) for each of the three
featuresX,Y and Z.
43Since Y has the largest information content, it
is chosen as the root.
44Character Classification
- Now consider a more complex example handwritten
numerical digits. - Characters with bays, lakes and lids.
- The following features can be computed
- lake_num the number of lakes
- bay_num the number of bays
- bay_above_bay Boolean feature (T or F)
- lid_rightof_bay Boolean feature
- bay_above_lake Boolean feature
- lid_bottomof_image Boolean feature
45(No Transcript)
46Bayesian Decision Making
47The practical difficulty is in coming up with the
probabilities. Empirically we collect enough
samples for each class and fit a distribution
and can find the probabilities in that
fashion. For example if we find enough sample for
each class and construct a Histogram, we can
normalize the histogram to obtain a
probability distribution function.
48(No Transcript)
49Decisions Using Multidimensional data
- Understanding the multidimensional space will go
a long way in understanding the classification. - All the techniques we have seen so far form a
basic type of machine learning called supervised
learning. We assumed that labeled samples were
available for all the classes that were to be
distinguished. - In unsupervised learning the machine must also
determine the class structure.