Pattern Recognition Concepts

About This Presentation

Title:

Pattern Recognition Concepts

Description:

For computer vision digital color cameras. Common Model. 3. Feature Extractor ... a certain feature or region of an image, one can compare the image with a ... – PowerPoint PPT presentation

Number of Views:111

Avg rating:3.0/5.0

Slides: 61

Provided by: valueds260

Category:

more less

Transcript and Presenter's Notes

Title: Pattern Recognition Concepts

1
Pattern Recognition Concepts
Dr. Ramprasad Bala Computer and Information
Science UMASS Dartmouth CIS 465 Topics in
Computer Vision
2
Pattern Recognition Problem

In many practical problems, there is a need to
make some decision about the content of an image
or about the classification of an object that it
contains.
Character recognition on a PDA
ATM face recognition
In general we will accommodate an unknown state.

3
Pattern Recognition

Definition The process of matching an object
instance to a single object prototype or class
definition is called verification.
One definition of recognition is to know again
A recognition system must contain some memory of
the objects that it is to recognize.

4
Image Memory

This memory representation might be
built-in (a frogs model of a fly)
Taught by a large number of samples (a student
learning the alphabets)
Programmed in terms of specific image features.

5
(No Transcript)
6
Common Model for Classification

1. Classes
There is a set of m known classes of objects
(either programmed or trained).
There will be a reject class for objects that
cannot be placed in one of the known classes.
An ideal class is a set of objects having some
important common property, a class to which an
object belongs is denoted by some class label.
Classification is a process that assigns a label
to an object according to the properties of the
object. A classifier is a device or algorithm
that performs takes an object as input and
outputs the object label.

7
Common Model

2. Sensor/Transducer
There must be some device that captures the
object and provides a digital representation
(image) of the object as an input to the
classifier.
Generic devices off the shelf devices
For computer vision digital color cameras.

8
Common Model

3. Feature Extractor
The feature extractor extracts information
relevant to classification by the sensor. Usually
done in software. Can also be done in hardware.
4. Classifier
The classifier uses the features extracted from
the sensed object data to assign the object one
of the m designated classes C1, C2, , Cm-1, Cm ,
Cr where Cr denotes the reject class.

9
A d-dimensional input feature vector is processed
by the m classes. K is the knowledge of the
class. Finally the results are compared and a
decision made on the class label.
10
Building a classification system

Each part of this system has many alternate
implementations.
Chapter 2 image sensors
Chapter 3 binary image features
Chapter 5 gray-scale image features
Chapter 6 color image features
Chapter 7 texture features

11
Evaluation of System Error

The error rate of a classification system is one
measure of how well the system solves the problem
for which it is designed.
Other factors are speed, cost (h/w, s/w),
developmental cost etc.
The classifier makes a classification error
whenever it classifies the input object as class
Ci when the true class is Cj ( i .ne. j and Ci
.ne. Cr).

The empirical error rate of a classification
system is the number of errors made on
independent test data divided by the number of
classifications attempted.
The empirical reject rate of a classification
system is the number of rejects made on
independent test data divided by the number of
classifications attempted.
Independent test data are sample objects with
true class known, includes objects from the
reject class, that were not used in designing the
feature extraction and classification algorithms.

13
False Alarms and False Dismissals

Some problems are special two-class problems
Good objects vs bad objects
Object present vs abscent
Person has disease vs does not have disease
In each case false-positives and false-negatives
have different consequences.

14
Receiver Operating Curve
In order to increase the percentage of objects
correctly recognized, one usually has to pay a
cost of incorrectly passing along objects that
should be rejected.
15
Precision versus Recall

In document retrieval(DR) or image retrieval, the
objective is to retrieve interesting objects of
class C1 and not too many of uninteresting class
C2 according to features supplied.
The precision of a DR system is the number of
relevant documents retrieved divided by the total
number of documents retrieved.
The recall of a DR system is the number of
relevant documents retrieved by the system
divided by the total number of relevant documents
in the DB.

16
Features used for Representation

What representation or encoding of the object is
used in the recognition process?
Consider the case of a handwriting CR system
The area of the character in units of black
pixels
The height and width of the bounding box
The number of holes inside the character
The center (centroid) of the set of pixels
The best axis direction through the pixels as the
axis of inertia
The second moments of the pixels about the axis
of least inertia and most inertia.

17
Assuming there is no error in computing the
features, a sequential Decision procedure can be
used to classify instances of these 10 Classes.
The algorithm 4.1 outlines this procedure. This
structure Is called a decision tree.
18
(No Transcript)
19
Feature Vector Representation

Objects maybe classified based on their
representation as a vector of measurements.
Suppose each object is represented by exactly d
measurements.
The similarity or closeness, between the feature
vectors can be described using the Euclidean
distance between the vector.

20
Implementing the Classifier

Assume there are m classes of objects, not
including the reject class, and there are ni
sample vectors for class i.
In our character recognition example, we had m
10
n 100 (say)
d 8 (features)

21
Nearest Class Mean

A simple classification algorithm is to summarize
the sample data for each class using the class
mean vector or centroid.
An unknown object with feature vector x is
classified as class i if it is closer to the mean
vector of class i than to any other class mean.
x could be put into the reject class if it is not
close enough to any of the classes.

22
Returning to Figure 4.2, we now have a definition
of the function boxes. The ith function box
computes the distance between unknown Input x and
the mean vector of the training samples from that
class.
23

While simple to implement nearest class mean does
not work well for a variety of problems.

The classes are not regular and in one case is
multi-modal.
In such cases a scaled Euclidean distance may be
considered.

24
Nearest Neighbor

A more flexible but expensive method is to
classify unknown feature vector x into the class
of the individual sample closes to it.
This is the nearest-neighbor rule.
NN classification can be effective even when
classes have complex structure in d-space and
when classes overlap.
The algorithm uses only the existing training
samples. A brute force approach computes the
distance from x to all samples in the database of
samples and remembers the minimum distance.

A better classification decision can be made by
examining the nearest k-feature vectors in the
DB.
Using larger k depends on a larger number if
samples in each neighborhood of the space to
prevent us from searching too far from x.
In a two-class problem, using k 3, we would
classify x into the class that has 2 of the three
samples nearest x.
If there are more than two classes then there are
more combinations possible and the decision is
more complex.

26
(No Transcript)
27
Structural Techniques.
Consider the two images above. They both have the
same bounding box, the same centroid, same number
of strokes and hole pixels and the same moments.
Each has two bays (intrusions of the
background into the character). Each bay has a
lid (a virtual line that closes the bay). The
difference is the placement of the lids
(structural relations).
28
Statistical PR

Traditionally represents entities by feature
vectors (set of real and Boolean values).
These values measure some global aspect of the
entities, such as area or spatial moments.
Our example even took into account holes and
strokes (do we know how to find them?)

29
Structural PR

An entity is represented by its primitive parts,
their attributes and their relationships as well
as by its global features.

Structural properties -4 major strokes -2
vertical or slanted - 2 horizontal -one hole or
lake on top -one bay -lake and bay separated by a
horizontal line.
30
Graphical representation

When the relationships are binary, a structural
description of an entity can be viewed as a graph
structure.
Define
CON specifies the connection of two strokes
ADJ specifies that a stroke is immediately
adjacent to a lake or bay region
ABOVE specifies that one hole (lake or bay)
lies above another.
We can then construct a graph of the structure.

31
In this example we use three binary relations.
Even ternary or quaternary relations can be used
for stronger constraints. For example A ternary
relation exists between the lake, the horizontal
line and the bay.
32
Structural PR

Structural PR is often achieved through
graph-matching algorithms.
However, the relationship between two primitives
can itself be considered an atomic feature and
thus can be used in feature vector (the number of
such relationships can be used as a feature) for
statistical PR.
Structural methods are useful for recognizing
complex patterns involving sub-patterns. They
also offer understanding of a scene, especially
when multiple objects are present.

33
The Confusion Matrix

The confusion matrix is often used to report
results of classification experiments.
The entry in row i, column j records the number
of times that an object labeled to be truly of
class i was classified as j.
The diagonal indicates success. High off-diagonal
numbers indicate confusion between classes and
forces us to reconsider our feature extraction or
classification procedures.

34
(No Transcript)
35
Decision Trees

When the PR task is complex, computing
similarities between feature vectors can become
very expensive (or impossible).
In such cases one should consider using a
decision tree. A decision tree is a compact
structure that uses one feature (or a small set)
at a time to split the search space.
Algorithm 4.1 presented a simple decision tree.
Each branching node has nodes that represents
different features of the feature vector. At each
stage a feature is selected for the decision.

36
(No Transcript)
37
Binary Decision Tree

A binary decision tree is a binary tree structure
that has a decision function associated with each
node. The decision function is applied to the
unknown vector and determines whether the next
node to be visited is the left child or right
child of the current node.
Each leaf node stores names of a pattern class.

38
(No Transcript)
39
Automatic Construction of DT

Decision trees are not unique for a specific
problem. Several trees may yield the same result.
One simple but effective method is grounded in
information theory. The most basic concept in
information theory is entropy.
Definition the entropy of a set of events is
given by

40
(No Transcript)
41

Entropy can be interpreted as the average
uncertainty of the information source.
IT allows us to measure the information content
of an event. In particular, the information
content of a class w.r.t. each of the feature
events.
The information content I(CF) of the class
variable C with possible values c1,c2,,cm
w.r.t. the feature variable F with possible
values f1,f2,,fd is
where P(Cci) is the probability of class C
having value ci, P(Ffj) is the probability of
feature F having fj and P(Cci, Ffj) is their
joint probability.

These probabilities can be estimated from the
frequency of the associated events in the
training data.
In the previous example, since class I occurs in
two out of the four training samples P(CI)
0.5. Since three of the four training samples
have 1 for feature X, P(X1) 0.75.
We can use this information content measure to
decide which feature is best one to select at the
root of the tree.
We calculate I(C,F) for each of the three
featuresX,Y and Z.

43
Since Y has the largest information content, it
is chosen as the root.
44
Character Classification

Now consider a more complex example handwritten
numerical digits.
Characters with bays, lakes and lids.
The following features can be computed
lake_num the number of lakes
bay_num the number of bays
bay_above_bay Boolean feature (T or F)
lid_rightof_bay Boolean feature
bay_above_lake Boolean feature
lid_bottomof_image Boolean feature

45
(No Transcript)
46
Bayesian Decision Making
47
The practical difficulty is in coming up with the
probabilities. Empirically we collect enough
samples for each class and fit a distribution
and can find the probabilities in that
fashion. For example if we find enough sample for
each class and construct a Histogram, we can
normalize the histogram to obtain a
probability distribution function.
48
(No Transcript)
49
Decisions Using Multidimensional data

Understanding the multidimensional space will go
a long way in understanding the classification.
All the techniques we have seen so far form a
basic type of machine learning called supervised
learning. We assumed that labeled samples were
available for all the classes that were to be
distinguished.
In unsupervised learning the machine must also
determine the class structure.

50
Template Matching

One of the most popular methods of extracting
useful information from an image is that of
template matching.
This is based on the theory that if one wants to
recognize a certain feature or region of an
image, one can compare the image with a database
of sample templates, and if any of them produce a
similarity, then a reasonable estimation can be
made as to what the image entails.

51
Template Matching

Consider the following example.

52
Matching.

The process of matching accomplished by using
convolution or correlation.
Let f(x,y) be the sample image (size MxN).
Let g(x,y) be the template (size (2m1)x(2n1)).
m ltlt M
n ltlt N

53
Matching

One method to measure similarity is given below,
where (i ,j) is the pixel location mlt i lt
M - m nlt j lt N - n
m n p( i, j ) S S g(p,q) f (ip, jq)
p-m q-n

54
Problems with Template Matching

Many factors affect whether or not the
appropriate region of the data image produces the
correct effect with the template. The primary
factors include
Magnification (zoom factor)
Rotation
Perspective

55
Problems

Comparing the template around the border of the
data image produces calculation problems. Since
the center of the template must be compared to
each pixel location, including along the border,
the neighboring pixels will "overhang" the data
image.
Another challenge that presents itself with this
method of recognizing parts of the data image, is
how to obtain an optimum template? As every
occurrence of an object in nature has even the
smallest of differences, as well, the subject of
noise arises with any image processing of this
type.

56
Assignment

Optical Character Recognition (OCR)
The goal of this assignment is to find instances
of the four lower case vowels a, e, o, u in
four paragraphs. The program has to detect each
instance of these letters and color them in
differently.

57
Solution

Convert the input grayscale image to a binary
image using an appropriate threshold. Clean this
binary image using different morphological
operations to aid detection.
Find the connected components in the binary
image. This step, if successful, would detect
each instance of a letter in the image.
Identify the letters and classify them.
Color each of the different vowels in a different
color.

The steps for this process are
Build a template for each of the letters to be
recognized, a, e, o, u. A good
first-approximation for a template is to the
intersection of all instances of that letter in
the training paragraph. However, more fine-tuning
of this template must be done for good
performance.
Erode the original image using this template as
structuring element. All 1 pixels in the
resulting image correspond to all matches found
for the given template.
Find the objects in the original image
corresponding to these 1 pixels. Another way of
doing this is to implement Step 3 as a closing
operation (i.e., erosion followed by dilation)
using the template as structuring element.