Pattern Recognition Concepts - PowerPoint PPT Presentation

1 / 49
About This Presentation
Title:

Pattern Recognition Concepts

Description:

In many practical problems, there is a need to make some ... (a frog's model of ... The algorithm 4.1 outlines this procedure. This structure. Is called a ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 50
Provided by: valueds260
Category:

less

Transcript and Presenter's Notes

Title: Pattern Recognition Concepts


1
Pattern Recognition Concepts
Dr. Ramprasad Bala Computer and Information
Science UMASS Dartmouth CIS 585 Image
Processing and Machine Vision
2
Pattern Recognition Problem
  • In many practical problems, there is a need to
    make some decision about the content of an image
    or about the classification of an object that it
    contains.
  • Character recognition on a PDA
  • ATM face recognition
  • In general we will accommodate an unknown state.

3
Pattern Recognition
  • Definition The process of matching an object
    instance to a single object prototype or class
    definition is called verification.
  • One definition of recognition is to know again
  • A recognition system must contain some memory of
    the objects that it is to recognize.

4
Image Memory
  • This memory representation might be
  • built-in (a frogs model of a fly)
  • Taught by a large number of samples (a student
    learning the alphabets)
  • Programmed in terms of specific image features.

5
(No Transcript)
6
Common Model for Classification
  • 1. Classes
  • There is a set of m known classes of objects
    (either programmed or trained).
  • There will be a reject class for objects that
    cannot be placed in one of the known classes.
  • An ideal class is a set of objects having some
    important common property, a class to which an
    object belongs is denoted by some class label.
    Classification is a process that assigns a label
    to an object according to the properties of the
    object. A classifier is a device or algorithm
    that performs takes an object as input and
    outputs the object label.

7
Common Model
  • 2. Sensor/Transducer
  • There must be some device that captures the
    object and provides a digital representation
    (image) of the object as an input to the
    classifier.
  • Generic devices off the shelf devices
  • For computer vision digital color cameras.

8
Common Model
  • 3. Feature Extractor
  • The feature extractor extracts information
    relevant to classification by the sensor. Usually
    done in software. Can also be done in hardware.
  • 4. Classifier
  • The classifier uses the features extracted from
    the sensed object data to assign the object one
    of the m designated classes C1, C2, , Cm-1, Cm ,
    Cr where Cr denotes the reject class.

9
A d-dimensional input feature vector is processed
by the m classes. K is the knowledge of the
class. Finally the results are compared and a
decision made on the class label.
10
Building a classification system
  • Each part of this system has many alternate
    implementations.
  • Chapter 2 image sensors
  • Chapter 3 binary image features
  • Chapter 5 gray-scale image features
  • Chapter 6 color image features
  • Chapter 7 texture features

11
Evaluation of System Error
  • The error rate of a classification system is one
    measure of how well the system solves the problem
    for which it is designed.
  • Other factors are speed, cost (h/w, s/w),
    developmental cost etc.
  • The classifier makes a classification error
    whenever it classifies the input object as class
    Ci when the true class is Cj ( i .ne. j and Ci
    .ne. Cr).

12
  • The empirical error rate of a classification
    system is the number of errors made on
    independent test data divided by the number of
    classifications attempted.
  • The empirical reject rate of a classification
    system is the number of rejects made on
    independent test data divided by the number of
    classifications attempted.
  • Independent test data are sample objects with
    true class known, includes objects from the
    reject class, that were not used in designing the
    feature extraction and classification algorithms.

13
False Alarms and False Dismissals
  • Some problems are special two-class problems
  • Good objects vs bad objects
  • Object present vs abscent
  • Person has disease vs does not have disease
  • In each case false-positives and false-negatives
    have different consequences.

14
Receiver Operating Curve
In order to increase the percentage of objects
correctly recognized, one usually has to pay a
cost of incorrectly passing along objects that
should be rejected.
15
Precision versus Recall
  • In document retrieval(DR) or image retrieval, the
    objective is to retrieve interesting objects of
    class C1 and not too many of uninteresting class
    C2 according to features supplied.
  • The precision of a DR system is the number of
    relevant documents retrieved divided by the total
    number of documents retrieved.
  • The recall of a DR system is the number of
    relevant documents retrieved by the system
    divided by the total number of relevant documents
    in the DB.

16
Features used for Representation
  • What representation or encoding of the object is
    used in the recognition process?
  • Consider the case of a handwriting CR system
  • The area of the character in units of black
    pixels
  • The height and width of the bounding box
  • The number of holes inside the character
  • The center (centroid) of the set of pixels
  • The best axis direction through the pixels as the
    axis of inertia
  • The second moments of the pixels about the axis
    of least inertia and most inertia.

17
Assuming there is no error in computing the
features, a sequential Decision procedure can be
used to classify instances of these 10 Classes.
The algorithm 4.1 outlines this procedure. This
structure Is called a decision tree.
18
(No Transcript)
19
Feature Vector Representation
  • Objects maybe classified based on their
    representation as a vector of measurements.
  • Suppose each object is represented by exactly d
    measurements.
  • The similarity or closeness, between the feature
    vectors can be described using the Euclidean
    distance between the vector.

20
Implementing the Classifier
  • Assume there are m classes of objects, not
    including the reject class, and there are ni
    sample vectors for class i.
  • In our character recognition example, we had m
    10
  • n 100 (say)
  • d 8 (features)

21
Nearest Class Mean
  • A simple classification algorithm is to summarize
    the sample data for each class using the class
    mean vector or centroid.
  • An unknown object with feature vector x is
    classified as class i if it is closer to the mean
    vector of class i than to any other class mean.
  • x could be put into the reject class if it is not
    close enough to any of the classes.

22
Returning to Figure 4.2, we now have a definition
of the function boxes. The ith function box
computes the distance between unknown Input x and
the mean vector of the training samples from that
class.
23
  • While simple to implement nearest class mean does
    not work well for a variety of problems.
  • The classes are not regular and in one case is
    multi-modal.
  • In such cases a scaled Euclidean distance may be
    considered.

24
Nearest Neighbor
  • A more flexible but expensive method is to
    classify unknown feature vector x into the class
    of the individual sample closes to it.
  • This is the nearest-neighbor rule.
  • NN classification can be effective even when
    classes have complex structure in d-space and
    when classes overlap.
  • The algorithm uses only the existing training
    samples. A brute force approach computes the
    distance from x to all samples in the database of
    samples and remembers the minimum distance.

25
  • A better classification decision can be made by
    examining the nearest k-feature vectors in the
    DB.
  • Using larger k depends on a larger number if
    samples in each neighborhood of the space to
    prevent us from searching too far from x.
  • In a two-class problem, using k 3, we would
    classify x into the class that has 2 of the three
    samples nearest x.
  • If there are more than two classes then there are
    more combinations possible and the decision is
    more complex.

26
(No Transcript)
27
Structural Techniques.
Consider the two images above. They both have the
same bounding box, the same centroid, same number
of strokes and hole pixels and the same moments.
Each has two bays (intrusions of the
background into the character). Each bay has a
lid (a virtual line that closes the bay). The
difference is the placement of the lids
(structural relations).
28
Statistical PR
  • Traditionally represents entities by feature
    vectors (set of real and Boolean values).
  • These values measure some global aspect of the
    entities, such as area or spatial moments.
  • Our example even took into account holes and
    strokes (do we know how to find them?)

29
Structural PR
  • An entity is represented by its primitive parts,
    their attributes and their relationships as well
    as by its global features.

Structural properties -4 major strokes -2
vertical or slanted - 2 horizontal -one hole or
lake on top -one bay -lake and bay separated by a
horizontal line.
30
Graphical representation
  • When the relationships are binary, a structural
    description of an entity can be viewed as a graph
    structure.
  • Define
  • CON specifies the connection of two strokes
  • ADJ specifies that a stroke is immediately
    adjacent to a lake or bay region
  • ABOVE specifies that one hole (lake or bay)
    lies above another.
  • We can then construct a graph of the structure.

31
In this example we use three binary relations.
Even ternary or quaternary relations can be used
for stronger constraints. For example A ternary
relation exists between the lake, the horizontal
line and the bay.
32
Structural PR
  • Structural PR is often achieved through
    graph-matching algorithms.
  • However, the relationship between two primitives
    can itself be considered an atomic feature and
    thus can be used in feature vector (the number of
    such relationships can be used as a feature) for
    statistical PR.
  • Structural methods are useful for recognizing
    complex patterns involving sub-patterns. They
    also offer understanding of a scene, especially
    when multiple objects are present.

33
The Confusion Matrix
  • The confusion matrix is often used to report
    results of classification experiments.
  • The entry in row i, column j records the number
    of times that an object labeled to be truly of
    class i was classified as j.
  • The diagonal indicates success. High off-diagonal
    numbers indicate confusion between classes and
    forces us to reconsider our feature extraction or
    classification procedures.

34
(No Transcript)
35
Decision Trees
  • When the PR task is complex, computing
    similarities between feature vectors can become
    very expensive (or impossible).
  • In such cases one should consider using a
    decision tree. A decision tree is a compact
    structure that uses one feature (or a small set)
    at a time to split the search space.
  • Algorithm 4.1 presented a simple decision tree.
  • Each branching node has nodes that represents
    different features of the feature vector. At each
    stage a feature is selected for the decision.

36
(No Transcript)
37
Binary Decision Tree
  • A binary decision tree is a binary tree structure
    that has a decision function associated with each
    node. The decision function is applied to the
    unknown vector and determines whether the next
    node to be visited is the left child or right
    child of the current node.
  • Each leaf node stores names of a pattern class.

38
(No Transcript)
39
Automatic Construction of DT
  • Decision trees are not unique for a specific
    problem. Several trees may yield the same result.
  • One simple but effective method is grounded in
    information theory. The most basic concept in
    information theory is entropy.
  • Definition the entropy of a set of events is
    given by

40
(No Transcript)
41
  • Entropy can be interpreted as the average
    uncertainty of the information source.
  • IT allows us to measure the information content
    of an event. In particular, the information
    content of a class w.r.t. each of the feature
    events.
  • The information content I(CF) of the class
    variable C with possible values c1,c2,,cm
    w.r.t. the feature variable F with possible
    values f1,f2,,fd is
  • where P(Cci) is the probability of class C
    having value ci, P(Ffj) is the probability of
    feature F having fj and P(Cci, Ffj) is their
    joint probability.

42
  • These probabilities can be estimated from the
    frequency of the associated events in the
    training data.
  • In the previous example, since class I occurs in
    two out of the four training samples P(CI)
    0.5. Since three of the four training samples
    have 1 for feature X, P(X1) 0.75.
  • We can use this information content measure to
    decide which feature is best one to select at the
    root of the tree.
  • We calculate I(C,F) for each of the three
    featuresX,Y and Z.

43
Since Y has the largest information content, it
is chosen as the root.
44
Character Classification
  • Now consider a more complex example handwritten
    numerical digits.
  • Characters with bays, lakes and lids.
  • The following features can be computed
  • lake_num the number of lakes
  • bay_num the number of bays
  • bay_above_bay Boolean feature (T or F)
  • lid_rightof_bay Boolean feature
  • bay_above_lake Boolean feature
  • lid_bottomof_image Boolean feature

45
(No Transcript)
46
Bayesian Decision Making
47
The practical difficulty is in coming up with the
probabilities. Empirically we collect enough
samples for each class and fit a distribution
and can find the probabilities in that
fashion. For example if we find enough sample for
each class and construct a Histogram, we can
normalize the histogram to obtain a
probability distribution function.
48
(No Transcript)
49
Decisions Using Multidimensional data
  • Understanding the multidimensional space will go
    a long way in understanding the classification.
  • All the techniques we have seen so far form a
    basic type of machine learning called supervised
    learning. We assumed that labeled samples were
    available for all the classes that were to be
    distinguished.
  • In unsupervised learning the machine must also
    determine the class structure.
Write a Comment
User Comments (0)
About PowerShow.com