PatternObject Recognition PowerPoint PPT Presentation

presentation player overlay
1 / 45
About This Presentation
Transcript and Presenter's Notes

Title: PatternObject Recognition


1
Pattern/Object Recognition
  • Statistical-based
  • Feature-based
  • Neural-network based
  • Structure-based

2
Some concepts
Class a set of objects having some important
common properties Classification a process
that assigns a label to an object according to
some representation of
the objects properties Classifier a device or
algorithm that inputs an object representation
and outputs a class
label Pattern an arrangement of descriptors,
such as features, length, area,
texture, etc. Feature extractor extracts
information relevant to classification from
the data input by
sensors.
3
Some concepts
  • Example of features used for representation of
    character
  • A, B, C, Z
  • - area
  • - height/width
  • - number of holes
  • - center
  • - axis direction
  • - second moments
  • Feature extraction
  • We select the features or combine the
    features (e.g., fusion of features)
  • in order to achieve the most probability
    recognition rate

4
Some concepts
  • Example of features used for representation of
    character (contd)
  • Clustering (classification)
  • E.g.,
  • -- K-NN (Nearest neighbors)
  • -- K-means algorithm

5
Some concepts
  • Example of features used for representation of
    character (contd)
  • Evaluation of the recognition
  • Cumulative Match Score
  • A plot of rank n scores versus probability of
    correct identification is called a Cumulative
    Match Score. An example is given
  • Below

6
Some concepts
  • Example of features used for representation of
    character (contd)
  • Evaluation of the recognition
  • False accept rate
  • the system thinks that the individual is who he
    says he is, even though he is not. This is called
    a false accept. The percentage of times that a
    false accept occurs across all individuals is
    called the false accept rate.
  • False reject rate
  • the system thinks that the individual is not who
    they say they are, even though they really are.
    This is called a false reject. The percentage
    of times that a false reject occurs across all
    individuals is called the false reject rate.
    Subtracting this rate from 100 (100 - false
    reject rate) gives us the Probability of
    Verification.
  • Receiver operating characteristic
  • The false accept rate and the probability of
    verification are not mutually exclusive. Instead,
    there is a give-take relationship between the
    two. The system parameters can be changed to
    receive a lower false acceptance rate, but this
    also lowers the probability of verification. A
    plot that shows this relationship is called a
    receiver operating characteristic or ROC. An
    example verification ROC curve is shown below

7
Some concepts
  • Example of features used for representation of
    character (contd)
  • Evaluation of the recognition (contd)
  • Equal error rate
  • Some resources utilize a term called the equal
    error rate to show performance of a biometric
    system when operating using the verification
    task. The equal error rate is the rate at which
    the false accept rate is exactly equal to the
    false reject rate. For example, a straight line
    was drawn on our example ROC curve from the upper
    left corner (coordinates 0,1) to the lower right
    corner (coordinates 1,0), the equal error rate is
    the point at which the graph crosses this line.
    This one point on the curve is not adequate to
    fully explain the performance of biometric
    systems being used for verification.

8
Algorithms of classification
  • Two classes are known
  • NN (nearest neighbor classifier)

W2
W1
x
If distance d(x, W1) lt d(x,w2) ? X ? W1
9
Algorithms of classification (contd)
  • Two classes are known
  • K-NN (K-nearest neighbor classifier)
  • - K nearest neighbors of P closer to W1 than
    W2, then P? W1

W2
W1
P
10
Algorithms of classification (contd)
  • K-means clustering (dynamic classifier)
  • - Two classes are unknown, but the number of
    classes is known
  • Algorithm
  • (1) Arbitrarily choose two samples as the
    initial clusters centers
  • (2) Distribute the pattern samples into the
    cluster domain according to
  • the distance from the sample to the
    cluster center.
  • (3) Update the cluster centers
  • (4) Repeat (2) and (3) until the updated
    cluster centers are unchanged
  • (5) End

11
Algorithms of classification (contd)
  • Decision function
  • - Discrimination function
  • - For N pattern classes w1, w2,, wN, the
    pattern recognition problem is to find N decision
    function d1(x), d2(x),, dN(x) with the property
    that
  • If a pattern x belongs to class
    wi,
  • Then di(x) gt dj(x) where
    j1,2,..,N (i?j)
  • E.g., statistical classifier Bayes Classifier

12
Algorithms of classification (contd)
  • Classification system

F1(x,K) F2(x,K) Fm(x,K)
X1 X2 Xd
Compare decide
C(x)
Output classification
Input feature Vector X
Distance or probability Computations by
discriminant function F with knowledge K
13
Algorithms of classification (contd)
E.g., statistical classifier Bayes Classifier
(contd) -- The principle is to minimize the
total average loss if we made the bad decision
dj(x) p(x/wj)p(wj)
Probability of occurrence of class wj
Probability density function of the patterns from
class wj
E.g., Given two classes w1 and w2 if
p(x/w1)p(w1) gt p(x/w2)p(w2) ? x?w1 if
p(x/w1)p(w1) lt p(x/w2)p(w2) ? x?w2
14
Algorithms of classification (contd)
  • Neural Networks
  • - Training patterns are used for estimating
    the parameters for decision functions
  • - Training process is to use training set to
    obtain the decision functions
  • Discrimination function

Pattern vector
Weight vector
Pattern vector
15
Algorithms of classification (contd)
  • Neural Networks (NN) (contd)
  • NN Training algorithm
  • -- for linearly separable classes
  • -- e.g., two classes w1 and w2

16
Algorithms of classification (contd)
  • NN Training algorithm (contd)
  • -- e.g., two classes w1 and w2
  • First, arbitrarily choose initial weight
    vector
  • Second, at kth iterative step
  • (1) if and
  • then replace w(k) by

(2) If and then replace
w(k) by
(3) Otherwise w(k1) w(k)
Note Where c is a positive correction
increment.
17
Algorithms of classification (contd)
  • Hidden Markov Models
  • -- A statistical approach to constructing
    classifiers
  • -- A statistical model for an ordered
    sequence of symbols
  • -- Acting as a stochastic state machine that
    generates a symbol each time a transition is made
    from one state to the next.
  • -- Transitions between states are specified
    by transition probabilities.
  • -- A Markov process is a process that moves
    from state to state depending on the previous n
    states.

18
Algorithms of classification (contd)
  • Hidden Markov Models (contd)
  • -- Concept of Markov chain a process that
    can be in one of a number of states at any given
    time.
  • -- Each state generates an observation, from
    which the state sequence can be inferred.
  • -- A Markov chain is defined by the
    probabilities for each transition in state
    occurring, given the current state
  • -- It assumes that the probability of moving
    from one state to another does not vary with time
  • -- HMM is a variation of a Morkov chain in
    which the states in the chain are hidden.

19
Algorithms of classification (contd)
  • Hidden Markov Models (contd)
  • -- Like a neural network classifier, a HMM
    must be trained before it can be used.
  • -- Training establishes the transition
    probabilities for each state in the Markov chain.
  • -- When presented with data in the database,
    the HMM provides a measure of how close the data
    patterns resemble the data used to train the
    model.

20
Algorithms of classification (contd)
  • Hidden Markov Models (contd)

Example of Markov chain A, B, and C represent
states, and the arrows connecting the states
represent transitions
21
Pattern Matching
  • Computational methods for pattern matching and
    sequence alignment
  • -- e.g., Bayesian methods, neural networks,
    HMM, genetic algorithm, dynamic programming, dot
    matrix (syntactic method) , etc.
  • Dot matrix analysis
  • -- visually intuitive method of pattern
    detection
  • -- it is used for bio-informatics data analysis

22
Pattern Matching
  • Dot matrix analysis (contd)

A T T C G G C A T T C
A T T C G A C A T T
23
Pattern Matching
Note a filter with a combination of Window and
stringency is used to Screen the noise
pattern. The window refers to the number Of
data points examined at a time The stringency is
the minimum number Of matches required within
each window. For example With a filter in which
the Window size is set to 2 and the stringency
To 1, a dot is printed at a matrix position
Only if 1 out of 2 positions is identified Dot
matrix analysis is useful in identifying Repeats
-- repeating characters or short sequences
  • Dot matrix analysis (contd)

A T T C G G C A T T C
A T T C G A C A T T
24
Pattern Recognition for knowledge discovery
  • Knowledge discovery process
  • -- Selection and sampling of the appropriate
    data from the databases
  • -- Preprocessing and cleaning of the data to
    remove redundancies,
  • errors, and conflicts
  • -- Transforming and reducing data to a
    format more suitable for the
  • data mining
  • -- Data mining
  • -- Evaluation of the mined data
  • -- Visualization of the evaluation results
  • -- Designing new data queries to test new
    hypotheses and returning
  • step 1.

25
Pattern Recognition for knowledge discovery
Visualization
  • Knowledge discovery process

Evaluation
Separate databases
Data mining
Data warehouse
Selection sampling
Transformation Reduction
Pre-processing Cleaning
26
Pattern Recognition for knowledge discovery
  • Pattern discovery
  • -- data mining is the process of identifying
    patterns and relationships
  • In data that often are not obvious in large,
    complex data sets. As such,
  • Data mining involves pattern recognition, by
    extension, pattern discovery.
  • -- It is concerned with the automatic
    classification of objects, character
  • Sequences, 3D structures, etc.

Pattern
Feature extraction
27
Pattern Recognition for knowledge discovery
  • Pattern discovery
  • -- Process of data mining is concerned with
    extracting patterns from the
  • data, typically using
  • - classification (e.g., mapping to a class or
    a group)
  • - Regression (e.g., statistical analysis)
  • - link analysis (e.g., correlation of data)
  • - Segmentation (e.g.,similarity function)
  • - Deviation detection (e.g., difference from
    norm)

28
Motion object tracking
  • Optical flow (motion velocity of object)
  • - The intensity of motion image is a
    function of the position and time
  • I f(x,y,t)
  • - In a small time period, the intensity
    change is small enough to be considered as
    unchanged. After time step dt, point (x,y) moves
    to point (xdx, ydy). If dt is small, then
  • f(x,y,t) f(xdx, ydy, tdt)

29
Motion object tracking
  • Optical flow (motion velocity of object)
  • - Tailar series expansion

Velocity of point (x,y)
Gray level gradient
30
Motion object tracking (contd)
  • Optical flow (contd)
  • - Note temporal gray level change at a
    point is equal to
  • the gradient (spatial change)
    times the motion velocity
  • In order to solve the equation for two unknow
    (u,v)
  • Additional condition (constraint) is necessary.
  • Assume the velocity in neighbor pixels is
    similar. If there are two
  • different velocities in an image, we treat them
    as two different
  • rigid object

31
Motion object tracking (contd)
  • Optical flow (contd)
  • - Find (u,v), such that ?2 is minimized

Where
32
Motion object tracking (contd)
  • Optical flow (contd)
  • - Solution (u,v)

Where
average of 4 neighbor velocities
reduce noise influence
Using relaxation method to get (u,v)
Note initial
33
Similarity measurement of visual information
  • Texture and shape
  • (- application multi-media information
    retrieval)

Geometric shape (E.g., deformable mesh model)
Texture correlation
Hybrid measurement of Texture and shape
similarity
34
Similarity measurement of visual information
  • Texture signatures
  • contrast
  • coarseness
  • directionality

Image standard deviation
Image fourth moment
K is obtained by maximizing the average intensity
of The moving window 2k x 2k
Magnitude of gradient vector
35
Similarity measurement of visual information
  • Shape features
  • - Coefficients of 2D Fourier transformation

Note similarity is evaluated in the frequency
domain Advantage translation is reflected only
in a change of the F(m,n) phase,
while the F(m,n) module remains
constant Disadvantage changes in scale and
orientation of the object shape
determine substantial changes in this
representation - Chain encoding
A simple representation is obtained by
considering boundary points
36
Similarity measurement of visual information
  • Shape features
  • - Moments

For a 2D function f(x,y), the moment of
order (pq) is defined as
Note moment sequence (mpq) and f(x,y)
are uniquely determined each other
37
Similarity measurement of visual information
  • Shape features (contd)
  • - Moments (contd)

Central moments
In digital image
38
Similarity measurement of visual information
  • Shape features (contd)
  • - Moments (contd)

Digital moment if we consider binary
image, f(x,y)1 in region R, the central
p,q-th moment is
Note m00 is represents the area of R
39
Similarity measurement of visual information
  • Shape features (contd)
  • - Digital Moments (contd)

Normalized coordinates by the standard
deviations
Normalizing by the area
40
Similarity measurement of visual information
  • Shape features (contd)
  • - Digital Moments (contd)

Normalized central moments are obtained from
central moments according to the following
transformation
Note Powerful descriptors based on digital
moments are functions of moments that are variant
under Scaling, translation, rotation or squeezing
41
Similarity measurement of visual information
  • Shape features (contd)
  • - Digital Moments (contd)

A set of seven invariant moments can be derived
from the 2th and 3rd moments. Six of them are
rotation invariant (?1 - ?6) and one (?7) is both
skew and rotation invariant
42
Similarity measurement of visual information
  • Shape features (contd)
  • - Digital Moments (contd)

A set of seven invariant moments can be derived
from the 2th and 3rd moments. Six of them are
rotation invariant (?1 - ?6) and one (?7) is both
skew and rotation invariant
43
Support Vector Machine
  • For a given set of points belonging to two
    classes, an SVM tries to find a decision function
    for an optimal separating hyperplane (OSH) which
    maximizes the margin between the two sets of data
    points. The solutions to such optimization
    problems are derived by training the SVM with
    sets of data similar to what it may encounter
    during its application.

Classification using smallest margin hyperplane
(left) and optimal hyperplane (right)
44
Support Vector Machine
Consider a binary classification task with a
training set of points
xi ? ?n, i 1, 2 N where each
point belongs to a corresponding class label yi ?
-1,1 and let the decision function be f(x)
sign(wx b), where w denotes the weights and b
the bias of the decision function. Therefore a
point lying directly on the hyperplane satisfies
the condition, wx b 0 and the points lying
on the right and left of the hyperplane must
satisfy the following conditions xi ?w? b ?
0 ?i (9) xi ?w? b ? 0 ?i (10) The
above Eq. (9) and (10) can be implicitly
formulated as yi (xi ?w? b) ? 0 ?i
and hold true for all input points if the
classification is correct.
45
Support Vector Machine
There is only one optimal hyperplane that exists
which maximizes the distance between the support
vectors. For a maximum margin, the distance from
the hyperplane to the support vectors on either
side must be equidistant. Let us denote H- as the
hyperplane which satisfies xiw b -1 and H
as the hyperplane which satisfies xiw b
1. Hence the maximum margin between hyperplanes
becomes 2/w. Thus, the hyperplane that
optimally separates the data is the one that
minimizes w2 0.5wTw 0.5 (w12 w22 .
wn2), subject to constraints in This is a
classic quadratic optimization problem with
inequality constraints and is solved by the
saddle point of the Lagrange functional
(Lagrangian), where the ai are Lagrange
multipliers. The optimal saddle point (wo, bo,
ao) must be found by minimizing Lp with respect
to w and b and has to be maximized with respect
to non negative ai.
Write a Comment
User Comments (0)
About PowerShow.com