Title: PatternObject Recognition
1Pattern/Object Recognition
- Statistical-based
- Feature-based
- Neural-network based
- Structure-based
2Some concepts
Class a set of objects having some important
common properties Classification a process
that assigns a label to an object according to
some representation of
the objects properties Classifier a device or
algorithm that inputs an object representation
and outputs a class
label Pattern an arrangement of descriptors,
such as features, length, area,
texture, etc. Feature extractor extracts
information relevant to classification from
the data input by
sensors.
3Some concepts
- Example of features used for representation of
character - A, B, C, Z
-
- - area
- - height/width
- - number of holes
- - center
- - axis direction
- - second moments
- Feature extraction
- We select the features or combine the
features (e.g., fusion of features) - in order to achieve the most probability
recognition rate
4Some concepts
- Example of features used for representation of
character (contd) - Clustering (classification)
- E.g.,
- -- K-NN (Nearest neighbors)
- -- K-means algorithm
-
5Some concepts
- Example of features used for representation of
character (contd) -
- Evaluation of the recognition
- Cumulative Match Score
- A plot of rank n scores versus probability of
correct identification is called a Cumulative
Match Score. An example is given - Below
6Some concepts
- Example of features used for representation of
character (contd) -
- Evaluation of the recognition
- False accept rate
- the system thinks that the individual is who he
says he is, even though he is not. This is called
a false accept. The percentage of times that a
false accept occurs across all individuals is
called the false accept rate. - False reject rate
- the system thinks that the individual is not who
they say they are, even though they really are.
This is called a false reject. The percentage
of times that a false reject occurs across all
individuals is called the false reject rate.
Subtracting this rate from 100 (100 - false
reject rate) gives us the Probability of
Verification. - Receiver operating characteristic
- The false accept rate and the probability of
verification are not mutually exclusive. Instead,
there is a give-take relationship between the
two. The system parameters can be changed to
receive a lower false acceptance rate, but this
also lowers the probability of verification. A
plot that shows this relationship is called a
receiver operating characteristic or ROC. An
example verification ROC curve is shown below
7Some concepts
- Example of features used for representation of
character (contd) - Evaluation of the recognition (contd)
- Equal error rate
- Some resources utilize a term called the equal
error rate to show performance of a biometric
system when operating using the verification
task. The equal error rate is the rate at which
the false accept rate is exactly equal to the
false reject rate. For example, a straight line
was drawn on our example ROC curve from the upper
left corner (coordinates 0,1) to the lower right
corner (coordinates 1,0), the equal error rate is
the point at which the graph crosses this line.
This one point on the curve is not adequate to
fully explain the performance of biometric
systems being used for verification.
8Algorithms of classification
- NN (nearest neighbor classifier)
W2
W1
x
If distance d(x, W1) lt d(x,w2) ? X ? W1
9Algorithms of classification (contd)
- K-NN (K-nearest neighbor classifier)
- - K nearest neighbors of P closer to W1 than
W2, then P? W1 -
W2
W1
P
10Algorithms of classification (contd)
- K-means clustering (dynamic classifier)
- - Two classes are unknown, but the number of
classes is known - Algorithm
- (1) Arbitrarily choose two samples as the
initial clusters centers - (2) Distribute the pattern samples into the
cluster domain according to - the distance from the sample to the
cluster center. - (3) Update the cluster centers
- (4) Repeat (2) and (3) until the updated
cluster centers are unchanged - (5) End
11Algorithms of classification (contd)
- Decision function
- - Discrimination function
- - For N pattern classes w1, w2,, wN, the
pattern recognition problem is to find N decision
function d1(x), d2(x),, dN(x) with the property
that - If a pattern x belongs to class
wi, - Then di(x) gt dj(x) where
j1,2,..,N (i?j) - E.g., statistical classifier Bayes Classifier
-
12Algorithms of classification (contd)
F1(x,K) F2(x,K) Fm(x,K)
X1 X2 Xd
Compare decide
C(x)
Output classification
Input feature Vector X
Distance or probability Computations by
discriminant function F with knowledge K
13Algorithms of classification (contd)
E.g., statistical classifier Bayes Classifier
(contd) -- The principle is to minimize the
total average loss if we made the bad decision
dj(x) p(x/wj)p(wj)
Probability of occurrence of class wj
Probability density function of the patterns from
class wj
E.g., Given two classes w1 and w2 if
p(x/w1)p(w1) gt p(x/w2)p(w2) ? x?w1 if
p(x/w1)p(w1) lt p(x/w2)p(w2) ? x?w2
14Algorithms of classification (contd)
- Neural Networks
-
- - Training patterns are used for estimating
the parameters for decision functions - - Training process is to use training set to
obtain the decision functions - Discrimination function
Pattern vector
Weight vector
Pattern vector
15Algorithms of classification (contd)
- Neural Networks (NN) (contd)
-
- NN Training algorithm
- -- for linearly separable classes
- -- e.g., two classes w1 and w2
-
16Algorithms of classification (contd)
- NN Training algorithm (contd)
-
- -- e.g., two classes w1 and w2
- First, arbitrarily choose initial weight
vector -
- Second, at kth iterative step
- (1) if and
- then replace w(k) by
-
(2) If and then replace
w(k) by
(3) Otherwise w(k1) w(k)
Note Where c is a positive correction
increment.
17Algorithms of classification (contd)
- Hidden Markov Models
-
- -- A statistical approach to constructing
classifiers - -- A statistical model for an ordered
sequence of symbols - -- Acting as a stochastic state machine that
generates a symbol each time a transition is made
from one state to the next. - -- Transitions between states are specified
by transition probabilities. - -- A Markov process is a process that moves
from state to state depending on the previous n
states. -
18Algorithms of classification (contd)
- Hidden Markov Models (contd)
-
- -- Concept of Markov chain a process that
can be in one of a number of states at any given
time. - -- Each state generates an observation, from
which the state sequence can be inferred. - -- A Markov chain is defined by the
probabilities for each transition in state
occurring, given the current state - -- It assumes that the probability of moving
from one state to another does not vary with time - -- HMM is a variation of a Morkov chain in
which the states in the chain are hidden. -
19Algorithms of classification (contd)
- Hidden Markov Models (contd)
-
- -- Like a neural network classifier, a HMM
must be trained before it can be used. - -- Training establishes the transition
probabilities for each state in the Markov chain. - -- When presented with data in the database,
the HMM provides a measure of how close the data
patterns resemble the data used to train the
model. -
20Algorithms of classification (contd)
- Hidden Markov Models (contd)
-
-
Example of Markov chain A, B, and C represent
states, and the arrows connecting the states
represent transitions
21Pattern Matching
- Computational methods for pattern matching and
sequence alignment - -- e.g., Bayesian methods, neural networks,
HMM, genetic algorithm, dynamic programming, dot
matrix (syntactic method) , etc. -
- Dot matrix analysis
- -- visually intuitive method of pattern
detection - -- it is used for bio-informatics data analysis
22Pattern Matching
- Dot matrix analysis (contd)
-
A T T C G G C A T T C
A T T C G A C A T T
23Pattern Matching
Note a filter with a combination of Window and
stringency is used to Screen the noise
pattern. The window refers to the number Of
data points examined at a time The stringency is
the minimum number Of matches required within
each window. For example With a filter in which
the Window size is set to 2 and the stringency
To 1, a dot is printed at a matrix position
Only if 1 out of 2 positions is identified Dot
matrix analysis is useful in identifying Repeats
-- repeating characters or short sequences
- Dot matrix analysis (contd)
-
A T T C G G C A T T C
A T T C G A C A T T
24Pattern Recognition for knowledge discovery
- Knowledge discovery process
- -- Selection and sampling of the appropriate
data from the databases - -- Preprocessing and cleaning of the data to
remove redundancies, - errors, and conflicts
- -- Transforming and reducing data to a
format more suitable for the - data mining
- -- Data mining
- -- Evaluation of the mined data
- -- Visualization of the evaluation results
- -- Designing new data queries to test new
hypotheses and returning - step 1.
-
25Pattern Recognition for knowledge discovery
Visualization
- Knowledge discovery process
Evaluation
Separate databases
Data mining
Data warehouse
Selection sampling
Transformation Reduction
Pre-processing Cleaning
26Pattern Recognition for knowledge discovery
- Pattern discovery
- -- data mining is the process of identifying
patterns and relationships - In data that often are not obvious in large,
complex data sets. As such, - Data mining involves pattern recognition, by
extension, pattern discovery. - -- It is concerned with the automatic
classification of objects, character - Sequences, 3D structures, etc.
-
Pattern
Feature extraction
27Pattern Recognition for knowledge discovery
- Pattern discovery
- -- Process of data mining is concerned with
extracting patterns from the - data, typically using
- - classification (e.g., mapping to a class or
a group) - - Regression (e.g., statistical analysis)
- - link analysis (e.g., correlation of data)
- - Segmentation (e.g.,similarity function)
- - Deviation detection (e.g., difference from
norm)
28Motion object tracking
- Optical flow (motion velocity of object)
- - The intensity of motion image is a
function of the position and time - I f(x,y,t)
- - In a small time period, the intensity
change is small enough to be considered as
unchanged. After time step dt, point (x,y) moves
to point (xdx, ydy). If dt is small, then - f(x,y,t) f(xdx, ydy, tdt)
29Motion object tracking
- Optical flow (motion velocity of object)
- - Tailar series expansion
Velocity of point (x,y)
Gray level gradient
30Motion object tracking (contd)
- Optical flow (contd)
- - Note temporal gray level change at a
point is equal to - the gradient (spatial change)
times the motion velocity
- In order to solve the equation for two unknow
(u,v)
- Additional condition (constraint) is necessary.
- Assume the velocity in neighbor pixels is
similar. If there are two - different velocities in an image, we treat them
as two different - rigid object
31Motion object tracking (contd)
- Optical flow (contd)
- - Find (u,v), such that ?2 is minimized
Where
32Motion object tracking (contd)
- Optical flow (contd)
- - Solution (u,v)
Where
average of 4 neighbor velocities
reduce noise influence
Using relaxation method to get (u,v)
Note initial
33Similarity measurement of visual information
- Texture and shape
- (- application multi-media information
retrieval) -
Geometric shape (E.g., deformable mesh model)
Texture correlation
Hybrid measurement of Texture and shape
similarity
34Similarity measurement of visual information
- Texture signatures
-
- contrast
- coarseness
- directionality
-
Image standard deviation
Image fourth moment
K is obtained by maximizing the average intensity
of The moving window 2k x 2k
Magnitude of gradient vector
35Similarity measurement of visual information
- Shape features
-
- - Coefficients of 2D Fourier transformation
Note similarity is evaluated in the frequency
domain Advantage translation is reflected only
in a change of the F(m,n) phase,
while the F(m,n) module remains
constant Disadvantage changes in scale and
orientation of the object shape
determine substantial changes in this
representation - Chain encoding
A simple representation is obtained by
considering boundary points
36Similarity measurement of visual information
For a 2D function f(x,y), the moment of
order (pq) is defined as
Note moment sequence (mpq) and f(x,y)
are uniquely determined each other
37Similarity measurement of visual information
- Shape features (contd)
-
- - Moments (contd)
Central moments
In digital image
38Similarity measurement of visual information
- Shape features (contd)
-
- - Moments (contd)
Digital moment if we consider binary
image, f(x,y)1 in region R, the central
p,q-th moment is
Note m00 is represents the area of R
39Similarity measurement of visual information
- Shape features (contd)
-
- - Digital Moments (contd)
Normalized coordinates by the standard
deviations
Normalizing by the area
40Similarity measurement of visual information
- Shape features (contd)
-
- - Digital Moments (contd)
Normalized central moments are obtained from
central moments according to the following
transformation
Note Powerful descriptors based on digital
moments are functions of moments that are variant
under Scaling, translation, rotation or squeezing
41Similarity measurement of visual information
- Shape features (contd)
- - Digital Moments (contd)
A set of seven invariant moments can be derived
from the 2th and 3rd moments. Six of them are
rotation invariant (?1 - ?6) and one (?7) is both
skew and rotation invariant
42Similarity measurement of visual information
- Shape features (contd)
- - Digital Moments (contd)
A set of seven invariant moments can be derived
from the 2th and 3rd moments. Six of them are
rotation invariant (?1 - ?6) and one (?7) is both
skew and rotation invariant
43Support Vector Machine
- For a given set of points belonging to two
classes, an SVM tries to find a decision function
for an optimal separating hyperplane (OSH) which
maximizes the margin between the two sets of data
points. The solutions to such optimization
problems are derived by training the SVM with
sets of data similar to what it may encounter
during its application.
Classification using smallest margin hyperplane
(left) and optimal hyperplane (right)
44Support Vector Machine
Consider a binary classification task with a
training set of points
xi ? ?n, i 1, 2 N where each
point belongs to a corresponding class label yi ?
-1,1 and let the decision function be f(x)
sign(wx b), where w denotes the weights and b
the bias of the decision function. Therefore a
point lying directly on the hyperplane satisfies
the condition, wx b 0 and the points lying
on the right and left of the hyperplane must
satisfy the following conditions xi ?w? b ?
0 ?i (9) xi ?w? b ? 0 ?i (10) The
above Eq. (9) and (10) can be implicitly
formulated as yi (xi ?w? b) ? 0 ?i
and hold true for all input points if the
classification is correct.
45Support Vector Machine
There is only one optimal hyperplane that exists
which maximizes the distance between the support
vectors. For a maximum margin, the distance from
the hyperplane to the support vectors on either
side must be equidistant. Let us denote H- as the
hyperplane which satisfies xiw b -1 and H
as the hyperplane which satisfies xiw b
1. Hence the maximum margin between hyperplanes
becomes 2/w. Thus, the hyperplane that
optimally separates the data is the one that
minimizes w2 0.5wTw 0.5 (w12 w22 .
wn2), subject to constraints in This is a
classic quadratic optimization problem with
inequality constraints and is solved by the
saddle point of the Lagrange functional
(Lagrangian), where the ai are Lagrange
multipliers. The optimal saddle point (wo, bo,
ao) must be found by minimizing Lp with respect
to w and b and has to be maximized with respect
to non negative ai.