PatternObject Recognition presentation

About This Presentation

Transcript and Presenter's Notes

Title: PatternObject Recognition

1
Pattern/Object Recognition

Statistical-based
Feature-based
Neural-network based
Structure-based

2
Some concepts
Class a set of objects having some important
common properties Classification a process
that assigns a label to an object according to
some representation of
the objects properties Classifier a device or
algorithm that inputs an object representation
and outputs a class
label Pattern an arrangement of descriptors,
such as features, length, area,
texture, etc. Feature extractor extracts
information relevant to classification from
the data input by
sensors.
3
Some concepts

Example of features used for representation of
character
A, B, C, Z
- area
- height/width
- number of holes
- center
- axis direction
- second moments
Feature extraction
We select the features or combine the
features (e.g., fusion of features)
in order to achieve the most probability
recognition rate

4
Some concepts

Example of features used for representation of
character (contd)
Clustering (classification)
E.g.,
-- K-NN (Nearest neighbors)
-- K-means algorithm

5
Some concepts

Example of features used for representation of
character (contd)
Evaluation of the recognition
Cumulative Match Score
A plot of rank n scores versus probability of
correct identification is called a Cumulative
Match Score. An example is given
Below

6
Some concepts

Example of features used for representation of
character (contd)
Evaluation of the recognition
False accept rate
the system thinks that the individual is who he
says he is, even though he is not. This is called
a false accept. The percentage of times that a
false accept occurs across all individuals is
called the false accept rate.
False reject rate
the system thinks that the individual is not who
they say they are, even though they really are.
This is called a false reject. The percentage
of times that a false reject occurs across all
individuals is called the false reject rate.
Subtracting this rate from 100 (100 - false
reject rate) gives us the Probability of
Verification.
Receiver operating characteristic
The false accept rate and the probability of
verification are not mutually exclusive. Instead,
there is a give-take relationship between the
two. The system parameters can be changed to
receive a lower false acceptance rate, but this
also lowers the probability of verification. A
plot that shows this relationship is called a
receiver operating characteristic or ROC. An
example verification ROC curve is shown below

7
Some concepts

Example of features used for representation of
character (contd)
Evaluation of the recognition (contd)
Equal error rate
Some resources utilize a term called the equal
error rate to show performance of a biometric
system when operating using the verification
task. The equal error rate is the rate at which
the false accept rate is exactly equal to the
false reject rate. For example, a straight line
was drawn on our example ROC curve from the upper
left corner (coordinates 0,1) to the lower right
corner (coordinates 1,0), the equal error rate is
the point at which the graph crosses this line.
This one point on the curve is not adequate to
fully explain the performance of biometric
systems being used for verification.

8
Algorithms of classification

Two classes are known

NN (nearest neighbor classifier)

W2
W1
x
If distance d(x, W1) lt d(x,w2) ? X ? W1
9
Algorithms of classification (contd)

Two classes are known

K-NN (K-nearest neighbor classifier)
- K nearest neighbors of P closer to W1 than
W2, then P? W1

W2
W1
P
10
Algorithms of classification (contd)

K-means clustering (dynamic classifier)
- Two classes are unknown, but the number of
classes is known
Algorithm
(1) Arbitrarily choose two samples as the
initial clusters centers
(2) Distribute the pattern samples into the
cluster domain according to
the distance from the sample to the
cluster center.
(3) Update the cluster centers
(4) Repeat (2) and (3) until the updated
cluster centers are unchanged
(5) End

11
Algorithms of classification (contd)

Decision function
- Discrimination function
- For N pattern classes w1, w2,, wN, the
pattern recognition problem is to find N decision
function d1(x), d2(x),, dN(x) with the property
that
If a pattern x belongs to class
wi,
Then di(x) gt dj(x) where
j1,2,..,N (i?j)
E.g., statistical classifier Bayes Classifier

12
Algorithms of classification (contd)

Classification system

F1(x,K) F2(x,K) Fm(x,K)
X1 X2 Xd
Compare decide
C(x)
Output classification
Input feature Vector X
Distance or probability Computations by
discriminant function F with knowledge K
13
Algorithms of classification (contd)
E.g., statistical classifier Bayes Classifier
(contd) -- The principle is to minimize the
total average loss if we made the bad decision
dj(x) p(x/wj)p(wj)
Probability of occurrence of class wj
Probability density function of the patterns from
class wj
E.g., Given two classes w1 and w2 if
p(x/w1)p(w1) gt p(x/w2)p(w2) ? x?w1 if
p(x/w1)p(w1) lt p(x/w2)p(w2) ? x?w2
14
Algorithms of classification (contd)

Neural Networks
- Training patterns are used for estimating
the parameters for decision functions
- Training process is to use training set to
obtain the decision functions
Discrimination function

Pattern vector
Weight vector
Pattern vector
15
Algorithms of classification (contd)

Neural Networks (NN) (contd)

NN Training algorithm
-- for linearly separable classes
-- e.g., two classes w1 and w2

16
Algorithms of classification (contd)

NN Training algorithm (contd)
-- e.g., two classes w1 and w2
First, arbitrarily choose initial weight
vector
Second, at kth iterative step
(1) if and
then replace w(k) by

(2) If and then replace
w(k) by
(3) Otherwise w(k1) w(k)
Note Where c is a positive correction
increment.
17
Algorithms of classification (contd)

Hidden Markov Models
-- A statistical approach to constructing
classifiers
-- A statistical model for an ordered
sequence of symbols
-- Acting as a stochastic state machine that
generates a symbol each time a transition is made
from one state to the next.
-- Transitions between states are specified
by transition probabilities.
-- A Markov process is a process that moves
from state to state depending on the previous n
states.

18
Algorithms of classification (contd)

Hidden Markov Models (contd)
-- Concept of Markov chain a process that
can be in one of a number of states at any given
time.
-- Each state generates an observation, from
which the state sequence can be inferred.
-- A Markov chain is defined by the
probabilities for each transition in state
occurring, given the current state
-- It assumes that the probability of moving
from one state to another does not vary with time
-- HMM is a variation of a Morkov chain in
which the states in the chain are hidden.

19
Algorithms of classification (contd)

Hidden Markov Models (contd)
-- Like a neural network classifier, a HMM
must be trained before it can be used.
-- Training establishes the transition
probabilities for each state in the Markov chain.
-- When presented with data in the database,
the HMM provides a measure of how close the data
patterns resemble the data used to train the
model.

20
Algorithms of classification (contd)

Hidden Markov Models (contd)

Example of Markov chain A, B, and C represent
states, and the arrows connecting the states
represent transitions
21
Pattern Matching

Computational methods for pattern matching and
sequence alignment
-- e.g., Bayesian methods, neural networks,
HMM, genetic algorithm, dynamic programming, dot
matrix (syntactic method) , etc.

Dot matrix analysis
-- visually intuitive method of pattern
detection
-- it is used for bio-informatics data analysis

22
Pattern Matching

Dot matrix analysis (contd)

A T T C G G C A T T C
A T T C G A C A T T
23
Pattern Matching
Note a filter with a combination of Window and
stringency is used to Screen the noise
pattern. The window refers to the number Of
data points examined at a time The stringency is
the minimum number Of matches required within
each window. For example With a filter in which
the Window size is set to 2 and the stringency
To 1, a dot is printed at a matrix position
Only if 1 out of 2 positions is identified Dot
matrix analysis is useful in identifying Repeats
-- repeating characters or short sequences

Dot matrix analysis (contd)

A T T C G G C A T T C
A T T C G A C A T T
24
Pattern Recognition for knowledge discovery

Knowledge discovery process
-- Selection and sampling of the appropriate
data from the databases
-- Preprocessing and cleaning of the data to
remove redundancies,
errors, and conflicts
-- Transforming and reducing data to a
format more suitable for the
data mining
-- Data mining
-- Evaluation of the mined data
-- Visualization of the evaluation results
-- Designing new data queries to test new
hypotheses and returning
step 1.

25
Pattern Recognition for knowledge discovery
Visualization

Knowledge discovery process

Evaluation
Separate databases
Data mining
Data warehouse
Selection sampling
Transformation Reduction
Pre-processing Cleaning
26
Pattern Recognition for knowledge discovery

Pattern discovery
-- data mining is the process of identifying
patterns and relationships
In data that often are not obvious in large,
complex data sets. As such,
Data mining involves pattern recognition, by
extension, pattern discovery.
-- It is concerned with the automatic
classification of objects, character
Sequences, 3D structures, etc.

Pattern
Feature extraction
27
Pattern Recognition for knowledge discovery

Pattern discovery
-- Process of data mining is concerned with
extracting patterns from the
data, typically using
- classification (e.g., mapping to a class or
a group)
- Regression (e.g., statistical analysis)
- link analysis (e.g., correlation of data)
- Segmentation (e.g.,similarity function)
- Deviation detection (e.g., difference from
norm)

28
Motion object tracking

Optical flow (motion velocity of object)
- The intensity of motion image is a
function of the position and time
I f(x,y,t)
- In a small time period, the intensity
change is small enough to be considered as
unchanged. After time step dt, point (x,y) moves
to point (xdx, ydy). If dt is small, then
f(x,y,t) f(xdx, ydy, tdt)

29
Motion object tracking

Optical flow (motion velocity of object)
- Tailar series expansion

Velocity of point (x,y)
Gray level gradient
30
Motion object tracking (contd)

Optical flow (contd)
- Note temporal gray level change at a
point is equal to
the gradient (spatial change)
times the motion velocity

In order to solve the equation for two unknow
(u,v)

Additional condition (constraint) is necessary.
Assume the velocity in neighbor pixels is
similar. If there are two
different velocities in an image, we treat them
as two different
rigid object

31
Motion object tracking (contd)

Optical flow (contd)
- Find (u,v), such that ?2 is minimized

Where
32
Motion object tracking (contd)

Optical flow (contd)
- Solution (u,v)

Where
average of 4 neighbor velocities
reduce noise influence
Using relaxation method to get (u,v)
Note initial
33
Similarity measurement of visual information

Texture and shape
(- application multi-media information
retrieval)

Geometric shape (E.g., deformable mesh model)
Texture correlation
Hybrid measurement of Texture and shape
similarity
34
Similarity measurement of visual information

Texture signatures
contrast
coarseness
directionality

Image standard deviation
Image fourth moment
K is obtained by maximizing the average intensity
of The moving window 2k x 2k
Magnitude of gradient vector
35
Similarity measurement of visual information

Shape features
- Coefficients of 2D Fourier transformation

Note similarity is evaluated in the frequency
domain Advantage translation is reflected only
in a change of the F(m,n) phase,
while the F(m,n) module remains
constant Disadvantage changes in scale and
orientation of the object shape
determine substantial changes in this
representation - Chain encoding
A simple representation is obtained by
considering boundary points
36
Similarity measurement of visual information

Shape features
- Moments

For a 2D function f(x,y), the moment of
order (pq) is defined as
Note moment sequence (mpq) and f(x,y)
are uniquely determined each other
37
Similarity measurement of visual information

Shape features (contd)
- Moments (contd)

Central moments
In digital image
38
Similarity measurement of visual information

Shape features (contd)
- Moments (contd)

Digital moment if we consider binary
image, f(x,y)1 in region R, the central
p,q-th moment is
Note m00 is represents the area of R
39
Similarity measurement of visual information

Shape features (contd)
- Digital Moments (contd)

Normalized coordinates by the standard
deviations
Normalizing by the area
40
Similarity measurement of visual information

Shape features (contd)
- Digital Moments (contd)

Normalized central moments are obtained from
central moments according to the following
transformation
Note Powerful descriptors based on digital
moments are functions of moments that are variant
under Scaling, translation, rotation or squeezing
41
Similarity measurement of visual information

Shape features (contd)
- Digital Moments (contd)

A set of seven invariant moments can be derived
from the 2th and 3rd moments. Six of them are
rotation invariant (?1 - ?6) and one (?7) is both
skew and rotation invariant
42
Similarity measurement of visual information

Shape features (contd)
- Digital Moments (contd)

A set of seven invariant moments can be derived
from the 2th and 3rd moments. Six of them are
rotation invariant (?1 - ?6) and one (?7) is both
skew and rotation invariant
43
Support Vector Machine

For a given set of points belonging to two
classes, an SVM tries to find a decision function
for an optimal separating hyperplane (OSH) which
maximizes the margin between the two sets of data
points. The solutions to such optimization
problems are derived by training the SVM with
sets of data similar to what it may encounter
during its application.

Classification using smallest margin hyperplane
(left) and optimal hyperplane (right)
44
Support Vector Machine
Consider a binary classification task with a
training set of points
xi ? ?n, i 1, 2 N where each
point belongs to a corresponding class label yi ?
-1,1 and let the decision function be f(x)
sign(wx b), where w denotes the weights and b
the bias of the decision function. Therefore a
point lying directly on the hyperplane satisfies
the condition, wx b 0 and the points lying
on the right and left of the hyperplane must
satisfy the following conditions xi ?w? b ?
0 ?i (9) xi ?w? b ? 0 ?i (10) The
above Eq. (9) and (10) can be implicitly
formulated as yi (xi ?w? b) ? 0 ?i
and hold true for all input points if the
classification is correct.
45
Support Vector Machine
There is only one optimal hyperplane that exists
which maximizes the distance between the support
vectors. For a maximum margin, the distance from
the hyperplane to the support vectors on either
side must be equidistant. Let us denote H- as the
hyperplane which satisfies xiw b -1 and H
as the hyperplane which satisfies xiw b
1. Hence the maximum margin between hyperplanes
becomes 2/w. Thus, the hyperplane that
optimally separates the data is the one that
minimizes w2 0.5wTw 0.5 (w12 w22 .
wn2), subject to constraints in This is a
classic quadratic optimization problem with
inequality constraints and is solved by the
saddle point of the Lagrange functional
(Lagrangian), where the ai are Lagrange
multipliers. The optimal saddle point (wo, bo,
ao) must be found by minimizing Lp with respect
to w and b and has to be maximized with respect
to non negative ai.

Write a Comment

User Comments (0)

About PowerShow.com

PatternObject Recognition PowerPoint PPT Presentation