Title: Dr' Claude C' Chibelushi
1Fac. of Comp., Eng. Tech. Staffordshire
University
Image Processing, Computer Vision, and Pattern
Recognition
Statistical Pattern RecognitionPart II
Classical Model
Dr. Claude C. Chibelushi
2Outline
- Introduction
- Classical Pattern Recognition Model
- Feature Extraction
- Classification
- Applications
- Optical Character Recognition
- Others
- Summary
3Introduction
- Pattern recognition often consists of sequence of
processes - often configured according to classical pattern
recognition model
4Classical Recognition Model
Simplified block diagram
Facial image
Example
5Classical Recognition Model
- Performance issues
- All stages of recognition pipeline, and their
connection, affect performance - typical performance measures recognition
accuracy, speed, storage requirements - optimisation of components/connections often
required - careful selection / design / implementation of
- data capture equipment / environment
- processing techniques
6Feature Extraction
- Aim to capture discriminant characteristics of
pattern - Extracts pattern descriptors from raw data
- descriptors should contain information most
relevant to recognition task - descriptors may be numerical (quantitative ) or
linguistic - group of numerical descriptors often known as
feature vector
7Feature Extraction
- Common features for computer vision
- Shape descriptors
- external (e.g. boundary) internal (e.g. holes)
- Surface descriptors
- texture, brightness, colour, ...
- Spatial configuration descriptors
- arrangement of basic elements
- Temporal configuration descriptors
- deformation or motion of basic elements
8Feature Extraction
- Example
- Pattern recognition application gender detection
- Classes male, female
9Feature Extraction
- Example (ctd.)
- Chosen features height, silhouette area
10Feature Extraction
- Example (ctd.) pseudo code
- frontEnd(image)
- // foreground-background image segmentation
(e.g. thresholding possibly after noise removal) - prpImage preprocess(image)
-
- // calculate height and width of image segments
- features FeatExtr(prpImage)
- return features
11Feature Extraction
Graphical representation of feature
distribution Example (ctd.) data set of 5 male
and 5 female subjects
12Feature Extraction
- Graphical representation of feature distribution
- Example (ctd.) Feature plot 2D feature space
13Classification
- Aim to identify class (category ) to which
unknown pattern belongs - Wide variety of classifiers
- Classifier selection is problem-dependent
- use simple classifier if effective
14Classification
- Some classifiers
- Minimum-distance classifier
- classification based on distance from
class-prototype (e.g. average) pattern - closest prototype determines class
- k-nearest neighbour classifier
- classification based on distance from class
patterns (or clusters) - closest k patterns (or clusters) determine class
15Classification
- Some classifiers
- Bayesian classifier
- classification based on probability of belonging
to class - most likely class
- Artificial neural network classifier
- classification based on neuron activations (shown
to relate to class probability) - most likely class
16Classification
Minimum-distance classifier
2D feature space
17Classification
- Minimum-distance classifier pseudo code
- minDistClassifier(unknownFeatVect, prototypes)
- minDist MAX
- class UNKNOWN
- // assign unknown sample to class of nearest
prototype - for each protoVect in prototypes
- dist distance(unknownFeatVect, protoVect)
- if (dist lt minDist)
- minDist dist
- class class of protoVect
-
-
- return class
18Classification
- k-nearest neighbour classifier
2D feature space
19Classification
- k-nearest neighbour classifier pseudo code
- kNNClassifier(unknownFeatVect, dataSamples, k)
- size k
- initialise(minDist, minDistClasses, size)
- class UNKNOWN
- // find k samples nearest to unknown sample
- for each sampleVect in dataSamples
- dist distance(unknownFeatVect, sampleVect)
- updateKNearestDist(dist, minDist,
minDistClasses) -
- class majorityClass(minDistClasses)
- return class
20Classification
- Some distance metrics
- (for distance-based classifiers)
- Measure similarity between unknown pattern and
prototype pattern - based on differences between corresponding
features in both patterns, e.g. - Euclidean distance sum of squares of differences
- City-block (Manhattan or taxi-cab) distance sum
of absolute values of differences
21Classification
- Decision boundary for
- minimum-distance classifier
2D feature space
22Classification
- Limitations of minimum-distance classifier
- Prone to misclassification for
- high feature correlation
- problems requiring non-linear decision boundary,
e.g. - curved decision boundary
- data with subclasses (i.e. clusters)
- intricate decision boundary
23Classification
2D feature space
Minimum-distance classifier feature correlation
24Classification
2D feature space
Minimum-distance classifier curved decision
boundary
25Classification
2D feature space
Minimum-distance classifier distinct subclasses
26Classification
2D feature space
Minimum-distance classifier complex decision
boundary
27Classification
- Bayesian classifier
- Bayes rule P(CF) ( P(FC) x P(C) ) / P(F)
- P(CF) a posteriori probability that observed
feature F is from pattern that belongs to class C - P(FC) conditional probability of observing
feature F given class C - P(C) a priori probability that randomly-drawn
feature is from pattern that belongs to class C - P(F) total probability of observing feature F
- for mutually exclusive classes P(F) ?c (P(FC)
x P(C)) - P(F) is class independent, and can be seen as
normalisation so that ? P(CF) 1 - hence P(F) can be omitted from classification
calculations (since decision criterion is maximum
likelihood)
28Classification
- Bayesian classifier
- Extension to multiple features
- for simplicity, assume features are statistically
independent, for each class (i.e. conditional
independence) - naïve Bayesian classifier
- compute class conditional probability, for each
class, as product of class conditional
probability for each feature - Bayes rule P(CF1F2...Fn) ( P(F1F2...FnC) x
P(C) ) / P(F1F2...Fn) - ( P(F1C) x P(F2C) x ... x P(FnC) x P(C)
) / P(F1F2...Fn)
29Classification
- Bayesian classifier numerical example
- Image segmentation
- image grey-level image of scene showing road,
pavement and grass (e.g. for autonomous unmanned
vehicle) - known proportions of pixels in common scenes
PavementGrassRoad 325 - problem classify each pixel in image captured by
vehicle camera as road, pavement or grass
using Bayesian maximum likelihood classification - solution
- classifier Bayesian classifier
- feature F is pixel grey level
- for colour image, each colour component can be a
feature (see naïve Bayesian classifier)
30Classification
- Bayesian classifier numerical example (ctd.)
- Classifier training
- collect many typical images
- training set that contains representative data
- for each known class, select image areas
containing pixels from class - compute grey-level histogram for class
- normalise histogram so that sum of grey-level
frequencies is 1 - i.e. estimation of class conditional
probabilities (P(FC)) - estimate a priori probability for class (P(C))
31Classification
- Bayesian classifier numerical example (ctd.)
Grey level
Label
Total
0
1
2
3
Pavement
15
12
26
28
81
Grass
30
20
6
4
60
Road
4
0
20
16
40
Training Grey-level histogram matrix
32Classification
- Bayesian classifier numerical example (ctd.)
Grey level
Label
0
1
2
3
Pavement
15 / 81 0.19
12 / 81 0.15
26 / 81 0.32
28 / 81 0.35
Grass
30 / 60 0.5
20 / 60 0.33
6 / 60 0.1
4 / 60 0.067
Road
4 / 40 0.1
0 / 40 0
20 / 40 0.5
16 / 40 0.4
Training Class conditional probability matrix
33Classification
- Bayesian classifier numerical example (ctd.)
- Training
- PavementGrassRoad 325
- hence, a priori class probabilities are
- P(Pavement) 3 / (3 2 5) 0.3
- P(Grass) 2 / (3 2 5) 0.2
- P(Road) 5 / (3 2 5) 0.5
34Classification
- Bayesian classifier numerical example
Grey level
Label
0
1
2
3
Pavement
0.19 x 0.3 0.057
0.15 x 0.3 0.045
0.32 x 0.3 0.096
0.35 x 0.3 0.105
Grass
0.5 x 0.2 0.1
0.33 x 0.2 0.066
0.1 x 0.2 0.02
0.067 x 0.2 0.013
Road
0.1 x 0.5 0.05
0 x 0.5 0
0.5 x 0.5 0.25
0.4 x 0.5 0.2
Classification Scaled a posteriori class
probability matrix
35Classification
- Bayesian classifier numerical example
Grey level
0
1
2
3
Grass
Road
Label
Grass
Road
Classification Class label matrix
36Classification
- Classifier training
- Data-driven capture of parameters representing
statistical distribution or syntactic
configuration of salient class characteristics - supervised training
- class labels used during training
- unsupervised training
- class labels not used during training (e.g.
clustering)
37Classification
- Classifier testing
- Testing estimation of recognition accuracy
- often uses real data simulation may be used
(Monte Carlo) - Accuracy measure
- error rate (often expressed as percentage)
- e.g. correct recognition rate, insertion rate,
false acceptance rate, false rejection rate, ...
38Optical Character Recognition
39Optical Character Recognition
Generic OCR system
40Optical Character Recognition
- Feature extraction methods
- Spatial domain to frequency domain transform
- Hartley, Fourier, or other transform
- Statistics
- mean, variance projection histograms
orientation histograms
41Optical Character Recognition
- Feature extraction methods
- Miscellaneous
- geometric measures
- ratio of width and height of bounding box, ...
- description of skeletonised characters
- graph description comprising line segments (e.g.
strokes of Chinese characters) - number of L,T, or X junctions, ...
42Optical Character Recognition
- Feature extraction methods
Projection histograms
43Optical Character Recognition
- OCR examples artificial neural network (1)
44Optical Character Recognition
- OCR examples artificial neural network (2)
45Optical Character Recognition
- OCR examples
- (see AALs book)
46Other Recognition Applications
- Sample
- recognition of faces or facial expressions
- recognition of body movement (gestures, gait)
- recognition of handwriting (text, signature)
- industrial inspection
- autonomous vehicles, traffic monitoring
- ...
- (Exercise identify architectural components for
these applications, and discuss factors affecting
performance)
47Summary
- Classical pattern recognition model
- pre/post-processing
- feature extraction
- classification
- Feature extraction representation of
discriminant pattern characteristics - Classification
- wide variety of classifiers
- supervised or unsupervised classifier training
48Summary
- Components of generic OCR system
- image capture, image pre-processing, feature
extraction, classification, post- processing - Wide variety of features for OCR, e.g.
- frequency-domain representation
- statistical or geometric measurements
- skeleton descriptors
- ...