Title: Object Recognition and Location in a Practical Context
1Object Recognition and Location in a Practical
Context
- J. R. Parker
- Digital Media Laboratory
- University of Calgary
2- For some workers in the computer vision field the
determination of what objects are being seen is
a crucial step. The assigning of a name or label
to objects in an image is called object
recognition, and is difficult because two
objects that are really the same will rarely be
identical pixel by pixel in an image. We must
find ways to allow small natural variations that
occur to identical objects (in-class) while
discarding larger variations that occur between
classes. - The enhancements to an image that are required to
enable recognition are often the same or simpler
ones that for human image interpretation.
Reliability is an issue how well are objects
correctly recognized (what percentage of the
time)? In this talk we will discuss object
recognition in various practical contexts, and
talk about ways to increase the reliability of
recognizers. Examples from document analysis,
biological systems, computer interfaces, and
simple biometrics will be used. -
3(No Transcript)
4Object Recognition?
Given a coherent region in an image, assign a
label to that region. Ignore variations due to
illumination, viewing position, occlusion,
background, colour variation, abstraction,
noise, E.G. This is a boat.
5Object Recognition?
6Object Location?
Where (2D or 3D position) in an image do objects
occur? Does not necessarily involve object
recognition.
7Real life Unexpected objects
8Summary of this talk
- Methods for object recognition
- pre-processing
- features
- What can we do with object recognition?
- gestures symbols
- signatures searches
- Tracking small objects - Daphnia
- Future audio, entertainment,interfaces
9Vision/PR is HARD!
- Ill Defined
- many simple object classes have a huge number of
significant variations. - Noise is random and hard top model
- Models used are not defined well enough for
segmentation
10Which object is a chair? Which is a model?
11Vision/PR is HARD!
- Ill posed
- One image does not give enough information to
disambiguate (underconstrained) geometry,
illumination, reflectance
12Vision/PR is HARD!
- Intractable
- Combinatorial complexity.
- Parallelism speeds up some steps, but
communication between phases is a problem. - Usually need to solve these problems in real
time. This means that we usually sacrifice
accuracy for speed.
13So what do we do?
- Restrict domain
- Do no more than needed. (2 classes)
- Improve input (On-camera processing)
- Throw hardware at it (multiple processors)
- Improve output (algorithm fusion, consensus)
14Object Recognition
- Classify methods as statistical or structural
- Statistical use measures as independent in many
dimensions. Closest object to a set of
measurements indicates class. - Structural try to match to a pre-defined object
by using the structure (relationship between
parts).
15Statistical Recognition uses Features
A feature is a measurement or a computed value
based on measurements.
16Example of structural recognition E
Long horizontal
top
left
centre
Long vertical
Short horizontal
left
bottom
left
Long horizontal
17Is this how we see things?
- Probably not.
- However, at the recognition stage, things like
scale and position do not matter (rotation does!) - Elements of the object are specified relative to
one another. - Can use formal grammars!
18I prefer statistical methods
- Take some objects for which we know labels
measure them. - Can use many measurements.
- For fun, plot the measurements so we can see them.
19I prefer statistical methods
- These objects are nuts, screws, and washers
measure area, perimeter, compute circularity. - Do these measurements allow us to discriminate
the three object types (Classes)?
20Statistical methods
- Three features here.
- Do these measurements allow us to discriminate
the three object types (Classes)? Means below.
Class Area Perimeter
Circularity Screw 117 58
2.3 Washer
378 124 3.5 Nut
224 72
1.9
21Variation
- V s/m
- Coefficient of variation
- Smaller is better (less variation)
- Area Perimeter
Circularity - Screw 0.19 0.15
0.22 - Washer 0.13 0.11
0.39 - Nut 0.06 0.15
0.35
22More than one feature
We can plot the position in feature space of all
objects!
W
N
N
N
W
W
Perimeter
W
W
S
W
N
N
S
S
S
S
N
S
S
S
S
S
Area
23Nearest neighbor
W
N
N
N
W
W
Perimeter
W
W
S
W
N
N
S
S
S
S
N
S
S
S
S
S
Area
24Nearest 5 neighbor
W
N
N
N
W
W
Perimeter
W
W
S
W
N
N
S
S
S
S
N
S
S
S
S
S
Area
25Nearest centroid
W
N
N
N
W
W
Perimeter
W
W
S
W
N
N
S
S
S
S
N
S
S
S
S
S
Area
26Other ways to classify
Decision functions (LDA) Support vector machines
(SVM) Various other definitions of distance And
a host of others
27Recognition/image processing
Low level edge detection, noise reduction,
texture Medium level Segmentation -gt feature
extraction High level recognition
28Features
- The choice of features depends on the objects (EG
minimize variation) and the properties of the
mixture.
29Simple (Scalar) Features
- Area, perimeter (BAD - scale dependent)
- Circularity c perimeter
- 4p area
- Convexity ratio of convex hull perimeter (area)
to actual object perimeter (area).
2
30Simple Features convex deficiencies
31Vector features
- Multiple values, sometime hierarchical
- Some examples
- Fourier descriptor Fourier coefficients of a
closed curve. - Outline feature max/min points from centroid,
etc. - Moment Sum of (deltas to powers) from centroid.
- Profile E.G. distances from left margin of
image. - Slope histogram Distribution of slopes in the
object. - Signatures 2d-1d transform move along boundary
32Vector features- Fourier descriptors
- Represents the shape of the boundary
- Let x(k) and y(k) k0,1,....,N-1 be the boundary
points for shape s. - a(u)DFTx(k)j.y(k)DFTs(k) gives us the
Fourier descriptors of s. - High frequency stuff at the end, low at the
beginning EG square -gtoval LPF
33Vector features- Fourier descriptors
34Vector features Outline measures
- Object is broken into sections for matching.
Measurements in each section count.
In each section leftmost point, rightmost point,
greatest slope, greatest thickness
5
35Vector features - Moments
- I is the image
- - Sum over all object pixels
- In the object.
- There are an infinite
- number of moments.
- - Image can be reconstructed perfectly from all
of them approximately from some of them.
36Vector features profiles/projections
5
37Vector features slope histograms
- Travel the boundary of the shape.
- At each sample point, compute the slope of the
tangent to the boundary at that point. Save in a
histogram.
Square
Circle
38Vector features signatures
- From the centroid, histogram the distance to the
boundary at t degree increments.
Square
Circle
39How do we compare two vectors?
Normalize with respect to vector length (norm).
IE sum elements of vector and divide all
elements by this sum. Compute difference at each
element sum the squares. There are other ways to
do this.
40How well do these simple methods work?
- Depends on context.
- How do we measure success?
- Do controlled experiments on known data, and
calculate correct (success rate) - Confusion matrix not only success rates, but
how it fails and which classes are confusing.
41Simple example Handprinted digits
- Outline features
- 0 1 2 3 4 5 6
7 8 9 - right 94 95 96 100 95 100 84 94 90 94
- wrong 1 4 1 0 5 0 10 5 4
6 - Reject 5 1 3 0 0 0 6 1 6
0 - Overall success 94.2
42Digits using convex deficiencies
Success rates 0 1 2
3 4 5 6 7 8 9 99 94 98 96
94 90 90 93 95 92 Overall success rate
94.1
43Digits using vector templates(tell you about
these later)
Success rates 0 1 2
3 4 5 6 7 8 9 99 94 98 96
94 92 90 93 95 92 Overall success rate
94.3
44Digits shape tracing
Success rates 0 1 2
3 4 5 6 7 8 9 100 94 92
99 90 94 100 88 99 98 Overall success rate
93.7
45Digits neural network
48 pixels -gt input 96 hidden 10
out Backpropagation Success
rates 0 1 2 3 4 5 6 7
8 9 99 93 99 95 100 95 99 100 95
74 Overall success rate 94.9
46So which method do we use?
Method Success rate Outline
94.2 Convex Deficiencies
94.1 Vector Templates 94.3 Shape
tracing 93.7 Neural net
94.9
47We use all of them.
Our laboratory have achieved very high success
rates and reliability by merging multiple
algorithms. This was a quite new idea when we
started (1992) but has caught on. There are still
many places where this is not routinely thought
of.
48Multiple algorithms How?
- Simple voting methods
- Majority vote- out of the 5 classifiers, a
majority (3 or more) selects the correct class. - Straight forward and intuitive.
- For the 5 classifier problem we get a success
rate of 99.4 using a simple majority vote - We deleted classifier 4 which was seen to
reduce the success.
49Multiple algorithms How?
- Rank voting methods
- Borda count - Each alternative is given a number
of points depending on where in the ranking it
has been placed. A selection is given no points
for placing last, one point for placing next to
last, and so on up to N-1 points for placing
first. In other words, the number of points given
to a selection is the number of classes below it
in the ranking.
50Multiple algorithms How?
- Borda Count
- Voter 1 a b c d a (3) b (2) c (1) d
(0) - Voter 2 c a b d c (3) a (2) b (1) d
(0) - Voter 3 b d c a b (3) d (2) c (1) a
(0) - SO a 3 2 0 5
- b 2 1 3 6 The winner! (most
points) - c 1 3 1 5
- d 0 0 2 2
51Multiple algorithms How?
- Borda Count problems
- Voter 1 a b c
- Voter 2 a b c
- Voter 3 a b c
- Voter 4 b c a
- Voter 5 b c a
- Borda winner is choice b. However, the majority
vote winner is a.
52Rules for voting systems
- If a majority of classifiers select X as their
first choice, a voting rule should select X as
the overall winner. (Majority) - If there is an alternative X that could obtain a
majority of votes in pair-wise contests against
every other alternative, a voting rule should
choose X. (Condorcet) - If X is a winner, and then one or more voters
change their preferences in a way favourable to X
without changing the order of any other
alternative, X should remain the winner.
(Monotonicity)
53Other combining schemes
- We can use the magnitude of the success response
in a number of ways. The response for a given
unknown object is (four possible objects) MAX
RULE - x1 x2 x3 x4
- C1 .6 .3 0 .1
- C2 .5 .1 .3 .2
- C3 .9 .0 .0 .1
- C4 .5 .1 .2 .1
- C5 .4 .3 .3 0
- Max value .9 .3 .3 .2 winner (max)
is x1
54Other combining schemes
- Median RULE
- x1 x2 x3 x4
- C1 .6 .3 0 .1
- C2 .5 .1 .3 .2
- C3 .9 .0 .0 .1
- C4 .5 .1 .2 .1
- C5 .4 .3 .3 0
- Median .4 .0 .0 .0 winner (max)
is x1
55Other combining schemes
- Sum RULE
- x1 x2 x3 x4
- C1 .6 .3 0 .1
- C2 .5 .1 .3 .2
- C3 .9 .0 .0 .1
- C4 .5 .1 .2 .1
- C5 .4 .3 .3 0
- Sum 2.9 .8 .8 .5 winner (max)
is x1
56Other combining schemes
- Product RULE
- x1 x2 x3 x4
- C1 .6 .3 0 .1
- C2 .5 .1 .3 .2
- C3 .9 .0 .0 .1
- C4 .5 .1 .2 .1
- C5 .4 .3 .3 0
- Sum .054 0 0 0 winner (max)
is x1
57What can we do if we can recognize objects ?
Giving the computer an effective vision sense
allows a great many options not available now.
Many of these are in the realm of user
interface enhancements. A more natural and
intelligent interface is very desirable.
58We are going to look at some vision apps that
work in the world
They must be fast enough to be practical. They
must be accurate enough to be useful. They take
into account the usual problems seen in the field
(robust).
59Respirogram classification Gesture recognition
Hand-printed symbols
60Respirograms
- Why? Waste Water assessment. Dont let toxins
into your waste treatment system because they
will kill the useful bacteria. - Use a model reactor. Let in small samples and see
what happens. There are two decay processes of
interest, and oxygen consumption of both is
measured simultaneously.
61Respirograms
62We used many methods to determine curve class
- Template matching 96.2
- Profiles 90.3
- Convex Deficiencies 95.4
- Slope histogram 95.5
- Signature 95.4
- X1 convexity 78.4
- X2 convexity 94.0
- Moments 96.5
- Rectangularity 68.7
- Circularity 77.6
63Majority vote is obvious choice
- There are only two possible results (single and
double) - Simple majority vote gives 97.8 (1.3 better)
than Moments at 96.5 - Consider a classifier that always votes for
double. There were 27 class 1 curves in the data
set, so 27 errors (out of 134) 79.8.
64Majority vote is obvious choice
- There are three classifiers used that have a
worse performance circularity, rectangularity,
and X1 convexity. - Deleting those three classifiers from the mix
improves the simple majority vote classifier to
98.5 - This is 2 better than Moments at 96.5
65Gesture recognition
This involves the recognition of the hand (as an
object) and its pose, possibly as a function of
time. My example here is applied to a game, and
was with one of my students (Mark Baumback)
66The System
- Solitaire
- Most used windows application
- Straight forward and intuitive interaction
- Ability to pick up and move cards
- Deal new cards or a new game
- Hand gesture input
- Visual and audio output
67The System
- The camera setup
- Looks down onto any flat surface
- Recognizes two different hand postures
- An open hand
- A closed hand
68Open Hand
- An open hand is a request for information
- E.g. Retrieve information about a card or stack
69Closed Hand
- A closed hand carries out an action
- E.g. Pick up a card
70Hand Posture Recognition
- Recognition process
- Segmentation
- Background subtraction
- Skin detection
- Forearm removal
- Palm detection
- Wrist detection
- Recognition
- Nearest neighbor classification
-
71Stage 1 - Segmentation
- Two stage process
- Background subtraction
- Skin detection
72Background Subtraction
- Subtraction
- YCrCb Color space
-
73Background SubtractionComplex Background
(Top) Reference image (Bottom) After
background subtraction
(Top) Input image
(Bottom) After skin detection
74Skin Detection
- Skin detection is applied to pixels that have
passed the background subtraction - Three different skin detectors
- A static bounding region defined in both
- HSV color space
- YCrCb color space
- Distance measure
-
75Skin Detector 1HSV
- The (R,G,B) values of the pixels are converted
into the HSV color space - A static bounding region
- Ignoring intensity
76Skin Detector 2YUV
- The (R,G,B) values of the pixels are converted
into YCrCb - A static bounding region
- Ignoring intensity
77Skin Detector 3Distance Measure
- Calculates the distance between the center of the
skin region defined in the YCrCb color space and
an input pixel - The closest 10 pixels are chosen
- Is effective since the background subtraction has
already removed a large portion of the image - Compensates for
- Slightly different skin tones
- Altering lighting conditions
78Skin DetectorDecision Fusion
- A pixel is classified as skin if
- It has passed the background subtraction
- It was selected by
- The HSV and YCrCb skin detectors
- OR it has passed the distance measure
79False Skin Classification
- Removing falsely classified skin pixels
- Many non skin objects fall into skin toned ranges
- Most of these objects tend to be less dense then
actual skin - Segmented pixels which dont have enough
neighbors are removed - Causes falsely classified groups to break up
- Pixel groups which are too small are then removed
80False Skin Classification
Reference image
Pixel neighborhood removal
Segmented pixels removal Forearm
removal and recognition
81Final Skin Classification
- The final segmented object is then chosen
The segmented object
82Stage 2 Forearm Removal
- The forearm drastically changes the shape of a
segmented object - Usually caused by intersection of the image
boundaries and the arm - Forearm removal process
- Locate the palm
- Remove forearm where the palm intersects the
wrist
83Locating The Palm
- Calculate a distance transform
- Largest value indicates
- Location of the palm
- Radius of circle that encompasses the palm
Distance transform with palm circle
84Wrist Detection
- The palm circle is extended by
- 1.2 Radius
- 1.5 Radius
- 1.7 Radius
- Lines tangent to the palm circle are checked
- The line must be below the center of the circle
- The line crosses the largest segmented region
Write detection and forearm removal
85Stage 3 - Recognition
- Three different signatures are calculated on the
segmented object - Center Signature
- Wrist Signature
- Circle Signature
- Each signature is compared against
- Five open and five closed templates
- A nearest neighbor classification is used for
each signature
86The Angle Distance Signature
- Plots the distance from a reference point to the
boundary of the object as a function of angle
87Center Signature
- Angle Distance Signature is calculated
- Reference point is the center of the palm circle
- Distances are normalized with respect to the
width of the wrist - Scale invariant
- Incorporates difference between height vs. width
of the hand
88Center Signature
89Wrist Signature
- Angle Distance Signature is calculated
- Reference point is the center of the wrist
- Distances are normalized with respect to the
width of the wrist
90Wrist Signature
91Circle Signature
- Built from concentric circles
- Origin is the center of the palm circle
- 7 concentric circles
- The signature consists of the number of object
pixels versus total pixels within each each circle
92Circle Signature
93Rotation Compensation
- Required for wrist and center signatures
- A rotation of the hand is a translation of the
Angle Distance Signature - Input signature is circularly shifted along each
template - The minimum distance defines the rotation of the
hand
94Rotation Adjustment
- Example of a hand rotated by 90 degrees from
normal
Center signature Wrist signature
Circle signature
95Rotation Results
- The input signature is translated until the
minimum distance is found
Original signature Adjusted
signature
96The Distance Measure
- The distance between the template and the input
signature is calculated - Mean squared error across signature
- The distance is calculated between each template
for each possible rotation for - Five open hands
- Five closed hands
97Nearest Neighbor Classification
- There are 5 difference templates for each type
- The one with the minimum distance (after
translation) is selected for each signature - This defines the classification (open / closed)
that each signature has chosen
98Voting
- The three signatures vote for either an open or
closed hand - The majority vote usually classifies the hand
Individual signature
accuracy rates
99Voting Results
- The majority vote between the three signatures
usually classifies the hand - All three signatures must agree to switch the
hand from a closed to an open posture - Correctly classifying a closed hand is more
important than falsely classifying an open hand
Classification results
100Video Demonstration
101Thank You
102(No Transcript)