Object Recognition and Location in a Practical Context

About This Presentation

Title:

Object Recognition and Location in a Practical Context

Description:

These objects are nuts, screws, and washers measure area, perimeter, compute circularity. ... of convex hull perimeter (area) to actual object perimeter (area) ... – PowerPoint PPT presentation

Number of Views:95

Avg rating:3.0/5.0

Slides: 103

Provided by: dami128

Category:

more less

Transcript and Presenter's Notes

Title: Object Recognition and Location in a Practical Context

1
Object Recognition and Location in a Practical
Context

J. R. Parker
Digital Media Laboratory
University of Calgary

For some workers in the computer vision field the
determination of what objects are being seen is
a crucial step. The assigning of a name or label
to objects in an image is called object
recognition, and is difficult because two
objects that are really the same will rarely be
identical pixel by pixel in an image. We must
find ways to allow small natural variations that
occur to identical objects (in-class) while
discarding larger variations that occur between
classes.
The enhancements to an image that are required to
enable recognition are often the same or simpler
ones that for human image interpretation.
Reliability is an issue how well are objects
correctly recognized (what percentage of the
time)? In this talk we will discuss object
recognition in various practical contexts, and
talk about ways to increase the reliability of
recognizers. Examples from document analysis,
biological systems, computer interfaces, and
simple biometrics will be used.

3
(No Transcript)
4
Object Recognition?
Given a coherent region in an image, assign a
label to that region. Ignore variations due to
illumination, viewing position, occlusion,
background, colour variation, abstraction,
noise, E.G. This is a boat.
5
Object Recognition?
6
Object Location?
Where (2D or 3D position) in an image do objects
occur? Does not necessarily involve object
recognition.
7
Real life Unexpected objects
8
Summary of this talk

Methods for object recognition
pre-processing
features
What can we do with object recognition?
gestures symbols
signatures searches
Tracking small objects - Daphnia
Future audio, entertainment,interfaces

9
Vision/PR is HARD!

Ill Defined
many simple object classes have a huge number of
significant variations.
Noise is random and hard top model
Models used are not defined well enough for
segmentation

10
Which object is a chair? Which is a model?
11
Vision/PR is HARD!

Ill posed
One image does not give enough information to
disambiguate (underconstrained) geometry,
illumination, reflectance

12
Vision/PR is HARD!

Intractable
Combinatorial complexity.
Parallelism speeds up some steps, but
communication between phases is a problem.
Usually need to solve these problems in real
time. This means that we usually sacrifice
accuracy for speed.

13
So what do we do?

Restrict domain
Do no more than needed. (2 classes)
Improve input (On-camera processing)
Throw hardware at it (multiple processors)
Improve output (algorithm fusion, consensus)

14
Object Recognition

Classify methods as statistical or structural
Statistical use measures as independent in many
dimensions. Closest object to a set of
measurements indicates class.
Structural try to match to a pre-defined object
by using the structure (relationship between
parts).

15
Statistical Recognition uses Features
A feature is a measurement or a computed value
based on measurements.
16
Example of structural recognition E

Long horizontal
top
left
centre
Long vertical
Short horizontal
left
bottom
left
Long horizontal
17
Is this how we see things?

Probably not.
However, at the recognition stage, things like
scale and position do not matter (rotation does!)
Elements of the object are specified relative to
one another.
Can use formal grammars!

18
I prefer statistical methods

Take some objects for which we know labels
measure them.
Can use many measurements.
For fun, plot the measurements so we can see them.

19
I prefer statistical methods

These objects are nuts, screws, and washers
measure area, perimeter, compute circularity.
Do these measurements allow us to discriminate
the three object types (Classes)?

20
Statistical methods

Three features here.
Do these measurements allow us to discriminate
the three object types (Classes)? Means below.

Class Area Perimeter
Circularity Screw 117 58
2.3 Washer
378 124 3.5 Nut
224 72
1.9
21
Variation

V s/m
Coefficient of variation
Smaller is better (less variation)
Area Perimeter
Circularity
Screw 0.19 0.15
0.22
Washer 0.13 0.11
0.39
Nut 0.06 0.15
0.35

22
More than one feature
We can plot the position in feature space of all
objects!
W
N
N

N
W
W
Perimeter
W
W

S
W
N
N
S
S
S
S

N
S
S
S
S
S
Area
23
Nearest neighbor
W
N
N

N
W
W
Perimeter
W
W

S
W
N
N
S
S
S
S

N
S
S
S
S
S
Area
24
Nearest 5 neighbor
W
N
N

N
W
W
Perimeter
W
W

S
W
N
N
S
S
S
S

N
S
S
S
S
S
Area
25
Nearest centroid
W
N
N

N
W
W
Perimeter
W
W

S
W
N
N
S
S
S
S

N
S
S
S
S
S
Area
26
Other ways to classify
Decision functions (LDA) Support vector machines
(SVM) Various other definitions of distance And
a host of others
27
Recognition/image processing
Low level edge detection, noise reduction,
texture Medium level Segmentation -gt feature
extraction High level recognition
28
Features

The choice of features depends on the objects (EG
minimize variation) and the properties of the
mixture.

29
Simple (Scalar) Features

Area, perimeter (BAD - scale dependent)
Circularity c perimeter
4p area
Convexity ratio of convex hull perimeter (area)
to actual object perimeter (area).

2
30
Simple Features convex deficiencies
31
Vector features

Multiple values, sometime hierarchical
Some examples
Fourier descriptor Fourier coefficients of a
closed curve.
Outline feature max/min points from centroid,
etc.
Moment Sum of (deltas to powers) from centroid.
Profile E.G. distances from left margin of
image.
Slope histogram Distribution of slopes in the
object.
Signatures 2d-1d transform move along boundary

32
Vector features- Fourier descriptors

Represents the shape of the boundary
Let x(k) and y(k) k0,1,....,N-1 be the boundary
points for shape s.
a(u)DFTx(k)j.y(k)DFTs(k) gives us the
Fourier descriptors of s.
High frequency stuff at the end, low at the
beginning EG square -gtoval LPF

33
Vector features- Fourier descriptors
34
Vector features Outline measures

Object is broken into sections for matching.
Measurements in each section count.

In each section leftmost point, rightmost point,
greatest slope, greatest thickness
5
35
Vector features - Moments

I is the image
- Sum over all object pixels
In the object.
There are an infinite
number of moments.
- Image can be reconstructed perfectly from all
of them approximately from some of them.

36
Vector features profiles/projections
5
37
Vector features slope histograms

Travel the boundary of the shape.
At each sample point, compute the slope of the
tangent to the boundary at that point. Save in a
histogram.

Square
Circle
38
Vector features signatures

From the centroid, histogram the distance to the
boundary at t degree increments.

Square
Circle
39
How do we compare two vectors?
Normalize with respect to vector length (norm).
IE sum elements of vector and divide all
elements by this sum. Compute difference at each
element sum the squares. There are other ways to
do this.
40
How well do these simple methods work?

Depends on context.
How do we measure success?
Do controlled experiments on known data, and
calculate correct (success rate)
Confusion matrix not only success rates, but
how it fails and which classes are confusing.

41
Simple example Handprinted digits

Outline features
0 1 2 3 4 5 6
7 8 9
right 94 95 96 100 95 100 84 94 90 94
wrong 1 4 1 0 5 0 10 5 4
6
Reject 5 1 3 0 0 0 6 1 6
0
Overall success 94.2

42
Digits using convex deficiencies
Success rates 0 1 2
3 4 5 6 7 8 9 99 94 98 96
94 90 90 93 95 92 Overall success rate
94.1
43
Digits using vector templates(tell you about
these later)
Success rates 0 1 2
3 4 5 6 7 8 9 99 94 98 96
94 92 90 93 95 92 Overall success rate
94.3
44
Digits shape tracing
Success rates 0 1 2
3 4 5 6 7 8 9 100 94 92
99 90 94 100 88 99 98 Overall success rate
93.7
45
Digits neural network
48 pixels -gt input 96 hidden 10
out Backpropagation Success
rates 0 1 2 3 4 5 6 7
8 9 99 93 99 95 100 95 99 100 95
74 Overall success rate 94.9
46
So which method do we use?
Method Success rate Outline
94.2 Convex Deficiencies
94.1 Vector Templates 94.3 Shape
tracing 93.7 Neural net
94.9
47
We use all of them.
Our laboratory have achieved very high success
rates and reliability by merging multiple
algorithms. This was a quite new idea when we
started (1992) but has caught on. There are still
many places where this is not routinely thought
of.
48
Multiple algorithms How?

Simple voting methods
Majority vote- out of the 5 classifiers, a
majority (3 or more) selects the correct class.
Straight forward and intuitive.
For the 5 classifier problem we get a success
rate of 99.4 using a simple majority vote
We deleted classifier 4 which was seen to
reduce the success.

49
Multiple algorithms How?

Rank voting methods
Borda count - Each alternative is given a number
of points depending on where in the ranking it
has been placed. A selection is given no points
for placing last, one point for placing next to
last, and so on up to N-1 points for placing
first. In other words, the number of points given
to a selection is the number of classes below it
in the ranking.

50
Multiple algorithms How?

Borda Count
Voter 1 a b c d a (3) b (2) c (1) d
(0)
Voter 2 c a b d c (3) a (2) b (1) d
(0)
Voter 3 b d c a b (3) d (2) c (1) a
(0)
SO a 3 2 0 5
b 2 1 3 6 The winner! (most
points)
c 1 3 1 5
d 0 0 2 2

51
Multiple algorithms How?

Borda Count problems
Voter 1 a b c
Voter 2 a b c
Voter 3 a b c
Voter 4 b c a
Voter 5 b c a
Borda winner is choice b. However, the majority
vote winner is a.

52
Rules for voting systems

If a majority of classifiers select X as their
first choice, a voting rule should select X as
the overall winner. (Majority)
If there is an alternative X that could obtain a
majority of votes in pair-wise contests against
every other alternative, a voting rule should
choose X. (Condorcet)
If X is a winner, and then one or more voters
change their preferences in a way favourable to X
without changing the order of any other
alternative, X should remain the winner.
(Monotonicity)

53
Other combining schemes

We can use the magnitude of the success response
in a number of ways. The response for a given
unknown object is (four possible objects) MAX
RULE
x1 x2 x3 x4
C1 .6 .3 0 .1
C2 .5 .1 .3 .2
C3 .9 .0 .0 .1
C4 .5 .1 .2 .1
C5 .4 .3 .3 0
Max value .9 .3 .3 .2 winner (max)
is x1

54
Other combining schemes

Median RULE
x1 x2 x3 x4
C1 .6 .3 0 .1
C2 .5 .1 .3 .2
C3 .9 .0 .0 .1
C4 .5 .1 .2 .1
C5 .4 .3 .3 0
Median .4 .0 .0 .0 winner (max)
is x1

55
Other combining schemes

Sum RULE
x1 x2 x3 x4
C1 .6 .3 0 .1
C2 .5 .1 .3 .2
C3 .9 .0 .0 .1
C4 .5 .1 .2 .1
C5 .4 .3 .3 0
Sum 2.9 .8 .8 .5 winner (max)
is x1

56
Other combining schemes

Product RULE
x1 x2 x3 x4
C1 .6 .3 0 .1
C2 .5 .1 .3 .2
C3 .9 .0 .0 .1
C4 .5 .1 .2 .1
C5 .4 .3 .3 0
Sum .054 0 0 0 winner (max)
is x1

57
What can we do if we can recognize objects ?
Giving the computer an effective vision sense
allows a great many options not available now.
Many of these are in the realm of user
interface enhancements. A more natural and
intelligent interface is very desirable.
58
We are going to look at some vision apps that
work in the world
They must be fast enough to be practical. They
must be accurate enough to be useful. They take
into account the usual problems seen in the field
(robust).
59
Respirogram classification Gesture recognition
Hand-printed symbols
60
Respirograms

Why? Waste Water assessment. Dont let toxins
into your waste treatment system because they
will kill the useful bacteria.
Use a model reactor. Let in small samples and see
what happens. There are two decay processes of
interest, and oxygen consumption of both is
measured simultaneously.

61
Respirograms

Two types of curve.

62
We used many methods to determine curve class

Template matching 96.2
Profiles 90.3
Convex Deficiencies 95.4
Slope histogram 95.5
Signature 95.4
X1 convexity 78.4
X2 convexity 94.0
Moments 96.5
Rectangularity 68.7
Circularity 77.6

63
Majority vote is obvious choice

There are only two possible results (single and
double)
Simple majority vote gives 97.8 (1.3 better)
than Moments at 96.5
Consider a classifier that always votes for
double. There were 27 class 1 curves in the data
set, so 27 errors (out of 134) 79.8.

64
Majority vote is obvious choice

There are three classifiers used that have a
worse performance circularity, rectangularity,
and X1 convexity.
Deleting those three classifiers from the mix
improves the simple majority vote classifier to
98.5
This is 2 better than Moments at 96.5

65
Gesture recognition
This involves the recognition of the hand (as an
object) and its pose, possibly as a function of
time. My example here is applied to a game, and
was with one of my students (Mark Baumback)
66
The System

Solitaire
Most used windows application
Straight forward and intuitive interaction
Ability to pick up and move cards
Deal new cards or a new game
Hand gesture input
Visual and audio output

67
The System

The camera setup
Looks down onto any flat surface
Recognizes two different hand postures
An open hand
A closed hand

68
Open Hand

An open hand is a request for information
E.g. Retrieve information about a card or stack

69
Closed Hand

A closed hand carries out an action
E.g. Pick up a card

70
Hand Posture Recognition

Recognition process
Segmentation
Background subtraction
Skin detection
Forearm removal
Palm detection
Wrist detection
Recognition
Nearest neighbor classification

71
Stage 1 - Segmentation

Two stage process
Background subtraction
Skin detection

72
Background Subtraction

Subtraction
YCrCb Color space

73
Background SubtractionComplex Background
(Top) Reference image (Bottom) After
background subtraction
(Top) Input image
(Bottom) After skin detection
74
Skin Detection

Skin detection is applied to pixels that have
passed the background subtraction
Three different skin detectors
A static bounding region defined in both
HSV color space
YCrCb color space
Distance measure

75
Skin Detector 1HSV

The (R,G,B) values of the pixels are converted
into the HSV color space
A static bounding region
Ignoring intensity

76
Skin Detector 2YUV

The (R,G,B) values of the pixels are converted
into YCrCb
A static bounding region
Ignoring intensity

77
Skin Detector 3Distance Measure

Calculates the distance between the center of the
skin region defined in the YCrCb color space and
an input pixel
The closest 10 pixels are chosen
Is effective since the background subtraction has
already removed a large portion of the image
Compensates for
Slightly different skin tones
Altering lighting conditions

78
Skin DetectorDecision Fusion

A pixel is classified as skin if
It has passed the background subtraction
It was selected by
The HSV and YCrCb skin detectors
OR it has passed the distance measure

79
False Skin Classification

Removing falsely classified skin pixels
Many non skin objects fall into skin toned ranges
Most of these objects tend to be less dense then
actual skin
Segmented pixels which dont have enough
neighbors are removed
Causes falsely classified groups to break up
Pixel groups which are too small are then removed

80
False Skin Classification
Reference image
Pixel neighborhood removal
Segmented pixels removal Forearm
removal and recognition
81
Final Skin Classification

The final segmented object is then chosen

The segmented object
82
Stage 2 Forearm Removal

The forearm drastically changes the shape of a
segmented object
Usually caused by intersection of the image
boundaries and the arm
Forearm removal process
Locate the palm
Remove forearm where the palm intersects the
wrist

83
Locating The Palm

Calculate a distance transform
Largest value indicates
Location of the palm
Radius of circle that encompasses the palm

Distance transform with palm circle
84
Wrist Detection

The palm circle is extended by
1.2 Radius
1.5 Radius
1.7 Radius
Lines tangent to the palm circle are checked
The line must be below the center of the circle
The line crosses the largest segmented region

Write detection and forearm removal
85
Stage 3 - Recognition

Three different signatures are calculated on the
segmented object
Center Signature
Wrist Signature
Circle Signature
Each signature is compared against
Five open and five closed templates
A nearest neighbor classification is used for
each signature

86
The Angle Distance Signature

Plots the distance from a reference point to the
boundary of the object as a function of angle

87
Center Signature

Angle Distance Signature is calculated
Reference point is the center of the palm circle
Distances are normalized with respect to the
width of the wrist
Scale invariant
Incorporates difference between height vs. width
of the hand

88
Center Signature
89
Wrist Signature

Angle Distance Signature is calculated
Reference point is the center of the wrist
Distances are normalized with respect to the
width of the wrist

90
Wrist Signature
91
Circle Signature

Built from concentric circles
Origin is the center of the palm circle
7 concentric circles
The signature consists of the number of object
pixels versus total pixels within each each circle

92
Circle Signature
93
Rotation Compensation

Required for wrist and center signatures
A rotation of the hand is a translation of the
Angle Distance Signature
Input signature is circularly shifted along each
template
The minimum distance defines the rotation of the
hand

94
Rotation Adjustment

Example of a hand rotated by 90 degrees from
normal

Center signature Wrist signature
Circle signature
95
Rotation Results

The input signature is translated until the
minimum distance is found

Original signature Adjusted
signature
96
The Distance Measure

The distance between the template and the input
signature is calculated
Mean squared error across signature
The distance is calculated between each template
for each possible rotation for
Five open hands
Five closed hands

97
Nearest Neighbor Classification

There are 5 difference templates for each type
The one with the minimum distance (after
translation) is selected for each signature
This defines the classification (open / closed)
that each signature has chosen

98
Voting

The three signatures vote for either an open or
closed hand
The majority vote usually classifies the hand

Individual signature
accuracy rates
99
Voting Results

The majority vote between the three signatures
usually classifies the hand
All three signatures must agree to switch the
hand from a closed to an open posture
Correctly classifying a closed hand is more
important than falsely classifying an open hand

Classification results
100
Video Demonstration
101
Thank You