Object Recognition and Location in a Practical Context - PowerPoint PPT Presentation

1 / 102
About This Presentation
Title:

Object Recognition and Location in a Practical Context

Description:

These objects are nuts, screws, and washers measure area, perimeter, compute circularity. ... of convex hull perimeter (area) to actual object perimeter (area) ... – PowerPoint PPT presentation

Number of Views:95
Avg rating:3.0/5.0
Slides: 103
Provided by: dami128
Category:

less

Transcript and Presenter's Notes

Title: Object Recognition and Location in a Practical Context


1
Object Recognition and Location in a Practical
Context
  • J. R. Parker
  • Digital Media Laboratory
  • University of Calgary

2
  • For some workers in the computer vision field the
    determination of what objects are being seen is
    a crucial step. The assigning of a name or label
    to objects in an image is called object
    recognition, and is difficult because two
    objects that are really the same will rarely be
    identical pixel by pixel in an image. We must
    find ways to allow small natural variations that
    occur to identical objects (in-class) while
    discarding larger variations that occur between
    classes.
  • The enhancements to an image that are required to
    enable recognition are often the same or simpler
    ones that for human image interpretation.
    Reliability is an issue how well are objects
    correctly recognized (what percentage of the
    time)? In this talk we will discuss object
    recognition in various practical contexts, and
    talk about ways to increase the reliability of
    recognizers. Examples from document analysis,
    biological systems, computer interfaces, and
    simple biometrics will be used.

3
(No Transcript)
4
Object Recognition?
Given a coherent region in an image, assign a
label to that region. Ignore variations due to
illumination, viewing position, occlusion,
background, colour variation, abstraction,
noise, E.G. This is a boat.
5
Object Recognition?
6
Object Location?
Where (2D or 3D position) in an image do objects
occur? Does not necessarily involve object
recognition.
7
Real life Unexpected objects
8
Summary of this talk
  • Methods for object recognition
  • pre-processing
  • features
  • What can we do with object recognition?
  • gestures symbols
  • signatures searches
  • Tracking small objects - Daphnia
  • Future audio, entertainment,interfaces

9
Vision/PR is HARD!
  • Ill Defined
  • many simple object classes have a huge number of
    significant variations.
  • Noise is random and hard top model
  • Models used are not defined well enough for
    segmentation

10
Which object is a chair? Which is a model?
11
Vision/PR is HARD!
  • Ill posed
  • One image does not give enough information to
    disambiguate (underconstrained) geometry,
    illumination, reflectance

12
Vision/PR is HARD!
  • Intractable
  • Combinatorial complexity.
  • Parallelism speeds up some steps, but
    communication between phases is a problem.
  • Usually need to solve these problems in real
    time. This means that we usually sacrifice
    accuracy for speed.

13
So what do we do?
  • Restrict domain
  • Do no more than needed. (2 classes)
  • Improve input (On-camera processing)
  • Throw hardware at it (multiple processors)
  • Improve output (algorithm fusion, consensus)

14
Object Recognition
  • Classify methods as statistical or structural
  • Statistical use measures as independent in many
    dimensions. Closest object to a set of
    measurements indicates class.
  • Structural try to match to a pre-defined object
    by using the structure (relationship between
    parts).

15
Statistical Recognition uses Features
A feature is a measurement or a computed value
based on measurements.
16
Example of structural recognition E

Long horizontal
top
left
centre
Long vertical
Short horizontal
left
bottom
left
Long horizontal
17
Is this how we see things?
  • Probably not.
  • However, at the recognition stage, things like
    scale and position do not matter (rotation does!)
  • Elements of the object are specified relative to
    one another.
  • Can use formal grammars!

18
I prefer statistical methods
  • Take some objects for which we know labels
    measure them.
  • Can use many measurements.
  • For fun, plot the measurements so we can see them.

19
I prefer statistical methods
  • These objects are nuts, screws, and washers
    measure area, perimeter, compute circularity.
  • Do these measurements allow us to discriminate
    the three object types (Classes)?

20
Statistical methods
  • Three features here.
  • Do these measurements allow us to discriminate
    the three object types (Classes)? Means below.

Class Area Perimeter
Circularity Screw 117 58
2.3 Washer
378 124 3.5 Nut
224 72
1.9
21
Variation
  • V s/m
  • Coefficient of variation
  • Smaller is better (less variation)
  • Area Perimeter
    Circularity
  • Screw 0.19 0.15
    0.22
  • Washer 0.13 0.11
    0.39
  • Nut 0.06 0.15
    0.35

22
More than one feature
We can plot the position in feature space of all
objects!
W
N
N

N
W
W
Perimeter
W
W

S
W
N
N
S
S
S
S

N
S
S
S
S
S
Area
23
Nearest neighbor
W
N
N

N
W
W
Perimeter
W
W

S
W
N
N
S
S
S
S

N
S
S
S
S
S
Area
24
Nearest 5 neighbor
W
N
N

N
W
W
Perimeter
W
W

S
W
N
N
S
S
S
S

N
S
S
S
S
S
Area
25
Nearest centroid
W
N
N

N
W
W
Perimeter
W
W

S
W
N
N
S
S
S
S

N
S
S
S
S
S
Area
26
Other ways to classify
Decision functions (LDA) Support vector machines
(SVM) Various other definitions of distance And
a host of others
27
Recognition/image processing
Low level edge detection, noise reduction,
texture Medium level Segmentation -gt feature
extraction High level recognition
28
Features
  • The choice of features depends on the objects (EG
    minimize variation) and the properties of the
    mixture.

29
Simple (Scalar) Features
  • Area, perimeter (BAD - scale dependent)
  • Circularity c perimeter
  • 4p area
  • Convexity ratio of convex hull perimeter (area)
    to actual object perimeter (area).

2
30
Simple Features convex deficiencies
31
Vector features
  • Multiple values, sometime hierarchical
  • Some examples
  • Fourier descriptor Fourier coefficients of a
    closed curve.
  • Outline feature max/min points from centroid,
    etc.
  • Moment Sum of (deltas to powers) from centroid.
  • Profile E.G. distances from left margin of
    image.
  • Slope histogram Distribution of slopes in the
    object.
  • Signatures 2d-1d transform move along boundary

32
Vector features- Fourier descriptors
  • Represents the shape of the boundary
  • Let x(k) and y(k) k0,1,....,N-1 be the boundary
    points for shape s.
  • a(u)DFTx(k)j.y(k)DFTs(k) gives us the
    Fourier descriptors of s.
  • High frequency stuff at the end, low at the
    beginning EG square -gtoval LPF

33
Vector features- Fourier descriptors
34
Vector features Outline measures
  • Object is broken into sections for matching.
    Measurements in each section count.

In each section leftmost point, rightmost point,
greatest slope, greatest thickness
5
35
Vector features - Moments
  • I is the image
  • - Sum over all object pixels
  • In the object.
  • There are an infinite
  • number of moments.
  • - Image can be reconstructed perfectly from all
    of them approximately from some of them.

36
Vector features profiles/projections
5
37
Vector features slope histograms
  • Travel the boundary of the shape.
  • At each sample point, compute the slope of the
    tangent to the boundary at that point. Save in a
    histogram.

Square
Circle
38
Vector features signatures
  • From the centroid, histogram the distance to the
    boundary at t degree increments.

Square
Circle
39
How do we compare two vectors?
Normalize with respect to vector length (norm).
IE sum elements of vector and divide all
elements by this sum. Compute difference at each
element sum the squares. There are other ways to
do this.
40
How well do these simple methods work?
  • Depends on context.
  • How do we measure success?
  • Do controlled experiments on known data, and
    calculate correct (success rate)
  • Confusion matrix not only success rates, but
    how it fails and which classes are confusing.

41
Simple example Handprinted digits
  • Outline features
  • 0 1 2 3 4 5 6
    7 8 9
  • right 94 95 96 100 95 100 84 94 90 94
  • wrong 1 4 1 0 5 0 10 5 4
    6
  • Reject 5 1 3 0 0 0 6 1 6
    0
  • Overall success 94.2

42
Digits using convex deficiencies
Success rates 0 1 2
3 4 5 6 7 8 9 99 94 98 96
94 90 90 93 95 92 Overall success rate
94.1
43
Digits using vector templates(tell you about
these later)
Success rates 0 1 2
3 4 5 6 7 8 9 99 94 98 96
94 92 90 93 95 92 Overall success rate
94.3
44
Digits shape tracing
Success rates 0 1 2
3 4 5 6 7 8 9 100 94 92
99 90 94 100 88 99 98 Overall success rate
93.7
45
Digits neural network
48 pixels -gt input 96 hidden 10
out Backpropagation Success
rates 0 1 2 3 4 5 6 7
8 9 99 93 99 95 100 95 99 100 95
74 Overall success rate 94.9
46
So which method do we use?
Method Success rate Outline
94.2 Convex Deficiencies
94.1 Vector Templates 94.3 Shape
tracing 93.7 Neural net
94.9
47
We use all of them.
Our laboratory have achieved very high success
rates and reliability by merging multiple
algorithms. This was a quite new idea when we
started (1992) but has caught on. There are still
many places where this is not routinely thought
of.
48
Multiple algorithms How?
  • Simple voting methods
  • Majority vote- out of the 5 classifiers, a
    majority (3 or more) selects the correct class.
  • Straight forward and intuitive.
  • For the 5 classifier problem we get a success
    rate of 99.4 using a simple majority vote
  • We deleted classifier 4 which was seen to
    reduce the success.

49
Multiple algorithms How?
  • Rank voting methods
  • Borda count - Each alternative is given a number
    of points depending on where in the ranking it
    has been placed. A selection is given no points
    for placing last, one point for placing next to
    last, and so on up to N-1 points for placing
    first. In other words, the number of points given
    to a selection is the number of classes below it
    in the ranking.

50
Multiple algorithms How?
  • Borda Count
  • Voter 1 a b c d a (3) b (2) c (1) d
    (0)
  • Voter 2 c a b d c (3) a (2) b (1) d
    (0)
  • Voter 3 b d c a b (3) d (2) c (1) a
    (0)
  • SO a 3 2 0 5
  • b 2 1 3 6 The winner! (most
    points)
  • c 1 3 1 5
  • d 0 0 2 2

51
Multiple algorithms How?
  • Borda Count problems
  • Voter 1 a b c
  • Voter 2 a b c
  • Voter 3 a b c
  • Voter 4 b c a
  • Voter 5 b c a
  • Borda winner is choice b. However, the majority
    vote winner is a.

52
Rules for voting systems
  • If a majority of classifiers select X as their
    first choice, a voting rule should select X as
    the overall winner. (Majority)
  • If there is an alternative X that could obtain a
    majority of votes in pair-wise contests against
    every other alternative, a voting rule should
    choose X. (Condorcet)
  • If X is a winner, and then one or more voters
    change their preferences in a way favourable to X
    without changing the order of any other
    alternative, X should remain the winner.
    (Monotonicity)

53
Other combining schemes
  • We can use the magnitude of the success response
    in a number of ways. The response for a given
    unknown object is (four possible objects) MAX
    RULE
  • x1 x2 x3 x4
  • C1 .6 .3 0 .1
  • C2 .5 .1 .3 .2
  • C3 .9 .0 .0 .1
  • C4 .5 .1 .2 .1
  • C5 .4 .3 .3 0
  • Max value .9 .3 .3 .2 winner (max)
    is x1

54
Other combining schemes
  • Median RULE
  • x1 x2 x3 x4
  • C1 .6 .3 0 .1
  • C2 .5 .1 .3 .2
  • C3 .9 .0 .0 .1
  • C4 .5 .1 .2 .1
  • C5 .4 .3 .3 0
  • Median .4 .0 .0 .0 winner (max)
    is x1

55
Other combining schemes
  • Sum RULE
  • x1 x2 x3 x4
  • C1 .6 .3 0 .1
  • C2 .5 .1 .3 .2
  • C3 .9 .0 .0 .1
  • C4 .5 .1 .2 .1
  • C5 .4 .3 .3 0
  • Sum 2.9 .8 .8 .5 winner (max)
    is x1

56
Other combining schemes
  • Product RULE
  • x1 x2 x3 x4
  • C1 .6 .3 0 .1
  • C2 .5 .1 .3 .2
  • C3 .9 .0 .0 .1
  • C4 .5 .1 .2 .1
  • C5 .4 .3 .3 0
  • Sum .054 0 0 0 winner (max)
    is x1

57
What can we do if we can recognize objects ?
Giving the computer an effective vision sense
allows a great many options not available now.
Many of these are in the realm of user
interface enhancements. A more natural and
intelligent interface is very desirable.
58
We are going to look at some vision apps that
work in the world
They must be fast enough to be practical. They
must be accurate enough to be useful. They take
into account the usual problems seen in the field
(robust).
59
Respirogram classification Gesture recognition
Hand-printed symbols
60
Respirograms
  • Why? Waste Water assessment. Dont let toxins
    into your waste treatment system because they
    will kill the useful bacteria.
  • Use a model reactor. Let in small samples and see
    what happens. There are two decay processes of
    interest, and oxygen consumption of both is
    measured simultaneously.

61
Respirograms
  • Two types of curve.

62
We used many methods to determine curve class
  • Template matching 96.2
  • Profiles 90.3
  • Convex Deficiencies 95.4
  • Slope histogram 95.5
  • Signature 95.4
  • X1 convexity 78.4
  • X2 convexity 94.0
  • Moments 96.5
  • Rectangularity 68.7
  • Circularity 77.6

63
Majority vote is obvious choice
  • There are only two possible results (single and
    double)
  • Simple majority vote gives 97.8 (1.3 better)
    than Moments at 96.5
  • Consider a classifier that always votes for
    double. There were 27 class 1 curves in the data
    set, so 27 errors (out of 134) 79.8.

64
Majority vote is obvious choice
  • There are three classifiers used that have a
    worse performance circularity, rectangularity,
    and X1 convexity.
  • Deleting those three classifiers from the mix
    improves the simple majority vote classifier to
    98.5
  • This is 2 better than Moments at 96.5

65
Gesture recognition
This involves the recognition of the hand (as an
object) and its pose, possibly as a function of
time. My example here is applied to a game, and
was with one of my students (Mark Baumback)
66
The System
  • Solitaire
  • Most used windows application
  • Straight forward and intuitive interaction
  • Ability to pick up and move cards
  • Deal new cards or a new game
  • Hand gesture input
  • Visual and audio output

67
The System
  • The camera setup
  • Looks down onto any flat surface
  • Recognizes two different hand postures
  • An open hand
  • A closed hand

68
Open Hand
  • An open hand is a request for information
  • E.g. Retrieve information about a card or stack

69
Closed Hand
  • A closed hand carries out an action
  • E.g. Pick up a card

70
Hand Posture Recognition
  • Recognition process
  • Segmentation
  • Background subtraction
  • Skin detection
  • Forearm removal
  • Palm detection
  • Wrist detection
  • Recognition
  • Nearest neighbor classification

71
Stage 1 - Segmentation
  • Two stage process
  • Background subtraction
  • Skin detection

72
Background Subtraction
  • Subtraction
  • YCrCb Color space

73
Background SubtractionComplex Background
(Top) Reference image (Bottom) After
background subtraction
(Top) Input image
(Bottom) After skin detection
74
Skin Detection
  • Skin detection is applied to pixels that have
    passed the background subtraction
  • Three different skin detectors
  • A static bounding region defined in both
  • HSV color space
  • YCrCb color space
  • Distance measure

75
Skin Detector 1HSV
  • The (R,G,B) values of the pixels are converted
    into the HSV color space
  • A static bounding region
  • Ignoring intensity

76
Skin Detector 2YUV
  • The (R,G,B) values of the pixels are converted
    into YCrCb
  • A static bounding region
  • Ignoring intensity

77
Skin Detector 3Distance Measure
  • Calculates the distance between the center of the
    skin region defined in the YCrCb color space and
    an input pixel
  • The closest 10 pixels are chosen
  • Is effective since the background subtraction has
    already removed a large portion of the image
  • Compensates for
  • Slightly different skin tones
  • Altering lighting conditions

78
Skin DetectorDecision Fusion
  • A pixel is classified as skin if
  • It has passed the background subtraction
  • It was selected by
  • The HSV and YCrCb skin detectors
  • OR it has passed the distance measure

79
False Skin Classification
  • Removing falsely classified skin pixels
  • Many non skin objects fall into skin toned ranges
  • Most of these objects tend to be less dense then
    actual skin
  • Segmented pixels which dont have enough
    neighbors are removed
  • Causes falsely classified groups to break up
  • Pixel groups which are too small are then removed

80
False Skin Classification
Reference image
Pixel neighborhood removal
Segmented pixels removal Forearm
removal and recognition
81
Final Skin Classification
  • The final segmented object is then chosen

The segmented object
82
Stage 2 Forearm Removal
  • The forearm drastically changes the shape of a
    segmented object
  • Usually caused by intersection of the image
    boundaries and the arm
  • Forearm removal process
  • Locate the palm
  • Remove forearm where the palm intersects the
    wrist

83
Locating The Palm
  • Calculate a distance transform
  • Largest value indicates
  • Location of the palm
  • Radius of circle that encompasses the palm

Distance transform with palm circle
84
Wrist Detection
  • The palm circle is extended by
  • 1.2 Radius
  • 1.5 Radius
  • 1.7 Radius
  • Lines tangent to the palm circle are checked
  • The line must be below the center of the circle
  • The line crosses the largest segmented region

Write detection and forearm removal
85
Stage 3 - Recognition
  • Three different signatures are calculated on the
    segmented object
  • Center Signature
  • Wrist Signature
  • Circle Signature
  • Each signature is compared against
  • Five open and five closed templates
  • A nearest neighbor classification is used for
    each signature

86
The Angle Distance Signature
  • Plots the distance from a reference point to the
    boundary of the object as a function of angle

87
Center Signature
  • Angle Distance Signature is calculated
  • Reference point is the center of the palm circle
  • Distances are normalized with respect to the
    width of the wrist
  • Scale invariant
  • Incorporates difference between height vs. width
    of the hand

88
Center Signature
89
Wrist Signature
  • Angle Distance Signature is calculated
  • Reference point is the center of the wrist
  • Distances are normalized with respect to the
    width of the wrist

90
Wrist Signature
91
Circle Signature
  • Built from concentric circles
  • Origin is the center of the palm circle
  • 7 concentric circles
  • The signature consists of the number of object
    pixels versus total pixels within each each circle

92
Circle Signature
93
Rotation Compensation
  • Required for wrist and center signatures
  • A rotation of the hand is a translation of the
    Angle Distance Signature
  • Input signature is circularly shifted along each
    template
  • The minimum distance defines the rotation of the
    hand

94
Rotation Adjustment
  • Example of a hand rotated by 90 degrees from
    normal

Center signature Wrist signature
Circle signature
95
Rotation Results
  • The input signature is translated until the
    minimum distance is found

Original signature Adjusted
signature
96
The Distance Measure
  • The distance between the template and the input
    signature is calculated
  • Mean squared error across signature
  • The distance is calculated between each template
    for each possible rotation for
  • Five open hands
  • Five closed hands

97
Nearest Neighbor Classification
  • There are 5 difference templates for each type
  • The one with the minimum distance (after
    translation) is selected for each signature
  • This defines the classification (open / closed)
    that each signature has chosen

98
Voting
  • The three signatures vote for either an open or
    closed hand
  • The majority vote usually classifies the hand

Individual signature
accuracy rates
99
Voting Results
  • The majority vote between the three signatures
    usually classifies the hand
  • All three signatures must agree to switch the
    hand from a closed to an open posture
  • Correctly classifying a closed hand is more
    important than falsely classifying an open hand


Classification results
100
Video Demonstration
101
Thank You
  • Questions?

102
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com