Gesture Recognition - PowerPoint PPT Presentation

1 / 38

About This Presentation

Title:

Gesture Recognition

Description:

Labels 'left hand' and 'right hand' are assigned to to whichever blob is leftmost and rightmost. ... of least inertia1 (found by first eigenvector of the blob) ... – PowerPoint PPT presentation

Number of Views:2421

Avg rating:1.0/5.0

Slides: 39

Provided by: jaronsc

Category:

more less

Transcript and Presenter's Notes

Title: Gesture Recognition

1
Gesture Recognition

1. Recognition of parameterized gestures
2. Real-time sign language recognition using a
single video camera
Jaron Schaeffer
Jaron.Schaeffer_at_jayweb.de

2
TOC

Recognition of parameterized gestures
Parametric gestures
Previous Approaches
Parametric Gaussian Hidden Markov Models
Training/Testing
Results
Real-time sign language recognition using a
single video camera
Objective
Feature Extraction
The desk-based recognizer
The wearable-based recognizer

3
Part 1Recognition and Interpretation of
Parametric Gesture
1. Parametric gestures 2. Previous Approaches 3.
Parametric Gaussian Hidden Markov Models 4.
Training/Testing 5. Results
4
Recognition and Interpretation of Parametric
Gesture

What is a parametric gesture?
A gesture that has a parameter T
which is needed to fully understand
the gesture.
In the example, T is the size of
the fish, given by the distance
between the signers hands.
Another example a pointing gesture,
where T is the direction pointed
to.

I caught a fish. It was this big.
5
Recognition of Parametric GesturePrevious
Approaches 1

Ad-hoc method for each gesture to be
recognized
Use an ad-hoc method to extract the parameter T
for each different parametric gesture
Problems
Difficult to write
Only works for gestures already labeled
Unknown gestures have to be modelled as noise
from an existing prototype
A new method is needed for each gesture

6
Recognition of Parametric GesturePrevious
Approaches 2

Use multiple HMMs to cover the parameter
space
Use a HMM for each possible value of T in
parameter space
Problems
Unknown, how many separate models will be
necessary
As dimensionality of parameter space increases, a
large number of models will be needed
Unreasonable demands on the amount of training
data

7
RepetitionStandard continuous Gaussian HMMs
8
RepetitionStandard continuous Gaussian HMMs

Example Gaussian HMM

2
1
3
Likelihood for output of 2 is about 10 given the
system is in state 2
9
Parametric Gaussian HMMs The model
10
Parametric Gaussian HMMsTraining

Training means Set the HMM parameters to
maximize the probability of the training
sequences
Each training sequence is paired with a value of
T
Baum-Welch form of expectation-maximization alg.
is used to update the parameters of the output
probability distributions

11
Training Parametric Gaussian HMMsExpectation-Max
imization algorithm

Assumption In addition to the observable data
(the observation sequence xt), there is hidden
data (the state sequence qt)
Expectation-Maximization algorithm
Expectation
Compute/guess value of the hidden data given
some of the observable data (Forward/Backward-Alg.
)
Maximization
Given this guess at the hidden data, compute an
updated value of the parameters
Repeat until satisfied (change in parameters is
small)
A lot of math no more details here

12
Training Parametric Gaussian HMMsTraining
results

After applying the EM algorithm for each training
sequence, we get new values for
Ready for testing!

13
Recognition of Parametric GestureTesting

Testing
Given a parameterized HMM and an input sequence,
we wish to compute T and the probability of the
input
sequence.
Extracing T
Complicated in contrast to normal HMM testing
Again, use an Expectation-Maximization (EM)
algorithm
that finally leads to
Probability of the input sequence given T Use
Viterbi.

14
Recognition of Parametric GestureResults
STIVE input and output

Testing for the fish size
gesture
30 examples of the fish gesture were collected
using STIVE (STereo INteravive Virtual
Environment) at a frame rate of 20Hz
STIVE returned the 3D positions of head and hands
Each sequence in average 43 samples long
T interpreted as fish size in inches
Values varied from 7.7 in (small fish) to 36.6
inches (repectable catch)

15
Recognition of Parametric GestureResults
STIVE input and output

Testing for the fish size
gesture
6 state parameterized HMM with no skip
transitions or backtransitions
Training with randomly chosen 15 sequences out of
the 30, rest for testing

16
Recognition of Parametric GestureResults
Testing for the size gesture
Standard derivation

mean
Average absolute error of only 0.16 in
17
Recognition of Parametric GestureResults

Testing for the pointing gesture
HMM now parameterized by more than one variable
((X/Y) position of the plane in front of the
user)
Motion capture system to record wrist position of
right hand at a frame rate of 30Hz
50 sequences collected
T interpreted as position of the wrist on the
pointing plane
8 state parameterized HMM with no skip
transitions or backtransitions
20 sequences for training, 30 for testing

18
Recognition of Parametric GestureResults

Testing for the pointing gesture Results

19
Recognition of Parametric GestureResults under
noise
The average error as a funtion of noise

N(0, x)-distributed noise added for testing
f(x) is mean error between estimated/measured T
under noise and measured T in the noise-free case
Under noise, the HMM performs even better than
directly measuring T
Why?
Direct measuring is more sensitive to noise,
since only one still image is used to measure T
the HMM uses the complete sequence to extract T.

f(x)
x
20
Recognition of Parametric GestureResults

Results quite good
Why?
Magnitude of Wj greatest for states corresponding
to the middle phase of the gestures
In the middle phases of the gestures, variation
of T maximally impacts the execution of the
gesture
System automatically learns which segment in the
gesture is most diagnostic of T

21
Part 2Real-time sign language recognition using
a single video camera
1. Objective 2. Feature Extraction 3. The
desk-based recognizer 4. The wearable-based
recognizer
22
Objective

Recognition of sentence-level American Sign
Language (ASL)
Sentences of the form
personal pronoun verb noun adjective
(same) personal pronoun
are to be recognized
Example I like cars red

23
The American Sign Language

Language of Choice for most deaf in the United
States
Uses approx. 6000 gestures for common words and
finger spelling for communicating obscure words
Signed conversations proceed at about the pace of
spoken conversation
Some aspects of ASL ignored for simplification
Storing objects in space for later reference,
moving of eyebrows for questions or directives

24
Understanding ASLThe Task

Two extensible HMM-based systems are provided for
recognition, both using one color camera
Desk mounted camera in front of user
Camera mounted in a cap worn by the user
Tracking stage does not attempts fine description
of hand shape, instead concentrates on the
evolution of the gestures through time
40-words test lexicon with words that would
generate coherent sentences given the grammar
constraint

25
Understanding ASL Hidden Markov Modeling

Estimate the number of different states involved
in specifying a sign to determine the initial HMM
topology
For less complicated signs, skip transitions can
be introduced
Here, a 4 state HMM with one skip transition was
determined to be appropriate

26
Understanding ASL Feature extraction - Hardware

Hands are tracked in real-time using a single
color camera
320x243 pixel resolution
Silicon graphics 200Mhz workstation maintains
hand tracking at 10 frames per second
(sufficient)
Natural color of hands is needed

27
Understanding ASL Feature Extraction - Hand
segmentation

Hand segmentation
To segment each hand initially, find a pixel of
the natural hand color in the image
Take this pixel as a seed and tolerantly grow the
hand region by checking the 8 neighbours for the
appropriate color
Labels left hand and right hand are assigned
to to whichever blob is leftmost and rightmost.

Seed pixels
right hand
left hand
What about occluding hands?
28
Understanding ASL Feature extraction Features
used

16 element feature vector contructed for each
hand
Centroid (X,Y) position
Change in (X,Y) to previous frame
Area in Pixels
Angle of axis of least inertia1 (found by first
eigenvector of the blob)
Length of this eigenvector
Eccentricity2 of bounding ellipse

1. Inertia Trägheit 2. Eccentricity Hier
Abweichung von der Kreisform
29
Understanding ASL Feature Extraction Occluding
hands

Occlusion in hand
segmentation
Only one large blob
Assign each of the two hands the features of this
single large blob
This method, combined with the time context
provided by HMM, is sufficient to distinguish
many different signs that have hand occlusions as
a trait

30
Understanding ASL The desk-based recognizer

Camera on a desk in front of the user
478 sentences used, constructed from the 40-words
lexicon
Each sign is 1 to 3 seconds long
No pause between signs in a sentence, but
sentences themselves are distinct
384 sentences used for training, rest for testing

31
Understanding ASL The desk-based recognizer -
Training

Sentences are divided in five equal portions for
initial segmentation
Initial estimates for the means and variances of
the output prob. are provided iteratively using
Viterbi alignment
Result are fed into a Baum-Welch re-estimator
whose estimates are refined in embedded training
Contexts are not used, since they would require
more data to train

32
Understanding ASL The desk-based recognizer
Test 1

Uses part-of-speech grammar
personal pronoun verb noun adjective
(same) personal pronoun
Word recognition accuracy Acc is calculated by
N total number of words in test set
S number of substitutions
No insertions or deletions, since number and
class of words to be recognized is known
Acc Percentage of correctly recognized words

33
Understanding ASL The desk-based recognizer
Test 2

Does not use part-of-speech grammar
Word recognition accuracy Acc is calculated by
N total number of words in test set
S number of substitutions
I number of insertions
D number of deletions
Insertions and deletions possible, since number
of words an word class unknown
Acc can now be negative

34
Understanding ASL The desk-based recognizer
Results

Third test performed Strip the absolute (X,Y)
positions from the feature vector
Simulates use of the recognizer in daily use if
the signer is not always in the same position
when the system is used
Word accuracy results

35
Understanding ASL The wearable-based recognizer

Camera mounted on a cap worn by the signer
Same 500 sentences
At beginning and end of sentence, hands were
often found in a resting position
To take this into account, another token called
silence was added to the dictionary
400 sentences for training, 100 for testing

36
Understanding ASL The wearable-based recognizer

New grammar for testing purposes Only
restriction is that each sentence is 5 words long
Word Accuracy Rate Acc is calculated in the same
way as with the desk-based recognizer

37
Understanding ASL The wearable-based recognizer
- Results
38
End of presentation

Thanks for your attention!
References
Real-Time American Sign Language Recognition
Using Desk and Wearable Computer Based Video
Thad Starner, Joshua Weaver, Alex Pentland
M.I.T. Media Laboratory Perceptual Computing
Section Rechnical Resport No. 466
IEEE PAMI 1998
Recognition and Interpretation of Parametric
Gesture
Andrew D. Wilson, Aaron F. Bobick
M.I.T. Media Laboratory Perceptual Computing
Section Rechnical Resport No. 421
Internactional Conference on Computer Vision,
1998
An Introduction to Hidden Markov models
L.R. Rabiner and B.H. Juang
IEEE ASSP Magazine, p. 4-16, Jan 1986