VisionBased Recognition of Continuos Dynamic Hand Gestures - PowerPoint PPT Presentation

1 / 58

About This Presentation

Title:

VisionBased Recognition of Continuos Dynamic Hand Gestures

Description:

Affine Model: Planar Model: For example: 4.2 Constructing ... Conclusion: For our gesture command set, choosing affine model is necessary and sufficient. ... – PowerPoint PPT presentation

Number of Views:358

Avg rating:3.0/5.0

Slides: 59

Provided by: yuanx

Category:

more less

Transcript and Presenter's Notes

Title: VisionBased Recognition of Continuos Dynamic Hand Gestures

1
Vision-Based Recognition of Continuos Dynamic
Hand Gestures

Yuanxin Zhu
Department of Computer Science Technology
Tsinghua University, Beijing, China

2
Outline

1. Introduction
2. Literature Review
3. Interaction Model and Prototype System Design
4. Real-time Segmentation of Hand Gestures
5. Parameterized Image Motion Model and Robust
Regression
6. Spatio-temporal Appearance Modeling
7. Dynamic Time Warping Based Recognition
8. Experiment Results
9. Summary
10. Future Work

3
1 Introduction

Human-computer interaction (HCI) has become an
increasingly important part of our daily lives.
Keyboards and mice are the most popular mode of
HCI.
Virtual Reality and Wearable Computing require
novel interaction modalities with following
characteristics
in a way that humans communicate with each other.
Hand gesture is a natural and intuitive
communication mode.
Other applications Sign Language Recognition,
video transmission, and so on.

4
1 Introduction

Vision-based recognition of dynamic hand gestures
is a challenging interdisciplinary project.
hand gestures are rich in diversities,
multi-meanings, and space-time variation.
human hand is a complex non-rigid object.
computer vision itself is a ill-pose problem.

5
1 Introduction

To recognize continuous dynamic hand gesture
Design of gesture command set and interaction
model.
Real-time segmentation of gesture streams.
Modeling, analysis, and recognition of gestures.
Real-time processing is mandatory for practically
using hand gestures in HCI.

6
2. State of the the Art of Hand Gesture
Recognition

2.1 Hand gesture taxonomy and interaction model
2.2 Hand gesture modeling
2.3 Hand gesture Analysis
2.4 Hand gesture recognition techniques

7
2.1 Taxonomy of Gesture for Human-computer
Interaction
Fig.1 A Taxonomy of hand gestures for
Human-computer Interaction. Meaningful gestures
are differentiated from unintentional movements.
Gestures used for manipulation of objects are
separated from the gestures which possess
inherent communicational character. Symbols are
those gestures having a linguistic role. They
symbolize some referential action or are used as
modalizers, often of speech.
8
2.2 Hand Gesture Modeling

Fig. 2 Classification of hand gesture models

9
2.2 Hand Gesture Modeling

(a) (b) (c)
(d) (e)
Fig.3 Representing the same hand posture by
different hand models. (a) 3-D textured
volumetric model (b) 3-D wireframe volumetric
model (c) 3-D skeletal model (d) Binary
silhouette (e) Contour model.

10
2.3 Gesture Analysis

Gesture detection and feature extraction
skin color clues based approaches
motion clues based approaches
multiple clues based approaches
features include gray image, binary silhouette,
moving region, edge, contour, and so on.

11
2.3 Gesture Analysis

Recovering gesture model parameters
Estimation of 3-D hand /arm model parameters
two sets of parameters angular (joint angles)
and linear (palm dimensions)
the initial parameter estimation
the parameter update as the hand gesture evolve
in time.
Estimation of appearance based model parameters
image motion estimation (e.g. optical flow)
shape analysis (e.g. computing moments)
histogram based feature parameters (e.g. )
active contour model.

12
2.4 Gesture Recognition Techniques

Fig. 4 Classification of hand gesture
recognition techniques

13
3.1 Interaction Model

Strength and weakness of gesture based
interaction
Structure of interaction model
users performing gestures follow three steps.
suitable feedback
apply gesture based input to appropriate tasks
A set of rules for designing gesture command set.
Performing gestures intentionally and
intensively, easy to learn, be symmetrical ...

14
3.2 A Prototype System Gesture-controlled
Panoramic Map Browser

(a)
(b)
Fig. 5 Gesture-controlled panoramic map browser.
(a) System setting (b) User interface.

15
3.3 Gesture Command Set

Four translation gesture commands
move up (1) move down (2) move left (3) move
right (4)
Six rotation gesture commands
yaw right (7) yaw left (8) roll clockwise (9)
roll counterclockwise (10) pitch down (11)
pitch (12)
Two other gesture commands
zoom in (5) zoom out (6).

16
4 Real-Rime Segmentation of Continuous Dynamic
Hand Gestures

Goals
segment the moving hand from background.
partion of gesture streams into meaningful
sections.
Methodology
integrating multiple clues skin color, motion.
post-processing (morphological filtering
techniques).

17
Fig. 6 Processing flow chart of real-time
segmentation
18
5. Recovering Image Motion Model Parameters by
Robust Regression

5.1 Parameterized Image Motion Model
5.2 Constructing Objective Function
5.3 Robust Error Norms
5.4 Simultaneous Over Relaxation with
Continuation Method.
5.5 Multi-resolution Analysis.
5.6 Examples of Experiment Results.

19
4.1 Parameterized Image Motion Models

Define
Translation Model
Affine Model
Planar Model
For example

20
4.2 Constructing Objective Function

Brightness Constancy assumption

Taking the Taylor series expansion, simplifying,
and dropping terms above first order gives
Recover model parameters by minimizing following
objective function
21
5.3 Robust Error Norms
Quadratic Truncated quadratic Geman-McClure
function Lorentzian function
22
5.3 Robust Error Norms

Fig. 7 Geman-McClure function. (a) Geman-McClure
function (b) Its derivative function.

23
5.4 Simultaneous Over Relaxation with
Continuation Method

The iterative updating equation at n1 iteration

Where,
24
5.5 Multi-resolution Analysis
Fig. 8 Illustration of multi-resolution
analysis.
25
5.6 Examples of Image Motion Estimation
(a) (b)
(c)

(d) (e)
(f)
Fig.9 An example of robust image motion
regression. (a) and (b) are the 2nd and 3nd
frames in an image sequence. (c) Inliers and
Outliers identified according to the result of
the first regression. (d) Segmentation of the
moving hand. (e) outliers identified according to
result of the second regression. (e) The
difference image between (a) and (b).

26
4.6 Examples of Image Motion Estimation
(a) (b)
(c)

(d) (e)
(f)
Fig. 10 Another example of robust image motion
regression

27
6. Spatio-Temporal Appearance Modeling

6. 1. Inter-frame Motion Appearance
6.2. Inner-frame Shape Appearance
6.3. Spatio-temporal Appearance

28
6. 1. Inter-frame Motion Appearance
29
6.2. Inner-frame Shape Appearance
30
6.3. Spatio-temporal Appearance
Where,
31
7.1 Dynamic Time Warping
Fig. 11 DTW assumes that the endpoints of the
two patterns have been accurately located and
formulates the pattern matching problem as
finding the optimal path from the start to the
end on a finite grid. The optimal path can be
found efficiently by dynamic programming.
32
7.2 Modified DTW

Our experiments find that the traditional DTW is
not adequate to match two spatio-temporal
appearance patterns.
Unlike the high sampling rate used in speech
recognition, the sampling rate is usually 10 Hz
in hand gesture recognition. Therefore, the
fluctuation in the time axis of hand gesture
patterns is much sharper than that of speech
patterns.
A modified DTW algorithm, a kind of non-linear
re-sampling technique, is developed to
dynamically warp each spatio-temporal pattern to
a fixed temporal length, which can reserve
necessary temporal information and spatial
distribution of original patterns.

33
7.3 Template based Recognition

The distance between two sptio-temporal
appearance patterns is calculated based on
correlation between their warped patterns.

Given a training set, a reference template is
created for each type of gestures by a minimax
type of optimization, then template-based
classification technique is employed to
recognized hand gestuers.

34
8. Experiment Results

8.1 Examples of Hand Gesture Segmentation.
8.2 Choosing Image Motion Models.
8.3 Examples of Spatio-temporal Appearance.
8.4 Examples of Warped Spatio-temporal
Appearance.
8.5 Motion Appearance versus Shape Appearance.
8.6 Testing.

35
8.1 Examples of hand gesture segmentation
Fig.12 Segmentation result of a move up hand
gesture.
36
8.1 Examples of hand gesture segmentation
Fig. 13 Segmentation result of a move left
hand gesture.
37
8.1 Examples of hand gesture segmentation
Fig. 14 Segmentation result of a zoom in hand
gesture.
38
8.1 Examples of hand gesture segmentation
Fig. 15 Segmentation result of a yaw right
hand gesture.
39
8.2 Choosing Image Motion Model
Recognition rates when choosing different image
motion model
Conclusion For our gesture command set, choosing
affine model is necessary and sufficient.
40
8.3 Examples of Spatio-temporal Appearances
Table 1 Spatio-temporal appearance model
parameters of the move up gesture sample.
41
8.3 Examples of Spatio-temporal Appearances
Table 2 Spatio-temporal appearance parameters of
the move left gesture sample.
42
8.3 Examples of Spatio-temporal Appearances
Table 3 Spatio-temporal appearance parameters of
the zoom in gesture sample.
43
8.3 Examples of Spatio-temporal Appearances
Table 4 Spatio-temporal appearance parameters of
the yaw right gesture sample.
44
8.4 Determining of Warping Length
45
8.5 Examples of Warped Spatio-temporal Appearance
Table 6 Parameters of the warped spatio-temporal
appearance of the move up gesture sample.
46
8.5 Examples of Warped Spatio-temporal Appearances
Table 7 Parameters of the warped spatio-temporal
appearance of the move left gesture sample.
47
8.5 Examples of Warped Spatio-temporal Appearances
Table 8 Parameters of the warped spatio-temporal
appearance of the zoom in gesture sample.
48
8.5 Examples of Warped Spatio-temporal Appearances
Table 9 Parameters of the warped spatio-temporal
appearance of the yaw right gesture sample.
49
8.6 Motion Appearance Vs Shape Appearance

To explore the discrimination power of motion
appearance or shape appearance separately, two
experiments are carried out, one with only motion
appearances being feature vectors and the other
with only shape appearances being feature
vectors.

50
8.7 Testing Experiment

The average recognition rate achieved on the test
set is 89.6 .
Gesture-controlled panoramic map controller.
The prototype system can recognize hand gestures
performed by a trained user with accuracy ranged
from 83 to 92.

51
9. Summary

Aiming at real-time gesture-controlled
human-computer interaction, we propose novel
approaches for visual modeling, analysis, and
recognition of continuous dynamic hand gestures.

52
9. Summary

A spatio-temporal appearance model is proposed to
represent dynamic hand gestures.
The model integrates temporal information,
motion and shape appearances.
The motion appearance represents the image
appearance changes caused by motion itself, not a
temporal sequence of static configurations.
The shape appearance is based on the geometrical
features of an ellipse fitted to the hand image
region rather than the simply moment-based
features.

53
9. Summary

Novel approaches are developed to extract model
parameters by hierarchically integrating multiple
clues.
At low level, fusion of flesh chrominance
analysis and coarse image motion detection is
employed to detect and segment hand gestures
At high level, the model parameters are recovered
by integrating fine image motion estimation and
shape analysis.
The approaches achieve both real-time processing
and high recognition rates.

54
9. Summary

A modified Dynamic Time Warping algorithm is
suggested for eliminating time variation of
spatio-temporal appearance patterns due to
various gesturing rates.
It is a kind of non-linear re-sampling
technique.
It can reserve necessary temporal information and
spatial distribution of original patterns.

55
9. Summary

A prototype system, gesture-controlled panoramic
map browser, is designed and implemented to
demonstrate the usability of gesture-controlled
real-time interaction.
Dynamic hand gestures are recognized without
resorting to any special marks, limited or
uniform background, or particular illumination.
Only one uncalibrated video camera is utilized.
Higher recognition rates are achieved.
User is allowed to perform continuous hand
gestures, starting at any point within the view
field of the camera.

56
10. Future Work

We currently assume that the moving skin color
region in the scene is the gesturing hand, which
could be invalid when there appears a moving
human face. Exploiting simple geometrical model
of human body can alleviate this problem, in that
case multiple cameras can be necessary.

57
10. Future Work

To practically use hand gestures in HCI, more
gestural commands will be needed.
Some kind of commands would be more reasonably
input by static hand gestures (hand postures).
On the other hand, speech commands will be an
alternative to some gestural commands.
Cooperating hand gesture recognition into
multi-modal interface (MMI) is our next work.