Title: VisionBased Recognition of Continuos Dynamic Hand Gestures
1Vision-Based Recognition of Continuos Dynamic
Hand Gestures
- Yuanxin Zhu
- Department of Computer Science Technology
- Tsinghua University, Beijing, China
2Outline
- 1. Introduction
- 2. Literature Review
- 3. Interaction Model and Prototype System Design
- 4. Real-time Segmentation of Hand Gestures
- 5. Parameterized Image Motion Model and Robust
Regression - 6. Spatio-temporal Appearance Modeling
- 7. Dynamic Time Warping Based Recognition
- 8. Experiment Results
- 9. Summary
- 10. Future Work
31 Introduction
- Human-computer interaction (HCI) has become an
increasingly important part of our daily lives. - Keyboards and mice are the most popular mode of
HCI. - Virtual Reality and Wearable Computing require
novel interaction modalities with following
characteristics - in a way that humans communicate with each other.
- Hand gesture is a natural and intuitive
communication mode. - Other applications Sign Language Recognition,
video transmission, and so on.
41 Introduction
- Vision-based recognition of dynamic hand gestures
is a challenging interdisciplinary project. - hand gestures are rich in diversities,
multi-meanings, and space-time variation. - human hand is a complex non-rigid object.
- computer vision itself is a ill-pose problem.
51 Introduction
- To recognize continuous dynamic hand gesture
- Design of gesture command set and interaction
model. - Real-time segmentation of gesture streams.
- Modeling, analysis, and recognition of gestures.
- Real-time processing is mandatory for practically
using hand gestures in HCI.
62. State of the the Art of Hand Gesture
Recognition
- 2.1 Hand gesture taxonomy and interaction model
- 2.2 Hand gesture modeling
- 2.3 Hand gesture Analysis
- 2.4 Hand gesture recognition techniques
72.1 Taxonomy of Gesture for Human-computer
Interaction
Fig.1 A Taxonomy of hand gestures for
Human-computer Interaction. Meaningful gestures
are differentiated from unintentional movements.
Gestures used for manipulation of objects are
separated from the gestures which possess
inherent communicational character. Symbols are
those gestures having a linguistic role. They
symbolize some referential action or are used as
modalizers, often of speech.
82.2 Hand Gesture Modeling
- Fig. 2 Classification of hand gesture models
92.2 Hand Gesture Modeling
- (a) (b) (c)
(d) (e) - Fig.3 Representing the same hand posture by
different hand models. (a) 3-D textured
volumetric model (b) 3-D wireframe volumetric
model (c) 3-D skeletal model (d) Binary
silhouette (e) Contour model.
102.3 Gesture Analysis
- Gesture detection and feature extraction
- skin color clues based approaches
- motion clues based approaches
- multiple clues based approaches
- features include gray image, binary silhouette,
moving region, edge, contour, and so on.
112.3 Gesture Analysis
- Recovering gesture model parameters
- Estimation of 3-D hand /arm model parameters
- two sets of parameters angular (joint angles)
and linear (palm dimensions) - the initial parameter estimation
- the parameter update as the hand gesture evolve
in time. - Estimation of appearance based model parameters
- image motion estimation (e.g. optical flow)
- shape analysis (e.g. computing moments)
- histogram based feature parameters (e.g. )
- active contour model.
122.4 Gesture Recognition Techniques
- Fig. 4 Classification of hand gesture
recognition techniques
133.1 Interaction Model
- Strength and weakness of gesture based
interaction - Structure of interaction model
- users performing gestures follow three steps.
- suitable feedback
- apply gesture based input to appropriate tasks
- A set of rules for designing gesture command set.
- Performing gestures intentionally and
intensively, easy to learn, be symmetrical ...
143.2 A Prototype System Gesture-controlled
Panoramic Map Browser
- (a)
(b) - Fig. 5 Gesture-controlled panoramic map browser.
(a) System setting (b) User interface.
153.3 Gesture Command Set
- Four translation gesture commands
- move up (1) move down (2) move left (3) move
right (4) - Six rotation gesture commands
- yaw right (7) yaw left (8) roll clockwise (9)
roll counterclockwise (10) pitch down (11)
pitch (12) - Two other gesture commands
- zoom in (5) zoom out (6).
164 Real-Rime Segmentation of Continuous Dynamic
Hand Gestures
- Goals
- segment the moving hand from background.
- partion of gesture streams into meaningful
sections. - Methodology
- integrating multiple clues skin color, motion.
- post-processing (morphological filtering
techniques).
17Fig. 6 Processing flow chart of real-time
segmentation
185. Recovering Image Motion Model Parameters by
Robust Regression
- 5.1 Parameterized Image Motion Model
- 5.2 Constructing Objective Function
- 5.3 Robust Error Norms
- 5.4 Simultaneous Over Relaxation with
Continuation Method. - 5.5 Multi-resolution Analysis.
- 5.6 Examples of Experiment Results.
194.1 Parameterized Image Motion Models
- Define
- Translation Model
- Affine Model
- Planar Model
- For example
204.2 Constructing Objective Function
- Brightness Constancy assumption
Taking the Taylor series expansion, simplifying,
and dropping terms above first order gives
Recover model parameters by minimizing following
objective function
215.3 Robust Error Norms
Quadratic Truncated quadratic Geman-McClure
function Lorentzian function
225.3 Robust Error Norms
- Fig. 7 Geman-McClure function. (a) Geman-McClure
function (b) Its derivative function.
235.4 Simultaneous Over Relaxation with
Continuation Method
- The iterative updating equation at n1 iteration
Where,
245.5 Multi-resolution Analysis
Fig. 8 Illustration of multi-resolution
analysis.
255.6 Examples of Image Motion Estimation
(a) (b)
(c)
- (d) (e)
(f) - Fig.9 An example of robust image motion
regression. (a) and (b) are the 2nd and 3nd
frames in an image sequence. (c) Inliers and
Outliers identified according to the result of
the first regression. (d) Segmentation of the
moving hand. (e) outliers identified according to
result of the second regression. (e) The
difference image between (a) and (b).
264.6 Examples of Image Motion Estimation
(a) (b)
(c)
- (d) (e)
(f) - Fig. 10 Another example of robust image motion
regression
276. Spatio-Temporal Appearance Modeling
- 6. 1. Inter-frame Motion Appearance
- 6.2. Inner-frame Shape Appearance
- 6.3. Spatio-temporal Appearance
286. 1. Inter-frame Motion Appearance
296.2. Inner-frame Shape Appearance
306.3. Spatio-temporal Appearance
Where,
317.1 Dynamic Time Warping
Fig. 11 DTW assumes that the endpoints of the
two patterns have been accurately located and
formulates the pattern matching problem as
finding the optimal path from the start to the
end on a finite grid. The optimal path can be
found efficiently by dynamic programming.
327.2 Modified DTW
- Our experiments find that the traditional DTW is
not adequate to match two spatio-temporal
appearance patterns. - Unlike the high sampling rate used in speech
recognition, the sampling rate is usually 10 Hz
in hand gesture recognition. Therefore, the
fluctuation in the time axis of hand gesture
patterns is much sharper than that of speech
patterns. - A modified DTW algorithm, a kind of non-linear
re-sampling technique, is developed to
dynamically warp each spatio-temporal pattern to
a fixed temporal length, which can reserve
necessary temporal information and spatial
distribution of original patterns.
337.3 Template based Recognition
- The distance between two sptio-temporal
appearance patterns is calculated based on
correlation between their warped patterns.
- Given a training set, a reference template is
created for each type of gestures by a minimax
type of optimization, then template-based
classification technique is employed to
recognized hand gestuers.
348. Experiment Results
- 8.1 Examples of Hand Gesture Segmentation.
- 8.2 Choosing Image Motion Models.
- 8.3 Examples of Spatio-temporal Appearance.
- 8.4 Examples of Warped Spatio-temporal
Appearance. - 8.5 Motion Appearance versus Shape Appearance.
- 8.6 Testing.
358.1 Examples of hand gesture segmentation
Fig.12 Segmentation result of a move up hand
gesture.
368.1 Examples of hand gesture segmentation
Fig. 13 Segmentation result of a move left
hand gesture.
378.1 Examples of hand gesture segmentation
Fig. 14 Segmentation result of a zoom in hand
gesture.
388.1 Examples of hand gesture segmentation
Fig. 15 Segmentation result of a yaw right
hand gesture.
398.2 Choosing Image Motion Model
Recognition rates when choosing different image
motion model
Conclusion For our gesture command set, choosing
affine model is necessary and sufficient.
408.3 Examples of Spatio-temporal Appearances
Table 1 Spatio-temporal appearance model
parameters of the move up gesture sample.
418.3 Examples of Spatio-temporal Appearances
Table 2 Spatio-temporal appearance parameters of
the move left gesture sample.
428.3 Examples of Spatio-temporal Appearances
Table 3 Spatio-temporal appearance parameters of
the zoom in gesture sample.
438.3 Examples of Spatio-temporal Appearances
Table 4 Spatio-temporal appearance parameters of
the yaw right gesture sample.
448.4 Determining of Warping Length
458.5 Examples of Warped Spatio-temporal Appearance
Table 6 Parameters of the warped spatio-temporal
appearance of the move up gesture sample.
468.5 Examples of Warped Spatio-temporal Appearances
Table 7 Parameters of the warped spatio-temporal
appearance of the move left gesture sample.
478.5 Examples of Warped Spatio-temporal Appearances
Table 8 Parameters of the warped spatio-temporal
appearance of the zoom in gesture sample.
488.5 Examples of Warped Spatio-temporal Appearances
Table 9 Parameters of the warped spatio-temporal
appearance of the yaw right gesture sample.
498.6 Motion Appearance Vs Shape Appearance
- To explore the discrimination power of motion
appearance or shape appearance separately, two
experiments are carried out, one with only motion
appearances being feature vectors and the other
with only shape appearances being feature
vectors.
508.7 Testing Experiment
- The average recognition rate achieved on the test
set is 89.6 . - Gesture-controlled panoramic map controller.
- The prototype system can recognize hand gestures
performed by a trained user with accuracy ranged
from 83 to 92.
519. Summary
- Aiming at real-time gesture-controlled
human-computer interaction, we propose novel
approaches for visual modeling, analysis, and
recognition of continuous dynamic hand gestures.
529. Summary
- A spatio-temporal appearance model is proposed to
represent dynamic hand gestures. - The model integrates temporal information,
motion and shape appearances. - The motion appearance represents the image
appearance changes caused by motion itself, not a
temporal sequence of static configurations. - The shape appearance is based on the geometrical
features of an ellipse fitted to the hand image
region rather than the simply moment-based
features.
539. Summary
- Novel approaches are developed to extract model
parameters by hierarchically integrating multiple
clues. - At low level, fusion of flesh chrominance
analysis and coarse image motion detection is
employed to detect and segment hand gestures - At high level, the model parameters are recovered
by integrating fine image motion estimation and
shape analysis. - The approaches achieve both real-time processing
and high recognition rates.
549. Summary
- A modified Dynamic Time Warping algorithm is
suggested for eliminating time variation of
spatio-temporal appearance patterns due to
various gesturing rates. - It is a kind of non-linear re-sampling
technique. - It can reserve necessary temporal information and
spatial distribution of original patterns.
559. Summary
- A prototype system, gesture-controlled panoramic
map browser, is designed and implemented to
demonstrate the usability of gesture-controlled
real-time interaction. - Dynamic hand gestures are recognized without
resorting to any special marks, limited or
uniform background, or particular illumination. - Only one uncalibrated video camera is utilized.
- Higher recognition rates are achieved.
- User is allowed to perform continuous hand
gestures, starting at any point within the view
field of the camera.
5610. Future Work
- We currently assume that the moving skin color
region in the scene is the gesturing hand, which
could be invalid when there appears a moving
human face. Exploiting simple geometrical model
of human body can alleviate this problem, in that
case multiple cameras can be necessary.
5710. Future Work
- To practically use hand gestures in HCI, more
gestural commands will be needed. - Some kind of commands would be more reasonably
input by static hand gestures (hand postures). - On the other hand, speech commands will be an
alternative to some gestural commands. - Cooperating hand gesture recognition into
multi-modal interface (MMI) is our next work.
58