Title: RealTime VisionBased Gesture Recognition Using Haarlike Features
1Real-Time Vision-Based Gesture Recognition Using
Haar-like Features
- By Qing Chen, Nicolas D. Georganas and Emil M.
Petriu -
- IMTC 2007, Warsaw, Poland, May 1-3, 2007
2Outline
- 1. Introduction
- 2. Two-level Approach
- 3. Posture Recognition
- 4. Gesture Recognition
- 5. Conclusions
31. Introduction
- Human-Virtual Environment (VE) interaction
requires utilizing different modalities (e.g.
speech, body position, hand gestures, haptic
response, etc.) and integrating them together for
a more immersive user experience. - Hand gestures are a intuitive yet powerful
communication modality which has not been fully
explored for H-VE interaction. - The latest computer vision, image processing
techniques make real-time vision-based hand
gesture recognition feasible for human-computer
interaction. - Vision-based hand gesture recognition system
needs to meet the requirements in terms of
real-time performance, robustness and accurate
recognition.
41. Introduction (contd)
- Vision-based gesture recognition techniques
can be divided into two categories
- Appearance-based approachesv- Pros
simple hand models efficient implementation
real-time performance easier to achieve.-
Cons limited capability to model 3D hand
gestures.- We choose this approach to achieve
the real-time performance. -
- 3D hand model-based approaches
- Pros potentiality to model more natural
hand gestures. - Cons complex hand model
real-time performance is difficult
user-dependent.
52. Two-level Approach
- Definition 1 (Posture/Pose) A posture or pose is
defined solely by the (static) hand
configurations and hand locations. - Definition 2 (Gesture) A gesture is a series of
postures over a time span connected by motions
(global hand motion and local finger motion).
62. Two-level Approach (contd)
- With the hierarchical nature of the definition,
it is natural to decouple the gesture
classification problem into two levels - Lower-level recognition of primitives
(postures) - Solution Viola and Jones algorithm
- Higher-level recognition of structure (gesture)
- Solution Grammar-based analysis
Posture level Viola Jones Algorithm
Gesture level Grammar-based analysis
73. Posture Recognition
- Viola and Jones Algorithm (2001)
- A statistical approach originally for the task of
human face detection and tracking. - 15 times faster than any previous face detection
approaches while achieving equivalent accuracy to
the best published results. - Employed 3 techniques
- Haar-like features
- Integral image
- AdaBoosting Learning algorithm
- Issues for hand postures
- Applicability
- Classification besides detection
- Selection of posture sets
- Calibration
83. Posture Recognition (contd)
- Haar-like features
- The value of a Haar-like feature
- f(x)Sumblack rectangle (pixel gray level)
Sumwhite rectangle (pixel gray level) - Compared with raw pixels, Haar-like features can
reduce/increase the in-class/out-of-class
variability, and thus making classification
easier.
Figure 1 The set of basic Haar-like features.
Figure 2 The set of extended Haar-like features.
93. Posture Recognition (contd)
- The rectangle Haar-like features can be computed
rapidly using integral image. - Integral image at location of x, y contains the
sum of the pixel values above and left of x, y,
inclusive - The sum of pixel values within D can be
computed by P1 P4- P2 -P3
103. Posture Recognition (contd)
- To detect the hand, the image is scanned by a
sub-window containing a Haar-like feature. - Based on each Haar-like feature fj , a weak
classifier hj(x) is defined as where x is a
sub-window, and ? is a threshold. pj indicating
the direction of the inequality sign.
113. Posture Recognition (contd)
- In machine vision
- HARD to find a single accurate classification
rule - EASY to find rules with classification accuracy
slightly better than 50 (weak classifiers) . - AdaBoosting (Adaptive Boosting) is an iterative
algorithm to improve the accuracy stage by stage
based on a series of weak classifiers. - Adaptive later classifiers are tuned up in favor
of the samples misclassified by previous
classifiers.
123. Posture Recognition (contd)
- Adaboost starts with a uniform distribution of
weights over training examples. The weights
tell the learning algorithm the importance of the
example. - Obtain a weak classifier from the weak learning
algorithm, hj(x). - Increase the weights on the training examples
that were misclassified. - (Repeat)
- At the end, carefully make a linear combination
of the weak classifiers obtained at all
iterations.
133. Posture Recognition (contd)
- A series of classifiers are applied to every
sub-window. - The first classifier
- Eliminates a large number of negative
sub-windows - pass almost all positive sub-windows (high false
positive rate) with very little processing. - Subsequent layers eliminate additional negatives
sub-windows (passed by the first classifier) but
require more computation. - After several stages of processing the number of
negative sub-windows have been reduced radically.
143. Posture Recognition (contd)
- Four hand postures have been tested with Viola
Jones algorithm
- Input device A low cost Logitech QuickCam
web-camera with a resolution of 320 240 up at
15 frames-per-second.
153. Posture Recognition (contd)
- Training samples collection
- Negative samples images that must not contain
object representations. We collected 500 random
images as negative samples. - Positive samples hand posture images that are
collected from humans hand, or generated with a
3D hand model. For each posture, we collected
around 450 positive samples. As the initial test,
we use the white wall as the background.
163. Posture Recognition (contd)
- After the training process based on the
AdaBoosting learning algorithm, we get a cascade
classifier for each hand posture when the
required accuracy is achieved - Two-finger posture 15 stage cascade
classifier - Palm posture 10 stage cascade classifier
- Fist posture 15 stage cascade classifier
- Little finger posture 14 stage cascade
classifier. - The performance of trained classifiers for 100
testing images
173. Posture Recognition (contd)
- To recognize these different hand postures, a
parallel structure that includes all of the
cascade classifiers is implemented
183. Posture Recognition (contd)
- The real-time performance of the posture
recognition
194. Gesture Recognition
- As a gesture is a series of postures, a
grammar-based syntactic analysis is suitable to
describe the composite gestures based on
postures, and thus enables the system to
recognize the gestures based on their
representations. - For pattern recognition, a grammar G (N, T, P,
S) - A finite set N of non-terminal symbols
- A finite set T of terminal symbols that is
disjoint from N - A finite set P of production rules
- A distinguished symbol S ? N that is the start
symbol. - Issues in modeling the structure of hand
gestures - Choice of basic primitives
- Choice of appropriate grammar type (context free,
stochastic context free, regular, HMM)
205. Conclusions
- The parallel cascade structure based Haar-like
features and the AdaBoosting learning algorithm
can achieve satisfactory real-time hand posture
classification results - The experiment result shows the Viola and Jones
algorithm has very robust performance against
scale invariance and a certain degree of
robustness against in-plane rotation (15) and
out-of-plane rotation - Viola and Jones algorithm also shows good
performance for different illumination
conditions, but poor performance for different
backgrounds - A two-level architecture that can capture the
hierarchical nature of gesture classification is
proposed the lower level focused on the posture
recognition while the higher level focused on the
description of composite gestures using
grammar-based syntactic analysis.
21Dziekuje ?