RealTime VisionBased Gesture Recognition Using Haarlike Features - PowerPoint PPT Presentation

1 / 21

About This Presentation

Title:

RealTime VisionBased Gesture Recognition Using Haarlike Features

Description:

Human-Virtual Environment (VE) interaction requires utilizing different ... hand gestures, haptic response, etc.) and integrating them together for a more ... – PowerPoint PPT presentation

Number of Views:224

Avg rating:3.0/5.0

Slides: 22

Provided by: qing7

Category:

more less

Transcript and Presenter's Notes

Title: RealTime VisionBased Gesture Recognition Using Haarlike Features

1
Real-Time Vision-Based Gesture Recognition Using
Haar-like Features

By Qing Chen, Nicolas D. Georganas and Emil M.
Petriu
IMTC 2007, Warsaw, Poland, May 1-3, 2007

2
Outline

1. Introduction
2. Two-level Approach
3. Posture Recognition
4. Gesture Recognition
5. Conclusions

3
1. Introduction

Human-Virtual Environment (VE) interaction
requires utilizing different modalities (e.g.
speech, body position, hand gestures, haptic
response, etc.) and integrating them together for
a more immersive user experience.
Hand gestures are a intuitive yet powerful
communication modality which has not been fully
explored for H-VE interaction.
The latest computer vision, image processing
techniques make real-time vision-based hand
gesture recognition feasible for human-computer
interaction.
Vision-based hand gesture recognition system
needs to meet the requirements in terms of
real-time performance, robustness and accurate
recognition.

4
1. Introduction (contd)

Vision-based gesture recognition techniques
can be divided into two categories

Appearance-based approachesv- Pros
simple hand models efficient implementation
real-time performance easier to achieve.-
Cons limited capability to model 3D hand
gestures.- We choose this approach to achieve
the real-time performance.

3D hand model-based approaches
- Pros potentiality to model more natural
hand gestures. - Cons complex hand model
real-time performance is difficult
user-dependent.

5
2. Two-level Approach

Definition 1 (Posture/Pose) A posture or pose is
defined solely by the (static) hand
configurations and hand locations.
Definition 2 (Gesture) A gesture is a series of
postures over a time span connected by motions
(global hand motion and local finger motion).

6
2. Two-level Approach (contd)

With the hierarchical nature of the definition,
it is natural to decouple the gesture
classification problem into two levels
Lower-level recognition of primitives
(postures)
Solution Viola and Jones algorithm
Higher-level recognition of structure (gesture)
Solution Grammar-based analysis

Posture level Viola Jones Algorithm
Gesture level Grammar-based analysis
7
3. Posture Recognition

Viola and Jones Algorithm (2001)
A statistical approach originally for the task of
human face detection and tracking.
15 times faster than any previous face detection
approaches while achieving equivalent accuracy to
the best published results.
Employed 3 techniques
Haar-like features
Integral image
AdaBoosting Learning algorithm
Issues for hand postures
Applicability
Classification besides detection
Selection of posture sets
Calibration

8
3. Posture Recognition (contd)

Haar-like features
The value of a Haar-like feature
f(x)Sumblack rectangle (pixel gray level)
Sumwhite rectangle (pixel gray level)
Compared with raw pixels, Haar-like features can
reduce/increase the in-class/out-of-class
variability, and thus making classification
easier.

Figure 1 The set of basic Haar-like features.
Figure 2 The set of extended Haar-like features.
9
3. Posture Recognition (contd)

The rectangle Haar-like features can be computed
rapidly using integral image.
Integral image at location of x, y contains the
sum of the pixel values above and left of x, y,
inclusive
The sum of pixel values within D can be
computed by P1 P4- P2 -P3

10
3. Posture Recognition (contd)

To detect the hand, the image is scanned by a
sub-window containing a Haar-like feature.
Based on each Haar-like feature fj , a weak
classifier hj(x) is defined as where x is a
sub-window, and ? is a threshold. pj indicating
the direction of the inequality sign.

11
3. Posture Recognition (contd)

In machine vision
HARD to find a single accurate classification
rule
EASY to find rules with classification accuracy
slightly better than 50 (weak classifiers) .
AdaBoosting (Adaptive Boosting) is an iterative
algorithm to improve the accuracy stage by stage
based on a series of weak classifiers.
Adaptive later classifiers are tuned up in favor
of the samples misclassified by previous
classifiers.

12
3. Posture Recognition (contd)

Adaboost starts with a uniform distribution of
weights over training examples. The weights
tell the learning algorithm the importance of the
example.
Obtain a weak classifier from the weak learning
algorithm, hj(x).
Increase the weights on the training examples
that were misclassified.
(Repeat)

At the end, carefully make a linear combination
of the weak classifiers obtained at all
iterations.

13
3. Posture Recognition (contd)

A series of classifiers are applied to every
sub-window.
The first classifier
Eliminates a large number of negative
sub-windows
pass almost all positive sub-windows (high false
positive rate) with very little processing.
Subsequent layers eliminate additional negatives
sub-windows (passed by the first classifier) but
require more computation.
After several stages of processing the number of
negative sub-windows have been reduced radically.

14
3. Posture Recognition (contd)

Four hand postures have been tested with Viola
Jones algorithm

Input device A low cost Logitech QuickCam
web-camera with a resolution of 320 240 up at
15 frames-per-second.

15
3. Posture Recognition (contd)

Training samples collection
Negative samples images that must not contain
object representations. We collected 500 random
images as negative samples.
Positive samples hand posture images that are
collected from humans hand, or generated with a
3D hand model. For each posture, we collected
around 450 positive samples. As the initial test,
we use the white wall as the background.

16
3. Posture Recognition (contd)

After the training process based on the
AdaBoosting learning algorithm, we get a cascade
classifier for each hand posture when the
required accuracy is achieved
Two-finger posture 15 stage cascade
classifier
Palm posture 10 stage cascade classifier
Fist posture 15 stage cascade classifier
Little finger posture 14 stage cascade
classifier.
The performance of trained classifiers for 100
testing images

17
3. Posture Recognition (contd)

To recognize these different hand postures, a
parallel structure that includes all of the
cascade classifiers is implemented

18
3. Posture Recognition (contd)

The real-time performance of the posture
recognition

19
4. Gesture Recognition

As a gesture is a series of postures, a
grammar-based syntactic analysis is suitable to
describe the composite gestures based on
postures, and thus enables the system to
recognize the gestures based on their
representations.
For pattern recognition, a grammar G (N, T, P,
S)
A finite set N of non-terminal symbols
A finite set T of terminal symbols that is
disjoint from N
A finite set P of production rules
A distinguished symbol S ? N that is the start
symbol.
Issues in modeling the structure of hand
gestures
Choice of basic primitives
Choice of appropriate grammar type (context free,
stochastic context free, regular, HMM)

20
5. Conclusions

The parallel cascade structure based Haar-like
features and the AdaBoosting learning algorithm
can achieve satisfactory real-time hand posture
classification results
The experiment result shows the Viola and Jones
algorithm has very robust performance against
scale invariance and a certain degree of
robustness against in-plane rotation (15) and
out-of-plane rotation
Viola and Jones algorithm also shows good
performance for different illumination
conditions, but poor performance for different
backgrounds
A two-level architecture that can capture the
hierarchical nature of gesture classification is
proposed the lower level focused on the posture
recognition while the higher level focused on the
description of composite gestures using
grammar-based syntactic analysis.