RealTime VisionBased Gesture Recognition Using Haarlike Features - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

RealTime VisionBased Gesture Recognition Using Haarlike Features

Description:

Human-Virtual Environment (VE) interaction requires utilizing different ... hand gestures, haptic response, etc.) and integrating them together for a more ... – PowerPoint PPT presentation

Number of Views:224
Avg rating:3.0/5.0
Slides: 22
Provided by: qing7
Category:

less

Transcript and Presenter's Notes

Title: RealTime VisionBased Gesture Recognition Using Haarlike Features


1
Real-Time Vision-Based Gesture Recognition Using
Haar-like Features
  • By Qing Chen, Nicolas D. Georganas and Emil M.
    Petriu
  • IMTC 2007, Warsaw, Poland, May 1-3, 2007

2
Outline
  • 1. Introduction
  • 2. Two-level Approach
  • 3. Posture Recognition
  • 4. Gesture Recognition
  • 5. Conclusions

3
1. Introduction
  • Human-Virtual Environment (VE) interaction
    requires utilizing different modalities (e.g.
    speech, body position, hand gestures, haptic
    response, etc.) and integrating them together for
    a more immersive user experience.
  • Hand gestures are a intuitive yet powerful
    communication modality which has not been fully
    explored for H-VE interaction.
  • The latest computer vision, image processing
    techniques make real-time vision-based hand
    gesture recognition feasible for human-computer
    interaction.
  • Vision-based hand gesture recognition system
    needs to meet the requirements in terms of
    real-time performance, robustness and accurate
    recognition.

4
1. Introduction (contd)
  • Vision-based gesture recognition techniques
    can be divided into two categories
  • Appearance-based approachesv- Pros
    simple hand models efficient implementation
    real-time performance easier to achieve.-
    Cons limited capability to model 3D hand
    gestures.- We choose this approach to achieve
    the real-time performance.
  • 3D hand model-based approaches
    - Pros potentiality to model more natural
    hand gestures. - Cons complex hand model
    real-time performance is difficult
    user-dependent.

5
2. Two-level Approach
  • Definition 1 (Posture/Pose) A posture or pose is
    defined solely by the (static) hand
    configurations and hand locations.
  • Definition 2 (Gesture) A gesture is a series of
    postures over a time span connected by motions
    (global hand motion and local finger motion).

6
2. Two-level Approach (contd)
  • With the hierarchical nature of the definition,
    it is natural to decouple the gesture
    classification problem into two levels
  • Lower-level recognition of primitives
    (postures)
  • Solution Viola and Jones algorithm
  • Higher-level recognition of structure (gesture)
  • Solution Grammar-based analysis

Posture level Viola Jones Algorithm
Gesture level Grammar-based analysis
7
3. Posture Recognition
  • Viola and Jones Algorithm (2001)
  • A statistical approach originally for the task of
    human face detection and tracking.
  • 15 times faster than any previous face detection
    approaches while achieving equivalent accuracy to
    the best published results.
  • Employed 3 techniques
  • Haar-like features
  • Integral image
  • AdaBoosting Learning algorithm
  • Issues for hand postures
  • Applicability
  • Classification besides detection
  • Selection of posture sets
  • Calibration

8
3. Posture Recognition (contd)
  • Haar-like features
  • The value of a Haar-like feature
  • f(x)Sumblack rectangle (pixel gray level)
    Sumwhite rectangle (pixel gray level)
  • Compared with raw pixels, Haar-like features can
    reduce/increase the in-class/out-of-class
    variability, and thus making classification
    easier.

Figure 1 The set of basic Haar-like features.
Figure 2 The set of extended Haar-like features.
9
3. Posture Recognition (contd)
  • The rectangle Haar-like features can be computed
    rapidly using integral image.
  • Integral image at location of x, y contains the
    sum of the pixel values above and left of x, y,
    inclusive
  • The sum of pixel values within D can be
    computed by P1 P4- P2 -P3

10
3. Posture Recognition (contd)
  • To detect the hand, the image is scanned by a
    sub-window containing a Haar-like feature.
  • Based on each Haar-like feature fj , a weak
    classifier hj(x) is defined as where x is a
    sub-window, and ? is a threshold. pj indicating
    the direction of the inequality sign.

11
3. Posture Recognition (contd)
  • In machine vision
  • HARD to find a single accurate classification
    rule
  • EASY to find rules with classification accuracy
    slightly better than 50 (weak classifiers) .
  • AdaBoosting (Adaptive Boosting) is an iterative
    algorithm to improve the accuracy stage by stage
    based on a series of weak classifiers.
  • Adaptive later classifiers are tuned up in favor
    of the samples misclassified by previous
    classifiers.

12
3. Posture Recognition (contd)
  • Adaboost starts with a uniform distribution of
    weights over training examples. The weights
    tell the learning algorithm the importance of the
    example.
  • Obtain a weak classifier from the weak learning
    algorithm, hj(x).
  • Increase the weights on the training examples
    that were misclassified.
  • (Repeat)
  • At the end, carefully make a linear combination
    of the weak classifiers obtained at all
    iterations.

13
3. Posture Recognition (contd)
  • A series of classifiers are applied to every
    sub-window.
  • The first classifier
  • Eliminates a large number of negative
    sub-windows
  • pass almost all positive sub-windows (high false
    positive rate) with very little processing.
  • Subsequent layers eliminate additional negatives
    sub-windows (passed by the first classifier) but
    require more computation.
  • After several stages of processing the number of
    negative sub-windows have been reduced radically.

14
3. Posture Recognition (contd)
  • Four hand postures have been tested with Viola
    Jones algorithm
  • Input device A low cost Logitech QuickCam
    web-camera with a resolution of 320 240 up at
    15 frames-per-second.

15
3. Posture Recognition (contd)
  • Training samples collection
  • Negative samples images that must not contain
    object representations. We collected 500 random
    images as negative samples.
  • Positive samples hand posture images that are
    collected from humans hand, or generated with a
    3D hand model. For each posture, we collected
    around 450 positive samples. As the initial test,
    we use the white wall as the background.

16
3. Posture Recognition (contd)
  • After the training process based on the
    AdaBoosting learning algorithm, we get a cascade
    classifier for each hand posture when the
    required accuracy is achieved
  • Two-finger posture 15 stage cascade
    classifier
  • Palm posture 10 stage cascade classifier
  • Fist posture 15 stage cascade classifier
  • Little finger posture 14 stage cascade
    classifier.
  • The performance of trained classifiers for 100
    testing images

17
3. Posture Recognition (contd)
  • To recognize these different hand postures, a
    parallel structure that includes all of the
    cascade classifiers is implemented

18
3. Posture Recognition (contd)
  • The real-time performance of the posture
    recognition

19
4. Gesture Recognition
  • As a gesture is a series of postures, a
    grammar-based syntactic analysis is suitable to
    describe the composite gestures based on
    postures, and thus enables the system to
    recognize the gestures based on their
    representations.
  • For pattern recognition, a grammar G (N, T, P,
    S)
  • A finite set N of non-terminal symbols
  • A finite set T of terminal symbols that is
    disjoint from N
  • A finite set P of production rules
  • A distinguished symbol S ? N that is the start
    symbol.
  • Issues in modeling the structure of hand
    gestures
  • Choice of basic primitives
  • Choice of appropriate grammar type (context free,
    stochastic context free, regular, HMM)

20
5. Conclusions
  • The parallel cascade structure based Haar-like
    features and the AdaBoosting learning algorithm
    can achieve satisfactory real-time hand posture
    classification results
  • The experiment result shows the Viola and Jones
    algorithm has very robust performance against
    scale invariance and a certain degree of
    robustness against in-plane rotation (15) and
    out-of-plane rotation
  • Viola and Jones algorithm also shows good
    performance for different illumination
    conditions, but poor performance for different
    backgrounds
  • A two-level architecture that can capture the
    hierarchical nature of gesture classification is
    proposed the lower level focused on the posture
    recognition while the higher level focused on the
    description of composite gestures using
    grammar-based syntactic analysis.

21
Dziekuje ?
Write a Comment
User Comments (0)
About PowerShow.com