Gesture Recognition - PowerPoint PPT Presentation

1 / 45
About This Presentation
Title:

Gesture Recognition

Description:

American Sign Language Recognition Using Hidden Markov Models (HMMs) ... box, car, book, table, paper, pants, bicycle, bottle, can, wristwatch, umbrella, ... – PowerPoint PPT presentation

Number of Views:269
Avg rating:3.0/5.0
Slides: 46
Provided by: Mart608
Category:

less

Transcript and Presenter's Notes

Title: Gesture Recognition


1
Gesture Recognition
  • Martin Stein
  • 26/5/04

2
Gesture Recognition
  • Markov Models
  • American Sign Language Recognition Using Hidden
    Markov Models (HMMs)
  • Gesture RecognitionUsing Finite StateMachines
    (FSMs)
  • Conclusion

3
I. Markov Models
  • Hidden Markov Models (HMMs)
  • 3 typical problems
  • HMM topologies

4
a) Hidden Markov Models
  • tuple ? (S,?,A,K,B)
  • S S0,S1,...,SN - states
  • ? p0,p1,...,pN - initial distribution
  • A (aij) - transition probabilities
  • K o1,o2,...,oN - output signs
  • B b0(oi),b1(oi),...,bN(oi) - emission
    probabilities
  • discrete in time
  • state transition probability constant

5
Weather example
  • S high-pressure,low-pressure
  • P 1.00
  • A
  • K sunshine, rain
  • B Psunshine, high0.8 Prain,high
    0.2 Psunshine,low 0.3 Prain,low 0.7

Psunshine0.8 Prain 0.2
Psunshine0.3 Prain 0.7
6
Weather example
  • Output sequence
  • O (sunshine,rain,sunshine)
  • State sequence X ???

7
b) 3 typical problems
  • Evaluation
  • Decoding
  • Training

8
Evaluation
  • Given HMM ?, output sequence O
  • What is P(O ?)?

Trivial algorithm O(NT)
Forward Algorithm O(N2T)
9
Decoding
  • Given HMM ?, output sequence O
  • Most probable state sequence ?

Viterbi-Algorithm Alignment output?state
10
Training
  • Given HMM ? (S,?,A,K,B) output sequence O
  • Maximize P(O ?)... How?
  • Global optimum ? inefficient
  • Local optimum ?
  • Forward-Backward-Algorithm
  • Baum-Welch-Reestimator

11
c) HMM topologies
  • Ergodic
  • Left-to-Right

12
II. ASL Recognition (HMMs)
  • What is American Sign Language?
  • Related work
  • HMM approach (MIT,1996)
  • HMM topology
  • System overview
  • Feature extraction
  • Desk-based recognizer
  • Wearable-based recognizer

13
a) What is American Sign Language?
  • ? 6000 gestures
  • Finger spelling
  • Pace of spoken conversation
  • Eyebrows

14
b) Related work
  • 1973, Gunnar Johansson
  • Human gestures can be recognized solely by motion
    information.
  • 1985, Sperling et al.
  • Isolated signs remain intelligible when
    subsampled to 24x16 pixels

15
b) Related work
  • System
  • Instrumented gloves
  • Desktop-based camera systems
  • Methods
  • Template matching
  • Neural nets
  • Model-based approach

16
b) Related work
  • HMMs used succesfully in
  • speech recognition
  • handwriting recognition
  • Since 95 Several HMM-based recognizers
    demonstrated

17
c) HMM approach (MIT,1996)
  • Thad Starner
  • Joshua Weaver
  • Alex Pentland
  • 1 colour camera
  • Unadorned hands
  • Real time

18
c) HMM approach (MIT,1996)
  • Part-of-speech grammar
  • pronoun verb noun adjective pronoun

19
d) HMM topology
  • Estimated number of different states 5

? handle less complicated signs Add skip
transitions
? fine tuning empirically 4 state HMM, 1 skip
transition
20
e) System overview
  • Goal Widely usable real-time system without
    constraints
  • Two different mounting locations
  • 320 x 240 pixels
  • colour
  • 10 fps

21
e) System overview
  • Second-person viewpoint
  • First-person viewpoint

22
f) Feature extraction
  • Algorithm for hand segmentation
  • Scan image for skin coloured pixel
  • Grow region by checking neighbours
  • Use centroid as seed for next frame
  • ?
  • Two blobs

23
f) Feature extraction
  • Two blobs

? Second moment analysis
  • ?
  • Feature vector
  • hands x,y position
  • change in x,y between frames
  • area/size (in pixels)
  • angle of axis of least inertia
  • length of eigenvector
  • eccenctriciy of bounding ellipse

24
g) Desk-based recognizer
  • 384 training sentences
  • 94 test sentences
  • Training
  • each sign ? separate HMM
  • train output probabilities (means, variances)

25
Training
  • Divide sentence in 5 equal portions
  • ?
  • Use Viterbi alignment
  • ? initial estimates for means variances

? Baum-Welch re-estimator ? Optimized means
variances
26
Testing
  • Concatenate all HMMs in all combinations
  • Calculate P(O ?)
  • Recognize sequence with highest probability

27
Testing
  • box 0.117
  • want
  • You paper 0.165
  • lose box 0.086

28
g) Desk-based recognizer
29
h) Wearable-based recognizer
  • 400 training sentences
  • 100 test sentences
  • New grammar added (5-word restriction)
  • Signer is to look forward

30
h) Wearable-based recognizer
31
III. Gesture Recognition (FSM)
  • Finite State Machine approach
  • Modelling using FSMs
  • Training the gesture model
  • Recognition
  • Results

32
a) FSM approach
  • Pengyu Hong
  • Matthew Turk
  • Thomas S. Huang
  • Goal
  • Real-time gesture recognizer
  • Technique to segment and align data automatically

University of Illinois at Urbana Microsoft
Research
33
b) Modelling using FSMs
  • Feature extraction
  • Real-time skin-colour tracking algorithm
  • ?
  • 2D positions of face and hands
  • ?
  • Trajectories of the hands relative to the head
  • Training data observing a repeated gesture
    several times

34
b) Modelling using FSMs
  • Gesture ordered sequence of states
  • state S ltûs,?s,ds,Tmin,s,Tmax,sgt
  • ûs 2D centroid
  • ?s spatial covariance matrix
  • ds distance threshold
  • Tmin,s,Tmax,s duration interval

35
c) Training the gesture model
  • Decoupletemporal information ? spatial
    information
  • ?
  • learn spatial information
  • ?
  • incorporate the temporal data
  • ?
  • refine spatial information

36
1. Spatial clustering
  • Define a threshold for the spatial variance
  • ?
  • Begin with a model of two states
  • ?
  • Train with dynamic k-means algorithm
  • ?
  • Split state with largest variance...

37
1. Spatial clustering
  • Wave left hand gesture without temporal
    information

38
2. Temporal alignment
  • Each data point
  • is assigned
  • a label ?

Manually specify the temporal sequence ?
structure of the FSM
39
2. Temporal alignment
  • Segment training data into gesture samples
  • 1 1 1 2 2 2 2 0 0 0 0 2 2 2 1 1
  • 1 1 1 2 2 2 0 0 0 0 2 2 1 1 1
  • 1 1 2 2 2 2 0 0 0 0 0 2 2 2 1 1
  • ...

Number of samples per state duration?
Tmin,s,Tmax,s
40
Training finished
  • ltûs,?s,ds,Tmin,s,Tmax,sgt

41
d) Recognition
  • Real time
  • Start all FSMs simultaneously
  • Check sample after sample ? O(n)

FSM requirements violated ? Reset and ignore FSM
FSM requirements met
Final state reached ? Recognizer fires
42
e) Results
  • Hand gestures

43
e) Results
  • Mouse gestures

44
IV. Conclusion
  • HMMs
  • Use detailed features (orientation, speed, ...)
  • Generalized extremely well
  • Able to recognize large vocabulary
  • FSMs
  • Handle gestures with different lengths
  • Computation complexity greatly reduced
  • Works with small training sets

45
References
  • 1 T.Starner, J. Weaver, and A. Pentland.
    Real-time American Sign Language recognition
    using desk and wearable computer-based video.
    IEEE Trans. Patt. Analy. and Mach. Intell., 1998.
  • 2 P. Hong, M. Turk, and T.S. Huang. Gesture
    modeling and recognition using finite state
    machines. Proc. Fourth IEEE International
    Conference and Gesture Recognition, March 2000,
    Grenoble, France.
  • 3 P. Hong, M. Turk, and T. S. Huang.
    Constructing Finite State Machines for Fast
    Gesture Recognition. 15th International
    Conference on Pattern Recognition, Barcelona,
    Spain, Sep 3-7, 2000.
Write a Comment
User Comments (0)
About PowerShow.com