Title: Project 3 Sign Language Tutoring Tool
1Project 3Sign Language Tutoring Tool
- Members
- Lale Akarun, Alice Caplier,
- Michele Rombaut, Bulent Sankur,
- Oya Aran, Ismail Ari,
- Alexandre Benoit, Pavel Campr,
- Ana Huerta Carrillo, Francois-Xavier Fanard
2Outline
- Introduction
- Analysis
- Hand Detection Segmentation
- Hand Motion Shape
- Head motion facial expressions
- Recognition
- Data fusion methodology
- Results
- Synthesis
- Hand arms synthesis
- Head facial features synthesis
- Demonstrator interface
3First prototype of Sign Language Tutor
Developed by BUMM, Bogazici University (demonstrat
or in the Similar WP11)
- 7 signs from TSL
- Uses only one handed signs
- No head information is used
- Training, practice phases with feedback given to
the user - Developed using MFC
4Sign Language Tutor (eNTERFACE06)
- Currently 19 signs from ASL
- Can be easily extended
- Manual non-manual signs
- Head information as well as hand information
- One handed or two handed signs
- Synthesis part is developed
5Signs
- MANUAL SIGNS
- Signs with different hand motion, shape and
position - One or two handed
- Periodic vs non-periodic
- NON-MANUAL SIGNS
- Head motions that complete the hand motions
- Yes and No head nods
- To express the affirmative or negative form of
the expression - (i.e. here )
- Particular head rotations (left/right)
- To emphasize the expression (i.e. VERY clean)
6Analysis Recognition
Final Decision
HMMmanual models (trained)
HMMnonmanual models (trained)
HMMmanual likelihood for each sign
HMMnonmanual likelihood for each sign
Eye, Eye Brow, Mouth Motion open, close, up, down
Head Motion neutral, no, yes, turn
Hand Shape area, orientation
Hand Motion center of mass, velocity
Hand Position wrt face, wrt other hand
Hand Detection Segmentation
Face Detection
Sign Language Video
7Merged avatar
Synthesis
Head Synthesizer
Hand Synthesizer
Final Decision
Hand Shape area, orientation
Hand Motion center of mass, velocity
Hand Position wrt face, wrt other hand
Head Motion neutral, no, yes, turn
8Analysis Feature Extraction
- 19 features
- Axis lengths and angle of best fitting ellipse
- Ratio of black/white pixels in defined areas of
image - Compactness, area...
Hand shape
Head motion
Hand motion
9Data fusion Classification
- Train two HMMs for each sign, i
- HMMimanualnonmanual
- models hand and head information jointly
- HMMinonmanual
- models head information
- Likelihoods of two HMMs are fused sequentially
Hand features
Sign Clusters
HMMmanualnonmanual
Confusion Matrix
Confusion clusters for each sign
Sign with maximum likelihood (Sign i)
Head features
HMMnonmanual
10Recognition Results
- Confusion matrix on the test set
- 12 examples per sign
- 99 success in manual signs
- 85 success in total
11Sign Language Synthesis
- Hand and Head synthesis
- Two challenges
- Synthesize each sign manually
- 19 videos of the avatar performing each of the 19
signs - Synthesize what the user is doing
- Synthesize automatically the users hand and head
motions - PROBLEM No 3D information!!!!
- SOLUTION (partial)
- Give the decision of the recognizer to the
synthesizer - Adapt the manually synthesized video to the
users motion
12Automatic Head Synthesis
- Input data processing
- Filter the noise
- Normalize the input
- Write FAP file with processed input data
- Computation of 12 head movements
- Head synthesis rendering
- AVI formatting of the sequence
13Automatic Head Synthesis
- Result of the input data processing
- (A) Face in neutral state by filtering the
input signal - (horizontal velocity 0, IPL energy ? 0)
- (B) Face at maximal motion position
- (horizontal velocity 0, IPL energy ? 0)
- (C) Face at intermediate motion position
- (horizontal velocity ? 0, IPL energy ? 0)
(A)
(C)
(B)
14Automatic Hands Arms Synthesis
- INPUT
- - Sign type, animation length hand
coordinates for each frame. - - Head position, arms length
- OUTPUT
- -Video file.
15Merging Head and Hand
- Head and hand synthesis videos are produced
separately. - The resulting videos are merged in MATLAB.
- 6 backgrounds involving different avatar dresses.
(head)
(background)
(arms and hands)
(merged frame)
16Demonstrator Interface
Training
- Combination GUI in MATLAB
Information
Synthesis
Practice
Practice
17Conclusion Future Work
- Sign Language Tutoring tool -- demonstrator
- Direct feedback about the performance
- Sign language recognition with combining hand and
head information - 99.5 for hand information
- 85.5 for hand head information
- (Semi) automatic synthesis of the signs
- Future research directions
- Different data fusion methods
- Hand segmentation without markers
- Facial expression analysis, shoulders body
parts detection
18Questions
19Hand Segmentation
- Color segmentation
- for blue and yellow gloves separately.
- Histogram approach in HSV space
- Must be trained for different lighting conditions
- Double thresholding region growing
- maintains connectivity
- best thresholds are found iteratively
20Hand Motion Analysis
- Kalman Filter tracking
- Used for smoothing the trajectory
- Hand motion features (for each frame)
- Center of mass (CoM) x,y
- Velocity vx, vy
- Usage
- Recognition
- CoM Velocity as the training features
- Synthesis
- Define the exact movement of hand, by CoM
21Hand shape Analysis
Features hand 1 0.45 0.22 ... 0.11 hand 2 0.29
0.87 ... 0.83 ...
?
Hand shape (binary image)
Usage 1) Sign language recognition new features
for improvement of recognition system recognized
sign depends on hand and head trajectory and
hand shape 2) Sign language synthesis not
implemented now synthesis of performed sign
recognition of finger movements is difficult (low
resolution, overlapping) performed hand shape gt
feature extraction gt classification into
predefined set of hand shapes gt synthesis of
predefined hand shape
22Hand shape feature extraction
- 19 features are used to describe hand shape
- Axis lengths and angle of best fitting ellipse
- Ratio of black/white pixels in defined areas of
image - Compactness, area...
- Problems
- different hand shapes can have same binary image
and features gt we can classify only shapes with
different features
Left hand from top or right hand from bottom?
20 clusters are used each cluster is represented
by 15 templates, classification into a cluster
is done by K-means algorithm
Classification ?
23Head motion Analysis
- Detection method using the human visual system
modeling - Retina filtering that enhance the moving contours
response - Optic flow and frequency analysis for motion
event description - Provided data
- Strength of head motion evolution.
- Horizontal and vertical velocity evolution
24Demonstrator Interface (2)
- Training
- Involve teacher videos
- Watch them to learn the sign
- Practice
- Try the sign yourself and see the result in the
information panel - Watch the original captured video or the
segmented video - Synthesize
- Synthesize your sign and watch it
25Hands and arms synthesis
- PHISICAL FEATURES EXTRACTION
- we need to normalize the parameters depending on
our avatar. - SPEED PARAMETERS
- detection of the speed features, depending on the
coordinates variations. Set variation points
(border frames). - INTERPOLATION OF PREDEFINED POSITIONS
- for each border frame, set position interpolating
predefined ones, depending of our phisical
extraction. - FINAL INTERPOLATION
- empty frames are interpolated between the border
frames we have set. - FINAL ANIMATION VIDEO GENERATION
- keep final animation (hand. avi) to generate the
output.