Personalized Face Animation Framework for Multimedia Systems

About This Presentation

Title:

Personalized Face Animation Framework for Multimedia Systems

Description:

Eyes, Eye-brows, Node, Lips, Ears. Face Outline. Extra Control Points. Facial Patches ... Feature islands; interpolation with two feature points ... – PowerPoint PPT presentation

Number of Views:110

Avg rating:3.0/5.0

Slides: 96

Provided by: raminso

Category:

more less

Transcript and Presenter's Notes

Title: Personalized Face Animation Framework for Multimedia Systems

1
Personalized Face Animation Framework for
Multimedia Systems
Ph.D. Final Exam Ali Arya Supervisors Dr. Rabab
Ward, Dr. Babak Hamidzadeh Dept. of Electrical
Computer Engineering University of British
Columbia January 26th, 2004
2
Topics

Motivations and Objectives
Related Work
Content Description
Content Creation
System Architecture
Evaluation Criteria and Results
Conclusion

3
Introduction

Making Faces (Virtual Substitute for Real Humans)
Video Conferencing
Training and Customer Service
Games
Special Effects

4
Sample Applications
5
Personalized Face Animation
6
Motivations (Graphics Level)

Modeling Information
3D Sensor Data, Multiple Images, etc
Runtime Input Data
Images, Commands, etc
Bandwidth and Storage Limitations
Sending/Storing Textual Description vs.
Audio/Visual Data
Realism
Algorithmic Complexity

7
Motivations (System Level)

Real-time Performance
Streaming, Computational Cost, etc
Interaction
Applications (API)
Users
Existing and New Standards
MPEG-4
XML
Behavioural Modeling and Scripting

8
Questions

What are the requirements of a face animation
system?
What architectural components are needed?
How can we evaluate such a system?
What do features like realism and interactivity
really mean?

9
Objectives (General Requirements)

Streaming
structural and computational fitness for
continuously receiving and displaying data
Structured Content Description
a hierarchical way to provide information about
content from high-level scene description to
low-level moves, images, and sounds
Generalized Decoding
creating the displayable content with acceptable
quality based on input.

10
Objectives (Continued)

Component-based Architecture
the flexibility to rearrange the system
components, and use new ones, as long as a
certain interface is supported
Compatibility
the ability to use and work with widely accepted
industry standards in multimedia systems
Algorithm and Data Efficiency
a minimized database of audio-visual footage and
modeling/input data, and simple efficient
algorithms

11
Thesis Contribution

Modular streaming architecture
XML-based content description language
Computationally efficient content creation method
based on image transformations
Evaluation criteria

12
Related Work
Content Description (CD)
Content Creation (CC)
System Architecture (SA)
Evaluation Criteria (EC)
13
Related Work - CD

Video Editing
VRML, OpenGL
SMIL
MPEG, MPEG-4 XMT
FACS
MPEG-4 Face Definition/Animation
Character Animation Languages

14
Related Work - CC

3D
Volume or Surface
Based on 3D information and geometry
With or without 3D sensors
2D
View morphing
Based on 2D (image) data
Using normal photographs

15
3D vs. 2D Face Construction

3D advantages
Flexibility
Data Size
2D advantages
Computational Simplicity
Model Acquisition
Realism

16
Examples of Face Animation Systems

MikeTalk
Optical flow-based Talking head
VideoRewrite
Image stitching
PerceptionLab Facial Effects
Image Transformations
Virtual Human Director
3D models and texture mapping

17
Audio Content Creation

Text-To-Speech (TTS)
Model-based
Model of vocal tract
Concatenative
Pre-recorded speech segments
Lip-synch tools
Commercial systems available
Co-articulation

18
Related Work - SA

Face animation
No comprehensive structure
Java applets
MPEG-4 players
VRML viewers
General Streaming
Windows Media and DirectX
QuickTime
RealPlayer

19
Related Work - EC

No comprehensive study
Realism
Flexibility
Computational efficiency
Performance (e.g. realtime)
Compatibility
MPEG-4
XML
VRML, Java3D,
Scripting

20
Face Modeling Language (FML)

Structured Content Description
Hierarchical representation animation events
Timeline definition of the relation between
facial actions and external events
Defining capabilities and behavioural templates
Compatibility with MPEG-4 XMT and FAPs
Compatibility with XML and related web
technologies and existing tools
Support for different content generation
(animation) methods

21
FML Document

ltfmlgt
ltmodelgt lt!-- Model Info --gt
ltmodel-item /gt
lt/modelgt
ltstorygt lt! Story TimeLine --gt
ltactgt
lttime-containergt
ltFML-move /gt
lt/time-containergt
lt/actgt
lt/storygt
lt/fmlgt

22
FML Time Containers

ltpargt
Parallel moves
Started with reference to the start of the
container
Can have relative delay
ltseqgt
Sequential moves
Started with reference to the end of previous
Can have relative delay
ltactgt is a special-case sequential container

23
Decision-Making

ltexclgt time container
Waits as long as necessary
Selects one of the items based on value of
external event
lteventgt is defined in the model part
Controlled through API

24
FML Iterations

Controlled by the repeat attribute of time
containers
ltevent name"select" val"-1" /gt
ltact repeat"select"gt
lt/actgt
Indefinite loops
Associated with an event
Definite loops
Associated with a number or an event with
non-negative value

25
Behavioural Templates

lttemplategt
In model part
Collection of predefined actions
Normal FML scripts with parameters
ltbehaviourgt
In story part
Similar to a function call in normal programming
languages

26
Compatibility

XML
Use of existing parsers
Document Object Model (DOM) for dynamic
interaction
MPEG-4
FAPs
XMT
FDPs through ltparamgt model item

27
Feature-based Image Transformation (FIX)

Learn Xforms from a training set of images
Apply them to any new image
No complicated 3D model or large image database

28
FIX Basics
T (Fs ,M) M Ft - Fs
Learning
Apply T Warp image
Runtime
29
Facial Features and Regions

Facial Features
Eyes, Eye-brows, Node, Lips, Ears
Face Outline
Extra Control Points
Facial Patches

30
Corresponding Feature Points
? Feature lines are divided into segments. ? j is
the corresponding feature point to i th point of
k th segment in a feature line.
31
Image Transformations

Library of Transformations
Visemes in full-view
Facial expressions in full-view
3D head movements
T ( F , M )
T Transformation
F Feature Lines/Points
M Mapping Vectors

32
Mapping Features

Apply learned transformations to new features
Facial states 1,2 and 3 observed at learning time

33
Mapping Features (continued)

Use Scaling/Perspective Calibration
T 12 fs(T12)
fs is scaling function
T 13 fp(T12 , T13) T 24
T 13 (F3 M12) (F1 M12)
fp is perspective calibration function
Combined transformations
T14 a T12 b T 13

34
Image Warping

Feature islands interpolation with two feature
points
Face patches weighted average of feature points
around the patch

35
Feature Islands
36
Facial Patches

Set of Controlling Feature Lines/Points for each
Patch
e.g. Left Forehead
HeadTop, HeadLeft, Hair, BrowLeft

37
New Regions

More than one input image is required for newly
appearing regions of face/head.
Inside Mouth
Sides/Back of Head
Those regions from the second image will be
mapped into proper orientation.

38
Texture Mapping

Change Colour/Texture for Effects such as
Wrinkles
Store/Apply Local Normalized Colour Changes
around each Feature

39
Texture Mapping (Learning Algorithm)

For each Feature Fi in Destination Model Image
Define Feature Area Si around Fi
For each pixel in Si
Find Corresponding Pixel in Source
Calculate Normalized Colour Change
(destination - source)/source
Store Feature Area

40
Texture Mapping (Runtime Algorithm)

For each Feature Fi
Load Feature Area Si
Resize Si to the Transformed Base Image
For each pixel in Si
Find Corresponding Pixel
Apply Normalized Colour Change
(colour colourchange)

41
Sample Images (see more samples)
42
Sample Video Clips

Welcome Speech
video
Frown - Facial Expression
video
Turn Left Head Move
video

43
ShowFace System
Applications (GUI, Web page, )
SF-API
ShowFace Player
Video Kernel
Multimedia Mixer
Script Reader
Parser/ Splitter
Audio Kernel
Underlying Multimedia System
44
ShowFace System (continued)

SF-API
ShowFace Objects and Interfaces
Run-time Library Function
ShowFace Player
Wrapper ActiveX Component
Supporting Technologies
DirectShow
MS-XML

45
System-level Concerns

Streaming
Components with stream of data
Modularity
Independent upgrade and change
Well-defined API
Compatibility
XML, MPEG, Multimedia technologies
Flexibility
Web-based, stand-alone, etc

46
Audio Processing

Text-To-Speech Engine
MBROLA
Only for phoneme list
Pre-recorded diphone database
Speech segmentation tool
Smooth Connection
Static (power, pitch)
Dynamic (time/phase)

47
Multimedia Mixing

Synchronization
Number of video frames based on sound duration
Underlying multimedia system (DirectShow)

48
Evaluation Criteria
Content 1-Realism 2-Graphic Capabilities 3-Speech
Generation
Architecture 4-Timeliness and streaming 5-Descript
iveness 6-Compatibility 7-Modularity
Development 8-Computational Simplicity and
Efficiency 9-Input Requirements
49
Test Procedure
Application Level More Complicated Scenarios. All
System Components Involved.
System Level Modular Development New
Technologies and Methods. All System Components
Involved.
System Level Integration/Operation Simple Cases
with All Components. Used GraphEditor and Simple
Web Page
Algorithm/Transformation Level Creating
individual images/sounds. Used ShowFaceStudio.
50
Summarized Comparative Evaluation
Criteria
ShowFace
MikeTalk
VideoRewrite
PerceptionLab
VHD
Realism Capabilities Speech Time/Stream Descriptiv
eness Compatibility Modularity Efficiency Input
2.3 3 2 2 4 4 4 3 3
1.6 1.5 2 1 0 1 1 3 1
1.6 1.75 2 1 0 1 1 3 2
2 2.25 0 1 0 1 1 3 3
2.3 3 2 1 3 3 2 2 3
51
Conclusion

Structured content description with FML
Content creation with FIX
Minimized model information and input images
Realistic and Personalized
ShowFace modular streaming framework
Compatibility with and making use of existing
technologies and standards
Comprehensive evaluation criteria

52
Future Extensions

Enhanced feature detection
Perspective calibration
Enhanced texture transformation
e.g. Lighting
MPEG-4 integration
Behavioural modeling
Interface for web services

53
To Be Continued

Thank you !

54
Face Animation Timeline
FML Timeline
55
FML Sample

ltfmlgt
ltmodelgt lt!-- Model Info --gt
ltimg srcme.jpg /gt
ltrange dir0 val60 /gt
ltevent iduser val-1 /gt
lttemplate namehello gt
ltseq begin0gt
lttalk begin0gtHellolt/talkgt
lthdmv begin0 end5
dir0 val30 /gt
lt/seqgt
lt/templategt
lt/modelgt

56
FML Sample (continued)

ltstorygt lt!-- Story Timeline --gt
ltactiongt
ltbehavior templatehello /gt
ltexcl ev_idusergt
lttalk ev_val0gtHilt/talkgt
lttalk ev_val1gtByelt/talkgt
lt/exclgt
ltpar begin0gt
lttalk begin1gtHello Worldlt/talkgt
ltexp begin0 end3 type3 val50 /gt
lt/pargt
lt/actiongt
lt/storygt
lt/fmlgt

57
Basic FML Moves

Talk
Text to be spoken
lttalkgtHellolt/talkgt
HdMv
3D head movements
Around three axes
Expr
smile, anger, surprise, sadness, fear, and normal
Fap
Any other MPEG-4 FAP (embedded in FML)

58
MikeTalk
59
Video Rewrite
60
Perception Lab
61
FIX New Regions
62
Evaluation - Realism

Subjective
Random vs. non-random viewers
Testable effects and ground truth
Standard subjects
Objective
Regional image comparison (partially used)
Feature comparison (used here)
Geometric validity

63
Evaluation - Capabilities

Wide range of actions
Talking
Expressions
Movements
Personalization
Robustness

64
Evaluation - Speech

SEM SC x SQM / SDS
SEM, Speech Evaluation Index
SQM, Speech Quality Metric (e.g.
signal-to-noise-ratio and comprehensibility)
SDS, Speech Database Size (e.g. number of
segments and the size)
SC, proper scaling coefficient

65
Evaluation Timeliness and Streaming

Frame per second
Input/output formats
Connectivity
Underlying technologies
Compatibility
Quality of service
Streaming structure

66
Evaluation - Descriptiveness

Hierarchical view of animation
high-level stories to low-level actions (like
mouth-open)
Support for MPEG-4 FAPs
Support for MPEG-4 XMT framework
Behavioural modeling and templates
Dynamic decision-making
allowing external events to change the sequence
of actions)

67
Evaluation - Compatibility

Overlapped with other criteria
XML
Parser
DOM
MPEG-4
XMT
FAP
FDP

68
Evaluation - Modularity

Independent operation
Upgrade/change with minimum effect
Well-defined interfaces

69
Evaluation - Efficiency

Development
Maintenance
Runtime
Software measurement research

70
Evaluation Input Requirements

Modeling vs. Runtime
3D data vs. 2D images
Weighted combination of parameters

71
ShowFaceStudio
72
Web Test (see another sample)
73
Filter Graph Editor
74
Ground Truth (Visemes)
75
Ground Truth (Mix Samples)
76
Learning-phase Images
77
Video Editing

SMPTE Time Coding
Location of events down to frame
Edit Decision List (EDL)
Electronic Program Guide (EPG)
Metadata Information
SMPTE Metadata Dictionary
Dublin Core
EBU P/Meta
TV Anytime

78
VRML/X3D

Virtual Reality Modeling Language (VRML)
Description of virtual worlds
Specific viewing and authoring tools
Shares concepts with 3D programming libraries
like OpenGL

X3D is the XML-based version.
79
Content Description in MPEG
MPEG-4 Object Content Information (OCI)
Face/Body Animation and Modeling Parameters ??
MPEG-7 Extended OCI
Descriptors (for objects) Schemas (for
descriptors) Definition Language
80
Synchronization and Timing

Synchronized Multimedia Integration Language
(SMIL)
High-level description of events
XML-based
Level of abstraction on top of MPEG-4 objects
(any type!)

HTMLTIME from Microsoft and some other
companies.
81
MPEG-4 XMT

Extensible MPEG-4 Textual format (XMT)
Links MPEG-4 to other languages like VRML and SMIL

82
FACS

Facial Action Coding System (FACS)
64 basic facial Action Units (AUs)
Inner Brow Raiser
Mouth Stretch
Head Turn Left
Originally not for computer graphics and
animation, but used extensively

83
MPEG-4 FDP

Face Definition Parameters (FDPs)
Mostly for calibrating synthetic faces

84
MPEG-4 FAP

Face Animation Parameters (FAPs)
High-level
Visemes (14)
Expressions (6)
Low-level
3D head movements and other actions (66)
Mostly apply to feature points defined by FDPs

85
Face/Body Animation Languages
MPEG-4
VHML
XML
MPML
Face Animation
BEAT
Behavioural Modeling
CML
Decision-making
PAR
Temporal Relation
86
3D Modeling

Volume
Constructive Solid Geometry
Volume Elements (Voxels)
Octrees
3D Surfaces
Meshes
Splines

87
3D Face Models Virtual Human Director, Lee et
al
88
View Morphing

View Morphing (Metamorphosis)
Intermediate Frames between Source and Target
Based on Movements of Control Points and Features
Computer Vision Techniques to Select the
Corresponding Features (Optical Flow, Template
Matching, )

89
Warp Morph Digital Image Warping,
Wolberg MikeTalk, Ezzat et al
90
Multimedia Presentation