Title: Personalized Face Animation Framework for Multimedia Systems
1Personalized Face Animation Framework for
Multimedia Systems
Ph.D. Final Exam Ali Arya Supervisors Dr. Rabab
Ward, Dr. Babak Hamidzadeh Dept. of Electrical
Computer Engineering University of British
Columbia January 26th, 2004
2Topics
- Motivations and Objectives
- Related Work
- Content Description
- Content Creation
- System Architecture
- Evaluation Criteria and Results
- Conclusion
3Introduction
- Making Faces (Virtual Substitute for Real Humans)
- Video Conferencing
- Training and Customer Service
- Games
- Special Effects
4Sample Applications
5Personalized Face Animation
6Motivations (Graphics Level)
- Modeling Information
- 3D Sensor Data, Multiple Images, etc
- Runtime Input Data
- Images, Commands, etc
- Bandwidth and Storage Limitations
- Sending/Storing Textual Description vs.
Audio/Visual Data - Realism
- Algorithmic Complexity
7Motivations (System Level)
- Real-time Performance
- Streaming, Computational Cost, etc
- Interaction
- Applications (API)
- Users
- Existing and New Standards
- MPEG-4
- XML
- Behavioural Modeling and Scripting
8Questions
- What are the requirements of a face animation
system? - What architectural components are needed?
- How can we evaluate such a system?
- What do features like realism and interactivity
really mean?
9Objectives (General Requirements)
- Streaming
- structural and computational fitness for
continuously receiving and displaying data - Structured Content Description
- a hierarchical way to provide information about
content from high-level scene description to
low-level moves, images, and sounds - Generalized Decoding
- creating the displayable content with acceptable
quality based on input.
10Objectives (Continued)
- Component-based Architecture
- the flexibility to rearrange the system
components, and use new ones, as long as a
certain interface is supported - Compatibility
- the ability to use and work with widely accepted
industry standards in multimedia systems - Algorithm and Data Efficiency
- a minimized database of audio-visual footage and
modeling/input data, and simple efficient
algorithms
11Thesis Contribution
- Modular streaming architecture
- XML-based content description language
- Computationally efficient content creation method
based on image transformations - Evaluation criteria
12Related Work
Content Description (CD)
Content Creation (CC)
System Architecture (SA)
Evaluation Criteria (EC)
13Related Work - CD
- Video Editing
- VRML, OpenGL
- SMIL
- MPEG, MPEG-4 XMT
- FACS
- MPEG-4 Face Definition/Animation
- Character Animation Languages
14Related Work - CC
- 3D
- Volume or Surface
- Based on 3D information and geometry
- With or without 3D sensors
- 2D
- View morphing
- Based on 2D (image) data
- Using normal photographs
153D vs. 2D Face Construction
- 3D advantages
- Flexibility
- Data Size
- 2D advantages
- Computational Simplicity
- Model Acquisition
- Realism
16Examples of Face Animation Systems
- MikeTalk
- Optical flow-based Talking head
- VideoRewrite
- Image stitching
- PerceptionLab Facial Effects
- Image Transformations
- Virtual Human Director
- 3D models and texture mapping
17Audio Content Creation
- Text-To-Speech (TTS)
- Model-based
- Model of vocal tract
- Concatenative
- Pre-recorded speech segments
- Lip-synch tools
- Commercial systems available
- Co-articulation
18Related Work - SA
- Face animation
- No comprehensive structure
- Java applets
- MPEG-4 players
- VRML viewers
- General Streaming
- Windows Media and DirectX
- QuickTime
- RealPlayer
19Related Work - EC
- No comprehensive study
- Realism
- Flexibility
- Computational efficiency
- Performance (e.g. realtime)
- Compatibility
- MPEG-4
- XML
- VRML, Java3D,
- Scripting
20Face Modeling Language (FML)
- Structured Content Description
- Hierarchical representation animation events
- Timeline definition of the relation between
facial actions and external events - Defining capabilities and behavioural templates
- Compatibility with MPEG-4 XMT and FAPs
- Compatibility with XML and related web
technologies and existing tools - Support for different content generation
(animation) methods
21FML Document
- ltfmlgt
- ltmodelgt lt!-- Model Info --gt
- ltmodel-item /gt
- lt/modelgt
- ltstorygt lt! Story TimeLine --gt
- ltactgt
- lttime-containergt
- ltFML-move /gt
- lt/time-containergt
- lt/actgt
- lt/storygt
- lt/fmlgt
22FML Time Containers
- ltpargt
- Parallel moves
- Started with reference to the start of the
container - Can have relative delay
- ltseqgt
- Sequential moves
- Started with reference to the end of previous
- Can have relative delay
- ltactgt is a special-case sequential container
23Decision-Making
- ltexclgt time container
- Waits as long as necessary
- Selects one of the items based on value of
external event - lteventgt is defined in the model part
- Controlled through API
24FML Iterations
- Controlled by the repeat attribute of time
containers - ltevent name"select" val"-1" /gt
- ltact repeat"select"gt
- lt/actgt
- Indefinite loops
- Associated with an event
- Definite loops
- Associated with a number or an event with
non-negative value
25Behavioural Templates
- lttemplategt
- In model part
- Collection of predefined actions
- Normal FML scripts with parameters
- ltbehaviourgt
- In story part
- Similar to a function call in normal programming
languages
26Compatibility
- XML
- Use of existing parsers
- Document Object Model (DOM) for dynamic
interaction - MPEG-4
- FAPs
- XMT
- FDPs through ltparamgt model item
27Feature-based Image Transformation (FIX)
- Learn Xforms from a training set of images
- Apply them to any new image
- No complicated 3D model or large image database
28FIX Basics
T (Fs ,M) M Ft - Fs
Learning
Apply T Warp image
Runtime
29Facial Features and Regions
- Facial Features
- Eyes, Eye-brows, Node, Lips, Ears
- Face Outline
- Extra Control Points
- Facial Patches
30Corresponding Feature Points
? Feature lines are divided into segments. ? j is
the corresponding feature point to i th point of
k th segment in a feature line.
31Image Transformations
- Library of Transformations
- Visemes in full-view
- Facial expressions in full-view
- 3D head movements
- T ( F , M )
- T Transformation
- F Feature Lines/Points
- M Mapping Vectors
32Mapping Features
- Apply learned transformations to new features
- Facial states 1,2 and 3 observed at learning time
33Mapping Features (continued)
- Use Scaling/Perspective Calibration
- T 12 fs(T12)
- fs is scaling function
- T 13 fp(T12 , T13) T 24
- T 13 (F3 M12) (F1 M12)
- fp is perspective calibration function
- Combined transformations
- T14 a T12 b T 13
34Image Warping
- Feature islands interpolation with two feature
points - Face patches weighted average of feature points
around the patch
35Feature Islands
36Facial Patches
- Set of Controlling Feature Lines/Points for each
Patch - e.g. Left Forehead
- HeadTop, HeadLeft, Hair, BrowLeft
37New Regions
- More than one input image is required for newly
appearing regions of face/head. - Inside Mouth
- Sides/Back of Head
- Those regions from the second image will be
mapped into proper orientation.
38Texture Mapping
- Change Colour/Texture for Effects such as
Wrinkles - Store/Apply Local Normalized Colour Changes
around each Feature
39Texture Mapping (Learning Algorithm)
- For each Feature Fi in Destination Model Image
- Define Feature Area Si around Fi
- For each pixel in Si
- Find Corresponding Pixel in Source
- Calculate Normalized Colour Change
- (destination - source)/source
- Store Feature Area
40Texture Mapping (Runtime Algorithm)
- For each Feature Fi
- Load Feature Area Si
- Resize Si to the Transformed Base Image
- For each pixel in Si
- Find Corresponding Pixel
- Apply Normalized Colour Change
- (colour colourchange)
41Sample Images (see more samples)
42Sample Video Clips
- Welcome Speech
- video
- Frown - Facial Expression
- video
- Turn Left Head Move
- video
43ShowFace System
Applications (GUI, Web page, )
SF-API
ShowFace Player
Video Kernel
Multimedia Mixer
Script Reader
Parser/ Splitter
Audio Kernel
Underlying Multimedia System
44ShowFace System (continued)
- SF-API
- ShowFace Objects and Interfaces
- Run-time Library Function
- ShowFace Player
- Wrapper ActiveX Component
- Supporting Technologies
- DirectShow
- MS-XML
45System-level Concerns
- Streaming
- Components with stream of data
- Modularity
- Independent upgrade and change
- Well-defined API
- Compatibility
- XML, MPEG, Multimedia technologies
- Flexibility
- Web-based, stand-alone, etc
46Audio Processing
- Text-To-Speech Engine
- MBROLA
- Only for phoneme list
- Pre-recorded diphone database
- Speech segmentation tool
- Smooth Connection
- Static (power, pitch)
- Dynamic (time/phase)
47Multimedia Mixing
- Synchronization
- Number of video frames based on sound duration
- Underlying multimedia system (DirectShow)
48Evaluation Criteria
Content 1-Realism 2-Graphic Capabilities 3-Speech
Generation
Architecture 4-Timeliness and streaming 5-Descript
iveness 6-Compatibility 7-Modularity
Development 8-Computational Simplicity and
Efficiency 9-Input Requirements
49Test Procedure
Application Level More Complicated Scenarios. All
System Components Involved.
System Level Modular Development New
Technologies and Methods. All System Components
Involved.
System Level Integration/Operation Simple Cases
with All Components. Used GraphEditor and Simple
Web Page
Algorithm/Transformation Level Creating
individual images/sounds. Used ShowFaceStudio.
50Summarized Comparative Evaluation
Criteria
ShowFace
MikeTalk
VideoRewrite
PerceptionLab
VHD
Realism Capabilities Speech Time/Stream Descriptiv
eness Compatibility Modularity Efficiency Input
2.3 3 2 2 4 4 4 3 3
1.6 1.5 2 1 0 1 1 3 1
1.6 1.75 2 1 0 1 1 3 2
2 2.25 0 1 0 1 1 3 3
2.3 3 2 1 3 3 2 2 3
51Conclusion
- Structured content description with FML
- Content creation with FIX
- Minimized model information and input images
- Realistic and Personalized
- ShowFace modular streaming framework
- Compatibility with and making use of existing
technologies and standards - Comprehensive evaluation criteria
52Future Extensions
- Enhanced feature detection
- Perspective calibration
- Enhanced texture transformation
- e.g. Lighting
- MPEG-4 integration
- Behavioural modeling
- Interface for web services
53To Be Continued
54Face Animation Timeline
FML Timeline
55FML Sample
- ltfmlgt
- ltmodelgt lt!-- Model Info --gt
- ltimg srcme.jpg /gt
- ltrange dir0 val60 /gt
- ltevent iduser val-1 /gt
- lttemplate namehello gt
- ltseq begin0gt
- lttalk begin0gtHellolt/talkgt
- lthdmv begin0 end5
- dir0 val30 /gt
- lt/seqgt
- lt/templategt
- lt/modelgt
56FML Sample (continued)
- ltstorygt lt!-- Story Timeline --gt
- ltactiongt
- ltbehavior templatehello /gt
- ltexcl ev_idusergt
- lttalk ev_val0gtHilt/talkgt
- lttalk ev_val1gtByelt/talkgt
- lt/exclgt
- ltpar begin0gt
- lttalk begin1gtHello Worldlt/talkgt
- ltexp begin0 end3 type3 val50 /gt
- lt/pargt
- lt/actiongt
- lt/storygt
- lt/fmlgt
57Basic FML Moves
- Talk
- Text to be spoken
- lttalkgtHellolt/talkgt
- HdMv
- 3D head movements
- Around three axes
- Expr
- smile, anger, surprise, sadness, fear, and normal
- Fap
- Any other MPEG-4 FAP (embedded in FML)
58MikeTalk
59Video Rewrite
60Perception Lab
61FIX New Regions
62Evaluation - Realism
- Subjective
- Random vs. non-random viewers
- Testable effects and ground truth
- Standard subjects
- Objective
- Regional image comparison (partially used)
- Feature comparison (used here)
- Geometric validity
63Evaluation - Capabilities
- Wide range of actions
- Talking
- Expressions
- Movements
- Personalization
- Robustness
64Evaluation - Speech
- SEM SC x SQM / SDS
- SEM, Speech Evaluation Index
- SQM, Speech Quality Metric (e.g.
signal-to-noise-ratio and comprehensibility) - SDS, Speech Database Size (e.g. number of
segments and the size) - SC, proper scaling coefficient
65Evaluation Timeliness and Streaming
- Frame per second
- Input/output formats
- Connectivity
- Underlying technologies
- Compatibility
- Quality of service
- Streaming structure
66Evaluation - Descriptiveness
- Hierarchical view of animation
- high-level stories to low-level actions (like
mouth-open) - Support for MPEG-4 FAPs
- Support for MPEG-4 XMT framework
- Behavioural modeling and templates
- Dynamic decision-making
- allowing external events to change the sequence
of actions)
67Evaluation - Compatibility
- Overlapped with other criteria
- XML
- Parser
- DOM
- MPEG-4
- XMT
- FAP
- FDP
68Evaluation - Modularity
- Independent operation
- Upgrade/change with minimum effect
- Well-defined interfaces
69Evaluation - Efficiency
- Development
- Maintenance
- Runtime
- Software measurement research
70Evaluation Input Requirements
- Modeling vs. Runtime
- 3D data vs. 2D images
- Weighted combination of parameters
71ShowFaceStudio
72Web Test (see another sample)
73Filter Graph Editor
74Ground Truth (Visemes)
75Ground Truth (Mix Samples)
76Learning-phase Images
77Video Editing
- SMPTE Time Coding
- Location of events down to frame
- Edit Decision List (EDL)
- Electronic Program Guide (EPG)
- Metadata Information
- SMPTE Metadata Dictionary
- Dublin Core
- EBU P/Meta
- TV Anytime
78VRML/X3D
- Virtual Reality Modeling Language (VRML)
- Description of virtual worlds
- Specific viewing and authoring tools
- Shares concepts with 3D programming libraries
like OpenGL
X3D is the XML-based version.
79Content Description in MPEG
MPEG-4 Object Content Information (OCI)
Face/Body Animation and Modeling Parameters ??
MPEG-7 Extended OCI
Descriptors (for objects) Schemas (for
descriptors) Definition Language
80Synchronization and Timing
- Synchronized Multimedia Integration Language
(SMIL) - High-level description of events
- XML-based
- Level of abstraction on top of MPEG-4 objects
(any type!)
HTMLTIME from Microsoft and some other
companies.
81MPEG-4 XMT
- Extensible MPEG-4 Textual format (XMT)
- Links MPEG-4 to other languages like VRML and SMIL
82FACS
- Facial Action Coding System (FACS)
- 64 basic facial Action Units (AUs)
- Inner Brow Raiser
- Mouth Stretch
- Head Turn Left
-
- Originally not for computer graphics and
animation, but used extensively
83MPEG-4 FDP
- Face Definition Parameters (FDPs)
- Mostly for calibrating synthetic faces
84MPEG-4 FAP
- Face Animation Parameters (FAPs)
- High-level
- Visemes (14)
- Expressions (6)
- Low-level
- 3D head movements and other actions (66)
- Mostly apply to feature points defined by FDPs
85Face/Body Animation Languages
MPEG-4
VHML
XML
MPML
Face Animation
BEAT
Behavioural Modeling
CML
Decision-making
PAR
Temporal Relation
863D Modeling
- Volume
- Constructive Solid Geometry
- Volume Elements (Voxels)
- Octrees
- 3D Surfaces
- Meshes
- Splines
873D Face Models Virtual Human Director, Lee et
al
88View Morphing
- View Morphing (Metamorphosis)
- Intermediate Frames between Source and Target
- Based on Movements of Control Points and Features
- Computer Vision Techniques to Select the
Corresponding Features (Optical Flow, Template
Matching, )
89Warp Morph Digital Image Warping,
Wolberg MikeTalk, Ezzat et al
90Multimedia Presentation
- Face Animation as a special case of Multimedia
Presentation - Examples
- Image Encoding/Decoding/Display
- Textual Data used as description for images to be
created and displayed
91Multimedia Presentation (continued)
- Streaming
- Structured Content Description
- Generalized Decoding
- Component-based Architecture
- Compatibility
- Efficiency
92Web Sample
93Web Sample (FML)
- ltevent iduser val-1 /gt
- . . .
- ltsergt
- lttalk ev_val0gtHello lt/talkgt
- ltexcl ev_idusergt
- lttalk ev_val0gtAlilt/talkgt
- . . .
- lt/exclgt
- lt/sergt
94Web Sample (HTML)
- ltbody onload"onPageLoad()"gt
- ltSELECT id"User" onchange"onUserChange()"gt
- ltOPTION value"0" selectedgtAlilt/OPTIONgt
- ltOPTION value"1"gtBabaklt/OPTIONgt
- ltOPTION value-1"gtlt/OPTIONgt
- lt/SELECTgt
- ltOBJECT idSFPlayer gt
- lt/OBJECTgt
95Web Sample (Script)
- function onPageLoad()
-
- SFPlayer.InputFile SampleFml.xml"
- SFPlayer.CreateMedia()
- SFPlayer.Play()
-
- function onUserChange()
-
- SFPlayer.SetEvent(USER,User.selectedIndex)