Personalized Face Animation Framework for Multimedia Systems - PowerPoint PPT Presentation

1 / 95
About This Presentation
Title:

Personalized Face Animation Framework for Multimedia Systems

Description:

Eyes, Eye-brows, Node, Lips, Ears. Face Outline. Extra Control Points. Facial Patches ... Feature islands; interpolation with two feature points ... – PowerPoint PPT presentation

Number of Views:110
Avg rating:3.0/5.0
Slides: 96
Provided by: raminso
Category:

less

Transcript and Presenter's Notes

Title: Personalized Face Animation Framework for Multimedia Systems


1
Personalized Face Animation Framework for
Multimedia Systems
Ph.D. Final Exam Ali Arya Supervisors Dr. Rabab
Ward, Dr. Babak Hamidzadeh Dept. of Electrical
Computer Engineering University of British
Columbia January 26th, 2004
2
Topics
  • Motivations and Objectives
  • Related Work
  • Content Description
  • Content Creation
  • System Architecture
  • Evaluation Criteria and Results
  • Conclusion

3
Introduction
  • Making Faces (Virtual Substitute for Real Humans)
  • Video Conferencing
  • Training and Customer Service
  • Games
  • Special Effects

4
Sample Applications
5
Personalized Face Animation
6
Motivations (Graphics Level)
  • Modeling Information
  • 3D Sensor Data, Multiple Images, etc
  • Runtime Input Data
  • Images, Commands, etc
  • Bandwidth and Storage Limitations
  • Sending/Storing Textual Description vs.
    Audio/Visual Data
  • Realism
  • Algorithmic Complexity

7
Motivations (System Level)
  • Real-time Performance
  • Streaming, Computational Cost, etc
  • Interaction
  • Applications (API)
  • Users
  • Existing and New Standards
  • MPEG-4
  • XML
  • Behavioural Modeling and Scripting

8
Questions
  • What are the requirements of a face animation
    system?
  • What architectural components are needed?
  • How can we evaluate such a system?
  • What do features like realism and interactivity
    really mean?

9
Objectives (General Requirements)
  • Streaming
  • structural and computational fitness for
    continuously receiving and displaying data
  • Structured Content Description
  • a hierarchical way to provide information about
    content from high-level scene description to
    low-level moves, images, and sounds
  • Generalized Decoding
  • creating the displayable content with acceptable
    quality based on input.

10
Objectives (Continued)
  • Component-based Architecture
  • the flexibility to rearrange the system
    components, and use new ones, as long as a
    certain interface is supported
  • Compatibility
  • the ability to use and work with widely accepted
    industry standards in multimedia systems
  • Algorithm and Data Efficiency
  • a minimized database of audio-visual footage and
    modeling/input data, and simple efficient
    algorithms

11
Thesis Contribution
  • Modular streaming architecture
  • XML-based content description language
  • Computationally efficient content creation method
    based on image transformations
  • Evaluation criteria

12
Related Work
Content Description (CD)
Content Creation (CC)
System Architecture (SA)
Evaluation Criteria (EC)
13
Related Work - CD
  • Video Editing
  • VRML, OpenGL
  • SMIL
  • MPEG, MPEG-4 XMT
  • FACS
  • MPEG-4 Face Definition/Animation
  • Character Animation Languages

14
Related Work - CC
  • 3D
  • Volume or Surface
  • Based on 3D information and geometry
  • With or without 3D sensors
  • 2D
  • View morphing
  • Based on 2D (image) data
  • Using normal photographs

15
3D vs. 2D Face Construction
  • 3D advantages
  • Flexibility
  • Data Size
  • 2D advantages
  • Computational Simplicity
  • Model Acquisition
  • Realism

16
Examples of Face Animation Systems
  • MikeTalk
  • Optical flow-based Talking head
  • VideoRewrite
  • Image stitching
  • PerceptionLab Facial Effects
  • Image Transformations
  • Virtual Human Director
  • 3D models and texture mapping

17
Audio Content Creation
  • Text-To-Speech (TTS)
  • Model-based
  • Model of vocal tract
  • Concatenative
  • Pre-recorded speech segments
  • Lip-synch tools
  • Commercial systems available
  • Co-articulation

18
Related Work - SA
  • Face animation
  • No comprehensive structure
  • Java applets
  • MPEG-4 players
  • VRML viewers
  • General Streaming
  • Windows Media and DirectX
  • QuickTime
  • RealPlayer

19
Related Work - EC
  • No comprehensive study
  • Realism
  • Flexibility
  • Computational efficiency
  • Performance (e.g. realtime)
  • Compatibility
  • MPEG-4
  • XML
  • VRML, Java3D,
  • Scripting

20
Face Modeling Language (FML)
  • Structured Content Description
  • Hierarchical representation animation events
  • Timeline definition of the relation between
    facial actions and external events
  • Defining capabilities and behavioural templates
  • Compatibility with MPEG-4 XMT and FAPs
  • Compatibility with XML and related web
    technologies and existing tools
  • Support for different content generation
    (animation) methods

21
FML Document
  • ltfmlgt
  • ltmodelgt lt!-- Model Info --gt
  • ltmodel-item /gt
  • lt/modelgt
  • ltstorygt lt! Story TimeLine --gt
  • ltactgt
  • lttime-containergt
  • ltFML-move /gt
  • lt/time-containergt
  • lt/actgt
  • lt/storygt
  • lt/fmlgt

22
FML Time Containers
  • ltpargt
  • Parallel moves
  • Started with reference to the start of the
    container
  • Can have relative delay
  • ltseqgt
  • Sequential moves
  • Started with reference to the end of previous
  • Can have relative delay
  • ltactgt is a special-case sequential container

23
Decision-Making
  • ltexclgt time container
  • Waits as long as necessary
  • Selects one of the items based on value of
    external event
  • lteventgt is defined in the model part
  • Controlled through API

24
FML Iterations
  • Controlled by the repeat attribute of time
    containers
  • ltevent name"select" val"-1" /gt
  • ltact repeat"select"gt
  • lt/actgt
  • Indefinite loops
  • Associated with an event
  • Definite loops
  • Associated with a number or an event with
    non-negative value

25
Behavioural Templates
  • lttemplategt
  • In model part
  • Collection of predefined actions
  • Normal FML scripts with parameters
  • ltbehaviourgt
  • In story part
  • Similar to a function call in normal programming
    languages

26
Compatibility
  • XML
  • Use of existing parsers
  • Document Object Model (DOM) for dynamic
    interaction
  • MPEG-4
  • FAPs
  • XMT
  • FDPs through ltparamgt model item

27
Feature-based Image Transformation (FIX)
  • Learn Xforms from a training set of images
  • Apply them to any new image
  • No complicated 3D model or large image database

28
FIX Basics
T (Fs ,M) M Ft - Fs
Learning
Apply T Warp image
Runtime
29
Facial Features and Regions
  • Facial Features
  • Eyes, Eye-brows, Node, Lips, Ears
  • Face Outline
  • Extra Control Points
  • Facial Patches

30
Corresponding Feature Points
? Feature lines are divided into segments. ? j is
the corresponding feature point to i th point of
k th segment in a feature line.
31
Image Transformations
  • Library of Transformations
  • Visemes in full-view
  • Facial expressions in full-view
  • 3D head movements
  • T ( F , M )
  • T Transformation
  • F Feature Lines/Points
  • M Mapping Vectors

32
Mapping Features
  • Apply learned transformations to new features
  • Facial states 1,2 and 3 observed at learning time

33
Mapping Features (continued)
  • Use Scaling/Perspective Calibration
  • T 12 fs(T12)
  • fs is scaling function
  • T 13 fp(T12 , T13) T 24
  • T 13 (F3 M12) (F1 M12)
  • fp is perspective calibration function
  • Combined transformations
  • T14 a T12 b T 13

34
Image Warping
  • Feature islands interpolation with two feature
    points
  • Face patches weighted average of feature points
    around the patch

35
Feature Islands
36
Facial Patches
  • Set of Controlling Feature Lines/Points for each
    Patch
  • e.g. Left Forehead
  • HeadTop, HeadLeft, Hair, BrowLeft

37
New Regions
  • More than one input image is required for newly
    appearing regions of face/head.
  • Inside Mouth
  • Sides/Back of Head
  • Those regions from the second image will be
    mapped into proper orientation.

38
Texture Mapping
  • Change Colour/Texture for Effects such as
    Wrinkles
  • Store/Apply Local Normalized Colour Changes
    around each Feature

39
Texture Mapping (Learning Algorithm)
  • For each Feature Fi in Destination Model Image
  • Define Feature Area Si around Fi
  • For each pixel in Si
  • Find Corresponding Pixel in Source
  • Calculate Normalized Colour Change
  • (destination - source)/source
  • Store Feature Area

40
Texture Mapping (Runtime Algorithm)
  • For each Feature Fi
  • Load Feature Area Si
  • Resize Si to the Transformed Base Image
  • For each pixel in Si
  • Find Corresponding Pixel
  • Apply Normalized Colour Change
  • (colour colourchange)

41
Sample Images (see more samples)
42
Sample Video Clips
  • Welcome Speech
  • video
  • Frown - Facial Expression
  • video
  • Turn Left Head Move
  • video

43
ShowFace System
Applications (GUI, Web page, )
SF-API
ShowFace Player
Video Kernel
Multimedia Mixer
Script Reader
Parser/ Splitter
Audio Kernel
Underlying Multimedia System
44
ShowFace System (continued)
  • SF-API
  • ShowFace Objects and Interfaces
  • Run-time Library Function
  • ShowFace Player
  • Wrapper ActiveX Component
  • Supporting Technologies
  • DirectShow
  • MS-XML

45
System-level Concerns
  • Streaming
  • Components with stream of data
  • Modularity
  • Independent upgrade and change
  • Well-defined API
  • Compatibility
  • XML, MPEG, Multimedia technologies
  • Flexibility
  • Web-based, stand-alone, etc

46
Audio Processing
  • Text-To-Speech Engine
  • MBROLA
  • Only for phoneme list
  • Pre-recorded diphone database
  • Speech segmentation tool
  • Smooth Connection
  • Static (power, pitch)
  • Dynamic (time/phase)

47
Multimedia Mixing
  • Synchronization
  • Number of video frames based on sound duration
  • Underlying multimedia system (DirectShow)

48
Evaluation Criteria
Content 1-Realism 2-Graphic Capabilities 3-Speech
Generation
Architecture 4-Timeliness and streaming 5-Descript
iveness 6-Compatibility 7-Modularity
Development 8-Computational Simplicity and
Efficiency 9-Input Requirements
49
Test Procedure
Application Level More Complicated Scenarios. All
System Components Involved.
System Level Modular Development New
Technologies and Methods. All System Components
Involved.
System Level Integration/Operation Simple Cases
with All Components. Used GraphEditor and Simple
Web Page
Algorithm/Transformation Level Creating
individual images/sounds. Used ShowFaceStudio.
50
Summarized Comparative Evaluation
Criteria
ShowFace
MikeTalk
VideoRewrite
PerceptionLab
VHD
Realism Capabilities Speech Time/Stream Descriptiv
eness Compatibility Modularity Efficiency Input
2.3 3 2 2 4 4 4 3 3
1.6 1.5 2 1 0 1 1 3 1
1.6 1.75 2 1 0 1 1 3 2
2 2.25 0 1 0 1 1 3 3
2.3 3 2 1 3 3 2 2 3
51
Conclusion
  • Structured content description with FML
  • Content creation with FIX
  • Minimized model information and input images
  • Realistic and Personalized
  • ShowFace modular streaming framework
  • Compatibility with and making use of existing
    technologies and standards
  • Comprehensive evaluation criteria

52
Future Extensions
  • Enhanced feature detection
  • Perspective calibration
  • Enhanced texture transformation
  • e.g. Lighting
  • MPEG-4 integration
  • Behavioural modeling
  • Interface for web services

53
To Be Continued
  • Thank you !

54
Face Animation Timeline
FML Timeline
55
FML Sample
  • ltfmlgt
  • ltmodelgt lt!-- Model Info --gt
  • ltimg srcme.jpg /gt
  • ltrange dir0 val60 /gt
  • ltevent iduser val-1 /gt
  • lttemplate namehello gt
  • ltseq begin0gt
  • lttalk begin0gtHellolt/talkgt
  • lthdmv begin0 end5
  • dir0 val30 /gt
  • lt/seqgt
  • lt/templategt
  • lt/modelgt

56
FML Sample (continued)
  • ltstorygt lt!-- Story Timeline --gt
  • ltactiongt
  • ltbehavior templatehello /gt
  • ltexcl ev_idusergt
  • lttalk ev_val0gtHilt/talkgt
  • lttalk ev_val1gtByelt/talkgt
  • lt/exclgt
  • ltpar begin0gt
  • lttalk begin1gtHello Worldlt/talkgt
  • ltexp begin0 end3 type3 val50 /gt
  • lt/pargt
  • lt/actiongt
  • lt/storygt
  • lt/fmlgt

57
Basic FML Moves
  • Talk
  • Text to be spoken
  • lttalkgtHellolt/talkgt
  • HdMv
  • 3D head movements
  • Around three axes
  • Expr
  • smile, anger, surprise, sadness, fear, and normal
  • Fap
  • Any other MPEG-4 FAP (embedded in FML)

58
MikeTalk
59
Video Rewrite
60
Perception Lab
61
FIX New Regions
62
Evaluation - Realism
  • Subjective
  • Random vs. non-random viewers
  • Testable effects and ground truth
  • Standard subjects
  • Objective
  • Regional image comparison (partially used)
  • Feature comparison (used here)
  • Geometric validity

63
Evaluation - Capabilities
  • Wide range of actions
  • Talking
  • Expressions
  • Movements
  • Personalization
  • Robustness

64
Evaluation - Speech
  • SEM SC x SQM / SDS
  • SEM, Speech Evaluation Index
  • SQM, Speech Quality Metric (e.g.
    signal-to-noise-ratio and comprehensibility)
  • SDS, Speech Database Size (e.g. number of
    segments and the size)
  • SC, proper scaling coefficient

65
Evaluation Timeliness and Streaming
  • Frame per second
  • Input/output formats
  • Connectivity
  • Underlying technologies
  • Compatibility
  • Quality of service
  • Streaming structure

66
Evaluation - Descriptiveness
  • Hierarchical view of animation
  • high-level stories to low-level actions (like
    mouth-open)
  • Support for MPEG-4 FAPs
  • Support for MPEG-4 XMT framework
  • Behavioural modeling and templates
  • Dynamic decision-making
  • allowing external events to change the sequence
    of actions)

67
Evaluation - Compatibility
  • Overlapped with other criteria
  • XML
  • Parser
  • DOM
  • MPEG-4
  • XMT
  • FAP
  • FDP

68
Evaluation - Modularity
  • Independent operation
  • Upgrade/change with minimum effect
  • Well-defined interfaces

69
Evaluation - Efficiency
  • Development
  • Maintenance
  • Runtime
  • Software measurement research

70
Evaluation Input Requirements
  • Modeling vs. Runtime
  • 3D data vs. 2D images
  • Weighted combination of parameters

71
ShowFaceStudio
72
Web Test (see another sample)
73
Filter Graph Editor
74
Ground Truth (Visemes)
75
Ground Truth (Mix Samples)
76
Learning-phase Images
77
Video Editing
  • SMPTE Time Coding
  • Location of events down to frame
  • Edit Decision List (EDL)
  • Electronic Program Guide (EPG)
  • Metadata Information
  • SMPTE Metadata Dictionary
  • Dublin Core
  • EBU P/Meta
  • TV Anytime

78
VRML/X3D
  • Virtual Reality Modeling Language (VRML)
  • Description of virtual worlds
  • Specific viewing and authoring tools
  • Shares concepts with 3D programming libraries
    like OpenGL

X3D is the XML-based version.
79
Content Description in MPEG
MPEG-4 Object Content Information (OCI)
Face/Body Animation and Modeling Parameters ??
MPEG-7 Extended OCI
Descriptors (for objects) Schemas (for
descriptors) Definition Language
80
Synchronization and Timing
  • Synchronized Multimedia Integration Language
    (SMIL)
  • High-level description of events
  • XML-based
  • Level of abstraction on top of MPEG-4 objects
    (any type!)

HTMLTIME from Microsoft and some other
companies.
81
MPEG-4 XMT
  • Extensible MPEG-4 Textual format (XMT)
  • Links MPEG-4 to other languages like VRML and SMIL

82
FACS
  • Facial Action Coding System (FACS)
  • 64 basic facial Action Units (AUs)
  • Inner Brow Raiser
  • Mouth Stretch
  • Head Turn Left
  • Originally not for computer graphics and
    animation, but used extensively

83
MPEG-4 FDP
  • Face Definition Parameters (FDPs)
  • Mostly for calibrating synthetic faces

84
MPEG-4 FAP
  • Face Animation Parameters (FAPs)
  • High-level
  • Visemes (14)
  • Expressions (6)
  • Low-level
  • 3D head movements and other actions (66)
  • Mostly apply to feature points defined by FDPs

85
Face/Body Animation Languages
MPEG-4
VHML
XML
MPML
Face Animation
BEAT
Behavioural Modeling
CML
Decision-making
PAR
Temporal Relation
86
3D Modeling
  • Volume
  • Constructive Solid Geometry
  • Volume Elements (Voxels)
  • Octrees
  • 3D Surfaces
  • Meshes
  • Splines

87
3D Face Models Virtual Human Director, Lee et
al
88
View Morphing
  • View Morphing (Metamorphosis)
  • Intermediate Frames between Source and Target
  • Based on Movements of Control Points and Features
  • Computer Vision Techniques to Select the
    Corresponding Features (Optical Flow, Template
    Matching, )

89
Warp Morph Digital Image Warping,
Wolberg MikeTalk, Ezzat et al
90
Multimedia Presentation
  • Face Animation as a special case of Multimedia
    Presentation
  • Examples
  • Image Encoding/Decoding/Display
  • Textual Data used as description for images to be
    created and displayed

91
Multimedia Presentation (continued)
  • Streaming
  • Structured Content Description
  • Generalized Decoding
  • Component-based Architecture
  • Compatibility
  • Efficiency

92
Web Sample
93
Web Sample (FML)
  • ltevent iduser val-1 /gt
  • . . .
  • ltsergt
  • lttalk ev_val0gtHello lt/talkgt
  • ltexcl ev_idusergt
  • lttalk ev_val0gtAlilt/talkgt
  • . . .
  • lt/exclgt
  • lt/sergt

94
Web Sample (HTML)
  • ltbody onload"onPageLoad()"gt
  • ltSELECT id"User" onchange"onUserChange()"gt
  • ltOPTION value"0" selectedgtAlilt/OPTIONgt
  • ltOPTION value"1"gtBabaklt/OPTIONgt
  • ltOPTION value-1"gtlt/OPTIONgt
  • lt/SELECTgt
  • ltOBJECT idSFPlayer gt
  • lt/OBJECTgt

95
Web Sample (Script)
  • function onPageLoad()
  • SFPlayer.InputFile SampleFml.xml"
  • SFPlayer.CreateMedia()
  • SFPlayer.Play()
  • function onUserChange()
  • SFPlayer.SetEvent(USER,User.selectedIndex)
Write a Comment
User Comments (0)
About PowerShow.com