Title: Nicolas Galoppo von Borries
1Speech synchronized facial animation
- Speech animation using Viseme Space
- G.A. Kalberer, P.Mueller and L.Van Gool
2Introduction
- Why not make the job even easier, and generate
animations directly from speech or transcribed
text? - Make animations directly from audio channel,
without performance capture - This animation can be the point of departure for
the animators, who then also get support from the
system to make further changes as desired
3Visemes
- Visemes can be considered as the visual speech
counterparts of phonemes. - They are associated to the 3D deformations of a
neutral face - Animation is achieved by concatenating visemes
VIDEO
4Animating faces in Viseme Space
- Smooth and convincing transitions by performing
interpolation in Viseme Space rather than in
geometric space - Viseme space can be roamed by the animator, as a
convenient tool to make creative modifications to
the animation.
5Face animation with visemes
- In related work, they describe how to extract a
set of visemes from a face, observed in 3D while
talking - Time consuming process we dont want to repeat
it for every single face to be animated
6Animation of novel faces
- What if we havent observed visemes for a novel
face? There are 3 main steps to animate such a
face - Personalizing the visemes
- Automatic audio-based animation
- Further modifications by the animator
7Personalizing the Visemes
- Simply cloning the visemes of a particular
example face on the novel face doesnt look real - We represent faces as points in Face Space
8Personalizing the Visemes
- is the orthogonal projection of the novel
face onto this hyperplane
9Personalizing the Visemes
- Express the projected novel face as
- Now we apply the weights wi to the visemes of the
example faces, to yield personalized set of
visemes for the novel face - The effect a rounded face will get visemes that
are closer to those of the more rounded example
face
10Automatic audio-based animation
- Basic steps for animation
- Audio track extract allophones and timings
- Translate the allophones to visemes
- Concatenate the visemes
- Animation sequence is a trajectory in Viseme
Space (similar to Face Space) - Viseme Space is based on Independent Component
Analysis (ICA) as opposed to PCA - (more about ICA later)
11Automatic audio-based animation
- Two problems with straight interpolation
- Point-to-point navigation between visemes yields
jerky motions - The temporal samples in the audio track may not
coincide with the pace at which visemes change - Do a Spline fitting to the Viseme Space
coordinates (NURBS) - All the visited deformations look realistic
- Fixed rate sampling gives smooth interpolations
12Automatic audio-based animation
- 2 types of visemes
- Vocals and labial consonants strict deformations
- Other can be pronounced with a lot of visual
variation - First perform fitting using first type, then bend
the curve towards points of second type.
13Modifications by the animator
- We want to allow the animator to add his creative
input to the generated animation - The animator can change
- The visited visemes
- The spline trajectories in between
14Modifications by the animator
- What happens when the animator changes the spline
trajectory? - The space of possible deformations in a space
based on PCA or ICA is the same - The difference ICA generates independent
components
15Modifications by the animator
- In fact, PCA is a pre-process of ICA
- PCA gives the major modes of variation in the net
effect of the movements of the individual
muscles - ICA decouples the net effect again. Therefore,
ICA makes the modifications by the animator more
intuitive.
16Modifications by the animator
- Example
- 1 IC can model opening the mouth, more PCs are
needed to model the same change - Changing 1 PC can open the mouth, but it will
also round it - Animators want intuitive keyframes like visemes,
but basic emotions as the primary modeling
interface
17Results
- Cloning alone of the visemes does not work,
weighting after projection in Face Space is needed
18Results
And animated
VIDEO
19Additional remarks
- The main key to producing realistic animations is
to add non-verbal speech related facial
expressions - Add Perlin noise
- Automatic Generation of Non-Verbal Facial
Expressions from Speech - Irene Albrecht, Jorg Haber, Hans-Peter Seidel