Geometric and Articulatory Models in Audiovisual Speech Technology - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

Geometric and Articulatory Models in Audiovisual Speech Technology

Description:

Geometric and Articulatory Models in Audiovisual Speech Technology. Tiina Karsikas ... muscle responsible for smiling; raises the commissure in the vertical direction ... – PowerPoint PPT presentation

Number of Views:64
Avg rating:3.0/5.0
Slides: 22
Provided by: tika
Category:

less

Transcript and Presenter's Notes

Title: Geometric and Articulatory Models in Audiovisual Speech Technology


1
Geometric and Articulatory Models in Audiovisual
Speech Technology
Tiina Karsikas
Audiovisuelle Sprache in der Sprachtechnologie
Sommersemester 2002
2
Papers
Parke, F.I. (1982) A Parameterized Model for
Facial Animation. IEEE Computer Graphics and
Applications 2(9), pp. 61-70 Magnenat-Thalmann,
N., E. Primeau and D. Thalmann (1988).
Abstract Muscle Action Procedures for Human Face
Animation. The Visual Computer 3(5), pp. 290-297.
Internet sources
LCE http//www.lce.hut.fi/research/face/index.
html KTH http//www.speech.kth.se/multimodal
PSL http//mambo.ucsc.edu/index.html MIRALab
http//www.miralab.unige.ch
3
Outline
  • Application Areas
  • Geometric Models
  • - Introduction
  • - Parkes Model
  • - Parkes descendants (Talking Heads in action)
  • Articulatory Models
  • - Introduction
  • - Magnenat-Thalmann et al.
  • - Rendez-vous à Montréal

4
Application Areas for Visual Speech Synthesis
  • Basic research on audiovisual speech perception
  • Multimedia
  • - e.g. synthetic news reader / story teller
  • Information systems in public and noisy
    environments
  • - airports
  • - train stations
  • - shopping centres
  • - etc.
  • Aids for hearing impaired
  • - tool for interactive training of speechreading
  • - visualizing tongue positions in speech
    training for deaf children
  • - telephone communication aid with a synthetic
    face

5
Other Potential Applications (Parke 1982)
  • Previews of the effects of corrective surgical
    or dental procedures on given faces
  • - conformation changes or changes in the range
    of possible expressions
  • Data compression for image transmission
  • - transmitting facial images simply by sending
    the appropriate parameter values
  • Forensic Art
  • - a crime victim could interactively modify
    parameters to obtain a 3D approximation
  • of the face of an assailant

Identi-KIT 2000 Witnesses shown a whole face
within the basic group matching their
description, after which they point out features
that arent quite right
6
Geometric (or Parametric) Models
Attempt to reproduce speech signals and facial
deformations in geometrical terms (I.e. not
trying to understand the underlying physiological
mechanisms that produce them)
  • 3D structure of the face which can be modified
    and deformed by the action of parameters
  • defined by a set of 3D meshes that describe the
    surface geometry of various organs (e.g. skin,
    teeth, eyes, etc...) which are generally
    involved in speech production
  • typically a few hundred 3D vertices (and the
    polygons they form) are moved by control
    parameters on the face (e.g. rotation)
  • control parameters can control a single point or
    more complex articulatory gestures or facial
    expressions
  • facial animation is created by changing the
    values of control parameters and redrawing the
    face by using the new values
  • the approach has the advantage of being quite
    simple and efficient as it requires low data
    storage

7
Parkes Model
Parkes 1982 parametric face model is what most
of the present audiovisual speech synthesizers
are based on Mesh of about 800 polygons that
approximate the surface of a human face including
the eyes, the eyebrows, the lips, and the teeth
Two main approaches 1. key frame animation -
a number of facial images are specified as key
frames and a computer algorithm is
used to generate the inbetween
frames 2. parameterized facial
models - an animator creates any facial
image by specifying the appropriate
parameter values ? Parametric
models better for 3D (as the number of key frames
becomes too high)
8
Parkes Model (cont.)
  • Creation of Parameterized Models
  • 1. underlying concept of parameterization and the
    development of appropriate parameter sets
  • - parameter values can be thought of as criteria
    values describing or specifying individual
  • members in any given class
  • - a complete set of parameters allows every
    member of the class to be specified just by
  • selecting appropriate parameter values (and
    every possible unique member can be described
  • by a unique n-tuple of parameter values)
  • - if certain members of the class cannot be
    uniquely described by parameter values the
  • parameter set is not complete
  • e.g. cubes
  • 2. Graphic image synthesizers producing images
    based on some defined parameterization

PARAMETRIC MODEL PROCEDURES, FUNCTIONS, DATA,
ETC.
GRAPHIC ROUTINES SHADING, RENDERRING, ETC.
GRAPHIC DESCRIPTORS
PARAMETERS
FACIAL IMAGES
MODEL DESIGNER
ANIMATOR
9
Parkes Model (cont.)
  • Facial Parameterization

Developing parameter sets 1. by observing
surface properties of faces and developing ad hoc
sets allowing the observed
characteristics to be specified
parametrically 2. by studying the
underlying structures, or facial anatomy
3. by blending the two parameters based on
structural understanding and
supplemented by parameters based on observation
- Based on Ekmans Facial Action
Coding System (FACS) (uses about 50 facial
actions) Two broad categories of
parameters Expression parameters -
controlling expression or emotional content
e.g. eyes eyelid opening, eybrow position,
looking direction... mouth jaw
rotation, width, expression of smile,
frown... Conformation parameters - controlling
structure of an individual face e.g. color of
skin, eyes, lips, etc., nose, chin, forehead
shape....
10
Effects of the Expression Parameters
sadness
surprise
disgust
neutral
happiness
anger
fear
11
Parkes Model (cont.)
  • The Model
  • - parameter set includes both expression and
    conformation parameters
  • - derived from earlier, more general versions

Polygon Topology - facial mask, each
eyeball, and teeth all separate polygons
(connected networks) - the 3D position of
each polygon vertex varies according to the
parameter values, eye orientation, and
face orientation - the polygons are sized
and positioned to match the features of the
face Operations - five kinds of operations
determine vertex positions from the parameter
values 1. Procedural construction models the
eyes 2. Interpolation used for those portions
of the face that change shape 3. Rotation used
to open the mouth 4. Scaling controls the
relative size of facial features 5. Position
offset controls the length of the nose, corners
of the mouth, raising of the upper lip The
advantage of parameterized models to the animator
is that he/she need only manipulate a limited
amount of imformation (the parameters) to create
a sequence of images
12
Some of Parkes Descendants
  • Contain a number of modifications to Parkes
    model to improve it and to make it more suitable
    for synthesized speech.
  • ? usually a set of rules for generating facial
    control parameter trajectories from phonetic
    text, and a simple tongue model (which were not
    included in the original Parke model)
  • Finnish Talking Head - Laboratory
    of Computational Engineering at Helsinki
    University of Technology
  • Controlled using 49 parameters, of which 12 are
    used in visual speech (lip formation and jaw
    rotation)
  • use of eyes and eyebrows
  • Future work improve coarticulation modelling
    and adding a tongue to the model
  • - Self-critical drawback the current quality of
    the speech synthesizer is far away from the
    natural voice

13
  • KTH Talking Heads - Centre for Speech
    Technology at KTH Royal Institute of Technology
  • Flexible architecture
  • ? allows the creation of new characters
    either by adopting a static wireframe
  • model and specifying the required
    deformation parameters for that model,
    or by sculpting and reshaping an already
    parameterized model
  • Use of eyes, eybrows, and tongue also several
    different expressions
  • - Currently working on improving dynamic
    articulation modelling

Kattis
Holger
Sven
Gunnar
Urban
Olga
August
Gustav
14
Mr. Smoketoomuch
15
Baldi - USCS Perceptual Science Lab
- Can be aligned with synthetic or natural speech
  • Main use as a language tutor for children with
    hearing problems
  • ? help with pronunciation and real-time feedback
  • ? children can easily practrice lip reading as
    Baldi produces probably the most accurate
  • automatic generation of visible speech in
    the world (Dr. Ron Cole, University of Colorado,
    Boulder)
  • ? in addition the children wear headphones over
    an acoustic nerve implant (inserted behind
  • the ear during a three hour operation) that
    converts sound into electrical signals that can
    be
  • relayed to the brain
  • - Also used for autistic people and people with
    reading disabilities

16
Articulatory Models
a.ka. Pseudomuscle-Based Models
  • - aim not to exactly simulate the detailed facial
    anatomy but to develop models with only a
  • few control parameters that emulate the basic
    face muscle actions
  • based on abstraction of muscle actions, where
    deformation operators define muscle activities
  • (ignoring the tissue structures and the exact
    muscle structure)

Platt Badler (1981) a mass-spring model to
simulate muscles - use Ekmans Facial Action
Coding System (FACS) - based on underlying
facial sructure - facial points simulated in the
skin, the muscles and the bones by 3D networks -
skin defined by a set of 3D points defining a
modifiable surface - bones represent an initial
unmovable level - between the two levels are
muscles as groups of points with elastic arcs
17
Articulatory Models (cont.)
N. Magnenat-Thalmann, E. Primeau and D.
Thalmann Abstract Muscle Action procedures (AMA)
- more complex than the single parameter
approach and a general muscle approach - work
on specific regions of the human face (which must
be defined when the face is constructed) - each
AMA procedure approximates the action of a single
muscle or a group of closely related muscles
e.g. the vertical jaw action responsible for
opening the mouth ? the single procedure
composed of several motions (lowering the
corners of the mouth, lowering the lip and parts
of the upper lip and rounding the overall lip
shape) - the order of each action is extremely
important as the AMA procedures are not
independent ? each AMA procedure is
responsible for a facial parameter corresponding
approximately to a muscle - similar
to, but not the same as FACS action units -
Weakness dependence on the order of the actions
as the muscles are not independent of each other
18
LIP AMA procedures - human lips very complex
(they may take almost any shape) - the goal of
muscle simulation for the lip control to provide
the illusion of generating the same motion as
human lips without imitating the complexity
?complex lip motions decomposed into several
simple motions (each simpler motion
produced by an AMA procedure) - Vertical Jaw ?
opening the mouth only movable bone in the head
composed of a series of successive small
motions - Close_Upper_Lip and Close_Lower_Lip ?
close the lips when they are open may be
manipulated separately - Left_Lip_Raiser and
Right_Lip_Raiser ? controls the raising of the
upper lip by the lip raiser muscle on the side of
the nose - Compressed_Lip ? muscle used in
kissing - Mouth_Beak ? lips out simularly to a
bird beak - Left_Zygomatic and Right_Zygomatic ?
muscle responsible for smiling raises the
commissure in the vertical direction -
Right_Risorius and Left_Risorius ? pulls
commisure in horizontal direction Two higher
levels in order to improve the user
interface The Expression Level - facial
expressions groups of facial parameter
values 1. Phonemes facial expression
which only uses motion and directly contributes
to speech - combination of several mouth
motions corresponding to specific sounds
useful for speaking - each phoneme
corresponds to a lip motion and tongue position
2. Emotions facial expression acting
on any part of the face (crying, smile,
laughter.)
19
The Script Level - a collection of multiple
tracks (a track is a chronological sequence of
key frames) - one track per facial parameter
(or AMA procedure) and two tracks for facial
expressions - a track for a facial parameter
can at any time be modified and mixed with the
facial expression - animation itself performed
by spline interpolation Constraints on human
faces 1. human face assumed to be approximately
symmetric ? only half of the face entered in the
computer ? AMA prcedures assume complete
symmetry and thus results can be strange in
assymmetrical cases 2. AMA procedures
translation independent, but can be
scale-dependent ? parameters of the procedures
may need to be scaled by a factor F when face is
scaled by F 3. Division of the human face into
specific regions (skin, teeth, etc.) ? the
order of the parts is significant
20
Script Example Rendez-vous à Montréal
Humphrey Bogart Heres looking at you,
kid. Marilyn Monroe Oh, play it again,
(Sam). Complete version in http//www.miralab.
unige.ch/newMIRA/Multimedia/films/FilmsroomMOVIES.
htm
Animation of human faces based on AMA procedures
part of the HUMAN FACTORY system ? system for
directing synthetic actors in their
environment - complete animation of human body
and complex hand animation
Conclusion The use of AMA procedures creates
fairly realistic results
21
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com