Title: Dynamics of Gestures: Temporal Patterning
 1Dynamics of Gestures Temporal Patterning
Work supported by NIH grant DC-03663
- Elliot Saltzman 
- Boston University  
- Haskins Laboratories
2Colleagues
- Dani Byrd 
- University of Southern California, USA 
- Louis Goldstein 
- Yale University  Haskins Laboratories, USA 
- Hosung Nam 
- Yale University  Haskins Laboratories, USA
3Question What is being learned when we learn a 
skilled behavior?
- Answer The dynamical system, or coordinative 
 structure, that shapes functional, coordinated
 activity defined across animal and environment
- But what is a dynamical system? 
- Roughly, it is a system of interacting variables 
 whose change over time are shaped by laws or
 rules of motion
- what types of variables? 
- what types of rules of motion?
4System states, parameters, and graphsand their 
dynamics
- Any dynamical system can be completely 
 characterized according to three types of
 variablesstate, parameter, and graphand their
 dynamics (Farmer, 1986)
- State variables a systems active degrees of 
 freedom
- defined by the number of autonomous 1st order 
 equations used to describe the system
- Ex) position  velocity of the mass in a damped 
 mass-spring system
- Ex) activations of nodes in a connectionist 
 network
- State dynamics the forces (velocity vector 
 field) defined in the space of state variables
 (state space) that shapes motion patterns of the
 state variables
5System states, parameters, and graphsand their 
dynamics(cont.)
- System parameters 
- Ex) m, b, k, and escapement strength in a limit 
 cycle equation
- Ex) target position in a point attractor equation 
- Ex) pendulum length 
- Ex) inter-node synaptic connection strength in a 
 connectionist network
- Parameter dynamics the forces/processes that 
 shape motion patterns of the system parameters
- Ex) intentional changes in oscillation frequency 
 in finger-wiggling experiment
- Ex) actor-environment field equation for 
 specifying target position in reaching
- Ex) changing system eigenfrequency due to 
 alteration of pendulum lengths in
 pendulum-swinging experiment
- Ex) connectionist learning algorithms for 
 changing system weights to solve a given
 computational task
6System states, parameters, and graphsand their 
dynamics(cont.)
- System graph Architecture of the systems 
 equation of motion
- the parameterized set of relationships defined 
 among a systems state variables
- Ex) circuit diagram (e.g., Simulink) 
 representation of mbk equation of motion
- Ex) node/connection diagram in a connectionist 
 network
7System states, parameters, and graphsand their 
dynamics(cont.)
- Graph dynamics the forces/processes that 
 change the system graph
-  state variables (i.e., system dimensionality) 
- Ex) recruitment/selection/assembly of degrees of 
 freedom appropriate for task in a particular
 actor-environment context
- e.g., recruitment of trunk leaning or body 
 twisting for reaching, depending on distance to
 target
- interconnection/linkage structure defined across 
 state variables
- Ex) learning/discovering appropriate interlimb 
 oscillator coupling functions to perform bimanual
 mn rhythms
- Ex) constructivist connectionist learning 
 algorithms that add/delete nodes and/or
 connections to implement grammar appropriate
 for learning given class of functions
8Outline of Remaining Presentation
- Part 1 Overview and review of task-dynamic model 
 of speech production
- Four types of timing phenomena Intragestural, 
 transgestural, intergestural, and global
- Hybrid dynamical model Task dynamics  recurrent 
 connectionist network
- Part 2 Focus on system graphs and intergestural 
 timing/phasing in speech production
- Influence of system graph on patterns of relative 
 timing between vowels and consonants in syllables
- Competitive, coupled oscillator model of syllable 
 structure
- task-dynamic model of intergestural phasing 
 (Saltzman  Byrd, 2000)
9Outline of Remaining Presentation (cont.)
- Part 3 State and/or parameter dynamics and 
 transgestural timing
- Phrasal boundary effects on local speaking rate 
- Prosodic gestures (p-gestures) induce local 
 slowings of central clock
- Part 4 Intragestural timing Gestural 
 anticipation intervals
- Self-organization of gestural onsets given 
 required times of target attainment
- Constrained temporal elasticity of anticipation 
 intervals
10Part 1 Overview and Review
- General Theoretical Question 
- How can we characterize the dynamics that 
 underlie the temporal coordination among the
 units (gestures) of speech?
11Dynamics Defined
- Dynamics 
- Laws or rules that specify the forces that 
 change a systems variables (system state) from
 one moment to the next
12Speech Gestures
- Equivalence classes of goal-directed actions by 
 different sets of articulators in the vocal tract
- examples 
- /p/, /b/, /m/Upper lip, lower lip, and jaw work 
 together to close the lips.
- /a/, /o/Tongue body and jaw work together to 
 position and shape the tongue dorsum (surface)
 for the vowel.
13Articulatory Phonology Catherine Browman and 
Louis Goldstein
- Speech can be described with a unitary structure 
 that captures both phonological and physical
 properties.
- Act of speaking can be decomposed into atomic 
 units, or gestures.
- Units of information Linguistic primitives of 
 speech production
- Units of action Dynamically-controlled 
 constriction actions of distinct vocal tract
 organs (e.g., lips, tongue tip, tongue body,
 velum, glottis)
- Coordinated into larger molecular structures
14Four Aspects of Speech Timing
- Intragestural variations of temporal patterns of 
 individual gestures
- Ex. Temporal asymmetry of velocity profiles 
- Intergestural relative phasing among gestures 
- Sequencing and partial temporal overlap 
 (coproduction) of vowel and consonant gestures in
 the word (and syllable) /pub/
- Transgestural modulations of temporal patterns 
 of all active gestures during a relatively
 localized portion of an utterance
- Ex. Temporally localized slowing of all gestures 
 in neighborhood of phrasal boundaries
- Global temporal pattern of entire utterance 
- Ex. Overall speaking rate or style 
15Overview Hybrid Dynamical Model
- Modeling dynamics of speech production a hybrid 
 dynamical model
- 2 components 
- Task-dynamic component shapes articulatory 
 trajectories given gestural timing information as
 input. Uses tract-variable and model articulator
 coordinates.
- Recurrent neural network provides a dynamics of 
 gestural timing. Uses activation coordinates.
16Tract Variable  Model Articulator Coordinates 
 17Gestural Activation
- A gestures dynamics influence vocal tract 
 activity for a discrete interval of time.
- Activations wax and wane gradually at edges. 
- A gestures strength is defined by its activation 
 level (range 0-1)
bad
time 
 18Gestures as Dynamical Systems
- Gestural activations are used to define 
 gesture-specific control dynamics in goal/task
 space coordinates
- point attractor dynamics of damped mass-spring 
 systems in the task-space
- constriction space (tract variables) closing the 
 lips, raising the tongue tip, etc.
- constriction target is approached regardless of 
 initial conditions or perturbations along the way
19Gestural Equation of Motion
Total gestural acceleration is the sum of the 
constriction gesture and neutral gesture 
acceleration components.
Constriction gesture
Neutral gesture (governs return to neutral 
posture) 
 20Hybrid Model Three Coordinate Systems 
 21Hybrid Dynamical Model Overall Structure 
 22Part 2 Intergestural Timing, System Graphs, and 
Syllable Structure
- Phenomenon Vowel and consonant gestures within 
 syllables show characteristic signatures of
 relative timing/phasing
- We hypothesized that these different patterns 
 were due to corresponding differences in
 intergestural coupling graphs
- coupling graphs were implemented in simulations 
- simulations were compared with actual data
23Syllable Structure Some Definitions
- The vowel and consonant gestures in a syllable 
 can be partitioned in three componentsOnset,
 Nucleus,  Coda
24Relative Timing in Syllables
- There is an asymmetry in patterns of relative 
 timing displayed within syllable-initial (onset)
 and syllable-final (coda) consonant clusters
- C-center effect on mean values of intergestural 
 relative phase
- c-center pattern occurs syllable-initially in 
 onsets but not syllable- finally in codas
- Browman  Goldstein (1988), Byrd (1995) 
- Stability of relative phasing 
- Greater stability (lower standard deviation) of 
 relative phasing occurs syllable initially in
 onsets than syllable-finally in codas
- Byrd (1996), Cho (2001) 
- Both effects are hypothesized to emerge from 
 appropriate dynamic coordination of gestures
 viewed in a oscillatory framework
25C-center Effect in Onsets, not Codas
Hypothetical Model
C-center
If add an additional coordination (C-C phasing)?
But C-V phasing is preserved as global 
c-center-to-V coordination
CV and CC phasings in competition
C-C phasing separates CC in timing
C-V phasing 
 26Why C-center Effect in Onsets and not Codas?
- Browman  Goldstein (2000)s Hypothesis 
- there are different coupling structures (system 
 graphs) for onsets (C1,oC2,oV) and codas
 (VC1,cC2,c)
- there is C1,o-V coupling in onsets, but there is 
 no V-C2,c coordination (coupling) in codas
- as a result, there is competition betweenVC and 
 CC phasings for onsets, but not for codas
27Proposed Coupling Graphs CCV vs. VCC
Competitive coupling structure
No V-C2 coordination No competition 
 28Stability of Relative Phasing
- Browman  Goldstein (2000) additionally 
 hypothesized that
- Competitive coupling structures in syllable 
 initial position may also help explain the
 greater stability of intergestural phasing in
 onsets than in codas
29Outline of Simulation Experiments
- C-center effect in CCV but not VCC? 
- Greater stability (lower variability) between 
 consonants in CCV than VCC?
- Effect of syllable boundary in heterosyllabic CC 
 sequences
30What do Oscillators Have to do with Speech?
- Oscillatory units have a well defined variable 
 representing timephase
- dynamics of coupled limit cycle oscillators 
 allows their relative timing to emerge in a
 self-organized manner due to intrinsic oscillator
 dynamics and the nature of the coupling.
- the best developed theories of inter-unit timing 
 come from work in (non-speech) rhythmic movement
31What do oscillators have to do with speech? 
(cont.)
- Phase has also been adopted as a measure of 
 intrinsic gestural time in speech gestures
 (Browman  Goldstein, Kröger, et al.)
- although point attractor models have been used to 
 model these gestures, intrinsic gestural phase
 has been defined relative to an associated
 abstract, underlying gestural oscillator
- Previously, the coordination of gestures in terms 
 of their relative phase has been specified by
 hand in models of word production
- we have been pursuing a model of speech timing 
 that allows relative phasing to self-organize as
 it does in oscillatory systems
32Task-dynamics of Intergestural Phasing 
- We assume that rhythmic and non-rhythmic speech 
 behavior have a common underlying dynamical
 organization
- here, we attempt to reconcile work in coupled 
 oscillator dynamics and intergestural timing in
 speech.
- Saltzman  Byrd (2000) implemented a task-dynamic 
 approach to controlling (generalized) relative
 phase and (mn) frequency ratio in a single pair
 of coupled nonlinear oscillators
- For a pair of oscillators in 11 frequency 
 locking
- the component oscillators must be coupled to one 
 another in a manner specific to the desired
 relative phasing
- We have generalized the Saltzman  Byrd (2000) 
 model to implement intergestural coupling among
 multiple (gt2) gestures (Nam, Saltzman,
 Goldstein, 2003)
33Control of Relative Phase General Approach
- Intergestural coupling is defined in a pairwise 
 manner among a set of oscillators in three steps
- 1stdefine set of task space potential functions, 
 V(y),
- state-variable represents relative phase (?  øi 
 øj)
- point minimum corresponds to desired relative 
 phase value, y0
- 2nddefine corresponding task-space (relative 
 phase) dynamics
- 3rdtransform these dynamics into the required 
 coupling forces between the component oscillators
- see Saltzman  Byrd (2000) for details
34Simulation Experiment 1 C-center effect in CCV
Competition
C-centers
- Target relative phase 
- C1-V  50? 
- C2-V  50? 
- C1-C2  30? 
C1
C1
V
C2
C2
- Resultant rel. phase(Final output) 
- C1-V  59.94? 
- C2-V  39.96? 
- C1-C2  19.98?
Mean of c-centers
C1
C-center effect
V
C2 
 35Simulation Experiment 1 No C-center effect in 
VCC
No competition
C-center
- Target relative phase 
- V-C1  50? 
- V-C2  none 
- C1-C2  30? 
C1
C1
V
C2
Mean of c-centers
- Resultant rel phase(Final output) 
- V-C1  49.96? 
- V-C2  79.90? 
- C1-C2  29.94?
C1
No c-center effect
V
C2 
 36Adding noiseSimulation Experiment 2
- Source of noise 
-  slight differences in frequencies of oscillators 
 (detuning)
- Noise modeled by adding a linear function to the 
 potential energy function
-  V (?)  -a cos (? - ?0)  b (? - ?0) 
-  b represents the amount of inter-oscillator 
 detuning,
-  which perturbs the location of potential 
 minimum
-  b randomly varied across simulations trials 
 within conditions defined by a given standard
 deviation
-  standard deviation of b manipulated across 
 simulation conditions
37Results Simulation Experiment 2
- Interconsonant phasing is more variable in 
 syllable-final position
std. of CC phase (radian)
1.0
Onsets
Codas
std. of detuning b
.05
.65
.25
.45
.85
- Browman  Goldsteins hypothesis proved correct 
-  Onsets in competition show greater stability
38Simulation Experiment 3 Generalizing the Model 
to Hetero-Syllabic Consonant Sequences
e.g. a scab e.g. mask amp e.g. bag sab
  39Results Simulation Experiment 3
- C-to-C phasing is more variable across boundaries
std. of CC phase (radian)
Onsets
1.0
Codas
X-bound
std. of detuning b
.05
.65
.25
.45
.85
- The result (VCCV lt VCCV lt VCCV) corresponds to 
 Byrd (1994)s findings
40Conclusion Importance of System Graph
- Dynamic structure (system graphs coupling 
 structure) generates observed phonetic
 asymmetries of intergestural phasing (mean
 patterns and their stability)
- C-center effect 
- mean relative phasing 
- Greater temporal stability
Competitive coupling structure in onset
Consonants not directly coupled across boundaries
- Effect of boundaries 
- (Greater variability)
41Future Directions Where are the Underlying 
Oscillators?
- Hypothesis Underlying oscillators live at the 
 state-unit level of the hybrid models recurrent
 network as members of an entrained oscillatory
 ensemble
- Question Is there a 11 association between 
 oscillators and gestures?
- Question How are the mappings learned between 
 oscillators and gestural activations?
42Part 3 Transgestural Effects of Phrasal 
Boundaries
- It has been shown that prosodic boundaries induce 
 temporally local contextual variation in ongoing
 articulation
- prosodic boundaries are boundaries between words 
 and higher order phrases in speech
- Boundary effects on articulation include 
- lengthening of gestural durations 
- decreased overlap (coarticulation) between 
 adjacent gestures
- spatially larger gestures in phrase-initial 
 positions
- Boundary effects appear to be graded 
- stronger boundaries induce greater lengthening
43Boundary Adjacent Slowing
- It has been shown that speech gestural durations 
 lengthen in the region of word and phrase
 boundaries
-  It also appears that stronger boundaries induce 
 greater lengthening
- Example (Byrd  Saltzman 1998)
44Boundary Adjacent Slowing(Byrd  Saltzman 1998) 
 45Boundary Adjacent Slowing(Byrd  Saltzman 1998)
Speaker J
mmi
none
word
pre-boundary lip opening duration
list
vocative
post-boundary lip closing duration
Boundary Type
utterance
Speaker K
none
word
list
vocative
utterance
0
100
200
300
(ms) 
 46Boundary Adjacent Relative Timing
- Additionally, evidence exists suggesting that 
 phrase boundaries affects the relative timing
 (i.e. overlap) between gestures.
- Chitoran, Goldstein  Byrd (to appear), Byrd 
 (1996), Hardcastle, (1985), Byrd, Kaun,
 Narayanan,  Saltzman, (2000), Jun (1993),
 Keating et al. (in press)
Time between displacement extrema in CC
.
70 
 47Approach Prosodic (p)-gestures
- Question How can we account for the variations 
 of gestural timing associated with prosodic
 context?
- p-gestures (prosodic gestures) influence the 
 expression of all constriction gestures which are
 concurrently active with the p-gestures
- Transgestural effect 
- Effect in proportion to the activation level of 
 the p-gesture.
- p-gesture activation determined by boundary 
 strength.
Byrd, Kaun, Naryanan,  Saltzman (2000), Byrd 
(2000), Byrd  Saltzman (subm) 
 48Two constrictions spanning a phrase boundary 
 49How is this Prosodic Action Effected?Parameter 
Dynamics Stiffness Lowering
- Lowering of gestural stiffness values has been 
 hypothesized to underlie gestural lengthening
 adjacent to phrasal boundaries.
-  Beckman et al. 1992, Byrd  Saltzman 1997 
- Local, transgestural on-line modulation of 
 gestural parameter values.
- E.g. Locally lower stiffness local 
 slowing
50But...
- Changes in both duration and relative timing 
 occur at phrase boundaries.
- Stiffness scaling does not account for changes in 
 relative timing.
- modulates point-attractor parameter values, but 
 does not specifically influence the domain of
 gestural activation.
51How is this Prosodic Action Effected?Central 
Clock Slowing
- Hypothesis Prosodic effects are induced by time 
 slowing at the gestural control level.
- slowing the timecourse of gestural activation 
 (Byrd  Saltzman, subm)
- Slowing the central clock has both intragestural 
 and intergestural timing consequences.
-  
-  Related Work V.-Bateson, Hirayama, Honda,  
 Kawato, 1992 Bailly, Laboissière,  Schwarz,
 1991 ODell  Nieminen, 1999 and especially,
 Port  Cummins, 1992, and Barbosa  Bailly, 1994
52Gestural Activation 
 53Slowing Activation Timecourse
Stretched with time slowing
1
0.5
No time slowing
0
0
0.05
0.1
0.15
0.2
0.25
Equation for time scaling/stretching/slowing
-  ? is scaled time, 
-  t is unscaled time whose flowrate  1, and 
-  a(t ), gestural activations (constriction and 
 p-gestures), are functions of scaled time.
54Simulation data No p-gesture
1
GESTURE 1
GESTURE 2
Activation
0.5
0
0
0.05
0.1
0.15
0.2
0.25
1
0.5
Position
0
-0.5
-1
0
0.05
0.1
0.15
0.2
0.25
gesture 2 duration
1
0.5
Velocity
0
-0.5
gesture 1 duration
-1
0
0.05
0.1
0.15
0.2
0.25 
 55Simulation p-gesture realized via clock slowing
Activation (faint unslowed bold slowed)
1
GESTURE 1 (phrase-final)
GESTURE 2 (phrase-initial)
0
.
5
p-gesture
0
0
0
.
0
5
0
.
1
0
.
1
5
0
.
2
0
.
2
5
Position (faint unslowed bold slowed)
1
0
.
5
0
-
0
.
5
-
1
0
0
.
0
5
0
.
1
0
.
1
5
0
.
2
0
.
2
5 
 56Initial Strengthening
- Initial strengthening apparently spatially 
 larger gestures in phrase-initial positions.
- E.g., more linguapalatal contact in lingual 
 consonants longer linguapalatal seal durations
 longer VOTs (Keating, Jun, Fougeron, Cho, Hsu,
 others) more breathy hs (Pierrehumbert
 Talkin, 1992) more lip rounding in rounded
 vowels (van Lieshout et al., 1995)
- BUT what is the articulatory foundation for these 
 very different types of effects?
Can we unite slowing, lesser overlap, and 
strengthening in terms of articulatory 
dynamicsspecifically clock slowing?? 
 57Simulation Clock slowing withtwo (same 
constriction) phrase-initial gestures
Gesture1closing  (e.g. lingual 
C) Gesture2opening  (e.g. following 
V) Gesture1 duration Gesture2 duration Time 
between peak velocities Spatial strengthening 
(phrase initial)
Activation (faint unslowed bold slowed)
gest 1 (consonant)
gest 2 (vowel)
1
0.5
p-gesture
0
0
0.05
0.1
0.15
0.2
0.25
Position (faint unslowed bold slowed)
2
Refererence line for plausible linguapalatal 
contact
1
0
-1
-2
0
0.05
0.1
0.15
0.2
0.25 
 58Summary p-gestures
- Local slowing of a central clock appears to be a 
 plausible way to capture prosodically driven
 shaping of articulatory behavior.
- Unlike stiffness modulation which only affects 
 gestural durations, clock rate modulation
 generates several experimentally observed
 prosodic effects
- gestural lengthening 
- reduced intergestural overlap 
- spatial strengthening
59Theoretical Implications of Prosodic-Gestures
- First step in conceiving a dynamical 
 implementation of phrasal structure.
- Just like articulatory gestures, phrasal 
 junctures are viewed as
- Having inherent durational properties 
- Being temporally coordinated with other gestures 
- Provides a theoretical reconciliation of what in 
 the past has been an inconsistency in the manner
 in which prosodic structure and segmental
 structure have been conceptualized in
 Articulatory Phonology (Browman  Goldstein, 1992
 and elsewhere).
60Part 4 Anticipatory Behavior of Speech Gestures
- Question 
- When does gestural motion begin relative to its 
 required time of target attainment in an
 utterance?
- Answer Controversial 
- Look-ahead modelas early as possible given no 
 other conflicting demands
- Frame modeltime-locked to the time of target 
 attainment
61Intragestural Effects Gestural Anticipation 
Intervals
- Intragestural shaping of gestural anticipation 
 intervals
- Self-organization of gestural onsets given 
 required times of target attainment
- Emergent behavior from a bidirectionally coupled 
 set of dynamical systems
- Activation dynamics (recurrent neural network) 
- Primary responsibility shaping gestural 
 activation patterns
- Acts as sequence-specific central controller 
 (clock, c.p.g.)
- drives task-dynamic model (feedforward) 
- Interarticulator coordination dynamics (task 
 dynamics)
- Primary responsibility shaping articulator 
 trajectories
- Ongoing state modulates recurrent controller 
 (feedback)
62Architecture of a Simple Hybrid Model  
task-dynamic elements
 sequential network elements  
 inter-element synapses
label delay lines   
numbers  symbols fixed weights assigned to some 
synapses. 
 63Network Training Side Constraints  Interval 
Types
- Network training/programming. 
- backprogagation-in-time distal supervised 
 learning
- Two constraint types during training 
- Task constraints specific to current task, 
 e.g., reach target at a specified time
- Side constraints generic constraints, e.g., 
 maximize smoothness, minimize effort, etc.
- Two types of training interval  
- Care task and side constraints 
- Dont care only side constraints 
- We used a side constraint that minimized gestural 
 activation.
64Anticipatory Behavior Effect of Side Constraints
Care
Don't care
interval
interval
Activation
level
Tract variable
position
- Left column Look-ahead behavior occurs when 
 side constraints are absent, and gestural onset
 occurs near the beginning of the dont-care
 interval, regardless of its length.
- Right column Frame model behavior occurs when 
 side constraints are present, regardless of the
 dont-care intervals length, and gestural
 onsets are approximately time-locked to the
 care interval.
65Constrained Temporal Elasticity in Speech
- Data on anticipatory lip-protrusion in French 
 speakers (e.g., Abry  Lallouache, 1995) suggests
 that anticipatory behavior may be neither rigidly
 time-locked nor totally unconstrained. This
 suggests a constrained temporal elasticity,
 intermediate between these two extremes.
- Abry  Lallouaches Movement Expansion Model, 
 i.e., a gestures anticipatory interval lengthens
 as the preceding dont care interval lengthens,
 but only fractionally. Different speakers show
 different lengthening fractions.
- We generated temporally elastic behavior using 
 intermediate values of side-constraints.
66Constrained Elasticity in the Hybrid Network 
 67Constrained Elasticity Lengthening Fractions 
 68