Motor%20Control%20Strategies%20for%20Chinese%20Intonation - PowerPoint PPT Presentation

About This Presentation

Title:

Motor%20Control%20Strategies%20for%20Chinese%20Intonation

Description:

A speaker has the chance to practice and optimize all the common 3- and 4- tone sequences. ... A simple model of the muscle control strategy. ... – PowerPoint PPT presentation

Number of Views:58

Avg rating:3.0/5.0

Slides: 25

Provided by: gregkoc

Learn more at: https://kochanski.org

Category:

more less

Transcript and Presenter's Notes

Title: Motor%20Control%20Strategies%20for%20Chinese%20Intonation

1
Motor Control Strategies for Chinese Intonation

Greg Kochanski (University of Oxford, UK)
Chilin Shih (University of Illinois,
Urbana-Champaign)
Tan Lee (Chinese University of Hong-Kong)
Hongyan Jing (IBM)

2
http//kochanski.org/gpk
3

The Goal
Explain intonation in a way that is
Consistent with linguistic assumptions.
Consistent with known Physiology and
Neuroscience.
The Method
Motion planning over a phrase.
Minimize sum of
Error between actual pitch and linguistic target
An effort cost term that penalizes rapid, jerky
motions.
The Result
Intonation in tone languages can be represented
by
A lexically-specified tone template (i.e, you use
a dictionary to look up which tone a syllable
has).
A continuous cost-of-error parameter, one per
word.
Evidence that the cost-of-misinterpretations we
measure are real
Cross-language similarities
Metrical patterns
Other

4
TheChallenge
5

Tone languages provide the ideal test case for
motor control strategies
tone is important, and
you can be sure what the speaker is trying to
accomplish.
The meaning of each syllable is determined by the
pitch contour over the syllable.
Ma (high tone) Mother
Ma (rising tone) Hemp
Ma (low falling tone) Horse
Ma (high falling tone) to scold
You can look up the tone in the dictionary.
Pitch contour is determined primarily by muscle
tension in the vocal folds.

6
Another Challenge
1
Typical tone shapes in green
F0 (Hz)
Time (10 ms intervals)
7
People talk nearly as fast as possible, therefore
dynamics must be important.
Pitch (f0) for a maximum-rate warble.
Pitch (f0) for a maximum-rate warble.
Conversational Mandarin on the sametime scale
8
The Data

Male speaker of Madarin (Chinese)
Female speaker of Cantonese (Chinese)
Text from newspaper news stories.
737 syllables for Mandarin
4?1.4 syllables per second
1.2?0.7 seconds per phrase (between pauses).
Segmented into words by three independent native
speakers (Mandarin)
Tracks of fundamental frequency vs. time (pitch)
extracted by get_f0 from ESPS/Waves package.

9
Basic assumptions used in modeling

People plan their utterances several syllables in
advance.
People produce optimal, highly practiced speech.
Most of what we say is made from bits and pieces
weve said before.
There are only 4 (Mandarin) or 6 (Cantonese)
tones to combine.
A speaker has the chance to practice and optimize
all the common 3- and 4- tone sequences.
A simple model for f0 (pitch) f0 is linearly
related to muscle tensions.
A simple model of the muscle control strategy.
No reason to believe pitch is controlled
differently from other muscle motions.

10
Optimize what?

People want to minimize the chance that they will
be significantly misunderstood. Some words will
be more important than others
Risk P(misinterpreted) cost-of-misinterpretati
on
Perhaps weight matches importance.
People want to minimize effort and/or talk faster
Chairs, Cars
How to combine the two?
A weighted sum.
Cost-of-misinterpretation plays the role of the
weight.

11
What is the unit of motion planning?Probably a
phrase or a sentence.
(Data courtesy Chilin Shih)
People start at a higher pitch when they begin
longer sentences. Also planning of inhaled air
volume. Therefore, there is some plan 300 ms
before start of speech.
12
Modeling math
p is the realized pitch
Were optimizing something
p is implicitly a function of time
R is the total risk for the utterance ri is the
error of the ith target, and si is the cost if
this particular word is misinterpreted.
Where ri is the error of the ith target
(this is an approximation see elsewhere for
correct, more detailed equation)
y(t) is the pitch of a point in the ith
target.The time-dependence is suppressed for
clarity.
13
Modeling math more detail.
The cost of a misinterpretation of the ith
syllable.
Total risk for the utterance.
Alpha (?) controls how much the shape of the
pitch contour matters.
Beta (?) controls how much the average pitch of
the syllable matters.
Where ri is the error of the ith target
y is the pitch of a point in the ith target.
A bar denotes an average over a target.
14
Effort
How does G depend on the form of the pitch
curve? Large effort implies a curve with larger
slopes and sharper corners wigglier.
15
Model behavior

For sgtgt1, Error (R) dominates, and pitch matches
target.
For sltlt1, Effort (G) dominates, both speaker and
listener accept large deviations, and pitch
smoothly interpolates.
For s1, everything compromises.

16
The rest of the model

A model is a sequence of targets.
The type of the target (tone1, tone2, )is
looked up in a dictionary.
Each target has a cost-of-misinterpretation.
The cost is adjustable for each word
Syllables within a word are derived from word
cost via the metrical pattern for words of a
certain length.
One target per tone.
Targets are stretched to fit syllable duration.
Only one phonological rule 33?23

17
Whats the procedure?
Sequence of tones (phonology)
Data
Compute the pitch curve as a function of
phonological inputs and the cost of a
misinterpretation.
Predicted F0
Costs of mis-interpretations
Nonlinear least-squares fitting algorithm
18
Model fits for Mandarin Chinese
Tone class (input)
Cost-of-misinterpretation (result)
Inside a word, the cost of a misinterpretation is
distributed by the metrical pattern
19
Model fits to Mandarin Chinese
0.61 free parameters per syllable, 13 Hz RMS
error.
20
Results are stable under small changes in the
model.
This model allows extra freedom different tones
are allowed to define their targets differently
Costs for misinterpreting different syllables.
This model allows less freedom all tones have
the same type of target.
The two models have words defined by different
labelers
21
Model parameters
Cantonese
Phrasing is marked in speech.
Cantonese data courtesy of Prof. Tan Lee
Mandarin
22
Metrical patterns inside words (Mandarin)
The metrical pattern controls how the
cost-of-misinterpretation is split up inside a
word. Syllables are marked with ?. The
vertical position is proportional to log(s) for
each syllable, so higher syllables have larger s,
and will be executed more carefully. For
4-syllable words, the error bars are shown by the
pairs of arrows.
Normal segmentation of characters into words.
Random segmentation of characters into words
Note that the metrical pattern disappears,
showing that we are measuring something real that
is tied to words.
23
Another nice property

The cost-of-misinterpretation parameter for a
syllable is correlated with the mutual
information with the preceeding syllable
r -0.175
gt95 confidence
Pitch patterns are implemented
sloppily for syllables that are unsurprising, and
precisely for surprising ones.
(Mutual informations from a database of 15000
newspaper sentences. Syllable identity was
defined by phoneme content and tone.)

24
Conclusion