Motor%20Control%20Strategies%20for%20Chinese%20Intonation - PowerPoint PPT Presentation

About This Presentation
Title:

Motor%20Control%20Strategies%20for%20Chinese%20Intonation

Description:

A speaker has the chance to practice and optimize all the common 3- and 4- tone sequences. ... A simple model of the muscle control strategy. ... – PowerPoint PPT presentation

Number of Views:58
Avg rating:3.0/5.0
Slides: 25
Provided by: gregkoc
Learn more at: https://kochanski.org
Category:

less

Transcript and Presenter's Notes

Title: Motor%20Control%20Strategies%20for%20Chinese%20Intonation


1
Motor Control Strategies for Chinese Intonation
  • Greg Kochanski (University of Oxford, UK)
  • Chilin Shih (University of Illinois,
    Urbana-Champaign)
  • Tan Lee (Chinese University of Hong-Kong)
  • Hongyan Jing (IBM)

2
http//kochanski.org/gpk
3
  • The Goal
  • Explain intonation in a way that is
  • Consistent with linguistic assumptions.
  • Consistent with known Physiology and
    Neuroscience.
  • The Method
  • Motion planning over a phrase.
  • Minimize sum of
  • Error between actual pitch and linguistic target
  • An effort cost term that penalizes rapid, jerky
    motions.
  • The Result
  • Intonation in tone languages can be represented
    by
  • A lexically-specified tone template (i.e, you use
    a dictionary to look up which tone a syllable
    has).
  • A continuous cost-of-error parameter, one per
    word.
  • Evidence that the cost-of-misinterpretations we
    measure are real
  • Cross-language similarities
  • Metrical patterns
  • Other

4
TheChallenge
5
  • Tone languages provide the ideal test case for
    motor control strategies
  • tone is important, and
  • you can be sure what the speaker is trying to
    accomplish.
  • The meaning of each syllable is determined by the
    pitch contour over the syllable.
  • Ma (high tone) Mother
  • Ma (rising tone) Hemp
  • Ma (low falling tone) Horse
  • Ma (high falling tone) to scold
  • You can look up the tone in the dictionary.
  • Pitch contour is determined primarily by muscle
    tension in the vocal folds.

6
Another Challenge
1
Typical tone shapes in green
F0 (Hz)
Time (10 ms intervals)
7
People talk nearly as fast as possible, therefore
dynamics must be important.
Pitch (f0) for a maximum-rate warble.
Pitch (f0) for a maximum-rate warble.
Conversational Mandarin on the sametime scale
8
The Data
  • Male speaker of Madarin (Chinese)
  • Female speaker of Cantonese (Chinese)
  • Text from newspaper news stories.
  • 737 syllables for Mandarin
  • 4?1.4 syllables per second
  • 1.2?0.7 seconds per phrase (between pauses).
  • Segmented into words by three independent native
    speakers (Mandarin)
  • Tracks of fundamental frequency vs. time (pitch)
    extracted by get_f0 from ESPS/Waves package.

9
Basic assumptions used in modeling
  • People plan their utterances several syllables in
    advance.
  • People produce optimal, highly practiced speech.
  • Most of what we say is made from bits and pieces
    weve said before.
  • There are only 4 (Mandarin) or 6 (Cantonese)
    tones to combine.
  • A speaker has the chance to practice and optimize
    all the common 3- and 4- tone sequences.
  • A simple model for f0 (pitch) f0 is linearly
    related to muscle tensions.
  • A simple model of the muscle control strategy.
  • No reason to believe pitch is controlled
    differently from other muscle motions.

10
Optimize what?
  • People want to minimize the chance that they will
    be significantly misunderstood. Some words will
    be more important than others
  • Risk P(misinterpreted) cost-of-misinterpretati
    on
  • Perhaps weight matches importance.
  • People want to minimize effort and/or talk faster
  • Chairs, Cars
  • How to combine the two?
  • A weighted sum.
  • Cost-of-misinterpretation plays the role of the
    weight.

11
What is the unit of motion planning?Probably a
phrase or a sentence.
(Data courtesy Chilin Shih)
People start at a higher pitch when they begin
longer sentences. Also planning of inhaled air
volume. Therefore, there is some plan 300 ms
before start of speech.
12
Modeling math
p is the realized pitch
Were optimizing something
p is implicitly a function of time
R is the total risk for the utterance ri is the
error of the ith target, and si is the cost if
this particular word is misinterpreted.
Where ri is the error of the ith target
(this is an approximation see elsewhere for
correct, more detailed equation)
y(t) is the pitch of a point in the ith
target.The time-dependence is suppressed for
clarity.
13
Modeling math more detail.
The cost of a misinterpretation of the ith
syllable.
Total risk for the utterance.
Alpha (?) controls how much the shape of the
pitch contour matters.
Beta (?) controls how much the average pitch of
the syllable matters.
Where ri is the error of the ith target
y is the pitch of a point in the ith target.
A bar denotes an average over a target.
14
Effort
How does G depend on the form of the pitch
curve? Large effort implies a curve with larger
slopes and sharper corners wigglier.
15
Model behavior
  • For sgtgt1, Error (R) dominates, and pitch matches
    target.
  • For sltlt1, Effort (G) dominates, both speaker and
    listener accept large deviations, and pitch
    smoothly interpolates.
  • For s1, everything compromises.

16
The rest of the model
  • A model is a sequence of targets.
  • The type of the target (tone1, tone2, )is
    looked up in a dictionary.
  • Each target has a cost-of-misinterpretation.
  • The cost is adjustable for each word
  • Syllables within a word are derived from word
    cost via the metrical pattern for words of a
    certain length.
  • One target per tone.
  • Targets are stretched to fit syllable duration.
  • Only one phonological rule 33?23

17
Whats the procedure?
Sequence of tones (phonology)
Data
Compute the pitch curve as a function of
phonological inputs and the cost of a
misinterpretation.
Predicted F0
Costs of mis-interpretations
Nonlinear least-squares fitting algorithm
18
Model fits for Mandarin Chinese
Tone class (input)
Cost-of-misinterpretation (result)
Inside a word, the cost of a misinterpretation is
distributed by the metrical pattern
19
Model fits to Mandarin Chinese
0.61 free parameters per syllable, 13 Hz RMS
error.
20
Results are stable under small changes in the
model.
This model allows extra freedom different tones
are allowed to define their targets differently
Costs for misinterpreting different syllables.
This model allows less freedom all tones have
the same type of target.
The two models have words defined by different
labelers
21
Model parameters
Cantonese
Phrasing is marked in speech.
Cantonese data courtesy of Prof. Tan Lee
Mandarin
22
Metrical patterns inside words (Mandarin)
The metrical pattern controls how the
cost-of-misinterpretation is split up inside a
word. Syllables are marked with ?. The
vertical position is proportional to log(s) for
each syllable, so higher syllables have larger s,
and will be executed more carefully. For
4-syllable words, the error bars are shown by the
pairs of arrows.
Normal segmentation of characters into words.
Random segmentation of characters into words
Note that the metrical pattern disappears,
showing that we are measuring something real that
is tied to words.
23
Another nice property
  • The cost-of-misinterpretation parameter for a
    syllable is correlated with the mutual
    information with the preceeding syllable
  • r -0.175
  • gt95 confidence
  • Pitch patterns are implemented
  • sloppily for syllables that are unsurprising, and
  • precisely for surprising ones.
  • (Mutual informations from a database of 15000
    newspaper sentences. Syllable identity was
    defined by phoneme content and tone.)

24
Conclusion
  • Models with motor planning capture important
    aspects of speech.
  • They allow a very compact representation of
    complex behaviors.
  • Intonation is represented as
  • a small set of discrete symbols, in sequence,
  • modulated by a cost-of-misinterpretation, with
  • The cost-of-misinterpretation parameter seems
    real
  • Similar across languages
  • Matches language structure
  • This model can be applied broadly
  • Two dialects of Chinese
  • Some aspects of English
  • Separating different singing and speaking styles
    from the content
  • See http//kochanski.org/papers .
Write a Comment
User Comments (0)
About PowerShow.com