Title: Why a Rising Tone is Falling in Mandarin Sentences
1Why a Rising Tone is Falling in Mandarin Sentences
Word Accents and Tones in Sentence Perspective A
symposium in conjunction with the 60th birthday
of Professor Gösta Bruce
- Chilin Shih
- University of Illinois at Urbana-Champaign
January 10, 2007
Lund, Sweden
2Generated by WordsEye from text description.
Under development at SemanticLight, Inc.
3Outline
- What we know
- Chinese is a lexical tone language.
- Surprise!
- Tones in sentences may deviate considerably from
their lexical specifications. - Research question
- Explain the difference between lexical tones and
the observed sentence production. - Implication
- A simulation model linking phonology to
phonetics.
4Chinese Lexical Tones
Tone shapes differentiate lexical meaning.
Ma1 mother Ma2 hemp Ma3 horse Ma4 to scold
5Chinese Sentences
Ma1-ma0 ma4 ma3. Mother scolds the horse.
Ma3 ma4 ma1-ma0. The horse scolds mother.
6Chinese Intonation Types (Data from JiahongYuan)
Statement
- Li3bai4wu3 Luo2yan4 yao4 mai3 lu4.
- On Friday Luoyan wants to buy a deer.
Question
7Classification of Tone Shapes
Tone 1 High level
Tone 2 Rising
Tone 3 Low falling
Tone 4 High falling
8Cause of Tonal Distortion
- Ease of articulatory effort
- Balancing articulatory effort and communication
need
9Physiological constraints
Communication errors
- When you say what you think
- you are saying
- When you are not saying want you
- think you are saying
10Ease of Articulatory EffortI
11Ease of Articulatory EffortII
12Ease of Articulatory EffortIII
13Production of Rising and Falling Tones
14Severe Tonal DistortionI
15People Talk Nearly As Fast As Possible
16Severe Tonal DistortionII
17Local distortion is predictable from global
optimization
18A Racing Game
19Adjusting the Best Path
20Best Path in Tonal Production
0.5
1.0
1.0
1.0
0.0
21Stem-ML
- The prosodic modeling is based on Stem-ML
- (Soft Template Mark-up Language).
- Stem-ML consists of a set of mathematically
- defined tags with value attributes.
For example Tone prosodic strength
- Allowing user-defined accent shapes, phrase
- curves, and other speaker specific parameters.
Kochanski and Shih (2003), Prosody modeling with
soft templates, Speech Communication V. 39. Shih
(in preparation), Prosody Learning and
Generation, Springer.
22Basic Assumptions
- Pre-planning.
- Balance articulatory effort and communication
needs (Lindblom, Ohala). - A dynamical model for the muscles that control f0
(Hill).
23We further propose
- Speaker shifts weights dynamically
- as they speak.
- This is the prosodic strength,
- which reflects the articulatory effort.
24Linking Phonology and Phonetics
- A model is a sequence of templates (i.e. points
representing tone/accent shapes). The templates
encodes phonological information. - For tone languages, there is one template per
tone. Templates are stretched to fit duration. - Each template has a strength. The strength value
determines phonetic variation.
25Representation
- Surface F0 contours are coded as a set of
- Template strength
T11.0 T3 0.3 T4 1.2 T5 0.8 T21.0 T1 0.5
- Generation Template strength ? F0
- Learning Template, F0 ? Template strength
26Modeling Math (Credit to Greg Kochanski)
Effort
is the muscle tension (frequency) at time t.
Each target encodes some linguistic information,
ri is the error of the ith target, and si is its
importance.
Error
y is the ith pitch target and a bar denotes an
average over a target.
27Representing F0 As Tone Strength
28Simulation of Tonal ProductionI
29Simulation of Tonal ProductionII
30Model Fits to Mandarin Chinese
0.61 free parameters per syllable, 13 Hz RMS
error.
31Works for English
The highest f0 is on a weak, unaccented word.
would
I
like
Uhm
A flight to Seattle
from Albuquerque
32Muscle Dynamics
Interpolation
33Discourse Functions
- Topic initialization
- Discourse structure
- Phrasing
- Emphasis
- New vs. old information
- Other communicative means
34How Do They Fit Together?
35Conclusion
- Speech is a communication system. Speakers
balance articulatory effort and communication
needs. - We need a representation that encodes
- Accent template
- Articulatory effort
- Emotional State
- We present a computational simulation model that
generate surface phonetic variations from this
representation.