Title: Meaningful Intonational Variation
1Meaningful Intonational Variation
2Today
- Assigning variation for TTS, CTS
- Contours
- Accent
- Phrasing
- Pitch Range
- Amplitude and timing
3TTS Production Pipeline
- Orthographic input Dr. Smith lives on Elm Dr.
- Text normalization abbreviation expansion
- Pronunciation modeling POS id, WS disambiguation
- Intonation assignment parsing, POS id, robust
semantics - Phonetic/phonological realization phonological
parsing, phonetic analysis - Unit selection acoustic analysis
4Intonation Assignment Phrasing
- Traditional hand-built rules
- Punctuation 234-5682
- Context/function word no breaks after function
word He went to dinner - Parse? She favors the nuts and bolts approach
- Current statistical analysis of large labeled
corpus - Punctuation, pos window, utt length,
5Functions of Phrasing
- Disambiguates syntactic constructions, e.g. PP
attachment - S You should buy the ticket with the discount
coupon. - Disambiguates scope ambiguities, e.g. Negation
- S You arent booked through Rome because of the
fare. - Or modifier scope
- S This fare is restricted to retired politicians
and civil servants.
6Intonation Assignment Accent
- Hand-built rules
- Function/content distinction He went out the back
door/He threw out the trash - Complex nominals
- Main Street/Park Avenue
- city hall parking lot
- Statistical procedures trained on large corpora
- Contrastive stress, given/new distinction?
7Functions of Pitch Accent
- Given/new information
- S Do you need a return ticket.
- U No, thanks, I dont need a return.
- Contrast (narrow focus)
- U No, thanks, I dont need a RETURN. (I need a
time schedule, receipt,) - Disambiguation of discourse markers
- S Now let me get you the train information.
- U Okay (thanks) vs. Okay.(but I really want)
8Intonation Assignment Contours
- Simple rules
- . declarative contour
- ? yes-no-question contour unless wh-word
present at/near front of sentence - Well, how did he do it? And what do you know?
- What else might we do?
9Contours Accent Phrasing
- What do intonational contours mean (Ladd 80,
Bolinger 89)? - Speech acts (statements, questions, requests)
- S Thatll be credit card? (L H- H)
- Propositional attitude (uncertainty, incredulity)
- S Youd like an evening flight. (LH L- H)
- Speaker affect (anger, happiness, love)
- U I said four SEVEN one! (LH L- L)
- Personality
- S Welcome to the Sunshine Travel System.
10- Propositional attitude (uncertainty)
- Did you feed the animals?
- I fed the LH goldfish L-H
- Distinguish direct/indirect speech acts
- Can you open the door?
11The TTS Front End Today
- Corpus-based statistical methods instead of
hand-built rule-sets - Dictionaries instead of rules (but fall-back to
rules) - Modest attempts to infer contrast, given/new
- Text analysis tools pos tagger, morphological
analyzer, little parsing
12TTS Where are we now?
- Natural sounding speech for some utterances
- Where good match between input and database
- Stillhard to vary prosodic features and retain
naturalness - Yes-no questions Do you want to fly first class?
- Context-dependent variation still hard to infer
from text and hard to realize naturally
13- Appropriate contours from text
- Emphasis, de-emphasis to convey focus, given/new
distinction I own a cat. Or, rather, my cat
owns me. - Variation in pitch range, rate, pausal duration
to convey topic structure - Characteristics of emotional speech little
understood, so hard to convey a voice that
sounds friendly, sympathetic, authoritative. - How to mimic real voices?
14TTS vs. CTS
- Decisions in Text-to-Speech (TTS) depend on
syntax, information status, topic structure,
information explicitly available to NLG - Concept-to-Speech (CTS) systems should be able to
specify better prosody the system knows what
it wants to say and can specify how - But.generating prosody for CTS isnt so easy
15To(nes and)B(reak)I(ndices)
- Developed by prosody researchers in four meetings
over 1991-94 - Goals
- devise common labeling scheme for Standard
American English that is robust and reliable - promote collection of large, prosodically
labeled, shareable corpora - ToBI standards also proposed for Japanese,
German, Italian, Spanish, British and Australian
English,....
16- Minimal ToBI transcription
- recording of speech
- f0 contour
- ToBI tiers
- orthographic tier words
- break-index tier degrees of junction (Price et
al 89) - tonal tier pitch accents, phrase accents,
boundary tones (Pierrehumbert 80) - miscellaneous tier disfluencies, non-speech
sounds, etc.
17Sample ToBI Labeling
18- Online training material,available at
- http//www.ling.ohio-state.edu/phonetics/ToBI/
- Evaluation
- Good inter-labeler reliability for expert and
naive labelers 88 agreement on presence/absence
of tonal category, 81 agreement on category
label, 91 agreement on break indices to within 1
level (Silverman et al. 92,Pitrelli et al 94)
19Pitch Accent/Prominence in ToBI
- Which items are made intonationally prominent and
how? - Accent type
- H simple high (declarative)
- L simple low (ynq)
- LH scooped, late rise (uncertainty/
incredulity) - LH early rise to stress (contrastive focus)
- H!H fall onto stress (implied familiarity)
20- Downstepped accents
- !H,
- L!H,
- L!H
- Degree of prominence
- within a phrase HiF0
- across phrases
21Prosodic Phrasing in ToBI
- Levels of phrasing
- intermediate phrase one or more pitch accents
plus a phrase accent (H- or L- ) - intonational phrase 1 or more intermediate
phrases boundary tone (H or L ) - ToBI break-index tier
- 0 no word boundary
- 1 word boundary
- 2 strong juncture with no tonal markings
- 3 intermediate phrase boundary
- 4 intonational phrase boundary
22(No Transcript)
23(No Transcript)
24Contour Examples
- http//www.cs.columbia.edu/julia/cs6998/cards/exa
mples.html
25And Other Things Contribute Pitch Range and
Timing (Rate, Pause)
- Level of speaker engagement
- Hello vs. HELLO
- Contour interpretation
- Rise/fall/rise (LH L-H) Elephantiasis isnt
incurable - Discourse/topic structure paratones
26Corpus-Based Research
- Predicting accent, phrasing, contours from large
ToBI-labeled corpora - Features
- Word position, p.o.s. window, word cooccurence,
punctuation, capitalization, sentence length,
paragraph position, - Results
- 80-85 correct accent prediction
- 92-96 correct phrase boundary prediction
- Contours????
- Reality
27- This is my version of a rather long sentence
which ideally should be broken into several
phrases automatically by a smart system but we
don't know if this will actually happen do we? - Is a yes-no question uttered with falling
intonation? Does that sound delightful?
Mellifluous? - I dont want cereal I want toast.
- .
28Next
- Story analysis and generation (readings will be
available later this week well send mail)