Title: Speech Synthesis Markup Language Aim at Extension
1Speech Synthesis Markup Language -----Aim at
Extension
National Laboratory of Pattern Recognition
(NLPR) Institute of Automation, Chinese Academic
of Sciences
2Brief Introduction to Evolution of SSML
- The original SSML (not W3C SSML)
- STML
- JSML
- SABLE
- W3C SSML
National Laboratory of Pattern Recognition
(NLPR) Institute of Automation, Chinese Academic
of Sciences
3The original SSML
- Mark phrase boundaries
- Emphasis words
- Specify pronunciations
- Include other sound files
National Laboratory of Pattern Recognition
(NLPR) Institute of Automation, Chinese Academic
of Sciences
4STML
- Developed by Edinburgh and Bell Labs
- Based on the original SSML
- Aimed at giving the same basic impressions to
listeners, not sounding identical on different
systems
National Laboratory of Pattern Recognition
(NLPR) Institute of Automation, Chinese Academic
of Sciences
5JSML
- Developed by Sun
- XML based
- Include
- Elements to mark the paragraphs and sentences
- Elements to control the pronunciations
- Elements to represent markers
National Laboratory of Pattern Recognition
(NLPR) Institute of Automation, Chinese Academic
of Sciences
6SABLE
- Developed by Edinburgh and Bell Labs
- Based on STML and JSML
- The stated aims
- Synthesizer control
- Text structure
- Speech pronunciation
- Multilinguality
- Easy of Use
- Portable
- Extensibility
National Laboratory of Pattern Recognition
(NLPR) Institute of Automation, Chinese Academic
of Sciences
7W3C SSML
- Key design criteria
- Consistency
- Interoperability
- Generality
- Internationalization
- Generation and Readability
- Implementable
National Laboratory of Pattern Recognition
(NLPR) Institute of Automation, Chinese Academic
of Sciences
8What we want from markup language
- Controlling
- Sharing
- Extended to multimedia
National Laboratory of Pattern Recognition
(NLPR) Institute of Automation, Chinese Academic
of Sciences
9Which level we should focus
- Text analysis module
- Prosody module
- Acoustic module
10Sharing
Text-analysis
acoustic
Prosody-analysis
Sys1
SSML
SSML
Sys2
National Laboratory of Pattern Recognition
(NLPR) Institute of Automation, Chinese Academic
of Sciences
11Text level for Mandarin
- Word boundary
- Pronunciation with tone
- POS
- Dialect?
12Prosody level for Mandarin
13Extensions to expressive synthesis
National Laboratory of Pattern Recognition
(NLPR) Institute of Automation, Chinese Academic
of Sciences
14Current elements related to prosody and style in
SSML
- 3.2.1 "voice" Element
- 3.2.2 "emphasis" Element
- 3.2.3 "break" Element
- 3.2.4 "prosody" Element
15Emotion and Style
- Emotion
- Anger, happy, surprise, sad, fear,
- Depend on speakers psychological and physical
states - Local effects on prosody
- Style
- News, comments,
- Depend on semantics of sentences
- Global effects on prosody
16Personalized Voice
- Elementvoice
- gender
- age
- name
- variant
- sample
- ??ltvoice gendermalegt?????lt/voicegt
- ???ltvoice genderfemalegt??????lt/voicegt
17Extension?
- To make it more expressive
- Background music
- VTTS
- Combined with talking head and some other media
information -
- We only can see the element mark
National Laboratory of Pattern Recognition
(NLPR) Institute of Automation, Chinese Academic
of Sciences
18Thanks!
19- Element ltStructuregt
- Level 0-.. paragraph, phrase,
- POS
- ltStructurelevelparagraphgt
- ltStructurelevelsentencegt
- ltStructurelevelphrasegt
- ltStructurelevelwordgt