Title: Towards Synthesis of Focus in Mandarin Texttospeech System
1Towards Synthesis of Focus in Mandarin
Text-to-speech System
Dr. Dezhi HUANGdezhi.huang_at_francetelecom.com.cn
SNLP Unit, FTRD Beijing
2005/11/2 V1.1
2Table of Contents
3Human has the strong ability of information
reconstruct
- Evidence from music perception
The Butterfly Lovers violin concerto
4Human has the strong ability of information
reconstruct (Cont.)
- Evidence from human vision
5Application model of Mandarin Text-to-speech
(Cont.)
Information query by the side of road
Mandarin Voice-enabled Service Gateway
PSTN/ Wireless
Mandarin TTS Engine
Angry
Environment Noise
6Why we fail?
- The important content is not prominent as we
expect - Weaken the background noise (Noise reduction)
- Improve the prominence of information that we
need - Utilizing the human ability of information
reconstruct
7What do we need in speech communication?
- The key information is always contained in a
phrase/word in a sentence - Have you always seen Prof. Zhao?
- No, I saw him only once.
- The container of key information is called the
focus. - The semantic centre of a sentence
8The value of synthesis of focus
- It is helpful for
- Analyzing the syntactic of sentence
- Understanding the meaning of utterance
- Capturing the turn-taking
- Comprehending the attempt and emotion of speaker
- Improve the acceptance of TTS
9Key challenges in synthesis of focus
- Difficult to locate a focus in a sentence
- Some focuses can be found from the syntactic
structure - ????????????????????
- The other focuses are decided by the context of a
sentence - ????????
- ????????
- ????????
- Lack of appropriate acoustic model to realize a
focus - Pitch accent
- Duration
- Energy
- Pause
- Weakness
Markup Language for Focus
10Table of Contents
Make the synthesized speech clear Improve the
validity of speech communication with TTS
11What is SSML?
- It is designed to provide a rich, XML-based
markup language for assisting the generation of
synthetic speech in Web and other applications
SSML
Natural Language Processing and Understanding
Speech Synthesis
12ltEMPHASISgt in SSML
- The emphasis element requests that the contained
text be spoken with emphasis (also referred to as
prominence or stress) - Level strong, moderate and none
- For synthesizer, it is easy to know which word
has sentence stress - ??????
- ??????
13The proposed ltfocusgt element
- The focus element indicates that the contained
text be the semantic centre and the carrier of
important information of a sentence - In the perspective of pragmatics
- Contrastive focus (also referred to as
identificational focus) - Informational focus (also referred to as the
presentational focus, natural focus)
14Samples of focus
- (1) ?????????
- ???????
- (2)?????????
- ????????
- (3)?????????
- (4)????????
- (5)????????
- (6)????????
- (7)???????
15A focus in Mandarin is not one-to-one
corresponding with an emphasis
- Most of focuses are realized by stresses
- ???????
- ????????????????30??
- Some of them are realized by pause or intonation
- ????????????????
- ???????
16Differences between focus and emphasis
- Focus is the concept of semantics and pragmatics
- We can mark the focus up without speech signal
- ????????????????,?????????????????????????????????
????????????,?????????????????????????????? - Emphasis is the concept of psychoacoustics
- The consistency of emphasis label is relatively
difficult to achieve without speech signal
17Differences between focus and emphasis (Cont.)
- Focus always carries the purpose of utterance
- We can know exactly what the sentence means
- Emphasis is not directly linked to the purpose of
utterance - The emphasized word may be trivial
- ????,????????????,?????????????
- ????????
18What can we benefit from focus labeling?
- Improve the intelligibility of synthesized
speech, especially in communication environment
with noise
Q???????????????? A???9?????CZ8071???????? Q???
? A?9?? Q???? A?CZ8071?
19What can we benefit from focus labeling? (Cont.)
- focus labeling can be directly applied to text
information processing - The next generation of search engine should need
to know - which is the topic of a paragraph
- which are the focuses of a sentence
- Text highlight is important step for information
retrieval - Keywords in automatic digest are always the
focuses
20Table of Contents
ltfocusgt indicates what is semantic centre ltfocusgt
solves the problem of focus location
21Attributes of ltfocusgt
- Type
- informational
- contrastive
- Method
- StrongStress
- ModerateStress
- None
- Pause
- Intonation
22Samples of ltfocusgt
- (1) ????ltfocus typeinformational
methodStrongStress gt???lt/focusgt?? - ????ltfocus typeinformational
methodPausegt??lt/focusgt? - (2)?????????
- ????ltfocus typeinformational
methodModerateStress gt???lt/focusgt? - (3)?ltfocus typecontrastive methodStrongStress
gt??lt/focusgt??????
23Samples of ltfocusgt (Cont.)
- (4)??ltfocus typecontrastive methodStrongStres
s gt?lt/focusgt????? - (5)???ltfocus typeinformational
methodPausegt????lt/focusgt? - (6)???ltfocus typeinformational
methodModerateStress gt????lt/focusgt? - (7)??ltfocus typeinformational
methodIntonation gt???lt/focusgt??
24Thank you!