Title: Position Paper for W3C Workshop on Internationalizing SSML The Usage of PartOfSpeech for Resolving M
1Position Paper for W3C Workshop on
Internationalizing SSMLThe Usage of
Part-Of-Speech for Resolving Multiple
Pronunciations in SSML
- 2005. 11. 3.
- Myoung-Wan Koo and Du-Seong Chang
- KT/KAIT
2Introduction
- Multiple pronunciation problem
- Same word but different pronunciations
- Newton /njut?n/ v.s. /nut?n/
- Same spelling but different pronunciations
(homograph) - refuse /r?'fjuz/ v.s. /'refjus/
lt?xml version"1.0" encoding"UTF-8"?gt ltlexicon
version"1.0" xmlns"http//www.w3.org/2005/01/pro
nunciation-lexicon" alphabet"ipa"
xmllang"en-GB"gt ltlexemegt
ltgraphemegtNewtonlt/graphemegt
ltphonemegtnjut?nlt/phonemegt ltphonemegtnut?nlt/ph
onemegt lt/lexemegt ltlexemegt
ltgraphemegtrefuselt/graphemegt ltphonemegt
r?'fjuz lt/phonemegt ltphonemegt'refjuslt/phoneme
gt lt/lexemegt lt/lexicongt
3Multiple pronunciation in SSMLPLS
- SSML
- The Speech Synthesis Markup Language
Specification Version 1.0 - Pronunciation information in SSML
- Phoneme element
- Lexicon element
- PLS
- Pronunciation Lexicon Specification Version 1.0
- Pronunciation information in PLS
- Phoneme element
- Prefer attribute
- They doesnt fully support the pronunciation
lexicon for multiple pronunciations and
agglutinative language. - ? Part-Of-Speech information is needed
4Pronunciation information in PLS (1/2)
- Pronunciation Lexicon Specification
- Version 1.0/Feb 2005/W3C Voice Browser Working
Group - It allow interoperable specification of
pronunciation information for either ASR and TTS
engines within voice browsing applications. - It is expected to handle multiple pronunciation.
- Example of PLS
lt?xml version"1.0" encoding"UTF-8"?gt ltlexicon
version"1.0" xmlnshttp//www.w3.org/2005/01/pro
nunciation-lexicon alphabet"ipa"
xmllang"en-US"gt ltlexemegt
ltgraphemegttomatolt/graphemegt ltphonemegt
t?mei??oult/phonemegt lt/lexemegt lt/lexicongt
5Pronunciation information in PLS (2/2)
- Prefer attribute of phoneme element
- Give one pronunciation high priority among
pronunciation candidates. - Effective in speech synthesis
- Only in multiple pronunciations for same
orthography - Not in homograph problem
- refuse verb/r?'fjuz/ v.s. noun/'refjus/
- No information for ASR systems.
lt?xml version"1.0" encoding"UTF-8"?gt ltlexicon
version"1.0" xmlns"http//www.w3.org/2005/01/pro
nunciation-lexicon" alphabet"ipa"
xmllang"en-GB"gt ltlexemegt
ltgraphemegtNewtonlt/graphemegt ltphoneme
prefer"true"gtnjut?nlt/phonemegt
ltphonemegtnut?nlt/phonemegt lt/lexemegt lt/lexicongt
6Typical Korean TTS system structure
Structural Information
Morphemes, POS
Phonemes, POS
Phonemes, Prosody
Morphological Analyzer
Grapheme-to- Phoneme
Prosody Analysis
Waveform production
Text
Speech
7POS for resolving multiple pronunciations
- POS information can reduce the overhead of
resolving multiple pronunciations in ASR and TTS
systems. - The word refuse can have two different
pronunciations depending on pos information. - Proposal POS attribute
lt?xml version"1.0" encoding"UTF-8"?gt ltlexicon
version"1.0" xmlns"http//www.w3.org/2005/01/pro
nunciation-lexicon" alphabet"ipa"
xmllang"en-US"gt ltlexemegt
ltgraphemegtrefuselt/graphemegt ltphoneme
posverbgt r?'fjuz lt/phonemegt lt/lexemegt
ltlexemegt ltgraphemegtrefuselt/graphemegt
ltphoneme posnoungt'refjuslt/phonemegt
lt/lexemegt lt/lexicongt
8POS information for LVCSR
- Large vocabulary continuous speech recognition of
agglutinative language - Basic unit is morpheme (pseudo-morpheme) for
reducing the vocabulary size. - Many homographs in the recognition dictionary.
- POS information help system to get a proper
pronunciation in a dictionary as well as to
resolve multiple pronunciations in some words. - It reduce the search time since POS information
could cut the wrong word connection in the first
stage, not in the semantic interpretation stage.
9Proposals
- Proposal 1 POS attribute of phoneme element
- Optional attribute
- Proposal 2 POS element
- Lexeme element contain optional POS elements.
- POS values language-specific
- Type allow vendor-specific POS type?
- Outstanding POS set Penn Treebank, Sejong
project (Korean)
lt?xml version"1.0" encoding"UTF-8"?gt ltlexicon
version"1.0" xmlns"http//www.w3.org/2005/01/pro
nunciation-lexicon" alphabet"ipa"
xmllang"en-US"gt ltlexemegt
ltgraphemegtrefuselt/graphemegt ltphonemegt
r?'fjuz lt/phonemegt ltposgt verb lt/verbgt
lt/lexemegt lt/lexicongt
10Conclusion
- No element or attribute for resolving multiple
pronunciations - In current SSML, PLS
- POS information
- can reduce the overhead of resolving multiple
pronunciations in ASR and TTS systems. - Can reduce the search time in a large vocabulary
recognition system. - Can be effective in agglutinative language.
- Proposals
- POS element
- POS attribute