Position Paper for W3C Workshop on Internationalizing SSML The Usage of PartOfSpeech for Resolving M - PowerPoint PPT Presentation

1 / 10
About This Presentation
Title:

Position Paper for W3C Workshop on Internationalizing SSML The Usage of PartOfSpeech for Resolving M

Description:

Multiple pronunciation problem. Same word but different pronunciations ... It is expected to handle multiple pronunciation. Example of PLS ... – PowerPoint PPT presentation

Number of Views:43
Avg rating:3.0/5.0
Slides: 11
Provided by: w3
Category:

less

Transcript and Presenter's Notes

Title: Position Paper for W3C Workshop on Internationalizing SSML The Usage of PartOfSpeech for Resolving M


1
Position Paper for W3C Workshop on
Internationalizing SSMLThe Usage of
Part-Of-Speech for Resolving Multiple
Pronunciations in SSML
  • 2005. 11. 3.
  • Myoung-Wan Koo and Du-Seong Chang
  • KT/KAIT

2
Introduction
  • Multiple pronunciation problem
  • Same word but different pronunciations
  • Newton /njut?n/ v.s. /nut?n/
  • Same spelling but different pronunciations
    (homograph)
  • refuse /r?'fjuz/ v.s. /'refjus/

lt?xml version"1.0" encoding"UTF-8"?gt ltlexicon
version"1.0" xmlns"http//www.w3.org/2005/01/pro
nunciation-lexicon" alphabet"ipa"
xmllang"en-GB"gt ltlexemegt
ltgraphemegtNewtonlt/graphemegt
ltphonemegtnjut?nlt/phonemegt ltphonemegtnut?nlt/ph
onemegt lt/lexemegt ltlexemegt
ltgraphemegtrefuselt/graphemegt ltphonemegt
r?'fjuz lt/phonemegt ltphonemegt'refjuslt/phoneme
gt lt/lexemegt lt/lexicongt
3
Multiple pronunciation in SSMLPLS
  • SSML
  • The Speech Synthesis Markup Language
    Specification Version 1.0
  • Pronunciation information in SSML
  • Phoneme element
  • Lexicon element
  • PLS
  • Pronunciation Lexicon Specification Version 1.0
  • Pronunciation information in PLS
  • Phoneme element
  • Prefer attribute
  • They doesnt fully support the pronunciation
    lexicon for multiple pronunciations and
    agglutinative language.
  • ? Part-Of-Speech information is needed

4
Pronunciation information in PLS (1/2)
  • Pronunciation Lexicon Specification
  • Version 1.0/Feb 2005/W3C Voice Browser Working
    Group
  • It allow interoperable specification of
    pronunciation information for either ASR and TTS
    engines within voice browsing applications.
  • It is expected to handle multiple pronunciation.
  • Example of PLS

lt?xml version"1.0" encoding"UTF-8"?gt ltlexicon
version"1.0" xmlnshttp//www.w3.org/2005/01/pro
nunciation-lexicon alphabet"ipa"
xmllang"en-US"gt ltlexemegt
ltgraphemegttomatolt/graphemegt ltphonemegt
t?mei??oult/phonemegt lt/lexemegt lt/lexicongt
5
Pronunciation information in PLS (2/2)
  • Prefer attribute of phoneme element
  • Give one pronunciation high priority among
    pronunciation candidates.
  • Effective in speech synthesis
  • Only in multiple pronunciations for same
    orthography
  • Not in homograph problem
  • refuse verb/r?'fjuz/ v.s. noun/'refjus/
  • No information for ASR systems.

lt?xml version"1.0" encoding"UTF-8"?gt ltlexicon
version"1.0" xmlns"http//www.w3.org/2005/01/pro
nunciation-lexicon" alphabet"ipa"
xmllang"en-GB"gt ltlexemegt
ltgraphemegtNewtonlt/graphemegt ltphoneme
prefer"true"gtnjut?nlt/phonemegt
ltphonemegtnut?nlt/phonemegt lt/lexemegt lt/lexicongt
6
Typical Korean TTS system structure
Structural Information
Morphemes, POS
Phonemes, POS
Phonemes, Prosody
Morphological Analyzer
Grapheme-to- Phoneme
Prosody Analysis
Waveform production
Text
Speech
7
POS for resolving multiple pronunciations
  • POS information can reduce the overhead of
    resolving multiple pronunciations in ASR and TTS
    systems.
  • The word refuse can have two different
    pronunciations depending on pos information.
  • Proposal POS attribute

lt?xml version"1.0" encoding"UTF-8"?gt ltlexicon
version"1.0" xmlns"http//www.w3.org/2005/01/pro
nunciation-lexicon" alphabet"ipa"
xmllang"en-US"gt ltlexemegt
ltgraphemegtrefuselt/graphemegt ltphoneme
posverbgt r?'fjuz lt/phonemegt lt/lexemegt
ltlexemegt ltgraphemegtrefuselt/graphemegt
ltphoneme posnoungt'refjuslt/phonemegt
lt/lexemegt lt/lexicongt
8
POS information for LVCSR
  • Large vocabulary continuous speech recognition of
    agglutinative language
  • Basic unit is morpheme (pseudo-morpheme) for
    reducing the vocabulary size.
  • Many homographs in the recognition dictionary.
  • POS information help system to get a proper
    pronunciation in a dictionary as well as to
    resolve multiple pronunciations in some words.
  • It reduce the search time since POS information
    could cut the wrong word connection in the first
    stage, not in the semantic interpretation stage.

9
Proposals
  • Proposal 1 POS attribute of phoneme element
  • Optional attribute
  • Proposal 2 POS element
  • Lexeme element contain optional POS elements.
  • POS values language-specific
  • Type allow vendor-specific POS type?
  • Outstanding POS set Penn Treebank, Sejong
    project (Korean)

lt?xml version"1.0" encoding"UTF-8"?gt ltlexicon
version"1.0" xmlns"http//www.w3.org/2005/01/pro
nunciation-lexicon" alphabet"ipa"
xmllang"en-US"gt ltlexemegt
ltgraphemegtrefuselt/graphemegt ltphonemegt
r?'fjuz lt/phonemegt ltposgt verb lt/verbgt
lt/lexemegt lt/lexicongt
10
Conclusion
  • No element or attribute for resolving multiple
    pronunciations
  • In current SSML, PLS
  • POS information
  • can reduce the overhead of resolving multiple
    pronunciations in ASR and TTS systems.
  • Can reduce the search time in a large vocabulary
    recognition system.
  • Can be effective in agglutinative language.
  • Proposals
  • POS element
  • POS attribute
Write a Comment
User Comments (0)
About PowerShow.com