Automatic speech recognition of CantoneseEnglish codemixing utterances - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

Automatic speech recognition of CantoneseEnglish codemixing utterances

Description:

of Eurospeech 2005, pp. 1533-1536, Lisbon, 2005 ... Adapting a monosyllabic word with fricative endings to produce a disyllabic, e.g. ... – PowerPoint PPT presentation

Number of Views:41
Avg rating:3.0/5.0
Slides: 22
Provided by: jenwe
Category:

less

Transcript and Presenter's Notes

Title: Automatic speech recognition of CantoneseEnglish codemixing utterances


1
Automatic speech recognition of Cantonese-English
code-mixing utterances
Joyce Y. C. Chan, P. C. Ching, Tan Lee and
Houwei Cao Department of Electronic
Engineering The Chinese University of Hong Kong,
Hong Kong SAR, China
  • Presenter Hsu Ting-Wei

2
Reference
  • 11 Joyce Y. C. Chan, P. C. Ching and Tan Lee,
    Development of a Cantonese-English Code-mixing
    Speech Corpus, in Proc. of Eurospeech 2005, pp.
    1533-1536, Lisbon, 2005
  • 13 Joyce Y. C. Chan, P. C. Ching, Tan Lee and
    Helen M. Meng, Detection of Language Boundary in
    Code-switching Utterances by Bi-phone
    Probabilities, in Proc. of ISCSLP 2004, pp.
    293-296, Hong Kong, 2004
  • 6 Mirjam Wester Syllable Classification using
    Articulatory- Acoustic Features, in Proc. of
    Eurospeech 2003, pp. 233-236, Geneva,
    Switerzerland, 2003
  • 10 W. K. Lo, Tan Lee and P. C. Ching,
    Development of Cantonese spoken language corpora
    for speech applications,in Proc. of ISCSLP 1998,
    pp. 102-107, Singapore, 1998

3
Outline
  • 1. Definition
  • 2. Introduction
  • 3. Acoustic modeling
  • 4. Language modeling
  • 5. Language boundary detection (LBD)
  • 6. Experiment
  • 7. Conclusion

4
1. Definition
  • Code-switching
  • John Gumperz,1982,
  • The juxtaposition within the same speech exchange
    of passages of speech belonging to two different
    grammatical systems or sub-system
  • Code-mixing
  • In Hong Kong, code switching tends to be
    intra-sentential and switching involving
    linguistic units above the clause level is rare,
    hence the preference for the term "code-mixing"
    in many studies
  • Ex
    (Cantonese)

5
2. Introduction
  • Hong Kong is a truly international city and most
    people are Cantonese-English bilinguals.
  • Cantonese is usually the matrix language while
    English is the embedded language that is often
    used to better describe meanings, feelings and
    phenomena in Hong Kong.
  • However, the English words uttered by many local
    people do contain Cantonese accent (??), which
    makes automatic speech recognition difficult.

6
2. Introduction (cont.)
  • 2.0 Phonological structure of Cantonese and
    English
  • Cantonese
  • One of the major Chinese dialects which is a
    Sino-Tibetan language
  • It is monosyllabic in nature and has a general
    syllable structure C1VC2
  • All the Cantonese syllables are of the canonical
    forms V, CV, CVC or VC
  • English
  • English is Indo-European language
  • Phonological structure is much more complicated
    than Cantonese.
  • In English discourse, over 80 of the syllables
    are of the canonical form of Cantonese, and the
    remainings are C, CC, CCV, VCC, CCCV, CCCVCC

7
2. Introduction (cont.)
  • 2.1 Cantonese accent in the embedded English
    words
  • This phenomenon is called borrowing. (1990)
  • For Cantonese speakers, the borrowing words are
    pronounced with the following characteristics
  • Softening or dropping the second consonant in a
    CC sequence, e.g. plan /p l ae n/ is pronounced
    as /p ae n/
  • Softening or dropping the final stop consonant
    e.g. check /ch eh k/ is pronounced as /ch eh/
  • Adapting a monosyllabic word with fricative
    endings to produce a disyllabic, e.g. notes /n
    ow t s/ is pronounced as /n ow t s iy/
  • Retroflex such as /r/ is read as /l/ sound or /w/
    sound, e.g. pressure /p r eh sh er/ is
    pronounced as /p l eh sh er/, and repeat /r
    iy p iy t/ is pronounced as /w iy p iy t/
  • If the phone exists in English only but not in
    Cantonese, they will be pronounced as the similar
    phones in Cantonese, such that /th/ becomes /f/,
    and /eh/ becomes /ae/

8
2. Introduction (cont.)
  • 2.2 Phone change and syllable fusion in Cantonese
  • Hong Kong people do not use romanization systems
    when they learn Chinese or Cantonese. People may
    not know the correct pronunciation of the words,
    and confuse a phoneme with the other.
  • Besides, syllable fusion may occur in fast
    speech. The pronunciation of the second syllable
    of disyllabic words may be ignored or changed.
    For example, the word ?? /zi1 dou3/ may be
    pronounced as /zi1 ou3/, ?? /gam1 jat6/
    becomes /gam1 mat6/ . (Cantonese)
  • Lead to phone insertion or phone deletion

9
2. Introduction (cont.)
  • Scenario
  • 1. Preparing the monolingual and cross-lingual
    acoustic models
  • 2. Preparing the modified pronunciation
    dictionary
  • To handle accents in the code-switch words, the
    phonetic sequence of the English lexicons in the
    pronunciation dictionary is modified
  • 3. Preparing the language models
  • Four different statistic language models are
    proposed in order to solve the problem on the
    lack of code-mixing training text data

10
2. Introduction (cont.)
  • Scenario
  • 4. Code-mixing speech recognizer
  • Bilingual speech recognizer, which is syllable
    based for Cantonese and word based for English.
  • Two-pass system
  • First pass
  • No language models are applied in the first pass.
  • A lattice will be generated by the bilingual
    speech recognizer, and language boundary (LB)
    information will be integrated to the lattice by
    re-scoring the acoustic scores of the hypothesis
    words.
  • Two pass
  • Language model scores will finally be integrated
    to the lattice, and the Generalized Word
    Posterior Probability (GWPP) will be derived.
  • According to the GWPP score, a character-based
    hypothesis will then be obtained by best path
    searching

11
3. Acoustic modeling
  • Three speech corpora are involved in this
    research
  • TIMIT Monolingual English corpus (native
    speakers)
  • CUSENT Monolingual Cantonese corpus (newspaper
    content)
  • CUMIX Cantonese-English code-mixing corpus
    (CE, C, Modified lexicon)

No accents
Cross-lingual
  • All the acoustic models are triphone models
  • The language-dependent models are
    monolingual(??), which includes 39 English
    phones and 56 Cantonese phones.

12
3. Acoustic modeling (cont.)
  • In model set C, similar phones of the two
    languages are clustered, and therefore, the total
    number of phones is reduced to 70.
  • The dictionary contains an average of 2.267
    different pronunciations for each English
    lexicon.

13
4. Language modeling
  • Mixing between standard Chinese and spoken
    Cantonese is another problem, since this will
    involve different sets of lexicons and grammar.
  • Instead of searching for code-mixing text data,
    we searched for spoken Cantonese text.
  • Articles that contain the selected spoken
    Cantonese characters (those do not appear in
    standard Chinese, e.g. ) are
    selected.
  • Among the collected data, 10 of them are
    code-mixing.

14
4. Language modeling (cont.)
  • All the language models are tri-gram, which is
    character based for Cantonese.
  • Monolingual language model (CAN_LM) consider
    all English words as out-of-vocabulary (OOV).
  • Code-mixing language model (CS_LM) all English
    words share the same probability.
  • Class-based language model (CLASS_LN) classify
    the English words into 13 classes according to
    their part-of-speech (POS) and meaning. The
    classes are adjective, companies, date and time,
    event and activities, fashion,food, brand name,
    objects and tools, human name, place, sentence
    and phrase, shops and restaurants, software, verb
    and the remaining nouns. Most of the classes are
    nouns since they are in major among code-switch
    words.
  • Translation-based language model (TRANS_LN)
    translate the English words into their Cantonese
    equivalent if available otherwise, use the
    classes in CLASS_LM. The language model is still
    character-based, even if the corresponding
    Cantonese contains multiple characters.

15
5. Language boundary detection (LBD) (cont.)
  • General equation for intra-syllable bi-phone
    probability is given by

16
5. Language boundary detection (LBD) (cont.)
  • The same character may have different phone
    sequences when it has different meanings.
  • For example, the character ? can be pronounced as
    /haang/, /hong/ and /hang/ in different phrases.
    The following example is to calculate the
    probability that?is pronounced as /haang/.

17
5. Language boundary detection (LBD) (cont.)
  • Ex

g-am n-in j-au B OW N AH S g-e
Phone based gt
g_am n_in j_au B_OW OW_N N_AH AH_S g_e
Intra bi-phonegt
CAN
ENG
CAN
Probability gt
ENG
ENG
CAN
CAN
CAN
ENG(2)
CAN(3)
CAN(1)
ENG(1)
CAN(1)
18
6. Experiment
19
6. Experiment (cont.)
  • However, when there are accents, the syllable
    structure of the code-switch words changes.
    Therefore, the English words would sound like
    Cantonese words.
  • To tackle(??) problems due to accents, larger
    units should be considered.
  • Hence, we propose to use a syllable-based LBD, or
    apply LBD algorithms to the lattice generated by
    a bilingual speech recognizer.
  • LBD approach based on lattice searches the
    English word with the longest (WE) duration from
    the word lattice.

20
6. Experiment (cont.)
21
7. Conclusion
  • The duration of English words is longer than that
    of Cantonese characters, since Cantonese is
    monosyllabic. Hence, the lattice-based LBD
    algorithm obtains a higher LBD accuracy.
  • When the correct language boundary is obtained,
    the accuracy of the code-switch words can be
    increased.
  • Therefore, studies on language boundary detection
    are necessary for further research.
Write a Comment
User Comments (0)
About PowerShow.com