Speech to Speech Machine Translation (S2SMT) - PowerPoint PPT Presentation

1 / 44
About This Presentation
Title:

Speech to Speech Machine Translation (S2SMT)

Description:

Speech to Speech Machine Translation (S2SMT) Kapita Selekta, 26 November 2005 Suyanto Overview Motivation S2SMT System Applications Conclusion Discussion VerbMobil ... – PowerPoint PPT presentation

Number of Views:514
Avg rating:3.0/5.0
Slides: 45
Provided by: suya4
Category:

less

Transcript and Presenter's Notes

Title: Speech to Speech Machine Translation (S2SMT)


1
Speech to Speech Machine Translation (S2SMT)
  • Kapita Selekta, 26 November 2005
  • Suyanto

2
Overview
  • Motivation
  • S2SMT System
  • Applications
  • Conclusion
  • Discussion

3
VerbMobil
4
Motivation
  • 6500 living languages www.ling.gu.se
  • Translation Market Donald Barabé 2003
  • 8 Billion Global Market
  • Doubling every five years

5
Motivation (cont.)
6
Motivation (cont.)
Customer Service Department in a China company
with 10,000 employees (Chinese and English)
Staffers Words/Day Translation Time (hrs) Percentage
Customer Feedback 235,000 350 39
Prospective Orders 150,000 228 25
Technical Support 156,000 235 26
Dealer Feedback 60,000 93 10
Total 601,000 906 100
7
Questions
  • Is it possible to develop S2SMT for the problems?
  • What are the challanges?

8
S2SMT
Source Language Utterance
Source Language Text
Target Language Text
Target Language Utterance
Kurt Godden 2002
9
ASR (Automatic Speech Recognition)
10
ASR Challenges
  • Co-articulation
  • Speaker independence
  • dialect variations
  • non-native speakers
  • Spontaneous speech
  • Disfluencies
  • Out-of-vocabulary words
  • Noise robustness
  • Convolutive recording/transmission conditions
  • Additive recording environment, transmission SNR
  • Intra-speaker variability stress, age, humor
  • Prosody
  • Intonation, stress, and phrase boundaries
  • Emotion

11
ASR approaches
  • Word-based ASR (1970s)
  • Recognize a word as a whole pattern
  • For special purposes isolated digits, connected
    words
  • How many words you need to develop an
    application?
  • Syllable-based ASR
  • Recognize a word as a set of syllable patterns
  • In English, there are about 10,000 syllables.
  • Phoneme-based ASR (widely used today)
  • Recognize a word as a set of phoneme patterns
  • Considerable for general purposes
  • In English, there are 50 phonemes.
  • In Indonesian, only 32 phonemes.

12
Phoneme-based ASR
  • Today, it is the most realistic approach.
  • In English, we need only 50 phonemes to be
    recognized.
  • To develop speech corpus, we should develop a
    sentence set with tri-phone (sil-asil, c-asil,
    c-an) balance.
  • For 50 phonemes, we need about 125,000
    tri-phones.
  • For 10,000 syllable, we need 1012 tri-syllables
    (complicated!)

13
Todays ASR
Phoneme-based approach using statistical models
(HMM or hybrid HMM/ANN) for acoustics and
linguistics Large vocabulary, speaker
independent T. Dutoit 2002
14
Language Model
n-gram models (trigram is widely used for
ASR) Probability of a sentence is estimated from
the conditional probabilities of each word given
the n-1 preceding words T. Dutoit 2002
P(The red hat linux) P(The_,_) P(redThe,_)
P(hatred,The) P(linuxhat,red)
Solve coarticulation/dialect hat, had, head,
heat, hate ...
15
Language Model (cont.)
  • An example in Bahasa Indonesia.
  • Satu kantornya
  • Satukan TOR-nya
  • If the sentences preceded by tolong
  • Tolong satu kantornya
  • Tolong satukan TOR-nya
  • 3-gram is better than 2-gram

16
IBM trigram example
17
IBM trigram example (cont.)
18
Language Model (cont.)
  • Advantages
  • Robust and efficient
  • Increase accuracy from 85 to 97 T. Dutoit
  • Problems
  • Limited only the local linguistic structure
  • a vocabulary of size V will have Vn n-grams
  • e.g. 20,000 words will have 8 trillion trigrams!

19
ASR Performance
T. Dutoit 2002 - Faculty Polytechnique de Mons
Belgium
20
Today ASR
  • Large Vocabulary Continues Speech Recognition
    (LVCSR)
  • Minimum Vocabulary 10,000 words
  • Continues Speech
  • Speaker Independent

21
Machine Translation (MT)
22
MT Challenges
  • Orthographic Variations
  • Ambiguous spelling
  • Ambiguous word boundaries
  • Lexical Ambiguity
  • Eat ? essen (human) vs fressen (animal)
  • ? he-wrote vs. it-was-written vs. books

23
MT Challenges
  • Morphological Variations
  • Affixation vs. RootPattern
  • write ? written
  • kill ? killed
  • do ? done
  • Translation Divergences

24
MT Approaches
  • Grammar-based
  • - Interlingua-based
  • - Transfer-based
  • Direct
  • - Example-based
  • - Statistical

MT Pyramid
Nizar Habash 2004 - Columbia University
25
Multi-Engine Machine Translation
  • Idea take output from different translation
    engines and get an overall better translation
  • Get the best from different worlds
  • High quality but low coverage from translation
    memory, interlingua system
  • High coverage but lower quality from statistical
    system
  • How to get a better translation ?
  • Select one translation, i.e. work on sentence
    level
  • Create a new one, i.e. using partial translations
    from different engines and create a new one

26
Text-to-Speech (TTS)
27
TTS Challenges
  • Accurate automatic phonetization (?dictionary
    look-up)
  • Prosody generation (i.e., intonation and phoneme
    durations) must be coherent easy to produce
    unnatural prosody
  • Synthesize phoneme sequences with corresponding
    prosody
  • Co-articulation!
  • Segmental quality should be maintained after
    pitch and duration modification
  • Engineering
  • Low design and maintenance cost
  • Low computational and Memory cost
  • Easy adaptation to other languages

28
TTS Diagram
T. Dutoit 2002 - Faculty Polytechnique de Mons
Belgium
29
Automatic Phonetization
30
Automatic Phonetization
More complex than that !
31
Intonation
  • Why ups and downs?
  • Stress (word level) ? Accent (phrase level)
  • Modify slightly ? unnatural

32
Phoneme Duration
  • Not constant
  • Not fixed for a given phoneme
  • Linked to intonation (longer on accented
    syllables)

33
Applications of S2SMT
Joy (Ying Zhang) 2003 CMU
34
VerbMobil
35
ATT
36
S2SMT Advantages
  • Data transmission
  • Voice format to text format
  • GSM 8 kbps to 40 bps (reduce 200 times!)

Can you put S2SMT to the client ?
37
S2SMT Disadvantages
  • The original voice of the speaker?
  • Not natural intonation, emotion
  • Delay

38
Conclusion
  • At the moment, it is possible to develop S2SMT
    for small special purposes, e.g. reservation,
    helpdesk, etc.
  • The main problem is ASR
  • MT and TTS are considerably acceptable
  • Many remaining challenges in S2SMT

39
Discussion
  • Coupling of ASR and MT

Takezawa et al. 98
40
Discussion
  • Personalized TTS

Takezawa et al. 98
41
Discussion
  • How about Bahasa Indonesia?
  • Population 240 million people
  • PT TELKOM has 30 million (12.5) customers (fixed
    and wireless)
  • Other operators has millions customers (say 20
    million)
  • Prospective market for S2SMT ???

42
Discussion
  • TELKOM RisTI, ATR (www.atr.jp), and STTTelkom are
    developing Indonesian text and speech corpus
  • Text corpus
  • 5,000 sentences
  • Extracted from news and application domain
  • Speech corpus
  • 400 people (200 male and 200 female)
  • 4 dialects Javanese, Sundanese, Jakarta, Batak
  • 4 age categories 18-23, 24-35, 36-50, 51-60
  • We need 100 students to be uttered!!!

43
Discussion
  • Target Applications (not translation)
  • E-governance (status tracking IMB, PBB, KTP,
    etc.)
  • Billing info
  • Audio conference (Reservation)
  • Tele Home-Security
  • Dumb and Deaf Telecommunication System
  • To develop S2SMT, we need experts in linguistic,
    computer science, electronics engineering,
    communications, etc.

44
Thank you
  • Any question?
Write a Comment
User Comments (0)
About PowerShow.com