The Use of Speech in SpeechtoSpeech Translation - PowerPoint PPT Presentation

1 / 15
About This Presentation
Title:

The Use of Speech in SpeechtoSpeech Translation

Description:

On the Use of Prosody in a Speech-to-Speech Translator. Strom et al. 1997 ... INTARC - German-English Translator produced for VERBMOBIL project. ... – PowerPoint PPT presentation

Number of Views:58
Avg rating:3.0/5.0
Slides: 16
Provided by: joem4
Category:

less

Transcript and Presenter's Notes

Title: The Use of Speech in SpeechtoSpeech Translation


1
The Use of Speech in Speech-to-Speech Translation
  • Andrew Rosenberg
  • 8/31/06
  • Weekly Speech Lab Talk

2
Candidacy Exam Organization
Use and Meaning of Intonation
Automatic Analysis of Intonation
Applications
Speech-to-Speech Translation
L2 Learning Systems
3
The Use of Speech in Speech-to-Speech Translation
  • The Use of Prosodic Event Information
  • On the Use of Prosody in a Speech-to-Speech
    TranslatorStrom et al. 1997
  • A Japanese-to-English Speech Translation System
    ATR-MATRIXTakezawa et al. 1998
  • Cascaded / Loose Coupled Approaches
  • Janus-III Speech-to-Speech Translation in
    Multiple Languages Lavie et al. 1997
  • A Unified Approach in Speech Translation
    Integrating Features of Speech Recognition and
    Machine TranslationZhang et al. 2004
  • Integrated Approaches
  • Finite State Speech-to-Speech TranslationVidal
    1997
  • On the Integration of Speech Recognition and
    Statistical Machine TranslationMatusov 2005
  • Coupling vs. Unifying Modeling Techniques for
    Speech-to-Speech TranslationGao 2003

4
The Use of Speech in Speech-to-Speech Translation
  • The Use of Prosodic Event Information
  • On the Use of Prosody in a Speech-to-Speech
    TranslatorStrom et al. 1997
  • A Japanese-to-English Speech Translation System
    ATR-MATRIXTakezawa et al. 1998
  • Cascaded / Loosely Coupled Approaches
  • Janus-III Speech-to-Speech Translation in
    Multiple Languages Lavie et al. 1997
  • A Unified Approach in Speech Translation
    Integrating Features of Speech Recognition and
    Machine TranslationZhang et al. 2004
  • Integrated / Tightly Coupled Approaches
  • Finite State Speech-to-Speech TranslationVidal
    1997
  • On the Integration of Speech Recognition and
    Statistical Machine TranslationMatusov 2005
  • Coupling vs. Unifying Modeling Techniques for
    Speech-to-Speech TranslationGao 2003

5
On the Use of Prosody in a Speech-to-Speech
TranslatorStrom et al. 1997
  • INTARC - German-English Translator produced for
    VERBMOBIL project.
  • Spontaneous, limited domain (appointment
    scheduling)
  • 80 minutes of prosodically labeled speech
  • Phrase Boundary (PB) Detector
  • Gaussian classifier based on F0, energy and time
    features with a 4 syl. window (acc. 80.76)
  • Focus Detector
  • Rule based approach Identifies location of
    steepest F0 decline (acc. 78.5)
  • Syntactic parsing search space is reduced by 65
  • Baseline syntactic parsing uses
  • Decoder factor product of acoustic and bi-gram
    scores
  • Grammar factor grammar model probability of a
    parse using the hypothesized word
  • Prosody factor 4-gram model of prosodic events
    (focus and PB)
  • Semantic parsing search space is reduced by 24.7
  • The semantic grammar was augmented, labeling
    rules as segment-connecting(SC) and
    segment-internal (SI)
  • SC rules are applied when there is a PB between
    segments, SI are applied when there are not.
  • Ideal phrase boundaries reduced the number of
    hypotheses by 65.4 (analysis trees by 41.9)
  • Automatically hypothesized PBs required a backoff
    mechanism to handle errors and PBs that are not
    aligned with grammatical phrase boundaries.
  • Prosodically driven translation is used when deep
    transfer (translation) fails
  • A focused word determines (probabilistically) a
    dialog act which is translated based on available
    information from the word chain.

6
A Japanese-to-English Speech Translation System
ATR-MATRIXTakezawa et al. 1998
  • Limited domain translation system (Hotel
    Reservations)
  • Cascaded approach
  • ASR sequential model 2k word vocabulary
  • MT syntactically driven 12k word vocabulary
  • TTS CHATR (now unit selection, then
    concatenative)
  • Early Example of Interactive Speech-to-Speech
    Translation
  • When the system has low confidence in either
    recognition or MT outputs, it prompts the user
    for corrections.
  • Speech Information is used in three ways in
    ATR-MATRIX
  • Voice Selection
  • Based on the source voice, either a male or
    female voice is used for synthesis
  • Hypothesized phrase boundaries
  • Using pause information along with POS N-gram
    information the source utterance is divided into
    meaningful chunks for translation.
  • Phrase Final Behavior
  • If phrase final rise is detected, it is passed to
    the MT module as a lexical item potentially
    indicating a question.

7
The Use of Speech in Speech-to-Speech Translation
  • The Use of Prosodic Event Information
  • On the Use of Prosody in a Speech-to-Speech
    TranslatorStrom et al. 1997
  • A Japanese-to-English Speech Translation System
    ATR-MATRIXTakezawa et al. 1998
  • Cascaded / Loosely Coupled Approaches
  • Janus-III Speech-to-Speech Translation in
    Multiple Languages Lavie et al. 1997
  • A Unified Approach in Speech Translation
    Integrating Features of Speech Recognition and
    Machine TranslationZhang et al. 2004
  • Integrated / Tightly Coupled Approaches
  • Finite State Speech-to-Speech TranslationVidal
    1997
  • On the Integration of Speech Recognition and
    Statistical Machine TranslationMatusov 2005
  • Coupling vs. Unifying Modeling Techniques for
    Speech-to-Speech TranslationGao 2003

8
Janus-III Speech-to-Speech Translation in
Multiple LanguagesLavie et al. 1997
  • Interlingua and Frame-Slot based Spanish-English
    translation
  • limited domain (conference registration)
    spontaneous speech
  • Cascaded Approach
  • Two semantic parse techniques
  • GLR Interlingua parsing (transcript 82.9 ASR
    54)
  • Manually constructed grammar to parse input into
    interlingua
  • robust, doesnt not require grammatically
    correct input
  • Search for the maximal subset covered by the
    grammar
  • Generation is performed by an interlingua
    generator
  • Phoenix (transcript 76.3 ASR 48.6)
  • identifies key concepts and their structure
  • parsing grammar contains specific patterns which
    represent domain concepts
  • The patterns are then compiled into a recursive
    transition network
  • Each concept has one or more fixed phrasings in
    the target language
  • Phoenix is used as a backoff when GLR fails.
  • Transcript 83.3 ASR 63.6
  • Late stage disambiguation
  • Multiple translations are processed through the
    whole system.
  • Translation hypothesis selection occurs just
    before generation using scores from recognition,
    parsing and discourse processing.

9
A Unified Approach in Speech Translation
Integrating Features of Speech Recognition and
Machine TranslationZhang et al. 2004
  • Process many hypotheses, then select one.
  • In a cascaded architecture
  • HMM-based ASR produces N-best recognition
    hypotheses
  • IBM Model 4 MT processes all N.
  • Rescore MT hypotheses based on weighted
    log-linear combination of ASR and MT features.
  • Construct the feature weight model by optimizing
    a translation distance metric (mWER, mPER, BLEU,
    NIST)
  • Experiment Results
  • Corpus 162k/510/508 Japanese-English parallel
    sentences
  • Baseline no optimization of MT features
  • Substantial improvement was obtained by
    optimizing feature weights based on distance
    metric
  • Additional improvement was achieved by including
    ASR features
  • Translation of N-best ASR hypotheses improved
    sentence translation accuracy of incorrectly
    recognized 1-best hypotheses by 7.5

10
The Use of Speech in Speech-to-Speech Translation
  • The Use of Prosodic Event Information
  • On the Use of Prosody in a Speech-to-Speech
    TranslatorStrom et al. 1997
  • A Japanese-to-English Speech Translation System
    ATR-MATRIXTakezawa et al. 1998
  • Cascaded / Loosely Coupled Approaches
  • Janus-III Speech-to-Speech Translation in
    Multiple Languages Lavie et al. 1997
  • A Unified Approach in Speech Translation
    Integrating Features of Speech Recognition and
    Machine TranslationZhang et al. 2004
  • Integrated / Tightly Coupled Approaches
  • Finite State Speech-to-Speech TranslationVidal
    1997
  • On the Integration of Speech Recognition and
    Statistical Machine TranslationMatusov 2005
  • Coupling vs. Unifying Modeling Techniques for
    Speech-to-Speech TranslationGao 2003

11
Finite-State Speech-to-Speech TranslationVidal
1997
  • FSTs can naturally be applied to translation.
  • FSTs for statistical MT can be learned from
    parallel corpora. (OSTIA)
  • Speech input is handled in two ways
  • Baseline cascaded approach
  • Integrated approach
  • Create an FST on text, replace each edge with an
    acoustic model of the lexical item
  • A major drawback of using this approach is large
    training data requirement.
  • Align the source and target utterances, reducing
    their asynchronicity
  • Cluster lexical items, reducing the vocabulary
    size
  • Proof of concept experiment
  • Text 30 lexical items used in 16k paired
    sentences (Spanish- English)
  • Greater than 99 translation accuracy is achieved
  • Speech 50k/400 (training/testing) paired
    utterances, spoken by 4 speakers
  • Best performance 97.2 translation acc. 97.4
    recognition accuracy
  • Requires inclusion of source and target 4-gram
    LMs in FST training.
  • Travel domain experiment
  • Text 600 lexical items in 169k/2k paired
    sentences
  • 0.7 translation WER w/ categorization 13.3 WER
    w/o
  • Speech 336 test utterances (3k words) spoken by
    4 speakers

12
On the Integration of Speech Recognition and
Statistical Machine TranslationMatusov et al.
2005
  • Use word lattices weighted by HMM ASR scores as
    input to a weighted FST for translation
  • Noisy Channel Model
  • Using an alignment model, A
  • Instead of modeling the alignment, search for the
    best alignment
  • Evaluation
  • Material 4 parallel corpora
  • Spontaneous speech in the travel domain
  • 3k - 66k paired sentences in Italian-English,
    Spanish-English and Spanish-Catalan
  • Vocabulary size 1.7k-15k words
  • On all metrics (mWER, mPER, BLEU, NIST), the
    translation results are as follows
  • Correct text
  • Word lattice w/ acoustic scores
  • Fully integrated ASR and MT (FUB Italian-English
    only)
  • Word lattice w/o acoustic scores
  • Single best ASR hypothesis (lower mPER than
    lattice w/o scores on FUB I-E)
  • Denser ASR lattices yield reduced translation WER
    (on FUB Italian-English)

13
Coupling vs. Unifying Modeling Techniques for
Speech-to-Speech TranslationGao 2003
  • Application of direct modeling to ASR, with the
    goal of direct modeling of interlingua text for
    MT.
  • A direct model of target text from source
    acoustics could also be constructed using this
    approach.
  • Composing models (e.g., noisy channel models) can
    lead to local or sub-optimal solutions
  • Direct Modeling tries to avoid these by creating
    a single maximum entropy model
  • p(textacoustics,...)
  • Direct modeling can also include other
    non-independent observations (features).
  • Major considerations
  • To simplify computational complexity, acoustic
    features are quantized.
  • Since the feature vector can get very large,
    reliable feature selection is necessary.
  • In preliminary experiments, 150M features were
    reduced to 500K via feature selection

14
The Use of Speech in Speech-to-Speech Translation
  • The Use of Prosodic Event Information
  • On the Use of Prosody in a Speech-to-Speech
    TranslatorStrom et al. 1997
  • A Japanese-to-English Speech Translation System
    ATR-MATRIXTakezawa et al. 1998
  • Cascaded / Loosely Coupled Approaches
  • Janus-III Speech-to-Speech Translation in
    Multiple Languages Lavie et al. 1997
  • A Unified Approach in Speech Translation
    Integrating Features of Speech Recognition and
    Machine TranslationZhang et al. 2004
  • Integrated / Tightly Coupled Approaches
  • Finite State Speech-to-Speech TranslationVidal
    1997
  • On the Integration of Speech Recognition and
    Statistical Machine TranslationMatusov 2005
  • Coupling vs. Unifying Modeling Techniques for
    Speech-to-Speech TranslationGao 2003

15
Thank you.
Write a Comment
User Comments (0)
About PowerShow.com