Title: Tight Coupling between ASR and MT in Speech-to-Speech Translation
1Tight Coupling between ASR and MT in
Speech-to-Speech Translation
- Arthur Chan
- Prepared for
- Advanced Machine Translation Seminar
2This Seminar
- Introduction (6 slides)
- Ringgers categorization of Coupling between ASR
and NLU (7 slides) - Interfaces in Loose Coupling
- 1 best and N-best (5 slides)
- Lattices/Confusion Network/Confidence Estimation
(12 slides) - Results from literature
- Tight Coupling
- Theory
- Some As Is Ideas on This Topic
36 papers on Tight Coupling of Speech-to-Speech
Translation
- H. Ney, Speech translation Coupling of
recognition and translation, in Proc. ICASSP,
1999. - Casacuberta et al., Architectures for
speech-to-speech translation using finite-state
models, in Proc. Workshop on Speech-to-Speech
Translation, 2002. - E. Matusov, S.Kanthak, and H. Ney, On the
integration of speech recognition and statistical
machine translation, in Proc. InterSpeech,
2005. - S.Saleem, S. C. Jou, S. Vogel, and T. Schultz,
Using word lattice information for a tighter
coupling in speech translation systems, in Proc.
ICSLP, 2004. - V.H. Quan et al., Integrated N-best re-ranking
for spoken language translation, in In
EuroSpeech, 2005. - N. Bertoldi and M. Federico, A new decoder for
spoken language translation based on confusion
networks, in IEEE ASRU Workshop, 2005.
4A Conceptual Model of Speech-to-Speech Translation
Speech Recognizer
Machine Translator
Speech Synthesizer
Decoding Result(s)
Translation
waveforms
waveforms
5Motivation of Tight Coupling between ASR and MT
- One best of ASR could be wrong
- MT could be benefited from wide range of
supplementary information provided by ASR - N-best list
- Lattice
- Sentenced/Word-based Confidence Scores
- E.g. Word posterior probability
- Confusion network
- Or consensus decoding (Mangu 1999)
- MT quality may depend on WER of ASR (?)
6Scope of this talk.
Speech Recognizer
Machine Translator
Speech Synthesizer
1-best?
Translation
N-best?
waveforms
waveforms
Lattice?
Confusion network?
7Topics Covered Today
- The concept of Coupling
- Tightness of coupling between ASR and
Technology X. (Ringger 95) - Interfaces between ASR and MT in loose coupling
- What could ASR provide?
- System with Semi-tight coupling
- Very tight coupling
- Neys formulae
- Casacubertas Approach
- Some random thoughts for this topic.
- What is missing in the current research?
8Topics not covered
- Direct Modeling
- Use both features in ASR and MT
- Some referred as ASR and MT unification
- Implication of the MT search algorithms on the
coupling - Generation of speech from text.
- Presenter doesnt know enough.
9The Concept of Coupling
10Classification of Coupling of ASR and Natural
Language Understanding (NLU)
- Proposed in Ringger 95, Harper 94
- 3 Dimensions of ASR/NLU
- Complexity of the search algorithm
- Simple N-gram?
- Incrementality of the coupling
- On-line? Left-to-right?
- Tightness of the coupling
- Tight? Loose? Semi-tight?
11Tightness of Coupling
Tight
Semi-Tight
Loose
12Notes
- Semi-tight coupling could appear as
- Feedback loop between ASR and Technology X for
the whole utterance of speech - Or Feedback loop between ASR and Technology X for
every frame. - The Ringger system
- A good way to understand how speech-based system
is developed
13Example 1 LM
- Someone asserts that ASR has to be used with
13-grams. - In tight-coupling,
- A search will be devised to search for the best
word sequence with best acoustic score 13 gram
likelihood - In loose coupling
- A simple search will be used to generate some
outputs (N-best list, lattice etc.), - 13-gram will then use to rescore the output.
- In semi-tight coupling
- 1, A simple search will be used to generate
results - 2, 13 gram will be applied at the word-end only
(but exact history will not be stored)
14Example 2 Higher order AM
- Segmental model assume obs. probability is not
conditionally independent. - Someone assert that segmental model is better
than just HMM. - Tight coupling Direct search of the best word
sequence using segmental model. - Loose coupling Use segmental model to rescore
- Semi-tight coupling Hybrid HMM-Segmental model
algorithm?
15Summary of Coupling between ASR and NLU
16Implication on ASR/MT coupling
- Generalize many systems
- Loose coupling
- Any system which uses 1-best, n-best, lattice, or
other inputs for 1-way module communication - (Bertoldi 2005)
- CMU System (Saleem 2004)
- (Matusov 2005)
- Tight coupling
- (Ney 1999)
- (Casacuberta 2002)
- Semi-tight coupling
- (Quan 2005)
17Interfaces in Loose Coupling1-best and N-best
18Perspectives
- ASR outputs
- 1-best results
- N-best results
- Lattice
- Consensus network.
- Confidence scores
- How ASR generate these outputs?
- Why they are generated?
- What if there are multiple ASRs?
- (and what if their results are combined?)
19Origin of the 1-best.
- Decoding of HMM-based ASR
- Searching the best path in a huge HMM-state
lattice. - 1-best ASR result
- The best path one could find from backtracking.
- State Lattice (Next page)
20(No Transcript)
21Note on 1-best
- Most of the time 1-best Word Sequence
- Why?
- In LVCSR, storing the backtracking pointer table
for state sequence takes a lot of memory (even
nowadays) - Compare this with the number of frames of score
one need to be stored - Usually a backtrack pointer storing
- The previous words before the current word
- Clever structure dynamically allocate
back-tracking pointer table.
22What is N-best list?
- Traceback not only from the 1st -best, also from
the 2nd best and 3rd best, etc. - Pathway
- Directly from search backtrack pointer table
- Exact N-best algorithm (Chow 90)
- Word pair N-best algorithm (Chow 91)
- A search using Viterbi score as heuristic (Chow
92) - Generate lattice first, then generate N-best from
lattice
23Interfaces in Loose CouplingLattice, Consensus
Network and Confidence Estimation
24What is Lattice?
- A compact representation of state-lattice
- Only word node (or link) are involved
- Difference between N-best and Lattice
- Lattice could be compact representation of N-best
list.
25(No Transcript)
26How lattice is generated?
- From the decoding backtracking pointer table
- Only record all the links between word nodes.
- From N-best list
- Become a compact representation of N-best
- Sometimes spurious link will be introduced
27How lattice is generated when there are phone
contexts at the word end?
- Very complicated when phonetic context is
involved - Not only word-end needs to be stored but also the
phone contexts. - Lattice has the word identity as well as contexts
- Lattice can become very large.
28How this is resolved?
- Some used only approximate triphone to generate
lattice in first stage (BBN) - Some generate lattice even with full CD-phones
but convert it back to no-context lattices (RWTH) - Use the lattice with full CD phone contexts
(RWTH)
29What ASR folks do when lattice is still too large?
- Use some criteria to prune the lattice.
- Example Criteria
- Word posterior probability
- Application of another LM or AM, then filtering.
- General confidence score
- Maximum lattice density
- (number of words in lattice/number of words)
- Or generate an even more compact representation
than lattices - E.g. consensus network.
30Conclusions on lattices
- Lattice generation itself could be a complicated
issue - Sometimes, what post-processing stage (e.g. MT)
will get is pre-filtered, pre-processed results.
31Confusion Network and Consensus Hypothesis
- Confusion Network
- Or Sausage Network.
- Or Consensus Network
32Special Properties (?)
- More local than lattice
- One can apply simple criteria to find the best
results - E.g. consensus decoding is to apply
word-posterior probability on confusion network. - More tractable
- In terms of size
- Found to be useful in
- ?
- ?
33How to generate consensus network?
- From the lattice
- Summary of Mangus algorithm
- Intra-word clustering
- Inter-word clustering
34Note on Consensus Network
- Note
- Time information might not be preserved in
confusion network - The similarity function directly affect the final
output of the consensus network.
35Other ways to generate confusion network
- From the N-best list
- Using Rover.
- A mixture of voting and adding confidence of word
36Confidence Measure
- Anything other than likelihood which could tell
whether the answer is useful - E.g.
- Word posterior probability
- P(WA)
- Usually compute using lattices
- Language model backoff mode
- Other posterior probabilities (frame, sentence)
37Interfaces in Loose CouplingResults from the
Literature
38N-best list
39Lattices CMU results (Saleem 2004)
- Just Put some graphs here.
40Consensus Network
41Confidence Does it help?
42Tight Coupling
43Motivation
44Theory (Ney 1999)
Bayes Rule
Introduce f as hidden var.
Bayes Rule
Assume x doesnt depend on target lang.
Sum to Max
45Layman point of view
46Later Approaches
47Casacuertas Approach
48Some As Is Ideas on This Topic
49The End. Thanks.
50(No Transcript)
51Literature
- Eric K. Ringger, A Robust Loose Coupling for
Speech Recognition and Natural Language
Understanding, Technical Report 592, Computer
Science Department, Rochester University, 1995 - The ATT paper
52Some As Is Ideas on This Topic