Title: SpeechtoSpeech MT in the JANUS System
1Speech-to-Speech MT in the JANUS System
- Lori Levin and Alon Lavie
- Language Technologies Institute
- Carnegie Mellon University
2Outline
- Design and Engineering of the JANUS/C-STAR
speech-to-speech MT system - Fundamentals of our approach
- System overview
- Engineering a multi-domain system
- The C-STAR Travel Domain Interlingua (IF)
- Evaluation and User Studies
- Conclusions, Current and Future Research
3JANUS Speech Translation
- Translation via an interlingua representation
- Main translation engine is rule-based
- Semantic grammars
- Modular grammar design
- System engineered for multiple domains
- Incorporate alternative translation engines
4Multilingual Interlingual Machine Translation
- Instructions
- Delete sample document icon and replace with
working document icons as follows - Create document in Word.
- Return to PowerPoint.
- From Insert Menu, select Object
- Click Create from File
- Locate File name in File box
- Make sure Display as Icon is checked.
- Click OK
- Select icon
- From Slide Show Menu, Select Action Settings.
- Click Object Action and select Edit
- Click OK
5Advantages of Interlingua
- Avoid the n-sqared problem for all-ways
translation. - Mono-lingual grammar development teams.
- Add a new language easily and automatically get
all-ways translation to all previous languages.
6The C-STAR Travel Planning Domain
- General Scenario
- Dialogue between one traveler and one or more
travel agents - Focus on making travel arrangements for a
personal leisure trip (not business) - Free spontaneous speech
7The C-STAR Travel Planning Domain
- Natural breakdown into several sub-domains
- Hotel Information and Reservation
- Transportation Information and Reservation
- Information about Sights and Events
- General Travel Information
- Cross Domain
8A Travel DialogueTranslated from Italian
- A Albergo Gabbia DOro. Good evening.
- B My name is Anna Maria DeGasperi. Im calling
from Rome. I wish to book two single rooms. - A Yes.
- B From Monday to Friday the 18th, Im sorry, to
Monday the 21st. - A Friday the 18th of June.
- B The 18th of July. Im sorry.
- A Friday the 18th of July to, you were saying,
Sunday. - B No. Through Monday the 21st.
9A Travel Dialogue(Continued)
- B So with departure on Tuesday the 22nd.
- A Then leaving on the 22nd. Yes. We have two
singles certainly. - B Yes.
- A Would you like breakfast?
- B Is it possible to have all meals?
- A No. We serve meals only in the evening.
- B Ok. If you can do breakfast and dinner.
- A Ok.
- B Do you need a deposit?
10A Travel Dialogue(Continued)
- A You can give me your credit card number.
- B Ok. Just a moment. Ok. My name is Anna
Maria DeGaperi. The card is 005792005792. - A Good.
- B Expiration 2002.
- A 2002. Good. Thank you. We need a
confirmation on the 18th of July before 6pm. - B Goodbye.
- A Thanks. Goodbye.
- B Thanks. Goodbye.
11A Non-Task-Oriented Dialogue(We cant translate
this.)
- A Are you cooking?
- B My father is cooking. Im cleaning. I just
finished cleaning the bathroom. - A Look. What do you know about Monica?
- B I dont know anything. Look. I dont know
anything. - A You dont know anything? I wrote her three
weeks ago, but if she hasnt received the letter,
they would have returned it. I hope she received
it. - B Because Celia told me that the address that
Monica had given us was wrong. She said that if
I was going to write to her, well, .
12Semantic Grammars
- Describe structure of semantic concepts instead
of syntactic constituency of phrases - Well suited for task-oriented dialogue containing
many fixed expressions - Appropriate for spoken language - often disfluent
and syntactically ill-formed - Faster to develop reasonable coverage for limited
domains
13Semantic Grammars
- Hotel Reservation Example
- Input we have two hotels available
- Parse Tree
- give-informationavailabilityhotel
- (we have hotel-type
- (quantity (two)
- hotel (hotels)
- available)
14The JANUS-III Translation System
15The JANUS-III Translation System
16The SOUP Parser
- Specifically designed to parse spoken language
using domain-specific semantic grammars - Robust - can skip over disfluencies in input
- Stochastic - probabilistic CFG encoded as a
collection of RTNs with arc probabilities - Top-Down - parses from top-level concepts of the
grammar down to matching of terminals - Chart-based - dynamic matrix of parse DAGs
indexed by start and end positions and head cat
17The SOUP Parser
- Supports parsing with large multiple domain
grammars - Produces a lattice of parse analyses headed by
top-level concepts - Disambiguation heuristics rank the analyses in
the parse lattice and select a single best path
through the lattice - Graphical grammar editor
18SOUP Disambiguation Heuristics
- Maximize coverage (of input)
- Minimize number of parse trees (fragmentation)
- Minimize number of parse tree nodes
- Minimize the number of wild-card matches
- Maximize the probability of parse trees
- Find sequence of domain tags with maximal
probability given the input words P(TW), where
T t1,t2,,tn is a sequence of domain tags
19Modular Grammar Design
- Grammar development separated into modules
corresponding to sub-domains (Hotel,
Transportation, Sights, General Travel, Cross
Domain) - Shared core grammar for lower-level concepts that
are common to the various sub-domains (e.g.
times, prices) - Grammars can be developed independently (using
shared core grammar) - Shared and Cross-Domain grammars significantly
reduce effort in expanding to new domains - Separate grammar modules facilitate associating
parses with domain tags - useful for multi-domain
integration within the parser
20Translation with Multiple Domain Grammars
21Analysis with Multiple Domain Grammars
- Parser is loaded with all domain grammars
- Domain tag attached to grammar rules of each
domain - Previously developed grammars for other domains
can also be incorporated - Parser creates a parse lattice consisting of
multiple analyses of the input into sequences of
top-level domain concepts - Parser disambiguation heuristics rank the
analyses in the parse lattice and select a single
best sequence of concepts
22A SOUP Parse Lattice
23Alternative Analysis Approach SALT
- SALT - Statistical Analyzer for Lang. Translation
- Combines ML trainable and rule-based analysis
methods for robustness and portability - Rule-based parsing restricted to well-defined set
of argument-level phrases and fragments - Trainable classifiers (NN, Decision Trees, etc.)
used to derive the DA (speech-act and concepts)
from the sequence of argument concepts. - Phrase-level grammars are more robust and
portable to new domains
24SALT Approach
- Example
- Input we have two hotels available
- Arg-SOUP exist hotel-type
available - SA-Predictor give-information
- Concept-Predictor availabilityhotel
- Predictors using SOUP argument concepts and input
words - Preliminary results are encouraging
25Design Criteria of the Interchange Format
- Instructions
- Delete sample document icon and replace with
working document icons as follows - Create document in Word.
- Return to PowerPoint.
- From Insert Menu, select Object
- Click Create from File
- Locate File name in File box
- Make sure Display as Icon is checked.
- Click OK
- Select icon
- From Slide Show Menu, Select Action Settings.
- Click Object Action and select Edit
- Click OK
- Suitable for task oriented dialogue
- Based on speakers intent, not literal meaning
- Domain independent framework with domain-specific
parts - Simple and reliable enough to use
- at multiple research sites.
- with widely varying type of parsers and
generators
26Domain Actions Extended, Domain-Specific Speech
Acts
- Instructions
- Delete sample document icon and replace with
working document icons as follows - Create document in Word.
- Return to PowerPoint.
- From Insert Menu, select Object
- Click Create from File
- Locate File name in File box
- Make sure Display as Icon is checked.
- Click OK
- Select icon
- From Slide Show Menu, Select Action Settings.
- Click Object Action and select Edit
- Click OK
- Examples
- crequest-informationavailabilityroom
- agive-informationpersonal-data
- cgive-informationtemporalarrival
27Task Oriented Sentences
- Instructions
- Delete sample document icon and replace with
working document icons as follows - Create document in Word.
- Return to PowerPoint.
- From Insert Menu, select Object
- Click Create from File
- Locate File name in File box
- Make sure Display as Icon is checked.
- Click OK
- Select icon
- From Slide Show Menu, Select Action Settings.
- Click Object Action and select Edit
- Click OK
- Perform an action in the domain.
- Are not descriptive.
- Contain fixed expressions that cannot be
translated literally.
28Components of the Interchange Format
- Instructions
- Delete sample document icon and replace with
working document icons as follows - Create document in Word.
- Return to PowerPoint.
- From Insert Menu, select Object
- Click Create from File
- Locate File name in File box
- Make sure Display as Icon is checked.
- Click OK
- Select icon
- From Slide Show Menu, Select Action Settings.
- Click Object Action and select Edit
- Click OK
- speaker a (agent)
- speech act give-information
- concept availabilityroom
- argument (room-type(single double),
- timemd12)
29Examples
- Instructions
- Delete sample document icon and replace with
working document icons as follows - Create document in Word.
- Return to PowerPoint.
- From Insert Menu, select Object
- Click Create from File
- Locate File name in File box
- Make sure Display as Icon is checked.
- Click OK
- Select icon
- From Slide Show Menu, Select Action Settings.
- Click Object Action and select Edit
- Click OK
- no thats not necessary
- cnegate
- yes I am
- caffirm
- and I was wondering what you have in the way of
rooms available during that time - crequest-informationavailabilityroom
- my name is alex waibel
- cgive-informationpersonal-data
(person-name(given-namealex, family-namewaibel)
) - and how will you be paying for this
- arequest-informationpayment (methodquestion)
- I have a mastercard
- cgive-informationpayment (methodmastercard)
30Speaker Tag
Client says Do you take credit cards?
crequest-informationpayment (methodcredit-card)
Agent says Will you be paying with a credit
card? arequest-informationpayment
(methodcredit-card)
31Size of IF
- Instructions
- Delete sample document icon and replace with
working document icons as follows - Create document in Word.
- Return to PowerPoint.
- From Insert Menu, select Object
- Click Create from File
- Locate File name in File box
- Make sure Display as Icon is checked.
- Click OK
- Select icon
- From Slide Show Menu, Select Action Settings.
- Click Object Action and select Edit
- Click OK
- May 1999
- Speech acts 54
- Concepts 84
- Arguments 118
32 Speech Acts
- accept
- acknowledge
- acknowledge-action
- affirm
- affirm-action
- apologize
- closing
- delay-action
- end-action
- give-certainty
- give-information
- Ill take that, Sounds good
- Okay, Sure, Yeah
- Here you go, This is it
- Yes, That is correct
- Yes, please do, Go ahead
- Sorry, Im sorry
- Bye, See you next week
- Ill get back to you on that
- Thats all for now
- Im sure
- I have 2 singles available
33Speech Acts
- greeting
- greeting-nice-meet
- greeting-request
- greeting-response-bad
- greeting-response-good
- greeting-welcome
- introduce-self
- introduce-topic
- negate
- negate-action
- not-understand
- Hello, Good morning
- Nice to meet you
- How are you
- Im not good
- Im fine
- Welcome to Pittsburgh
- This is Brian, Best Western
- About that flight
- No
- No, dont
- I dont understand
34Speech Acts
- offer
- offer-information
- offer-repeat
- please-wait
- reject (e.g., offer)
- request-action
- request-affirmation
- request-certainty
- request-delay-action
- request-information
- request-introduce-self
- How about it?
- Let me get you the information
- Let me repeat that
- Just a minute, Let me see
- No, I dont want that
- Can you reserve that for me?
- Is that correct?
- Are you sure?
- Can I get back to you later?
- Do you accept visa?
- Who am I speaking with?
35Speech Acts
- request-knowledge
- request-neg-affirmation
- request-repeat
- request-suggestion
- request-verification
- return-from-delay
- suggest
- thank
- verify
- welcome
- x-exclamation
- Do you know?
- Is that bad?
- Could you repeat that?
- Which hotel should I get?
- Right?, That was 40 dollars?
- Im back
- How about a single?
- Thank you very much
- Yes, that is 40 dollars.
- Youre welcome
- That is beautiful! (ETRI only)
36Meta-Demo Speech acts
- testing
- testing-problem
- testing-start
- testing-stop
- testing-proceed
- testing-request-proceed
- testing-ready
- testing-present
- testing-request-present
- Testing 1 2 3, This is a test
- We have a problem
- Lets start
- Lets stop
- Go ahead!
- Would you go first
- Ready here
- We are here, CMU is on line
- Are you there?
37Some Concepts
- Actions change, reservation, confirmation,
cancellation, help, purchase, view, display,
preference - Attributes availability, price, temporal, price,
location, size, features etc. - Objects room, hotel, flight, tour, event,
attraction, web-page etc. - Other arrival, departure, numeral,
expiration-date, payment
38Using Concepts to Represent Information Focus
Is there a hotel in Pittsburgh? crequest-informat
ionavailabilityhotel (locationpittsburgh)
Is the hotel in Pittsburgh? crequest-information
locationhotel (locationpittsburgh)
39Topic vs Focus
The Hilton Hotel is in Verona. agive-information
locationhotel (hotel-namehilton,
locationverona)
The hotel in Verona is the Hilton Hotel.
agive-informationlocationhotel
(hotel-namehilton, locationverona)
40The Interchange Format Database
d.u.sdu olang X lang Y Prv Z
sdu-in-language-Y on one line d.u.sdu olang X
lang E Prv Z sdu-in-English on one
line d.u.sdu IF Prv Z
dialogue-act-on-one-line d.u.asdu comments
your comments d.u.asdu comments go here
61.2.3 olang I lang I Prv IRST telefono per
prenotare delle stanze per quattro
colleghi 61.2.3 olang I lang E Prv IRST Im
calling to book some rooms for four
colleagues 61.2.3 IF Prv
IRST crequest-actionreservationfeaturesroom
(for-whom (associate,
quantity4)) 61.2.3 comments dial-oo5-spkB-roca0
-02-3
41The Interchange Format Database
English Dialogues English Sentences Korean
Dialogues Korean Sentences Italian
Dialogues Italian Sentences Japanese
Dialogues Japanese Utterances Distinct Dialogue
Acts
36 2466 70 1142 5 233 124 5887 554 (310 agent,
244 client)
42Phenomena Not Covered
- Instructions
- Delete sample document icon and replace with
working document icons as follows - Create document in Word.
- Return to PowerPoint.
- From Insert Menu, select Object
- Click Create from File
- Locate File name in File box
- Make sure Display as Icon is checked.
- Click OK
- Select icon
- From Slide Show Menu, Select Action Settings.
- Click Object Action and select Edit
- Click OK
Anaphora
Comparative Constructions
Scope (negation and modifiers)
Relative Clauses
Plurality
Descriptive Sentences
43Expressivity vs Simplicity
- If it is not expressive enough, components of
meaning will be lost. - If it is not simple enough, it cant be used
reliably across sites.
44Coverage
- The database includes about 550 distinct dialogue
acts. - About 60 dialogue acts cover about 70 of the
data. - About 5 of unseen data wasnt covered (as
judged by human experts)
45Consistency of Use Across Sites
- Successful international demo.
- After testing English-Italian and English-Korean,
Italian-Korean worked without extra effort. - Inter-coder agreement for each component of IF
individually (speech acts, concepts, arguments)
around 85 - Cross-site evaluation same as intra-site
evaluation 60 spoken 75 transcribed.
46User Studies
- We conducted three sets of user tests
- Travel agent played by experienced system user
- Traveler is played by a novice and given five
minutes of instruction - Traveler is given a general scenario - e.g., plan
a trip to Heidelberg - Communication only via ST system, multi-modal
interface and muted video connection - Data collected used for system evaluation, error
analysis and then grammar development
47Evaluations
- Accuracy Based Evaluation
- Translation preserves original meaning
- Task Based Evaluation
- goal success or failure
- user effort how many attempts before succeeding
or giving up
48Accuracy Based Evaluation
- End-to-end evaluations conducted at the SDU
(sentence) level - Multiple bilingual graders compare the input with
translated output and assign a grade of Perfect,
OK or Bad - OK meaning of SDU comes across
- Perfect OK fluent output
- Bad translation incomplete or incorrect
49Task Based Evaluation
- I would like to reserve 1s a single room 2f
- request-actionreservationhotel
(room-typesingle) - Translation
- I would like to reserve a seating room.
50Task Based Evaluation
- Scoring Scheme
- For goals that succeed 1/n
- For goals that fail -(1-1/n)
- where n is the number of attempts
- Overall score average for all goals
51August-99 Evaluation
- Data from latest user study - traveler planning a
trip to Japan - 132 utterances containing one or more SDUs, from
six different users - SR word error rate 14.7
- 40.2 of utterances contain recognition error(s)
52Accuracy and Task Based Evaluations
53Accuracy Based Evaluation
54Evaluation - Progress Over Time
55Current and Future Work
- Expanding the travel domain covering descriptive
as well as task-oriented sentences - Development of the SALT statistical approach and
expanding it to other domains - Full integration of multiple MT approaches SOUP,
SALT, Pangloss - Disambiguation improved sentence-level
disambiguation applying discourse contextual
information for disambiguation
56Conclusions
- We started skeptically with tools that we thought
were too simple context-free parser, semantic
grammar, interlingua based on domain actions. - We were surprised that they worked adequately for
some types of task oriented dialogue. - We improved portability.
- We are now working on embedding the simple
task-oriented system into a more complete
system.
57The JANUS/C-STAR Team
- Project Leaders
- Lori Levin, Alon Lavie, Monika Woszczyna, Alex
Waibel - Grammar and Component Developers
- Donna Gates, Dorcas Wallace, Taro Watanabe,
- Boris Bartlog, Ariadna Font-Llitjos, Marsal
Gavalda, - Chad Langley, Marcus Munk, Klaus Ries, Klaus
Zechner, - Detlef Koll, Michael Finke, Eric Carraux, Celine
Morel, - Alexandra Slavkovic, Susie Burger, Laura
Tomokiyo, - Takashi Tomokiyo, Kavita Thomas, Mirella Lapata,
- Matthew Broadhead, Cortis Clark, Christie Watson,
- Daniella Mueller, Sondra Ahlen