Some Activities on Speech Translation at ITCirst - PowerPoint PPT Presentation

1 / 68
About This Presentation
Title:

Some Activities on Speech Translation at ITCirst

Description:

Copular constructions ( the hotel is cheap and near Trento' ... Number of dialogues containing ambiguities concerning place names (ski-areas, towns, hotels) ... – PowerPoint PPT presentation

Number of Views:91
Avg rating:3.0/5.0
Slides: 69
Provided by: pian4
Category:

less

Transcript and Presenter's Notes

Title: Some Activities on Speech Translation at ITCirst


1
(Some) Activities on Speech Translation at
ITC-irst
  • Fabio Pianesi
  • and
  • Roldano Cattoni
  • Erica Costantini
  • Emanule Pianta
  • ..

2
Scenario
  • NESPOLE! is a STST system allowing a tourist
    operator and a user to interact using their own
    languages.
  • Both customer and agent have thin clients (with
    whiteboard)
  • The customers terminal connects to the Italian
    (Agent side) mediator, which acts as a
    multimedial dispatcher.
  • The mediator
  • opens a connection with the tourist agent
  • transmits web pages
  • sends the audio to the appropriate HLT servers.
  • buffers and transmits gestures from the client to
    the agent and vice versa.
  • Feedback facilities provide full control by both
    parties on the evolution of the communicative
    exchange.

3
Scenario
CLIENT screen
  • The customer wants to organise a trip in
    Trentino.
  • She starts by browsing APT web pages to get
    information.

4
Interchange Format
5
Motivations
  • Interlingua facilitates translation between as
    many language pairs as possible with minimal
    effort.
  • SUBJECT DOMAINS tourist information, medical
    domain.
  • COMMUNICATION DOMAIN spoken dialogue

6
IF
Intermediate Representation Formalism
A lot of work Goals pursued
  • a general-purpose IRF to be used in conjunction
    with a more domain-oriented interlingua.
  • the generic part exploits a frame-like
    representation. WordNet 1.6 provides the
    conceptual repertory.
  • Important the interplay between the
    general-purpose and the domain-oriented IRF.
  • updates and improvements to the domain-oriented
    IF developed within CSTAR-II, to cope with the
    new requirements of NESPOLE!.
  • Extension of coverage to the new features of the
    application scenarios
  • improvements over existing representation for
    such linguistic information as referents novelty,
    number, nominal.

7
IF Design
  • CMU,Usa
  • Karlsruhe University, Germany
  • ITC-irst, Italy
  • CLIPS, France
  • ATR, Japan
  • ETRI, Korea
  • Chinese Academy of Sciences, China
  • Siemens, Germany

8
Language Features
  • Task oriented (rather than descriptive)
  • Many fixed expressions
  • Many fragments

9
Requirements
  • abstract away from the peculiarities of
    particular languages
  • capture the speaker's communication intent rather
    than the literal phrasing
  • usable at different sites with different language
    engine
  • allow reliable data annotation (inter-coder and
    inter-site agreement)
  • allow for robust language engines
    (underspecification, fragments)

10
Formalism
  • DOMAIN ACTION plus ARGUMENTS
  • DOMAIN-ACTION main communicative intention,
    semantic focus
  • ARGUMENTS semantic details

11
IF - syntax
  • ()
  • Domain action
  • a, c (agent, client) 
  • give-information,
    request-information, greeting, accept,
    apologize...
  •  
  • disposition, feasibility,
    obligation, view, arrival, rent, accommodation,
    arrival, trip, ..
  •  
  • accommodation-spec ...

12
Recent advances
  • information packaging (new/old)
  • number, gender
  • attitudes (know that, want, prefer)
  • modality (must, need, ...)
  • tense
  • rhetorical relations (because, after that, ...)
  • relatives (partial)
  • focalisers (also, only, for example)
  • Copular constructions (the hotel is cheap and
    near Trento)
  • multimodality coverage (indicate, show, square,
    .. pen)

13
Example1
  • " thank you . "
  • cthank
  •  
  • " can I help you ? "
  • aofferhelp (whoi, to-whomyou)
  •  
  • " my name is Chad "
  • cgive-informationpersonal-data(person-name(giv
    en-namechad))

14
Example 2
  • " and I would like to arrive around
    September ninth . "
  •  
  • cgive-informationdispositionarrival
  • (disposition(whoi, desire), / attitude /
  • conjunctiondiscourse /rhetorical information
    /
  • time(exactnessapproximate, month9, md9))
  • / time /

15
Example 3
  • " and I was hoping that you could help me plan a
    vacation to one of the national parks in the
    Trentino area . "
  •  
  • crequest-actionhelpplantrip
  • (help(whoyou, to-whomi),
  • conjunctiondiscourse,
  • visit-spec(vacation, identifiabilityno),
  • destination(quantity1,
  • specifier( national_park,
  • quantityplural,
  • location(place- nametrentino))))
  •  
  • NB the focus is on communicative intention, not
    on exact phrasing

16
Example4
  • " and in a restaurant .
    "agive-informationconcept
    (conjunctiondiscourse, location(restaurant,
    identifiabilityno))" which town
    ?"crequest-informationconcept
    (concept-spec(town, identifiabilityquestion))

17
Quantitativa data IF specifications
  • Last release february 2002speech acts 61
    (domain independent 20
    dialog-management SA)concepts 108 (mostly
    domain dependent)arguments 304 (mostly domain
    dependent)values 7,652

18
Quantitative data - NESPOLE! data base
  • Annotated turns (end 2001) English 815 (235
    distinct DAs) German 2,873 (367) Italian 1,286
    (233) French 234 (94)Total distinct DAs
    610Annotated turns (by end 2002) some 30/40
    more

19
Support tools
  • IF specifications (available on the web)
  • http//www.is.cs.cmu.edu/nespole/db/index.html
  • IF discussion board
  • http//peace.is.cs.cmu.edu/ISL/get/if.html
  • C-STAR and NESPOLE! Data Bases
  • http//www.is.cs.cmu.edu/nespole/db/index.html
  • IF Checker (web interface)
  • http//tcc.itc.it/projects/xig/xig-on-line.html
  • IF test suite
  • http//tcc.itc.it/projects/xig/xig-ts.html
  • IF emacs mode

20
  • Robust Generation for Speech Translation

21
Problems with the IF Lack of Well Defined
Semantics
  • Informal definition of the semantic primitives
  • No predicate argument structure
  • Only loosely compositional (but this is
    improving)
  • No true formal semantics for IF (huge effort,
    risk of loosing flexibility and adaptation to
    languages and linguistic engine)

22
Problems for Generation Linguistic
Underspecification
  • Analysis engines may not able to understand/
    disambiguate (e.i. information packaging, tense)
  • Subject in pro drop languages (the Italian
    analyser can produce an IF representation in
    which the subject is left unspecified)

23
Problems for Generation Ill-formedness
  • A legal IF representation must fit certain
    constraints.
  • agive-informationdisposition OK
  • agive-informationarrival OK
  • agive-informationarrivaldisposition NO!

24
Problems for Generation Ill-formedness
  • Top level arguments must be licensed by the
    concepts in the DA
  • a three star hotel would be fine

25
Problems for Generation Ill-formedness
  • Sub-arguments are licensed by super-arguments
  • are there available rooms at Hotel Belvedere?

26
Problems for GenerationIll-formedness
  • Values must be licensed by arguments
  • cgive-informationaction (actione-call-2,
    origin(place-namemumbay))
  • mumbay is out of the current coverage of the IF
    specs

27
Strategy Generating Fragments
  • Some 20-30 of IF representations produced
    during translation are illformed.
  • If you cannot generate a complete sentence
  • then generate the main phrases of the sentence
    and adopt a default order (e.g. NP, Verb, NP,
    Adjuncts)
  • If you cannot generate a main phrase
  • then generate fragments adopting some default
    order (e.g. Det, Noun, Modifiers)
  • If you cannot generate a lexical item out of an
    IF value
  • then return the value as it is

28
Defaults
  • Agent Can you see the map?
  • arequest-informationfeasibilityviewinformation
    -object (info-objectmap)
  • ??? Who is the the subject of view ???
  • Default rule When the Agent requests
    information about viewing something the
    agent of the viewing is the client

29
The Italian Generation Component.continued
  • XIG-IF sentence planner
  • Maps IF-representations into functional
    representations.
  • 4 layers of mapping rules (sentences, NPs,
    adjuncts, lexicon)
  • Cascade of increasingly less specific rules.
  • Functional representations are Mixed
    representations (morphology, potential words,
    strings,..)
  • HTPL solver Maps mixed representations into text.

30
Multimodality in STST
31
Multimodality
  • NESPOLE! allows users to perform gestures
    (pointing, selection, etc.) on maps.
  • Gestures are performed by means of a tablet
    and/or a mouse on maps displayed through the
    systems whiteboard.
  • Anchoring between gestures and language is
    obtained through a simple time-based procedure.
  • More complex procedures, aiming at conceptual
    anchoring have a greater impact on HLT modules.
    Their investigation has been postponed.

32
Multimodality
Previous results
  • The advantages of multimodal input over
    speech-only input includes faster task
    completion, fewer errors, fewer spontaneous
    disfluences, strong preference for multimodal
    interaction (Oviatt, 97)
  • when combined with spoken input, pen-based input
    can disambiguate badly understood sentences
    (Oviatt, 2000)

33
(No Transcript)
34
(No Transcript)
35
(No Transcript)
36
(No Transcript)
37
Multimodality
Usability study
  • Goal the impact of multimodality in a real
    speech-to-speech translation environment
  • Evaluation of the added value of multimodality in
    a multilingual and multimedial environment.
  • Evaluation of the degree of integration of
    multimodality in the multilingual system.

38
MODALITY x LANGUAGE
Multimodality - experiment
Experimental Design
  • MODALITY
  • SO (Speech only)
  • MM (Multimodal)
  • LANGUAGE
  • English
  • German

39
Experimental Design
Users Customers
  • TOTAL NUMBER 28
  • FEATURES
  • English and German speakers
  • similar level of computer literacy and web
    expertise
  • paid volunteers
  • DESIGN between (each client took part in one
    dialogue and experienced only one modality)
  • Sex balanced across conditions

40
Table 1. Group composition
Experimental Design
Users Customers
E English speakers G German speakers
41
Experimental Design
Users Agents
  • TOTAL NUMBER 7
  • Italian volunteers (not involved in the Nespole!
    Project) acting as Trentino tourist board agents
  • DESIGN within (each agent took part in more than
    one dialogue, and experienced both modalities)
  • Sex balanced across conditions and languages

42
Experimental Design
Dependent Variables
  • Variables targeted
  • spoken input
  • gestures
  • effectiveness of the dialogue
  • usability self-reports

43
Experimental Design
Dependent Variables
  • Speech
  • Spontaneous events
  • A-grammatical phrases (repetitions, corrections,
    false starts)
  • empty pauses (silence, breathing)
  • filled pauses (vowels, nasal, other)
  • human noises (laugh, noise)
  • word interruptions (speaker)
  • understandability
  • technical breaks (word break, word missing)
  • turn breaks (the utterance is broken)

44
Experimental Design
Dependent Variables
  • Speech
  • TURNS AND WORDS
  • turns per dialogue
  • tokens (spoken words) per dialogue
  • types (vocabulary) per dialogue
  • tokens per turn
  • types per turn
  • token/type rate (how many words were used before
    a new word was introduced)
  • returns to topics already treated

45
Experimental Design
Dependent Variables
  • Pen-based Gestures
  • Number and types of collected gestures
  • loading of an image
  • scroll
  • zoom
  • running a browser
  • selection of an area (only MM condition)
  • pointing to an area (only MM condition)
  • connecting different areas (only MM condition)

in SO modality too they are not properly
multimodal inputs, but commands concerning
multimedia
46
Experimental Design
Dependent Variables
  • Dialogue effectiveness
  • number of successful turns
  • ambiguities concerning place names (ski-areas,
    towns, hotels)
  • reached goal did the client find the hotel which
    meets his/her expense budget?

47
Experimental Design
Dependent Variables
  • Usability self-reports
  • S.U.S. (System Usability Scale) (agents and
    clients)
  • Preference concerning experimental conditions
    (agents)

48
Experimental Design
Material
  • Microphone
  • Pen and tablet
  • 3 maps
  • Two web pages
  • Same translation systems for the two conditions
  • Different instructions for agents and customers

49
Experimental Design
Material -screen
  • Netmeeting window with
  • Push-to-talk button
  • Check-uncheck button
  • Feedback window with
  • Hypothesed string
  • Hypothesed meaning
  • Textual translation of remote speech

50
Experiment - results
Successful dialogues
  • CANCELED DIALOGUES N 22
  • client didnt show up 3
  • interrupted (connection or hlt servers crashes)
    4
  • connection problems (connection failed) 4
  • the system was not yet frozen 5
  • incomplete recordings 6
  • FULLY RECORDED DIALOGUES n 28
  • delays due to connection problems (about 20
    minutes) 3
  • interruption and restart during dialogue 3
  • synthesis crashed 10 minutes before the end of
    the dialogue(but dialogue contined in text
    mode) 2

51
Experiment - results
Speech-related variables
  • No significant differences among conditions as to
    spontaneous events, turns and words figures,
    dialogue lenght.
  • One spoken turn every 33 seconds (average) in
    both conditions.
  • Average duration per dialogue
  • SO36 min. MM34.5 min

52
Experiment - results
Successful turns
  • Real turns (excluding non-understandable case)
  • SO 486 (83) MM 368 (79)
  • Average duration of real turns (from the start of
    turn i to the start of turn ii)
  • SO 33,78 secs MM 32,45 secs

53
Experiment - results
Repetitions
Percentages of repeated turns, repetitions, and
other turns on real turns
54
Percentages of repeated turns and repetitions of
the repeated turns
55
Percentages of successful turns (yes), partially
successful turns (par) non-successful turns (no)
and false turns (false).
56
Experiment - results
Dialogue fluency
 

Return rate number of turns / number of returns
57
Experiment - results
Ambiguities
  • Number of dialogues containing ambiguities
    concerning place names (ski-areas, towns, hotels)
  • MM SO
  • yes 2 5
  • no 5 2
  • All ambiguities were immediately solved in MM
  • Ambiguities were harder to solve in SO
  • Original It is not Panchià, it is Cavalese
  • Translation Pachià not Cavalese

58
Experiment - results
Gesture-related variables
  • All gestures (but 2), performed by agents
  • Total gestures
  • SO 63 MM 182
  • Few or no deictics used. Mostly accompanying
    speech (Ill show it to you on the map)

59
Experiment - results
Gesture-related variables
  • Average figures for gestures
  • loading of an image 2,7 (MM and SO. No
    significant differences)
  • scroll 1,7 (both MM and SO. No significant
    differences)
  • zoom 0
  • running a browser 0,4 (both MM and SO, No
    significant differences)
  • MM-only gestures 7.6
  • selection of an area 4.71
  • pointing on an area 1.36
  • gestures connecting different areas 1.4

60
Experiment - results
Gesture-related variables
  • All gestures performed at the end of the turn
  • Typical sequence
  • Ill show you the ice skating rink on the map
  • Microphone is switched off
  • Gesture is performed
  • Despite the absence of deictics, gestures were
    always appropriately introduced by language.
  • Hence multilinguality and multimodality are
    suitably integrated.

61
Experiment - results
Goals achievement
  • No differences in the number of dialogues in
    which the client found/didnt find the hotel
    meeting the requirements
  • MM SO
  • yes 5 5
  • no 2 2

62
Experiment - results
Usability
  • No differences among conditions as to S.U.S.
    scores.
  • No differences between clients group and agents
    group as to S.U.S. scores.
  • Average score 55
  • System Usability Scale (developed by Digital
    Equipment Co. Ltd, Reading, UK)
  • S.U.S. scores have a range of 0 to 100

63
Experiment - results
Usability
  • Strong preference of agents for multimodal
    interaction
  • Weak preference of agents for the English
    Language

X strong preference x weak preference
Agents n.5, 6, 7 took part in 3 or 4 dialogues
(less than half respect to the other agents) n.
5 and 6 have not preferences n. 7 has not
preference concerning language)
64
Experiment
Conclusions
  • Tendency for dialogues to be shorter in MM than
    in SO
  • Tendency for repeated turns to be fewer in MM
    than in SO
  • If returns can be taken as an indicator of
    dialogue fluency, then there is a tendency for
    fluency to be better in MM than in SO.
  • Moreover, this is even clearer for dialogue
    segments dealing with spatial information.

65
Experiment
Conclusions
  • No, or very rare, spontaneous use of deictics.
  • All MM gestures have been used by agents, with a
    clear preference for area selection.
  • Tendency for MM to exhibit less ambiguity
  • Moreover, when present, the ambiguity was
    immediately solved by resorting to MM resources.
  • However, there doesnt seem to be a difference in
    effectiveness (goal achievement) between SO and
    MM.
  • Strong preference for MM by agents.

66
Experiment
Conclusions
  • Pen-based input increases the probability of
    successful interaction, reducing the impact of
    translation errors
  • The advantages of multimodal input are more
    relevant when spatial information is to be
    conveyed.
  • The greater complexity of the the MM system does
    not prevent users from enjoying the interaction
    (and from evaluating it friendlier and more
    usable than SO system)

67
Experiment
Conclusions
  • The presence/absence of multimodality does not
    seem to systematically affect low-level
    linguistic variables
  • This seams to be a consequence of the low number
    of turns with gestures and of the very high
    frequency of bad turns (technical problems)

68
Experiment
Conclusions
  • The number of cases is very low considering the
    number of independent (and confounding)
    variables, negatively affecting the power of the
    statistical tests
  • A between design is not able to capture the
    preferences for one modality with complex
    systems when users can experience both the SO
    and the MM versions, the preference towards MM
    condition is very strong.
Write a Comment
User Comments (0)
About PowerShow.com