The QALL-ME Benchmark: a Multilingual Resource of Annotated Spoken Requests for Question Answering - PowerPoint PPT Presentation

About This Presentation
Title:

The QALL-ME Benchmark: a Multilingual Resource of Annotated Spoken Requests for Question Answering

Description:

How many colors are in the Italian flag. QUANTITY. Where is the Uffizi museum. LOCATION ... I would like to know the three colors of the Italian flag ... – PowerPoint PPT presentation

Number of Views:25
Avg rating:3.0/5.0
Slides: 39
Provided by: lui2
Learn more at: http://www.lrec-conf.org
Category:

less

Transcript and Presenter's Notes

Title: The QALL-ME Benchmark: a Multilingual Resource of Annotated Spoken Requests for Question Answering


1
The QALL-ME Benchmarka Multilingual Resource of
Annotated Spoken Requests for Question Answering
  • E. Cabrio, M. Kouylekov, B. Magnini, M. Negri
    (FBK-Irst)
  • L. Hasler, C. Orasan, (University of
    Wolverhampton)
  • D. Tomas, J.L. Vicedo (University of Alicante)
  • G. Neumann, C. Weber (DFKI)

2
Outline
  • Motivations and goals
  • QALL-ME Project
  • QALL-ME Benchmark
  • Data collection
  • Translation into English
  • Speech Acts Annotation
  • Question Answering Annotation
  • Annotation of relations
  • Conclusion and Future Work

3
Context the Qall-me project
  • QALL-ME (Question Answering Learning technologies
    in a multiLingual and multiModal Environment)
  • an EU-funded project aiming at the realization of
    a shared and distributed infrastructure for
    Question Answering systems on mobile devices
    (e.g. mobile phones).

4
QALL-ME details
  • Reference FP6 IST-033860
  • Contract Type STREP
  • Start date October 1st, 2006
  • Duration 36 months
  • Project Funding 2.82 M euros
  • http//qallme.fbk.eu

FBK- Irst, Italy Comdata S,p.A., Italy
DFKI, Germany Ubiest S.p.A., Italy
University of Alicante, Spain Waycom S.r.l., Italy
University of Wolverhampton, UK
5
Motivations
  • Providing a dataset of requests beyond factoid
    questions (e.g. verification, procedural)

6
Motivation beyond factoid
  • has Venezia hotel a restaurant
  • is there a toll free number for the INAIL office
    in via Gazzoletti in Trento
  • VERIFICATION
  • where is the INAIL office and how can I get there
  • how can I get to the pharmacy De Gerloni of
    Trento
  • PROCEDURAL

7
Motivations
  • Providing a dataset of requests beyond factoid
    questions (e.g. verification, procedural)
  • Investigating domain dependent vs domain
    independent annotation schema (Qall-me project
    domain cultural events in a town).

8
Challenges
  • Context aware QA
  • What can I see tonight at cinema
  • Where is the nearest pharmacy
  • Persistent vs dynamic information
  • Multiple sources (database, newspaper, web)

9
Challenges related to events
  • Context aware QA
  • What can I see tonight at cinema (in Trento)
  • Where is the nearest pharmacy (to piazza Duomo)
  • Persistent vs dynamic information
  • Multiple sources (database, newspaper, web)

10
Motivations
  • Providing a dataset of requests beyond factoid
    questions (e.g. verification, procedural)
  • Investigating domain dependent vs domain
    independent annotation schema (Qall-me project
    domain cultural events in a town).
  • Experimenting the impact of QA annotations (e.g.
    EAT) on spoken requests (speech vs QA).

11
QA annotation
  • may I know where the ice stadium of Trento is
    located and at what time it opens

LOCATION
Expected Answer Type
DATE
12
Motivations
  • Providing a dataset of requests beyond factoid
    questions (e.g. verification, procedural)
  • Investigating domain dependent vs domain
    independent annotation schema (Qall-me project
    domain cultural events in a town).
  • Experimenting the impact of QA annotations (e.g.
    EAT) on spoken requests (speech vs QA).
  • Investigating of the portability of semantic
    annotation through languages.

13
Portability of annotations
ich möchte wissen wo das Eisstadium von Trento ist
potrei sapere dovè lo stadio del ghiaccio di
Trento
  • may I know where the ice stadium of Trento is
    located

Expected Answer Type LOCATION
puedo saber donde esta el estadio de hielo de
Trento
14
Data collection
  • 14645 questions in four different languages
  • ITALIAN, ENGLISH, GERMAN, SPANISH
  • Domain cultural events in a town
  • Acquisition
  • Every speaker performs 30 questions, based on 15
    scenarios
  • Using a graphical interface, for each scenario is
    first generated a spontaneous request and then a
    written one (previously predefined)
  • A telephone was used to acquire questions.

15
Data collection
words utterances avg. len (words)
ITALIAN read utterances 25715 2290 11.2
ITALIAN spontaneous utterances 33492 2374 14.1
ITALIAN total utterances 59207 4664 12.7
SPANISH read utterances 25919 2250 11.52
SPANISH spontaneous utterances 26327 2250 11.70
SPANISH total utterances 52246 4500 11.61
ENGLISH read utterances 26626 2215 12
ENGLISH spontaneous utterances 36000 2286 15.8
ENGLISH total utterances 62626 4501 13.9
GERMAN read utterances 10990 903 12.17
GERMAN spontaneous utterances 985 77 12.79
GERMAN total utterances 11975 980 12.22
16
Data acquisition features
speakers males females non-native tot. speech duration avg. utt. dur
IT 161 68 93 12 9h20 7
SP 150 109 41 8 16h4 5.14
EN 113 46 63 21 7h35 6.1
GER 9 4 5 2 1h21 4.9
17
Transcription
  • All the audio files acquired from a speaker were
    joined together and orthographically transcribed
    using the tool Transcriber. (http//trans.soucefor
    ge.net)
  • Being domain-restricted, our scenarios led
    sometimes to the same utterance (matching word
    sequence). However, the number of repetitions is
    actually small.

18
Translation into English
  • Translation made by simulating the real situation
    of an English speaker visiting a foreign city. 
  • E.g.
  • what is the address of museo dell'aeronautica
    Gianni Caproni
  • Future work all data collected for one language
    translated into the other three languages

19
Annotation of speech acts
  • As a starting point for further analyses, it is
    important to separate within an utterance (each
    speakers turn) what has to be interpreted as the
    actual request from what does not need an answer.
  • hallo I am in Trento and I would like to visit a
    church in the centre of the town I would like to
    know the name and the location of one of these
    churches thanks

from the QALL-ME benchmark
20
Annotation of speech acts
  • As a starting point for further analyses, it is
    important to separate within an utterance (each
    speakers turn) what has to be interpreted as the
    actual request from what does not need an answer.
  • to greet
  • hallo I am in Trento and I would like to visit a
    church in the centre of the town I would like to
    know the name and the location of one of these
    churches thanks

from the QALL-ME benchmark
21
Annotation of speech acts
  • As a starting point for further analyses, it is
    important to separate within an utterance (each
    speakers turn) what has to be interpreted as the
    actual request from what does not need an answer.

  • to contextualise
  • hallo I am in Trento and I would like to visit a
    church in the centre of the town I would like to
    know the name and the location of one of these
    churches thanks

from the QALL-ME benchmark
22
Annotation of speech acts
  • As a starting point for further analyses, it is
    important to separate within an utterance (each
    speakers turn) what has to be interpreted as the
    actual request from what does not need an answer.
  • hallo I am in Trento and I would like to visit a
    church in the centre of the town I would like to
    know the name and the location of one of these
    churches thanks
  • to ask

from the QALL-ME benchmark
23
Annotation of speech acts
  • As a starting point for further analyses, it is
    important to separate within an utterance (each
    speakers turn) what has to be interpreted as the
    actual request from what does not need an answer.
  • hallo I am in Trento and I would like to visit a
    church in the centre of the town I would like to
    know the name and the location of one of these
    churches thanks

  • to
    thank

from the QALL-ME benchmark
24
Annotation of speech acts
UTTERANCE
  • REQUESTS

NON REQUESTS
All the utterances used by the speaker to
introduce himself, to contextualize himself or
his request in time and space, to thank, to greet.
  • DIRECT
  • wh-questions
  • Introduced by
  • Could you tell me
  • May I know
  • pronounced with ascendant intonation
  • INDIRECT
  • requests formulated in indirect or implicit ways

ASSERT
THANKS
GREETINGS
OTHER
For our purposes, we used CLaRK, an XML Based
System for Corpora Development (http//www.bultree
bank.org/clark/index.html).
25
Agreement (speech acts)
  • Inter-annotator agreement (calculated on 1000
    randomly picked sentences) for ITALIAN
  • Dice coefficient 2C/(AB)
  • Cnumber of common annotations
  • A , B number of annotations provided by the
    first and the second annotator

Overall agreement 96.1
ASSERT 85.5
DIRECT 97.88
INDIRECT 97.33
OTHER 76.47
THANKS 98.51
GREETINGS 99.49
26
Expected Answer Type
  • For EAT annotation we propose the following
    scheme
  • EAT
  • PROCEDURAL VERIFICATION FACTOID
    DEFINITION/DESCRIPTION

Extracted from Graessers (1988) taxonomy
  • DOMAIN-INDEPENDENT (SEKINES ENE HIERARCHY)
  • DOMAIN-SPECIFIC (QALL-ME ONTOLOGY)

27
Sekines ENE vs Qall-me ont.
what is the restaurant in via Brennero in Trento
? EAT
Sekines ENE hierarchy
Qall-me ontology
28
Sekines ENE vs Qall-me ont.
can you give me the name of the pharmacy in
piazza Pasi 20 in Trento
? EAT
Sekines ENE hierarchy
Qall-me ontology
29
Annotation of Relations
  • Relations among entities convey and complete the
    context in which a specific request has to be
    interpreted
  • At what time is the movie il grande capo
    beginning tomorrow afternoon at
  • Vittoria cinema
  • Rel1 (MOVIE, DATE)
  • Rel2 (MOVIE, STARTINGHOUR)
  • Rel3 (MOVIE, CINEMA)
  • 10 of the Italian questions (referring to
    Cinema/Movie domain) have been annotated with the
    12 relations holding in such domain (Qall-me
    ontology).

30
Status of the benchmark
Present situation and tentative scheduling
audio transcr. translat. speech acts EAT Sekine EAT ontology
ITALIAN X X X X X X
SPANISH X X X X X in progress
ENGLISH X X --- in progress in progress in progress
GERMAN in progress in progress in progress in progress in progress in progress
The QALL-ME benchmark is being made incrementally
available at the project website
(http//qallme.fbk.eu)
31
Future work
  • Additional annotation layers will be considered
  • Focus of the question
  • Multiwords
  • Named Entities
  • Normalized Temporal Expressions

32
Conclusions
  • QALL-ME benchmark multilingual resource (for
    Italian, Spanish, English and German) of
    annotated spoken requests in the tourism domain.
  • Beyond factoid
  • Context aware QA and dynamic changes
  • QA annotation on spoken requests
  • Portability of semantic annotation
  • Reference resource, useful to train and test ML
    based QA systems

33
  • Thank you
  • cabrio, kouylekov, magnini, negri_at_fbk.eu
  • L.Hasler, c.orasan_at_wlv.ac.uk
  • tomas, vicedo_at_disi.ua.es
  • neumann, cowe01_at_dfki.de
  • Project website http//qallme.fbk.eu

34
Acquisition scenarios
SubDomain
DesiredOutput
MandatoryItems
OptionalItems
35
Example from the corpus
  • ltquestion id"3118"gt
  • lttextgtbuongiorno chiamo da Trento avrei
    bisogno dell'indirizzo del teatro Auditorium per
    un concerto di Salvatore Accardo del 17 gennaio
    2007lt/textgt
  • ltanalysisgt
  • ltgreetingsgtbuongiornolt/greetingsgt
  • ltassertgtchiamo da Trentolt/assertgt
  • ltindirectgtavrei bisogno
    dell'indirizzo del teatro Auditorium per un
    concerto di Salvatore Accardo del 17 gennaio
    2007lt/indirectgtlt/analysisgt
  • ltreferencegt
  • ltrefgt
  • ltspeakergtspk075_27mar07comd_it_sid
    023lt/speakergt
  • ltturngt6lt/turngt
  • ltoriginalStringgt
  • buongiorno chiamo da Trento ho mmm avrei
    bisogno dell'indirizzo del teatro Auditorium per
    un eh concerto di Salvatore Accardo del 17
    gennaio 2007 b
  • lt/originalStringgt
  • lt/refgt
  • lt/referencegt
  • lttranslationgtgood morning I am calling
    from Trento I would like to know the address of
    Auditorium theatre for Salvatore Accardo's
    concert on 17th January 2007lt/translationgt
  • lt/questiongt

36
Expected Answer Type (1)
  • The semantic category associated to the desired
    answer, chosen out of a predefined set of labels
    (e.g. PERSON, LOCATION, DATE).
  • How many colors are in the Italian flag

  • QUANTITY
  • Where is the Uffizi museum

  • LOCATION
  • Most QA systems described in literature heavily
    rely on EAT information, at least in the Answer
    Extraction phase, to narrow the potential answer
    candidate search space.

37
Example from the corpus
What are the address and the telephone number of
Venezia hotel in Trento lteatsgt ltEAT type
FACTOID sekineADDRESS_OTHER
qallmePostalAddress eaq one/gt ltEAT type
FACTOID sekineADDRESS_OTHER qallmeContact
eaq one/gtlt/eatsgt
38
Expected Answer Quantifier
  • Attribute of the EAT that specifies the number of
    expected items in the answer.
  • I would like to know the three colors of the
    Italian flag
  • which movies are on tonight at Multisala Modena

  • all
  • The possible values are one, at least one, all,
    n.
Write a Comment
User Comments (0)
About PowerShow.com