Title: Annotating Attribution Relations Towards an Italian Discourse Treebank
1Annotating Attribution RelationsTowards an
Italian Discourse Treebank
- Silvia Pareti
- Irina Prodanof
2- Introduction
- Related works
- Goal and methodology
- Proposed scheme
- Some issues
- Pilot annotation
- Attribution figures
- Conclusion and future work
3ATTRIBUTION in a text is ascribing the ownership
of an attitude towards some linguistic material
, i.e. the text itself, a portion of it or
their semantic content, to an entity.
Recognising attribution relations is fundamental
for Information Extraction, (Multi Perspective)
Question Answering, Opinion Mining etc.
Different sources can differ in bias and
reliability and this deeply affects the way we
perceive information.
4Why should we identify the source of a portion of
text?
ODQA
NLP techniques
Information Retrieval
Language Generation
Answer selection
Question comprehension
Finding text fragments with the answer
Answer generation
- visualize only authoritative answers
- collect different opinions, hearsay
- discard second-hand or anonymous information
- retrieve statements from a specific source over
a given time span -
5È meglio vaccinarsi per linfluenza suina? Is
it better to get the swine flu vaccine?
6È meglio vaccinarsi per linfluenza suina? Is
it better to get the swine flu vaccine?
The vaccine is useless.
orsetta90
blogger not authoritative and not verifiable
source
Everyone should get the vaccine.
Novartis
Pharmaceuticals industry authoritative but
biased
Only persons having a higher risk of
complication from influenza should get the
vaccine .
Doctor association
7Opinion holders identification projects
Bethard et al. (2004) Consider just opinion
propositions (source agent) Kim and Hovy
(2005) Identify all possible opinion holders
agentive and NPs (no pronouns) Stoyanov and
Cardie (2006) Identify NPs sources Choi et
al.(2006) They do not consider implicit or
multiple sources and test their system on the
OPQA corpus
Opinion recognition has limited coverage and not
satisfactory precision 60-70
8PDTB (Prasad et al., 2007) assertions, beliefs,
facts, eventualities
Attribution of discourse connectives and their
arguments only
Opinion Corpus (Wiebe, 2002) speech acts private
states opinions, beliefs, thoughts, feelings,
emotions, goals, evaluations and judgements
Attribution considered as an intra-sentential
phenomenon
GraphBank (Wolf and Gibson, 2005) attribution
included as a directed coherence relation
(satellite to nucleus)
Attribution of discourse segments
9Designing the addition of a level of annotation
for attribution to the ISST (Italian Syntactic -
Semantic Treebank) corpus.
- more complete and independent analysis of
attribution - development of an annotation schema
- pilot annotation of a portion of the ISST
- partial listing of possible attribution cues
- evaluation
10- Selection of features to be annotated
- Design of the schema
- Annotation requirement definition
- Match tool characteristics and annotation
requirements - Setting the tool
- Scope definition
- Identification of characteristics and issues
X
- Evaluation of the schema applicability
- Pilot annotation and detection of issues
- Linguistic resource creation and release
11Markables
relation
SOURCE(S)
CUE
CONTENT(S)
(SUPPLEMENT)
-verb -noun -adjective -preposition -prep.
group -graphic marker
-noun phrase -adjective -prep. phrase
-word -phrase -clause -sentence -entire
article
-cue modifier -indirect object -source of
source -event specification
12Features
Attribution type
Source type
Factuality
Scopal change
13Source
- Nested attribution
- Multiple sources
- Source of source
- Pronominal and bridging anaphora
14Some issues
Source
Sue said that Mary believes (that Gore won the
election). Fonti writer writer, Sue
(writer, Sue, Mary) (Wiebe, 20025 - with the
addition of brackets)
- Nested attribution
- Multiple sources
- Source of source
- Pronominal and bridging anaphora
Blinder, secondo voci riferite dal New York
Times, sperava di succedere al presidente
Greenspan quando a marzo scadrà la sua nomina.
(ISST re070) Blinder, according to rumours
reported by the New York Times, hoped to succeed
to president Greenspan when in May his
appointment will run over.
15Some issues
Source
- Nested attribution
- Multiple sources
- Source of source
- Pronominal and bridging anaphora
Tutti, incluse le autorità, conoscono la loro
provenienza, ma nessuno dice e fa nulla per
prevenire il massacro di capi selvatici.
(cs.morph020) Everyone, including the
authorities, knows their provenance, but no one
says and does anything to prevent the massacre of
wild animals.
16Some issues
(Ø) Ho saputo della squalifica di Garciano da
Maurizio Damilano, vi giuro, non pensavo di
arrivare primo. (ISST cs071) (I) heard of the
disqualification of Garciano from Maurizio
Damilano, I swear, I didnt imagine I would have
came first.
Source
- Nested attribution
- Multiple sources
- Source of source
- Pronominal and bridging anaphora
Poi però, tramite la figlia che sta a Santiago,
prima limita la portata del colloquio con Gaston
Salvatore (non è stata una vera intervista, solo
una conversazione), poi smentisce. (ISST
period005) Afterwards however, through the
daughter who lives in Santiago, first diminishes
the importance of the colloquium with Gaston
Salvatore (it wasnt a real interview, just a
conversation), then (she) denies.
17Some issues
Source
- Nested attribution
- Multiple sources
- Source of source
- Pronominal and bridging anaphora
La Fermenta, a sentire l' arabo, è organizzata in
modo che oggi consegue un utile pari al 35 per
cento del fatturato. Questo il vero traguardo che
dovrà nel tempo raggiungere la Pierrel. Ma come?
Con tagli di mano d'opera? Nemmeno per sogno,
dice El Sayed. (ISST els001) Fermenta, according
to the Arabian, is organised so that it earns at
present a profit of 35 per cent of the turnover.
This is the real goal that in the long distance
Pierrel will have to achieve. But how? Cutting
down on workforce? No way, says El Sayed.
18Some issues
Cue
- Type definition
- Multimodal cues
- Scopal change
19Some issues
Cue
Eventuality
- Type definition
- Multimodal cues
- Scopal change
Assertion
"Vi daremo le statistiche alla fine", promettono
i generali croati. (ISST cs030) Well give you
the statistics at the end, promise the Croatian
generals.
assertion belief facts eventualities
affermare credere ricordare permettere
sostenere pensare sapere sostenere
osservare dubitare osservare desiderare
20Some issues
Arlacchi sorride Pura paranoia politica. Non ho
partecipato ai lavori solo a causa di un impegno
privato. (ISST re095) Arlacchi smiles Pure
political paranoia. I didnt participate in the
works only because of a private appointment .
Cue
- Type definition
- Multimodal cues
- Scopal change
"Sì - si adombra Matt - Un ruolo interessante
con Tarantino eravamo a buon punto, poi é
arrivato Bruce. I suoi film incassano un po' più
dei miei, no? Hanno scelto lui (ISST
cs060) Yes - Matt grows dark - An interesting
role with Tarantino we were at a good point,
then Bruce arrived. His films cash in a bit more
than mines, right? They chose him
21Some issues
? tutti vorrebbero non accadessero
Cue
Strano destino, quello di Civitavecchia finire
spesso, troppo spesso, sulle pagine dei giornali
per eventi misteriosi, oppure per fatti che
nessuno vorrebbe accadessero nella sua città.
(ISST cs090) Strange destiny, that of
Civitavecchia ending up often, too often, in the
news because of mysterious events, or because of
events that no one would like to happen in their
town.
- Type definition
- Multimodal cues
- Scopal change
22Some issues
Content
- Multiple contents
- Discontinuous spans
- Event anaphora
23Some issues
Content
- Multiple contents
- Discontinuous spans
- Event anaphora
(Ø) Ho detto che ero dalla sua parte e che
ritenevo giusta la sua protesta. (ISST
cs063) (I) said that I was on his side and
that I considered his complaint fair.
24Some issues
Content
- Multiple contents
- Discontinuous spans
- Event anaphora
"There's no question that some of those workers
and managers contracted asbestos-related
diseases," said Darrell Phillips, vice president
of human resources for Hollingsworth Vose.
"But you have to recognize that these events
took place 35 years ago. It has no bearing on our
work force today." (PDTB 0003)
25Some issues
Content
- Multiple contents
- Discontinuous spans
- Event anaphora
Lumanità deve proclamare uno storico sciopero
ad oltranza fino alla distruzione di tutti gli
armamenti nucleari. Le parole registrate di
Gheddafi, (ISST cs039) The world should
proclaim a non-stop strike till the destruction
of all nuclear armaments. Gheddafis recorded
words,
26Tool requirements
Discontinuous text selection
Nested selection
Relations
Multiple sources/contents
Pre-defined values selection
Display customizability
Ease of setting a scheme
Ease of annotation
XML stand-off output
Reference to word index
Tools GATE Knowtator Annotator MMAX2 Callisto
MMAX2 Base Data (original text) Scheme
(annotation schema) Style (display
structure) Customization (preferences) Markable
(annotation)
- Subcorpus
- 50 articles from the ISST
- balanced
- 37.000 word tokens
- 461 attribution relations
27(No Transcript)
28Markables
CUE 461
SOURCE 329
CONTENT 468
Source type
WRITER 23
OTHER 375
ARBITRARY 62
MIXED 1
Scopal change
NONE 429
SCOPAL-CHANGE 7
29Attribution type and Factuality
30- Achievements
- more complete analysis of attribution
- definition of an annotation schema
- identification of issues and possible solutions
- partial listing of possible attribution cues
- annotation of a portion of the ISST corpus
- Future work
- testing of the interannotator agreement for the
proposed schema - redefinition of problematic or underspecified
attributes - annotation of the whole ISST corpus
- expanding the list of attribution cues
- relation between attribution and discourse
connectives/ anaphora/
31Conclusion and future work
Thank you
Discourse generation
Researches on journalistic discourse
Training tools for ODQA/ MPQA/ IE
Testing algorithms for the recognition of
attribution
ANNOTATED CORPUS
Statistical and combinatory analysis
Development of corpora in other languages