Title: Dealing with Italian Temporal Expressions: the ITAChronos System
1Dealing with Italian Temporal Expressions the
ITA-Chronos System
- Matteo Negri
- Fondazione Bruno Kessler - IRST, Trento - Italy
- negri_at_itc.it
- EVALITA 2007 - Evaluation of NLP Tools for
Italian - Rome - Italy
- September 10, 2007
2Outline
- Chronos a multilingual system for TE
recognition/normalization - System description
- Some examples
- Results at EVALITA 2007
3Chronos
- Multilingual (ITA/ENG) tool for TE recognition
and normalization according to the TIMEX2
standard - Approach
- Rule-based system
- ENG-Chronos 1500 rules
- ITA-Chronos 981 rules
- Six phases Preprocessing, Detection, Braketing,
Information Gathering, Anchors Selection,
Normalization - ENG-Chronos participated in TERN-04 with good
results on the RecognitionNormalization Task - Ranked 2nd, with 76 TERN-Value (best system
78)
4ITA-Chronos System Architecture
Plain Text
Tagged Text
Intermediate Annotation
Tokenization, POS Tagging, Multiwords Recognition
DetectionBasic Tagging Rules
Attributes Normalization
Bracketing Composition Rules
Dates Normalization
Information GatheringTagging Rules for SET,
Anchor_Dir, Anchor_Val, MOD Type, T_Cat, Heur,
Op, Quant, Val_Ext
Anchors Selection
Detection and Bracketing
Normalization
5STEP1 Preprocessing
- The first phase of the process performs
- Tokenization
- POS tagging
- Multiwords recognition
- The preprocessed input text is then passed to the
TE detection phase, where around 400 tagging
rules are in charge of finding all the TEs it
contains.
6STEP2 Detection
- Markable expressions are detected considering the
presence of lexical triggers in the input text - anno, oggi, Venerdì, Natale,
quotidianamente, 10/09/2007, 1982, etc. - Basic Tagging Rules
- Regular expressions checking for word senses,
parts of speech, symbols, or words satisfying
specific predicates
E preposition
N numeral
TimeUnit-p satisfied by secondo, minuto,
ora, giorno, settimana,
mese, etc.
Tagging rule matching with Fra tre giorni
7STEP3 Bracketing
- Considers the context surrounding the detected
triggers - inizio, fine, prima, dopo, fa,
successivo, precedente, durante, circa,
almeno, 3, sesto, etc. - Composition rules
- In charge of handling conflicts between possible
multiple taggings (e.g. when a recognized TE
contains, overlaps, or is adjacent to one or more
detected TEs)
Tutta la notte di sabato Tutta la notte la
notte la notte di sabato sabato Tutta la notte di
sabato
Composition rule for handling inclusions
8STEP4 Information gathering
- Goal mine relevant information for normalization
- Considers triggerscontext to assign values to
- TIMEX2 attributes (e.g. SET, MOD, ANCHOR_DIR)
- TEMPORARY attributes (e.g. Type, T_Cat, Heur, Op,
Quant) - This is done by running separate sets of
specialized tagging rules - Such information is stored in the Intermediate
Annotation, and input to the normalization
component
9Information Gathering Example
TIMEX2 attributes MOD più di, circa,
oltre SET ogni, tutti ANCHOR_DIR
prima, durante, dopo... TEMPORARY
attributes type T-ABS T-REL t-cat
second, minute, hour, day, op , ,
- quant n0 heur CR-DATE PR-DATE
10Information Gathering Example
oltre tre anni dopo
Detected TE
TIMEX2 attributes MOD più di, circa,
oltre SET ogni, tutti ANCHOR_DIR
prima, durante, dopo... TEMPORARY
attributes type T-ABS T-REL t-cat
second, minute, hour, day, op , ,
- quant n0 heur CR-DATE PR-DATE
11Information Gathering Example
oltre tre anni dopo
TIMEX2 attributes MOD più di, circa,
oltre SET ogni, tutti ANCHOR_DIR
prima, durante, dopo... TEMPORARY
attributes type T-ABS T-REL t-cat
second, minute, hour, day, op , ,
- quant n0 heur CR-DATE PR-DATE
MORE_THAN
12Information Gathering Example
oltre tre anni dopo
TIMEX2 attributes MOD più di, circa,
oltre SET ogni, tutti ANCHOR_DIR
prima, durante, dopo... TEMPORARY
attributes type T-ABS T-REL t-cat
second, minute, hour, day, op , ,
- quant n0 heur CR-DATE PR-DATE
MORE_THAN
ENDING
13Information Gathering Example
oltre tre anni dopo
TIMEX2 attributes MOD più di, circa,
oltre SET ogni, tutti ANCHOR_DIR
prima, durante, dopo... TEMPORARY
attributes type T-ABS T-REL t-cat
second, minute, hour, day, op , ,
- quant n0 heur CR-DATE PR-DATE
MORE_THAN
ENDING
T-REL
14Information Gathering Example
oltre tre anni dopo
TIMEX2 attributes MOD più di, circa,
oltre SET ogni, tutti ANCHOR_DIR
prima, durante, dopo... TEMPORARY
attributes type T-ABS T-REL t-cat
second, minute, hour, day, op , ,
- quant n0 heur CR-DATE PR-DATE
MORE_THAN
ENDING
T-REL
YEAR
15Information Gathering Example
oltre tre anni dopo
TIMEX2 attributes MOD più di, circa,
oltre SET ogni, tutti ANCHOR_DIR
prima, durante, dopo... TEMPORARY
attributes type T-ABS T-REL t-cat
second, minute, hour, day, op , ,
- quant n0 heur CR-DATE PR-DATE
MORE_THAN
ENDING
T-REL
YEAR
16Information Gathering Example
oltre tre anni dopo
TIMEX2 attributes MOD più di, circa,
oltre SET ogni, tutti ANCHOR_DIR
prima, durante, dopo... TEMPORARY
attributes type T-ABS T-REL t-cat
second, minute, hour, day, op , ,
- quant n0 heur CR-DATE PR-DATE
MORE_THAN
ENDING
T-REL
YEAR
3
17Information Gathering Example
oltre tre anni dopo
TIMEX2 attributes MOD più di, circa,
oltre SET ogni, tutti ANCHOR_DIR
prima, durante, dopo... TEMPORARY
attributes type T-ABS T-REL t-cat
second, minute, hour, day, op , ,
- quant n0 heur CR-DATE PR-DATE
MORE_THAN
ENDING
T-REL
YEAR
3
PR-DATE
18Intermediate Annotation Example
- adige20041007_id413938
- Così il 31 Luglio del 2002, quindi oltre tre
anni dopo lincidente, il giovane venne
nuovamente ricoverato e sottoposto ad un
intervento che si dimostrerà risolutivo -
-
-
- quindi ltTIMEX2 MODMORE_THAN
ANCHOR_DIRENDING typeT-REL t-catYEAR
op quant3, heurPR-DATEgtoltre tre anni
dopo lt/TIMEX2gt lincidente
Plain Text
Detection and Bracketing
Intermediate Annotation
19STEP5 Anchors Selection
- Goal connect each detected T-REL to an
appropriate anchor date - While the meaning of T-ABSs (13 Marzo 2005) is
context-independent, T-RELs (tre anni dopo) can
only be interpreted with respect to e reference
TE - The heur attribute is used for this purpose
- 2 heuristics
- CR-DATE connects a T-REL to the documents
creation date (found at the beginning of the doc,
or induced from docs name. e.g.
adige20041007_) - PR-DATE connects a T-REL to the nearest
detected TE with a compatible granularity (a
t-cat with at least the same degree of
specificity) - t-cat month month, week,
day, century -
20STEP6 Dates Normalization
- Goal fill the VAL attribute of each detected TE
-
- T-ABSs regular expressions considering their
superficial form (1990s 199) -
- T-RELs rewriting rules considering
- the anchor (e.g. 2002)
- the operator (OP) to be applied (e.g. )
- the quantity (QUANT) to be
added/subtracted (e.g. 3)
tre anni dopo
2005
2002 3
21ITA-Chronos at EVALITA 2007
- Results over the EVALITA-07 test set (2715
computation time, 50 words/sec) - Higher scores on MOD and SET attributes
- Activated by the presence of triggers that are
easy to identify - Lower scores with ANCHOR_VAL and ANCHOR_DIR
- Require the analysis of a larger context, e.g.
including verb tense
22Web Demo
- http//www.qallme.itc.it/server/chronos/italian