Dealing with Italian Temporal Expressions: the ITAChronos System - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Dealing with Italian Temporal Expressions: the ITAChronos System

Description:

ENG-Chronos participated in TERN-04 with good results on the 'Recognition Normalization Task' ... Ranked 2nd, with 76% TERN-Value (best system: 78%) EVALITA'07 ... – PowerPoint PPT presentation

Number of Views:28
Avg rating:3.0/5.0
Slides: 23
Provided by: matteo52
Category:

less

Transcript and Presenter's Notes

Title: Dealing with Italian Temporal Expressions: the ITAChronos System


1
Dealing with Italian Temporal Expressions the
ITA-Chronos System
  • Matteo Negri
  • Fondazione Bruno Kessler - IRST, Trento - Italy
  • negri_at_itc.it
  • EVALITA 2007 - Evaluation of NLP Tools for
    Italian
  • Rome - Italy
  • September 10, 2007

2
Outline
  • Chronos a multilingual system for TE
    recognition/normalization
  • System description
  • Some examples
  • Results at EVALITA 2007

3
Chronos
  • Multilingual (ITA/ENG) tool for TE recognition
    and normalization according to the TIMEX2
    standard
  • Approach
  • Rule-based system
  • ENG-Chronos 1500 rules
  • ITA-Chronos 981 rules
  • Six phases Preprocessing, Detection, Braketing,
    Information Gathering, Anchors Selection,
    Normalization
  • ENG-Chronos participated in TERN-04 with good
    results on the RecognitionNormalization Task
  • Ranked 2nd, with 76 TERN-Value (best system
    78)

4
ITA-Chronos System Architecture
Plain Text
Tagged Text
Intermediate Annotation
Tokenization, POS Tagging, Multiwords Recognition
DetectionBasic Tagging Rules
Attributes Normalization
Bracketing Composition Rules
Dates Normalization
Information GatheringTagging Rules for SET,
Anchor_Dir, Anchor_Val, MOD Type, T_Cat, Heur,
Op, Quant, Val_Ext
Anchors Selection
Detection and Bracketing
Normalization
5
STEP1 Preprocessing
  • The first phase of the process performs
  • Tokenization
  • POS tagging
  • Multiwords recognition
  • The preprocessed input text is then passed to the
    TE detection phase, where around 400 tagging
    rules are in charge of finding all the TEs it
    contains.

6
STEP2 Detection
  • Markable expressions are detected considering the
    presence of lexical triggers in the input text
  • anno, oggi, Venerdì, Natale,
    quotidianamente, 10/09/2007, 1982, etc.
  • Basic Tagging Rules
  • Regular expressions checking for word senses,
    parts of speech, symbols, or words satisfying
    specific predicates

E preposition
N numeral
TimeUnit-p satisfied by secondo, minuto,
ora, giorno, settimana,
mese, etc.
Tagging rule matching with Fra tre giorni
7
STEP3 Bracketing
  • Considers the context surrounding the detected
    triggers
  • inizio, fine, prima, dopo, fa,
    successivo, precedente, durante, circa,
    almeno, 3, sesto, etc.
  • Composition rules
  • In charge of handling conflicts between possible
    multiple taggings (e.g. when a recognized TE
    contains, overlaps, or is adjacent to one or more
    detected TEs)

Tutta la notte di sabato Tutta la notte la
notte la notte di sabato sabato Tutta la notte di
sabato
Composition rule for handling inclusions
8
STEP4 Information gathering
  • Goal mine relevant information for normalization
  • Considers triggerscontext to assign values to
  • TIMEX2 attributes (e.g. SET, MOD, ANCHOR_DIR)
  • TEMPORARY attributes (e.g. Type, T_Cat, Heur, Op,
    Quant)
  • This is done by running separate sets of
    specialized tagging rules
  • Such information is stored in the Intermediate
    Annotation, and input to the normalization
    component

9
Information Gathering Example
TIMEX2 attributes MOD più di, circa,
oltre SET ogni, tutti ANCHOR_DIR
prima, durante, dopo... TEMPORARY
attributes type T-ABS T-REL t-cat
second, minute, hour, day, op , ,
- quant n0 heur CR-DATE PR-DATE
10
Information Gathering Example
oltre tre anni dopo
Detected TE
TIMEX2 attributes MOD più di, circa,
oltre SET ogni, tutti ANCHOR_DIR
prima, durante, dopo... TEMPORARY
attributes type T-ABS T-REL t-cat
second, minute, hour, day, op , ,
- quant n0 heur CR-DATE PR-DATE
11
Information Gathering Example
oltre tre anni dopo
TIMEX2 attributes MOD più di, circa,
oltre SET ogni, tutti ANCHOR_DIR
prima, durante, dopo... TEMPORARY
attributes type T-ABS T-REL t-cat
second, minute, hour, day, op , ,
- quant n0 heur CR-DATE PR-DATE
MORE_THAN
12
Information Gathering Example
oltre tre anni dopo
TIMEX2 attributes MOD più di, circa,
oltre SET ogni, tutti ANCHOR_DIR
prima, durante, dopo... TEMPORARY
attributes type T-ABS T-REL t-cat
second, minute, hour, day, op , ,
- quant n0 heur CR-DATE PR-DATE
MORE_THAN
ENDING
13
Information Gathering Example
oltre tre anni dopo
TIMEX2 attributes MOD più di, circa,
oltre SET ogni, tutti ANCHOR_DIR
prima, durante, dopo... TEMPORARY
attributes type T-ABS T-REL t-cat
second, minute, hour, day, op , ,
- quant n0 heur CR-DATE PR-DATE
MORE_THAN
ENDING
T-REL
14
Information Gathering Example
oltre tre anni dopo
TIMEX2 attributes MOD più di, circa,
oltre SET ogni, tutti ANCHOR_DIR
prima, durante, dopo... TEMPORARY
attributes type T-ABS T-REL t-cat
second, minute, hour, day, op , ,
- quant n0 heur CR-DATE PR-DATE
MORE_THAN
ENDING
T-REL
YEAR
15
Information Gathering Example
oltre tre anni dopo
TIMEX2 attributes MOD più di, circa,
oltre SET ogni, tutti ANCHOR_DIR
prima, durante, dopo... TEMPORARY
attributes type T-ABS T-REL t-cat
second, minute, hour, day, op , ,
- quant n0 heur CR-DATE PR-DATE
MORE_THAN
ENDING
T-REL
YEAR

16
Information Gathering Example
oltre tre anni dopo
TIMEX2 attributes MOD più di, circa,
oltre SET ogni, tutti ANCHOR_DIR
prima, durante, dopo... TEMPORARY
attributes type T-ABS T-REL t-cat
second, minute, hour, day, op , ,
- quant n0 heur CR-DATE PR-DATE
MORE_THAN
ENDING
T-REL
YEAR

3
17
Information Gathering Example
oltre tre anni dopo
TIMEX2 attributes MOD più di, circa,
oltre SET ogni, tutti ANCHOR_DIR
prima, durante, dopo... TEMPORARY
attributes type T-ABS T-REL t-cat
second, minute, hour, day, op , ,
- quant n0 heur CR-DATE PR-DATE
MORE_THAN
ENDING
T-REL
YEAR

3
PR-DATE
18
Intermediate Annotation Example
  • adige20041007_id413938
  • Così il 31 Luglio del 2002, quindi oltre tre
    anni dopo lincidente, il giovane venne
    nuovamente ricoverato e sottoposto ad un
    intervento che si dimostrerà risolutivo
  • quindi ltTIMEX2 MODMORE_THAN
    ANCHOR_DIRENDING typeT-REL t-catYEAR
    op quant3, heurPR-DATEgtoltre tre anni
    dopo lt/TIMEX2gt lincidente

Plain Text
Detection and Bracketing
Intermediate Annotation
19
STEP5 Anchors Selection
  • Goal connect each detected T-REL to an
    appropriate anchor date
  • While the meaning of T-ABSs (13 Marzo 2005) is
    context-independent, T-RELs (tre anni dopo) can
    only be interpreted with respect to e reference
    TE
  • The heur attribute is used for this purpose
  • 2 heuristics
  • CR-DATE connects a T-REL to the documents
    creation date (found at the beginning of the doc,
    or induced from docs name. e.g.
    adige20041007_)
  • PR-DATE connects a T-REL to the nearest
    detected TE with a compatible granularity (a
    t-cat with at least the same degree of
    specificity)
  • t-cat month month, week,
    day, century

20
STEP6 Dates Normalization
  • Goal fill the VAL attribute of each detected TE
  • T-ABSs regular expressions considering their
    superficial form (1990s 199)
  • T-RELs rewriting rules considering
  • the anchor (e.g. 2002)
  • the operator (OP) to be applied (e.g. )
  • the quantity (QUANT) to be
    added/subtracted (e.g. 3)

tre anni dopo
2005
2002 3
21
ITA-Chronos at EVALITA 2007
  • Results over the EVALITA-07 test set (2715
    computation time, 50 words/sec)
  • Higher scores on MOD and SET attributes
  • Activated by the presence of triggers that are
    easy to identify
  • Lower scores with ANCHOR_VAL and ANCHOR_DIR
  • Require the analysis of a larger context, e.g.
    including verb tense

22
Web Demo
  • http//www.qallme.itc.it/server/chronos/italian
Write a Comment
User Comments (0)
About PowerShow.com