Coreferencing Treebank data using CESAC - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

Coreferencing Treebank data using CESAC

Description:

Collection of nodes - Each node consists of. Brackets: (...) Label: (NP ... Hweonene cumest tu fearlac deades munegunge. Ich cume he seid of helle. ... – PowerPoint PPT presentation

Number of Views:52
Avg rating:3.0/5.0
Slides: 17
Provided by: erwinrkom
Category:

less

Transcript and Presenter's Notes

Title: Coreferencing Treebank data using CESAC


1
Coreferencing Treebank data using CESAC
  • Annotating and analysing IS in corpora of
    historical English
  • Berlin, 13-14 November 2009

2
Overview
Contents
Coreferencing using Cesac
  • CESAC
  • - Goals
  • - Coreference types operationalizing IS
  • - Input and output
  • Inter-rater agreement
  • Example
  • Summary and conclusion

3
CESAC goals
Goal of CESAC
Coreferencing using Cesac
  • Overall goal
  • - Referring from any one constituent to any other
    constituent
  • More specifically
  • - Source and destination IP/phrase/node or
    DP/lexeme/endnode
  • - Attributes
  • Coreference type
  • Distance measure
  • (NP type definite, indefinite, etc.)
  • (Animacy)

4
CESAC coreference types operationalizing IS
Goal of CESAC
Coreferencing using Cesac
  • Two basic rules
  • - Do not omit possible coreference information
  • - A source should be linked to the nearest
    possible destination
  • Labels encoding different forms of anaphoricity
  • - Identity
  • Jacqueline plays the cello. She is an amazing
    musician
  • - Cross Speech
  • John said to Paul Why dont you play the
    guitar?
  • - Inferred
  • Do you see that house? They say the kitchen
    is extremely spacious
  • - World knowledge (separate category)
  • According to Burt Reynolds, all dogs go to
    heaven
  • Cross Speech gt Identity gt Inferred
  • Encoding facts vs. encoding interpretations
    objective data

5
CESAC input 1
The input Penn-Treebank
Coreferencing using Cesac
  • Standard Penn-Treebank format
  • - Collection of ltnodesgt
  • - Each ltnodegt consists of
  • Brackets ()
  • Label (NP )
  • Other node (NP (N ) )
  • Lexeme (N man)
  • Possibly ltlexemegtltnodegt (P to (NP him))
  • - Attributes in label
  • (NP-ACC (PROA hine))
  • - Extra-textual data in CODE nodes
  • (CODE ltTEXT tylastegt)

6
CESAC input 2
The input Penn-Treebank
Coreferencing using Cesac
( (CODE ltT06080009600,11.4gt) (IP-MAT (CONJ
And) (NP-NOM (DN tat) (NN folc))
(NP-ACC (PROA hine)) (ADVP-TMP (ADVT ta))
(PP (P mid) (NP-DAT (ADJD
unasecgendlicre) (ND wurdmynte))) (PP (P
to) (NP-DAT (ND scipe))) (VBDI
geladdon) (. ,)) (ID coapollo,ApT11.4.183)) (
(IP-MAT (CONJ and) (NP-NOM (NRN
Apollonius)) (NP-ACC-1 (PROA hi)) (VBDI
bad) (IP-INF (NP-ACC-SBJ ICH-1)
(QP-ACC (QA ealle)) (VB gretan))) (ID
coapollo,ApT11.4.184))
7
CESAC output 1
enriched Penn-Treebank
Coreferencing using Cesac
  • Penn-Treebank format
  • Enriched with coreference information
  • - Source node ID
  • - Destination node ID
  • - Coreference type
  • - Coreference distance derivable
  • Destination node example
  • (NP-SBJ (CODE ltCoref_Id"339"_/gt) (NPR Crist))
  • Source node example
  • (NP-OB1
  • (CODE ltCoref_Id"20"_Ref"21"_Type"Identity"_NdD
    ist"16"_/gt)
  • (PRO hem) )

8
CESAC output 2
enriched Penn-Treebank
Coreferencing using Cesac
Destination node
ltnodegt one-or-more ltnodegt OR ltlexemegt
9
CESAC output 3
enriched Penn-Treebank
Coreferencing using Cesac
( (IP-MAT (CONJ and) (NP-NOM con (CODE
ltCoref_Id"1488"_Ref"1489"_Type"Identity"_NdDist
"12"_/gt)) (VBD ladde) (NP-ACC (CODE
ltCoref_Id"1476"_Ref"1477"_Type"Identity"_NdDist
"8"_/gt) (PROA hine)) (PP (P mid)
(NP-DAT-RFL (CODE ltCoref_Id"1487"_Ref"1488"_
Type"Identity"_NdDist"6"_/gt) (PROD him)))
(PP (P to) (NP-DAT (PRO his (CODE
ltCoref_Id"1486"_Ref"1487"_Type"Identity"_NdDist
"5"_/gt)) (ND huse)))) (ID
coapollo,ApT12.16.209))
10
CESAC output 4
enriched Penn-Treebank
Coreferencing using Cesac
Source node
ltnodegt one-or-more ltnodegt OR ltnodegt
ltlexemegt OR ltlexemegt
11
Inter-rater agreement 1
Goal of CESAC
Coreferencing using Cesac
  • Two features measured
  • - Coreference destination (node ID)
  • - Coreference type
  • Adapted version of Cohens kappa ? gt .6
  • Two important problems
  • - Identity vs. cross speech
  • - Omission of link
  • Solutions
  • - Create new rule(s)
  • - Adapt/specify existing rule(s)

12
Inter-rater agreement 2
Goal of CESAC
Coreferencing using Cesac
  • Tool used to calculate inter-rater agreement
    concerning
  • - Coreference destination (feature 1) ? .67
  • - Coreference type (feature 2) ? .66

13
Example 1
Goal of CESAC
Coreferencing using Cesac
14
Example 2
Goal of CESAC
Coreferencing using Cesac
  • Clean text fragment with translation
  • Ant warshipe hire easked. Hweonene cumest tu
    fearlac deades munegunge. Ich cume he
    seid of helle.
  • And Worship him asked, From where come
    you, Fearlac, deaths reminder? I come,
    he said, from hell.
  • Text fragment in CESAC coreference file
  • 170.64 2031 ant2033 warschipe2035 hire
    easked. Hweonene20422043 cumest 2045
    tu2047 fearlac2049 deades munegunge .
  • 170.65 20532054 Ich cume20572058 he seid
    of2063 helle .
  • Text fragment in Penn-Treebank file
  • ( (IP-MAT (CONJ ant)
  • (NP-SBJ (N warschipe))
  • (NP-OB1 (PRO hire))
  • (VBP easked)
  • (, .)
  • (CP-QUE-SPE (WADVP-1 (WADV Hweonene))
  • (IP-SUB-SPE (ADVP-DIR T-1)
  • (VBP cumest)
  • (NP-SBJ (CODE ltCoref_Id"346"_Ref"345"_Ty
    pe"CrossSpeech"_NdDist"10"_/gt) (PRO tu))
  • (NP-VOC (CODE ltCoref_Id"50"_Ref"346"_Typ
    e"Identity"_NdDist"2"_/gt) (N fearlac)

15
Summary and conclusion
Goal of CESAC
Coreferencing using Cesac
  • Annotation program CESAC
  • - Input standard Penn-Treebank
  • - Output relatively easy to analyse
  • - Inter-rater agreement measured
  • Operationalizing IS
  • - 4 coreference types
  • - As objective as possible facts vs.
    interpretations
  • Plans
  • - Fixed set of coreference types
  • - Larger corpus of coreferenced texts

16
Thank you for your attention!
Goal of CESAC
Coreferencing using Cesac
Write a Comment
User Comments (0)
About PowerShow.com