What - PowerPoint PPT Presentation

About This Presentation
Title:

What

Description:

Elements of the annotation-matching Formalism. Support for ... Roy Byrd. Herb Chong. Albert Eskenazi. Paul Kaye. Son Bao Pham. Lokesh Shresta. Max Silberztein ... – PowerPoint PPT presentation

Number of Views:34
Avg rating:3.0/5.0
Slides: 22
Provided by: branimir1
Learn more at: http://www.lrec-conf.org
Category:
Tags: byrd

less

Transcript and Presenter's Notes

Title: What


1
Whats NEXT?Navigating through Dense
Annotation Spaces
  • Branimir K. Boguraev
  • Mary S. NeffLanguage Engineering for Content
    AnalysisIBM T.J. Watson Research Center
  • Yorktown Heights, NY

2
Outline
  • Dense annotation spaces
  • Navigational challenges
  • Elements of the annotation-matching Formalism
  • Support for navigational control
  • Conclusion
  • Future work

3
Dense Annotation Spaces
SENT
SENT
SC
SC
SUB
OBJ
OBJ
SUB
OBJ
OBJ
PP
PP
NP
NP
NP
NP
VG
VG
NP
NP
NP
NP
VG
VG
np
nps
md
vb
nn
nn
in
nn
to
vb
dt
nn
np
nps
md
vb
nn
nn
in
nn
to
vb
dt
nn
Service Reps can read customer name, in order to
contact the customer.
4
Annotation trees
SENT
SC
SUB
OBJ
OBJ
PP
NP
NP
NP
NP
VG
VG
np
nps
md
vb
nn
nn
in
nn
to
vb
dt
nn
Service Reps can read customer name, in order to
contact the customer.
5
Annotation lattice
SENT
SC
SUB
OBJ
OBJ
PP
NP
NP
NP
NP
VG
VG
np
nps
md
vb
nn
nn
in
nn
to
vb
dt
nn
Service Reps can read customer name, in order to
contact the customer.
6
Navigational Challenges
  • PNAME
  • TitleName
  • First Middle Last
  • What is visible to the lattice traversal engine?

7
Annotation-Based Finite State Transducer (AFst)
  • UIMA-based
  • A finite state calculus over typed feature
    structures
  • Cf. grep over a sequence of annotations,
    specified as types and features
  • np ltEgt/NP .
  • TokenposDT ltEgt .
  • TokenposJJ .
  • ( TokenposNN TokenposNNS ) .
  • ltEgt/NP

8
Pitching the Iterator support for navigational
control
SENT
SC
SUB
OBJ
OBJ
PP
NP
NP
NP
NP
VG
VG
np
nps
md
vb
nn
nn
in
nn
to
vb
dt
nn
Service Reps can read customer name, in order to
contact the customer.
9
Afst Traversal Regime
  • Defining a particular path through the annotation
    space requires a lattice traversal engine that
    can focus onsimultaneously
  • Sequential constraints pattern matching
  • Horizontalprenominal mod and nominal head
  • Structural constraints
  • Verticaliterate over NP with specific
    configurational relationship e.g. not sentence
    initial, not in a PP
  • Configurational constraints
  • Type prioritization

10
Linearizing the Lattice whats next?
SUB
OBJ
OBJ
PP
NP
NP
NP
NP
VG
VG
  • Unambiguous Typeset iterator, inferred from
    grammar SUB . VG . OBJ . PP
  • UIMA natural annotation sort order
  • Start position ascending
  • Length descending
  • Type priority, defined in UIMA descriptors

11
Linearizing the Lattice whats next?
  • Grammar-wide declarations
  • boundary Sentence
  • honour Address
  • month TokenlemmaJanuary
  • TokenlemmaFebruary
  • date ltEgt/Year .
  • month ltEgt .
  • Tokenstring12\d3
  • ltEgt/Year

12
FocusSelecting Nested Boundary Annotations
ltnameValuePairgt ltnamegtFocuslt/namegt ltvaluegtltarra
ygt ltstringgtSectionlabelEducation lt/st
ringgt ltstringgtSentencenumber1 lt/stringgt lt/
arraygtlt/valuegt lt/nameValuePairgt
13
Linearizing the Lattice whats next?
  • Grammar-wide declarations
  • match first, last, longesr, shortest, all
  • advance skip, step

14
Whats next?Switching Levels, Mixed Iterator
  • Refocus the iterator to examine inner contour
    _at_descend, _at_ascend
  • findDrSmith
  • ltEgt/PName_at_descend .
  • TitlestringDr. .
  • ltEgt/Name_at_descend .
  • FirstltEgt . LaststringSmith .
  • ltEgt/Name_at_ascend .
  • ltEgt/PName_at_ascend

15
Alternate Multiple Level Access
  • Upper/lower context without switching levels
  • Token_costartsSentencenumber1
  • Subject_coversPName
  • PName_costartsNP,_coendsNP

16
Grammar cascading
  • From simpler to more complex analyses
  • Lower levels of output feed as inputs into higher
    levels
  • Small noun phrases verb groups
  • Prepositional, possessive adjectival phrases
  • More complex noun phrases
  • Variety of clause types
  • Grammatical relations (subject, object)

17
Implementations
  • Shallow Parsing
  • Named Entity Detection interleaved with shallow
    parsing
  • Terminology identification in new domains
  • Temporal expression parsing
  • Privacy policy rules
  • Information extraction from resumes
  • Information extraction from contact center
    telephone calls

18
Future work list
  • Alternate (semi-ambiguous) iterator, useful for
    disambiguator grammars
  • Actor Director
  • Tree-walk iterator for tree representations where
    children are explicitly referenced in features

19
Performance Notes
  • Performance is a function of
  • How grammar is written
  • Optimisation of fst graph (grammar compiler)
  • Optimisation of symbol compiler
  • Optimisation of executor
  • However for the benefit of the curious
  • IBM Software Group (Dublin) optimised the last
    two, and

20
IBM LanguageWare (Dublin) text analysis
performance results
  • The Results
  • Precision for Company Annotations only 0.81
  • Recall for Company Annotations only 0.67
  • Precision for Person Annotations only 0.93
  • Recall for Person Annotations only 0.91
  • Processing time 3.4 seconds
  • These numbers are 10 times faster than the best
    of breed internal reference annotators.
  • The analysis
  • - AFST rules and FST dictionary
  • - 26 rules, 7 dictionaries (things like first
    names, indicators like Corp. etc)
  • - creating Person and Company annotations
  • The Test
  • - test set Enron
  • - 924 files
  • - (4.5Mb)

21
Perpetrators erResponsible parties
  • Bran Boguraev
  • Mary Neff
  • Bran Lambov
  • D.J. McCloskey
  • Thilo Goetz
  • Thomas Hampp
  • Oliver Suhre
  • Roy Byrd
  • Herb Chong
  • Albert Eskenazi
  • Paul Kaye
  • Son Bao Pham
  • Lokesh Shresta
  • Max Silberztein

22
For more on AFst and tools --
  • Tomorrow, 1225 in Fez 1
  • A Development Environment for Configurable
    Meta-Annotators in a Pipelined NLP Environment
  • Youssef Drissi, Branimir Boguraev, David
    Ferrucci, Paul Keyser, and Anthony Levas
Write a Comment
User Comments (0)
About PowerShow.com