Matthew W' Bilotti - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

Matthew W' Bilotti

Description:

MXTerminator, RASP Tokenization. POS Tagging. CLAWS, Brill. Syntax. LINK, Stanford, RASP. Semantics. ASSERT. Annotators, continued. Named Entities. BBN Identifinder ... – PowerPoint PPT presentation

Number of Views:35
Avg rating:3.0/5.0
Slides: 17
Provided by: guadalaja
Category:
Tags: bilotti | matthew | rasp

less

Transcript and Presenter's Notes

Title: Matthew W' Bilotti


1
Annotations Database and Text Annotator An
Architectural Overview
Matthew W. Bilotti mbilotti_at_cs.cmu.edu February
22, 2005
2
Outline
  • Annotations Database
  • Text Annotator
  • Next Steps
  • Demonstration
  • Questions welcome throughout!

3
Annotations Database
  • MySQL Data Model
  • Client-side Java API

DB
server
4
Data Model
  • Documents
  • Spans
  • Tags
  • Frames
  • Roles






5
Annotations Database API
  • Corpus a Collection of Documents
  • supports addition, removal, iteration
  • Document contains Tags, Roles and Frames.
  • supports update, destruction, iteration over
    contents
  • Tag, Frame and Role individual Annotations
  • support update, destruction, equality, natural
    ordering

6
Filter API
  • Generalized, hierarchical Comparator
  • Encapsulates a Boolean constraint
  • Filtering does this Object satisfy the
    constraint?
  • Comparison/ordering which of these two Objects
    better satisfies the constraint?
  • Example finding all Documents published within
    some range of dates

7
Text Annotator
  • Platform
  • Manages Annotators
  • Dependencies
  • Pipelines work
  • Annotators
  • Reads document text and/or existing Annotations
  • Writes Annotations to Database

Text Annotator Platform
...
Annotators 1 through n ...
8
Annotators
  • Segmentation/Tokenization
  • MXTerminator, RASP Tokenization
  • POS Tagging
  • CLAWS, Brill
  • Syntax
  • LINK, Stanford, RASP
  • Semantics
  • ASSERT

9
Annotators, continued
  • Named Entities
  • BBN Identifinder
  • Special-purpose Annotators
  • MinorThird Annotators for Numerical Expressions
    and Temporal Expressions, Annotators for marking
    up text using Wordnet, FrameNet, VerbNet and
    PropBank data, and tools for abstracting
    higher-level content from lower-level Annotations

10
Annotator Details Link Parser
  • Example Prithvi Subcorpus Sentence
  • India has stockpiled enriched uranium to make and
    deploy 15,000 WMDs within weeks.

--------------MVi-------------
--- -----------Os-------
--- --- --Ss-----PP----
----A---- --I-
India has.v
stockpiled!.v enriched.v uranium.n to
make.v -----------MVp-------- ---------O--------
---- -Dmcn --Jp--
and deploy.v 15,000
WMDs within weeks.n
Ss Subject, O(s) Object PP past
participle MV(i/p) modifying phrase A
prenominal adjective I infinitive, Jp
preposition Dmcn cardinal number
11
Annotator Details Stanford
  • Same example sentence
  • "India has stockpiled enriched uranium to make
    and deploy 15,000 WMDs within weeks."

(S (NP (NNP India)) (VP (VBZ has) (VP (VBN
stockpiled) (NP (JJ enriched) (NN uranium))
(S (VP (TO to) (VP (VB make)
(CC and) (VB deploy)
(NP (CD 15,000) (NNS WMDs)) (PP (IN
within) (NP (NNS weeks))))))) (. .))
12
Annotator Details ASSERT
  • Same example sentence
  • "India has stockpiled enriched uranium to make
    and deploy 15,000 WMDs within weeks."

ARG0 India has TARGET stockpiled ARG1
enriched uranium to make and deploy 15,000 WMDs
within weeks ARG0 India has stockpiled
enriched uranium to TARGET make and deploy
ARG1 15,000 WMDs ARGM-TMP within
weeks ARG0 India has stockpiled enriched
uranium to make and TARGET deploy ARG1 15,000
WMDs ARGM-TMP within weeks
13
Annotator Status
14
Next Steps Data Model
  • Generalize Higher-Level Annotations Structure
  • Nested Tags can implement Role, Frame, and others
  • Multiple values for Tags
  • Attribute-value pairs for Documents
  • Multiple text passages per Document

15
Next Steps API
  • Tag-space navigation operations
  • familial relationships
  • enclosure relationships.
  • Convenience operations
  • random access for documents
  • building on Filter API
  • Performance enhancing operations
  • batch-mode Tag insertion

16
Next Steps Text Annotator
  • Continue wrapping Annotators for use with the
    Platform
  • Integrate more tightly with Annotations Database
  • Stress test, and improve the efficiency of the TA
    Platform.
Write a Comment
User Comments (0)
About PowerShow.com