Title: Matthew W' Bilotti
1Annotations Database and Text Annotator An
Architectural Overview
Matthew W. Bilotti mbilotti_at_cs.cmu.edu February
22, 2005
2Outline
- Annotations Database
- Text Annotator
- Next Steps
- Demonstration
- Questions welcome throughout!
3Annotations Database
- MySQL Data Model
- Client-side Java API
DB
server
4Data Model
- Documents
- Spans
- Tags
- Frames
- Roles
5Annotations Database API
- Corpus a Collection of Documents
- supports addition, removal, iteration
- Document contains Tags, Roles and Frames.
- supports update, destruction, iteration over
contents - Tag, Frame and Role individual Annotations
- support update, destruction, equality, natural
ordering
6Filter API
- Generalized, hierarchical Comparator
- Encapsulates a Boolean constraint
- Filtering does this Object satisfy the
constraint? - Comparison/ordering which of these two Objects
better satisfies the constraint? - Example finding all Documents published within
some range of dates
7Text Annotator
- Platform
- Manages Annotators
- Dependencies
- Pipelines work
- Annotators
- Reads document text and/or existing Annotations
- Writes Annotations to Database
Text Annotator Platform
...
Annotators 1 through n ...
8Annotators
- Segmentation/Tokenization
- MXTerminator, RASP Tokenization
- POS Tagging
- CLAWS, Brill
- Syntax
- LINK, Stanford, RASP
- Semantics
- ASSERT
9Annotators, continued
- Named Entities
- BBN Identifinder
- Special-purpose Annotators
- MinorThird Annotators for Numerical Expressions
and Temporal Expressions, Annotators for marking
up text using Wordnet, FrameNet, VerbNet and
PropBank data, and tools for abstracting
higher-level content from lower-level Annotations
10Annotator Details Link Parser
- Example Prithvi Subcorpus Sentence
- India has stockpiled enriched uranium to make and
deploy 15,000 WMDs within weeks.
--------------MVi-------------
--- -----------Os-------
--- --- --Ss-----PP----
----A---- --I-
India has.v
stockpiled!.v enriched.v uranium.n to
make.v -----------MVp-------- ---------O--------
---- -Dmcn --Jp--
and deploy.v 15,000
WMDs within weeks.n
Ss Subject, O(s) Object PP past
participle MV(i/p) modifying phrase A
prenominal adjective I infinitive, Jp
preposition Dmcn cardinal number
11Annotator Details Stanford
- Same example sentence
- "India has stockpiled enriched uranium to make
and deploy 15,000 WMDs within weeks."
(S (NP (NNP India)) (VP (VBZ has) (VP (VBN
stockpiled) (NP (JJ enriched) (NN uranium))
(S (VP (TO to) (VP (VB make)
(CC and) (VB deploy)
(NP (CD 15,000) (NNS WMDs)) (PP (IN
within) (NP (NNS weeks))))))) (. .))
12Annotator Details ASSERT
- Same example sentence
- "India has stockpiled enriched uranium to make
and deploy 15,000 WMDs within weeks."
ARG0 India has TARGET stockpiled ARG1
enriched uranium to make and deploy 15,000 WMDs
within weeks ARG0 India has stockpiled
enriched uranium to TARGET make and deploy
ARG1 15,000 WMDs ARGM-TMP within
weeks ARG0 India has stockpiled enriched
uranium to make and TARGET deploy ARG1 15,000
WMDs ARGM-TMP within weeks
13Annotator Status
14Next Steps Data Model
- Generalize Higher-Level Annotations Structure
- Nested Tags can implement Role, Frame, and others
- Multiple values for Tags
- Attribute-value pairs for Documents
- Multiple text passages per Document
15Next Steps API
- Tag-space navigation operations
- familial relationships
- enclosure relationships.
- Convenience operations
- random access for documents
- building on Filter API
- Performance enhancing operations
- batch-mode Tag insertion
16Next Steps Text Annotator
- Continue wrapping Annotators for use with the
Platform - Integrate more tightly with Annotations Database
- Stress test, and improve the efficiency of the TA
Platform.