Title: GLARF-ULA: Working Towards Usability
1GLARF-ULA Working Towards Usability
Unified Linguistic Annotation Workshop Adam
Meyers New York University March 19, 2008
2Outline
- Introduction to the GLARF Approach
- What is a standard anyway?
- Improving Distributing Easy to Use Parts
- Participation in CONLL
- Chinese GLARF
3GLARF Approach to ULA
- A Typed Feature Structure Representation
- Produces a single-theory analysis
- Not Reversible
- GLARF System combines
- hand-annotation
- automatically generated annotation
- combination of manual/automatic annotation
4Example Sentence
- Meanwhile, they made three bids.
- Offset of first character 123
- Meanwhile ARG1 previous S, ARG2 current S
- PDTB
- made ARG0 they, ARG1 three bids
- PropBank
- bids ARG0 they, Support made
- NomBank
- (S (ADVP (RB Meanwhile)) (, ,)
- (NP (PRP they))?
- (VP (VBN made)?
- (NP (CD three)?
- (NNS bids))) (. .))?
- Penn Treebank
5GLARF TFS
- (S (ADV (ADVP (HEAD (ADVX (HEAD (RB Meanwhile
0))? -
(P-ARG1 (S (EC-TYPE PB) (INDEX 00)) -
(P-ARG2 (S (EC-TYPE PB) (INDEX 0))))? - (POINTER 01))))?
- (PUNCTUATION (, , 1))?
- (SBJ (NP (HEAD (PRP they 2)) (INDEX 1) (POINTER
21))))? - (PRD (VP (HEAD (VX (HEAD (VBN made 3))?
-
(P-ARG0 (NP (EC-TYPE PB) (INDEX 1)))
-
(P-ARG1 (NP (EC-TYPE PB) (INDEX 3)))? -
(INDEX 2)))? - (OBJ (NP (T-POS (CD three 4))?
- (HEAD
(NX (HEAD (NNS bids 5))? -
(P-ARG0-Supp (NP (EC-TYPE PB) (INDEX
1)))? -
(Support (VX (EC-TYPE PB) (INDEX
2)))))? - (INDEX
3)? - (POINTER
41)))? - (POINTER 31)))?
- (PUNCTUATION (. . 6))?
- (POINTER 02) (TREE-NUM 1) (INDEX 0)?
6What is a Standard Anyway?
- Wide Usage (VHS/Betamax, cassette/8-track,
Windows/MAC)? - Quality, the first of its kind, etc.
- Papers written by happy users
- A Shared Task like CONLL
- What need does GLARF-ULA fill?
- Unified Detailed Linguistic Annotation
- German, Czech, Japanese, but not English
- A la carte analyses with compatible encodings
insufficient - Because it is desirable to have common
- tokenization, phrase boundaries, POS tags, etc.
- obvious to GALE participants (part of SRI team
uses GLARF)? - Working toward a standard, not necessarily GLARF
- Make the useful pieces available
- Contribute to the CONLL representation
7Parts of GLARF-ULA that non-GLARF-users Want
- Last Years ULA meeting
- Tokenization splits around hyphens
- Based on NomBank and NE tags
- Offset information
- Possibly POS correction (if accurate)?
- CONLL
- Tokenization splits around hyphens
- All real words (not just NomBank)
- NE tags
- NP-internal relations
- apposition, relative, possessive, etc.
- NE modification relations
- POST-HON, TITLE
8CONLL Splitting at Hyphens/Slashes 1
- Split tokens
- Assign POS tags
- Automatic results for sample of 179 tokens
- 153 correct (85.5), 14 incorrect (7.8), 12
unclear (6.7)? - Decimal token numbers
- (VP (NP (NNP New 6)
- (NNP York 7.1)))?
- (HYPH 7.2)
- (VBN based 7.3))?
9CONLL Splitting at Hyphens/Slashes 2
- Split Segments iff
- COMLEX words, numbers, prefixes (from a list)?
- Required by BBN NE tags (we made a gazatteer)?
- Relations from GLARF
- Conjunction cases Japan-U.S. agreement
- Everything else (distinguish HMOD/HEAD)?
- GLARF distinguishes them further
10NP-internal Relations
- NP internal relations used for CONLL
- Title Mr. John Smith
- Post-Hon John Smith Jr. III, Inc., Ph.D., etc.
- APPOsite John Smith, president of the U.S.
- SUFFIX John 's
- Near 100 accuracy for small sample
- 45 correct, 2 unclear
- All NP GLARF Roles
- RELATIVE, COMP, A-POS, T-POS, Q-POS, etc.
- 224 correct (83.9), 32 wrong (12), 11 unclear
(4.1)?
11Automatic GLARF for ULA-OANC-1
- Out of the Box with Charniak parser
- Role Precision for 1st 5 sentences in Kaufman
- NomBank 8/10 (80)?
- PropBank 25/31 (81)?
- PDTB 7/11 (64)
- Tune Charniak results
- Run/Tune on Treebank (and other hand data)?
- Process CONLL style
- Use for LAW 2 WG task
12Chinese TreeBank and PropBank
Arg0
NP
VV
NN
AD
?
??
13Chinese GLARF
(IP (SBJ (NP (HEAD (NN??)) (INDEX 1))? (PRD
(VP (ADV (ADVP (HEAD (AD ??))))?
(HEAD (VX (HEAD (VV ??))?
(P-ARG0 (NP (EC-TYPE PB)
(INDEX 1)))?
(P-ARG1 (NP (EC-TYPE PB)
(INDEX 2)))))? (OBJ (NP
(T-POS (DP (HEAD (DT ?)))?
(HEAD (NX (HEAD (Nn?)))?
(INDEX 2)))))?
14Summary
- Helped build a CONLL standard
- Adopting the useful parts of GLARF
- Interoperability
- Automatic GLARF
- Input Annotation (hand or automatic)?
- Extend to Chinese (and Japanese)
15Future for GLARF-ULA
- NE-like integration, e.g. TIMEX, Opinion
- Structure-changing vs. match dependency head
- NEs with markable Nom/PropBank structure
- PDTB and NomBank overlap occasionally
- For example, As a result, etc.
- adjudication procedures needed
- TimeML relations, NonOvert PDTB
- More CONLL integration