GLARF-ULA: Working Towards Usability - PowerPoint PPT Presentation

About This Presentation
Title:

GLARF-ULA: Working Towards Usability

Description:

Participation in CONLL. Chinese GLARF. GLARF-ULA: ULA08 Workshop ... Papers written by happy users. A Shared Task like CONLL. What need does GLARF-ULA fill? ... – PowerPoint PPT presentation

Number of Views:16
Avg rating:3.0/5.0
Slides: 16
Provided by: verbsCo
Category:

less

Transcript and Presenter's Notes

Title: GLARF-ULA: Working Towards Usability


1
GLARF-ULA Working Towards Usability
Unified Linguistic Annotation Workshop Adam
Meyers New York University March 19, 2008
2
Outline
  • Introduction to the GLARF Approach
  • What is a standard anyway?
  • Improving Distributing Easy to Use Parts
  • Participation in CONLL
  • Chinese GLARF

3
GLARF Approach to ULA
  • A Typed Feature Structure Representation
  • Produces a single-theory analysis
  • Not Reversible
  • GLARF System combines
  • hand-annotation
  • automatically generated annotation
  • combination of manual/automatic annotation

4
Example Sentence
  • Meanwhile, they made three bids.
  • Offset of first character 123
  • Meanwhile ARG1 previous S, ARG2 current S
  • PDTB
  • made ARG0 they, ARG1 three bids
  • PropBank
  • bids ARG0 they, Support made
  • NomBank
  • (S (ADVP (RB Meanwhile)) (, ,)
  • (NP (PRP they))?
  • (VP (VBN made)?
  • (NP (CD three)?
  • (NNS bids))) (. .))?
  • Penn Treebank

5
GLARF TFS
  • (S (ADV (ADVP (HEAD (ADVX (HEAD (RB Meanwhile
    0))?

  • (P-ARG1 (S (EC-TYPE PB) (INDEX 00))

  • (P-ARG2 (S (EC-TYPE PB) (INDEX 0))))?
  • (POINTER 01))))?
  • (PUNCTUATION (, , 1))?
  • (SBJ (NP (HEAD (PRP they 2)) (INDEX 1) (POINTER
    21))))?
  • (PRD (VP (HEAD (VX (HEAD (VBN made 3))?

  • (P-ARG0 (NP (EC-TYPE PB) (INDEX 1)))

  • (P-ARG1 (NP (EC-TYPE PB) (INDEX 3)))?

  • (INDEX 2)))?
  • (OBJ (NP (T-POS (CD three 4))?
  • (HEAD
    (NX (HEAD (NNS bids 5))?

  • (P-ARG0-Supp (NP (EC-TYPE PB) (INDEX
    1)))?

  • (Support (VX (EC-TYPE PB) (INDEX
    2)))))?
  • (INDEX
    3)?
  • (POINTER
    41)))?
  • (POINTER 31)))?
  • (PUNCTUATION (. . 6))?
  • (POINTER 02) (TREE-NUM 1) (INDEX 0)?

6
What is a Standard Anyway?
  • Wide Usage (VHS/Betamax, cassette/8-track,
    Windows/MAC)?
  • Quality, the first of its kind, etc.
  • Papers written by happy users
  • A Shared Task like CONLL
  • What need does GLARF-ULA fill?
  • Unified Detailed Linguistic Annotation
  • German, Czech, Japanese, but not English
  • A la carte analyses with compatible encodings
    insufficient
  • Because it is desirable to have common
  • tokenization, phrase boundaries, POS tags, etc.
  • obvious to GALE participants (part of SRI team
    uses GLARF)?
  • Working toward a standard, not necessarily GLARF
  • Make the useful pieces available
  • Contribute to the CONLL representation

7
Parts of GLARF-ULA that non-GLARF-users Want
  • Last Years ULA meeting
  • Tokenization splits around hyphens
  • Based on NomBank and NE tags
  • Offset information
  • Possibly POS correction (if accurate)?
  • CONLL
  • Tokenization splits around hyphens
  • All real words (not just NomBank)
  • NE tags
  • NP-internal relations
  • apposition, relative, possessive, etc.
  • NE modification relations
  • POST-HON, TITLE

8
CONLL Splitting at Hyphens/Slashes 1
  • Split tokens
  • Assign POS tags
  • Automatic results for sample of 179 tokens
  • 153 correct (85.5), 14 incorrect (7.8), 12
    unclear (6.7)?
  • Decimal token numbers
  • (VP (NP (NNP New 6)
  • (NNP York 7.1)))?
  • (HYPH 7.2)
  • (VBN based 7.3))?

9
CONLL Splitting at Hyphens/Slashes 2
  • Split Segments iff
  • COMLEX words, numbers, prefixes (from a list)?
  • Required by BBN NE tags (we made a gazatteer)?
  • Relations from GLARF
  • Conjunction cases Japan-U.S. agreement
  • Everything else (distinguish HMOD/HEAD)?
  • GLARF distinguishes them further

10
NP-internal Relations
  • NP internal relations used for CONLL
  • Title Mr. John Smith
  • Post-Hon John Smith Jr. III, Inc., Ph.D., etc.
  • APPOsite John Smith, president of the U.S.
  • SUFFIX John 's
  • Near 100 accuracy for small sample
  • 45 correct, 2 unclear
  • All NP GLARF Roles
  • RELATIVE, COMP, A-POS, T-POS, Q-POS, etc.
  • 224 correct (83.9), 32 wrong (12), 11 unclear
    (4.1)?

11
Automatic GLARF for ULA-OANC-1
  • Out of the Box with Charniak parser
  • Role Precision for 1st 5 sentences in Kaufman
  • NomBank 8/10 (80)?
  • PropBank 25/31 (81)?
  • PDTB 7/11 (64)
  • Tune Charniak results
  • Run/Tune on Treebank (and other hand data)?
  • Process CONLL style
  • Use for LAW 2 WG task

12
Chinese TreeBank and PropBank
Arg0
NP
VV
NN
AD
?
??
13
Chinese GLARF
(IP (SBJ (NP (HEAD (NN??)) (INDEX 1))? (PRD
(VP (ADV (ADVP (HEAD (AD ??))))?
(HEAD (VX (HEAD (VV ??))?
(P-ARG0 (NP (EC-TYPE PB)

(INDEX 1)))?
(P-ARG1 (NP (EC-TYPE PB)

(INDEX 2)))))? (OBJ (NP
(T-POS (DP (HEAD (DT ?)))?
(HEAD (NX (HEAD (Nn?)))?
(INDEX 2)))))?
14
Summary
  • Helped build a CONLL standard
  • Adopting the useful parts of GLARF
  • Interoperability
  • Automatic GLARF
  • Input Annotation (hand or automatic)?
  • Extend to Chinese (and Japanese)

15
Future for GLARF-ULA
  • NE-like integration, e.g. TIMEX, Opinion
  • Structure-changing vs. match dependency head
  • NEs with markable Nom/PropBank structure
  • PDTB and NomBank overlap occasionally
  • For example, As a result, etc.
  • adjudication procedures needed
  • TimeML relations, NonOvert PDTB
  • More CONLL integration
Write a Comment
User Comments (0)
About PowerShow.com