Construct State Modification in the Arabic Treebank - PowerPoint PPT Presentation

About This Presentation
Title:

Construct State Modification in the Arabic Treebank

Description:

Construct State (iDAfa ?????) in Arabic. What it is. The problem of attachment within an iDAfa ... Construct State (iDAfa) 2 words grouped tightly together ... – PowerPoint PPT presentation

Number of Views:114
Avg rating:3.0/5.0
Slides: 12
Provided by: skul
Category:

less

Transcript and Presenter's Notes

Title: Construct State Modification in the Arabic Treebank


1
Construct State Modification in the Arabic
Treebank
  • Ryan Gabbard and Seth Kulick
  • University of Pennsylvania

2
Outline
  • Construct State (iDAfa ?????) in Arabic
  • What it is
  • The problem of attachment within an iDAfa
  • A Machine Learning Approach
  • Definition, Features, Results
  • Conclusion and Future Work

3
Construct State (iDAfa)
  • 2 words grouped tightly together
  • Like English compound or possessive
  • NOUN with NP complement (recursive)

(NP awAriE streets (NP madiynap
city (NP luwnog byt Long Beach)))
????? ????? ???? ????
4
Construct State (iDAfa)
(NP awAriE streets (NP (NP madiynap
city (NP luwnog byt Long
Beach)) (PP fiy in
(NP wilAyap state
(NP kAliyfuwrniyA)))))
????? ????? ???? ???? ?? ????? ??????????
  • (Multiple) Modification at any level
  • Modifiers stacked up at end
  • No clear pattern of attachment level

5
Restriction on PP attachment in PTB
  • Multiple PP modifiers at same level

Allowed Not Allowed(NP (NP
) (NP (NP (NP ) (PP )
(PP )) (PP )
(PP ))
  • Parser can learn that PPs attach to base
    (non-recursive) NPs (Collins, 99)
  • Not true for ATB, because of the iDAfa.

6
Modification of non-base NPs
(NP awAriE streets (NP (NP madiynap
city (NP luwnog byt Long
Beach)) (PP fiy in
(NP wilAyap state
(NP kAliyfuwrniyA)))))

(NP (NP streets) (PP of (NP (NP the city)
(PP of (NP Long Beach))
(PP in (NP (NP the state)
(PP of
California)))))
7
Problem Summary and Approach
  • PP, ADJP attachment harder in ATB
  • Cannot rely on base NP constraint
  • PP attachment to a non-base NP nearly
    non-existent in PTB
  • 16th most frequent dependency in ATB
  • PP attachment worse for ATB(Kulick,Gabbard,Marcus
    , 2006)
  • Treat attachment within iDAfa as problem
    independent of parser

8
The Task as a Machine Learning Problem
  • Definition
  • Instances are attachmentsExtract idafas and
    modifiers from corpus
  • Labels are level to attach at
  • Constraint No attachments crossing levels
  • Technique
  • MaxEnt model to label attachments
  • Dynamic programming to enforce constraint

9
Machine Learning Features
  • Baseline Only level of attachment
  • Non-Baseline Features
  • AttSym POS tag or nonterminal label of modifier
  • Lex (noun being modifed, head word of modifier)
  • TotDepth (baseline total depth of idafa
    AttSym)
  • Simple GenAgr - (AttSym gender suffixes of the
    words corresponding to lex)
  • Full GenAgr Simple GenAgr also with number
    suffixes

10
Machine Learning Results
Features Accuracy
Base 39.7
BaseAttSym 76.1
BaseLex 58.4
BaseLexAttSym 79.9
BaseLexAttSymTotDepth 78.7
BaseLexAttSymGenAgr 79.3
11
Future Work
  • For ML problem in this talk
  • More feature investigation
  • Improved analysis of subclasses of iDAfas.
  • In context of real system
  • Analysis of iDAfa and attachment accuracy in
    current parsing
  • Get attachment problem out of parserUse current
    work as module after parsing
Write a Comment
User Comments (0)
About PowerShow.com