Interlingua Annotation - PowerPoint PPT Presentation

About This Presentation
Title:

Interlingua Annotation

Description:

... guidelines (methodology, manual) for annotating language ... Research: develop annotation scheme(s), methodology, manuals. Levels (reminder): (deep syntax) ... – PowerPoint PPT presentation

Number of Views:80
Avg rating:3.0/5.0
Slides: 15
Provided by: OwenR
Category:

less

Transcript and Presenter's Notes

Title: Interlingua Annotation


1
Interlingua Annotation
  • Owen Rambow
  • Advaith Siddharthan
  • Kathleen McKeown
  • rambow_at_cs.columbia.edu

2
Goal
  • Determine feasible deep semantic,
    language-independent annotation (interlingua)
    for text
  • Different from PropBank, FrameNet, WordNet these
    projects are language-dependent

3
Expected Results
  • Annotation guidelines (methodology, manual) for
    annotating language-independent meaning
    representation on texts in 7 languages
  • Methodology for porting to new languages
  • Annotated corpora

4
Methodology
  • Use source-language texts and multiple
    translations into English
  • Develop successively more language-independent
    levels of representation
  • (deep syntax)
  • language-specific lexical disambiguation and
    thematic structure (agent, theme, )
  • language-independent representation

5
Methodology (2)
  • Six sites (CMU, Columbia, ISI, Mitre, NMSU, UMd)
  • Each site has one language Columbia Hindi
  • Closer cooperation Columbia-UMd on Arabic and
    Hindi
  • Division of tasks and expertise among sites

6
Methodology (3)
  • Use annotators from beginning to test
    inter-annotator agreement
  • Columbia have hired a native Hindi annotator
    (near-native English) and an English-language
    annotator

7
Research Issues
  • Research develop annotation scheme(s),
    methodology, manuals
  • Levels (reminder)
  • (deep syntax)
  • language-specific lexical disambiguation
  • language-independent representation
  • Questions
  • Which levels do we annotate explicitly?
  • What is included where?
  • How do we annotate? Using which tools?

8
Timeline
  • January develop language-specific disambiguation
  • February-March annotate, measure
  • April-June develop language-independent
    annotation
  • July-August annotate, measure
  • Year 2 review results, adjust annotation scheme
  • Year 3 annotate

9
Arabic Dialects
  • Owen Rambow
  • (Nizar Habbash)
  • rambow_at_cs.columbia.edu

10
Goal
  • Investigate representation of linguistic
    resources for closely related languages/dialects
  • Example Arabic
  • Automatically derive NLP tools for cross-dialect
    MT

11
Note on Arabic
  • Interest Only one written dialect Modern
    Standard Arabic (MSA), rarely spoken
    spontaneously
  • Many spoken dialects, almost never written
  • Dialects function of geography, urban/rural,
    Bedouin/sedentary, sex, religion,
  • Code switching (mainly dialect-MSA) several
    linguistic systems in same sentence
  • Challenge for traditional NLP approaches!

12
Expected Results
  • Representation of phonology, lexicon, morphology,
    and syntax for Modern Standard Arabic and
    Egyptian Colloquial Arabic
  • Tools for converting between MSA and ECA
  • Demonstration of tools in several domains (ECA
    speech recognition, ECA -gt English translation)

13
Methodology
  • Use existing scholarly resources to compile
    sound-change rules, morphological
    representations, syntactic representations
  • Use native speakers to validate, and augment
    lexicon
  • Develop representation
  • Develop automatic compilation of NLP tools

14
Timeline
  • Sep-Dec start compiling sound change rules,
    morphological rules, syntax
  • Jan-April develop representations for sound
    change rules, morphology
  • Jan-Apr develop conversion rules
  • May-August work on ECA speech recognition
    application
  • Note also working on MSA syntax
  • Year 2 extend to syntax, extend to second
    dialect (Palestinian? Iraqi?)
Write a Comment
User Comments (0)
About PowerShow.com