Parallel Reverse Treebanks for the Discovery of MorphoSyntactic Markings - PowerPoint PPT Presentation

1 / 42
About This Presentation
Title:

Parallel Reverse Treebanks for the Discovery of MorphoSyntactic Markings

Description:

Add an English or Spanish sentence (plus context notes) to express the meaning ... set of feature structures with English sentences has been delivered to the ... – PowerPoint PPT presentation

Number of Views:30
Avg rating:3.0/5.0
Slides: 43
Provided by: lsl
Category:

less

Transcript and Presenter's Notes

Title: Parallel Reverse Treebanks for the Discovery of MorphoSyntactic Markings


1
Parallel Reverse Treebanks for the Discovery of
Morpho-Syntactic Markings
  • Lori Levin
  • Robert Frederking
  • Alison Alvarez
  • Language Technologies Institute
  • School of Computer Science
  • Carnegie Mellon University

Jeff Good Department of Linguistics Max Planck
Institute for Evolutionary Anthropology
2
Reverse Treebank (RTB)
  • What?
  • Create the syntactic structures first
  • Then add sentences
  • Why?
  • To elicit data from speakers of less commonly
    taught languages
  • Decide what meaning we want to elicit
  • Represent the meaning in a feature structure
  • Add an English or Spanish sentence (plus context
    notes) to express the meaning
  • Ask the informant to translate it

3
Bengali Example
  • srcsent The large bus to the post office broke
    down.
  • context
  • tgtsent
  • ((actor ((modifier ((mod-role mod-descriptor)
  • (mod-role role-loc-general-to)))
  • (np-identifiability identifiable)(np-specificity
    specific)
  • (np-biological-gender bio-gender-n/a)(np-animacy
    anim-inanimate)
  • (np-person person-third)(np-function
    fn-actor)(np-general-type common-noun-type)(np-num
    ber num-sg)(np-pronoun-exclusivity
    inclusivity-n/a)(np-pronoun-antecedent
    antecedent-n/a)(np-distance distance-neutral)))
  • (c-general-type declarative-clause)(c-my-causer-in
    tentionality intentionality-n/a)(c-comparison-type
    comparison-n/a)(c-relative-tense
    relative-n/a)(c-our-boundary boundary-n/a)(c-compa
    rator-function comparator-n/a)(c-causee-control
    control-n/a)(c-our-situations situations-n/a)(c-co
    mparand-type comparand-n/a)(c-causation-directness
    directness-n/a)(c-source source-neutral)(c-causee
    -volitionality volition-n/a)(c-assertiveness
    assertiveness-neutral)(c-solidarity
    solidarity-neutral)(c-polarity polarity-positive)(
    c-v-grammatical-aspect gram-aspect-neutral)(c-adju
    nct-clause-type adjunct-clause-type-n/a)(c-v-phase
    -aspect phase-aspect-neutral)(c-v-lexical-aspect
    activity-accomplishment)(c-secondary-type
    secondary-neutral)(c-event-modality
    event-modality-none)(c-function
    fn-main-clause)(c-minor-type minor-n/a)(c-copula-t
    ype copula-n/a)(c-v-absolute-tense
    past)(c-power-relationship power-peer)(c-our-share
    d-subject shared-subject-n/a)(c-question-gap
    gap-n/a))

4
Outline
  • Background
  • The AVENUE Machine Translation System
  • Contents of the RTB
  • An inventory of grammatical meanings
  • Languages that have been elicited
  • Tools for RTB creation
  • Future work
  • Evaluation
  • Navigation

5
AVENUE Machine Translation System
SL the old man, TL ha-ish ha-zaqen NPNP
DET ADJ N -gt DET N DET ADJ ( (X1Y1) (X1Y3)
(X2Y4) (X3Y2) ((X1 AGR) 3-SING) ((X1 DEF
DEF) ((X3 AGR) 3-SING) ((X3 COUNT)
) ((Y1 DEF) DEF) ((Y3 DEF) DEF) ((Y2 AGR)
3-SING) ((Y2 GENDER) (Y4 GENDER)) )
  • Type information
  • Synchronous Context Free Rules
  • Alignments
  • x-side constraints
  • y-side constraints
  • xy-constraints,
  • e.g. ((Y1 AGR) (X1 AGR))

Jaime Carbonell (PI), Alon Lavie (Co-PI), Lori
Levin (Co-PI) Rule learning Katharina Probst
6
AVENUE
  • Rules can be written by hand or learned
    automatically.
  • Hybrid
  • Rule-based transfer
  • Statistical decoder
  • Multi-engine combinations with SMT and EBMT

7
AVENUE systems(Small and experimental, but
tested on unseen data)
  • Hebrew-to-English
  • Alon Lavie, Shuly Wintner, Katharina Probst
  • Hand-written and automatically learned
  • Automatic rules trained on 120 sentences perform
    slightly better than about 20 hand-written rules.
  • Hindi-to-English
  • Lavie, Peterson, Probst, Levin, Font, Cohen,
    Monson
  • Automatically learned
  • Performs better than SMT when training data is
    limited to 50K words

8
AVENUE systems(Small and experimental, but
tested on unseen data)
  • English-to-Spanish
  • Ariadna Font Llitjos
  • Hand-written, automatically corrected
  • Mapudungun-to-Spanish
  • Roberto Aranovich and Christian Monson
  • Hand-written
  • Dutch-to-English
  • Simon Zwarts
  • Hand-written

9
Elicitation
  • Get data from someone who is
  • Bilingual
  • Literate
  • Not experienced with linguistics

10
English-Hindi Example
Elicitation Tool Erik Peterson
11
English-Chinese Example
12
English-Arabic Example
13
Elicitation
  • srcsent Tú caíste
  • tgtsent eymi ütrünagimi
  • aligned ((1,1),(2,2))
  • context tú Juan masculino, 2a persona del
    singular
  • comment You (John) fell
  • srcsent Tú estás cayendo
  • tgtsent eymi petu ütrünagimi
  • aligned ((1,1),(2 3,2 3))
  • context tú Juan masculino, 2a persona del
    singular
  • comment You (John) are falling
  • srcsent Tú caíste
  • tgtsent eymi ütrunagimi
  • aligned ((1,1),(2,2))
  • context tú María femenino, 2a persona del
    singular
  • comment You (Mary) fell

14
Outline
  • Background
  • The AVENUE Machine Translation System
  • Contents of the RTB
  • An inventory of grammatical meanings
  • Languages that have been elicited
  • Tools for RTB creation
  • Future work
  • Evaluation
  • Navigation

15
Size of RTB
  • Around 3200 sentences
  • 20K words

16
Languages
  • The set of feature structures with English
    sentences has been delivered to the Linguistic
    Data Consortium as part of the Reflex program.
  • Translated (by LDC) into
  • Thai
  • Bengali
  • Plans to translate into
  • Seven strategic languages per year for five
    years.
  • As one small part of a language pack (BLARK) for
    each language.

17
Languages
  • Feature structures are being reverse annotated in
    Spanish at New Mexico State University (Helmreich
    and Cowie)
  • Plans to translate into Guarani
  • Reverse annotation into Portuguese in Brazil
    (Marcello Modesto)
  • Plans to translate into Karitiana
  • 200 speakers
  • Plans to translate into Inupiaq (Kaplan and
    MacLean)

18
Previous Elicitation Work
  • Pilot corpus
  • Around 900 sentences
  • No feature structures
  • Mapudungun
  • Two partial translations
  • Quechua
  • Three translations
  • Aymara
  • Seven translations
  • Hebrew
  • Hindi
  • Several translations
  • Dutch

19
Sample clause level
  • Mary is writing a book for John.
  • Who let him eat the sandwich?
  • Who had the machine crush the car?
  • They did not make the policeman run.
  • Mary had not blinked.
  • The policewoman was willing to chase the boy.
  • Our brothers did not destroy files.
  • He said that there is not a manual.
  • The teacher who wrote a textbook left.
  • The policeman chased the man who was a thief.
  • Mary began to work.
  • Tense, aspect, transitivity
  • Questions, causation and permission
  • Interaction of lexical and grammatical aspect
  • Volitionality
  • Embedded clauses and sequence of tense
  • Relative clauses
  • Phase aspect

20
Sample noun phrase level
  • The man quit in November.
  • The man works in the afternoon.
  • The balloon floated over the library.
  • The man walked over the platform.
  • The man came out from among the group of boys.
  • The long weekly meeting ended.
  • The large bus to the post office broke down.
  • The second man laughed.
  • All five boys laughed.
  • Temporal and locative meanings
  • Quantifiers
  • Numbers
  • Combinations of different types of modifers
  • My book
  • Possession, definiteness
  • A book of mine
  • Possession, indefiniteness

21
Example
  • srcsent The large bus to the post office broke
    down.
  • ((actor ((modifier ((mod-role mod-descriptor)
  • (mod-role role-loc-general-to)))
  • (np-identifiability identifiable)(np-specificity
    specific)
  • (np-biological-gender bio-gender-n/a)(np-animacy
    anim-inanimate)
  • (np-person person-third)(np-function
    fn-actor)(np-general-type common-noun-type)(np-num
    ber num-sg)(np-pronoun-exclusivity
    inclusivity-n/a)(np-pronoun-antecedent
    antecedent-n/a)(np-distance distance-neutral)))
  • (c-general-type declarative-clause)(c-my-causer-in
    tentionality intentionality-n/a)(c-comparison-type
    comparison-n/a)(c-relative-tense
    relative-n/a)(c-our-boundary boundary-n/a)(c-compa
    rator-function comparator-n/a)(c-causee-control
    control-n/a)(c-our-situations situations-n/a)(c-co
    mparand-type comparand-n/a)(c-causation-directness
    directness-n/a)(c-source source-neutral)(c-causee
    -volitionality volition-n/a)(c-assertiveness
    assertiveness-neutral)(c-solidarity
    solidarity-neutral)(c-polarity polarity-positive)(
    c-v-grammatical-aspect gram-aspect-neutral)(c-adju
    nct-clause-type adjunct-clause-type-n/a)(c-v-phase
    -aspect phase-aspect-neutral)(c-v-lexical-aspect
    activity-accomplishment)(c-secondary-type
    secondary-neutral)(c-event-modality
    event-modality-none)(c-function
    fn-main-clause)(c-minor-type minor-n/a)(c-copula-t
    ype copula-n/a)(c-v-absolute-tense
    past)(c-power-relationship power-peer)(c-our-share
    d-subject shared-subject-n/a)(c-question-gap
    gap-n/a))

22
Grammatical meanings vs syntactic categories
  • Features and values are based on a collection of
    grammatical meanings
  • Many of which are similar to the grammatemes of
    the Prague Treebanks

23
Grammatical Meanings
  • YES
  • Semantic Roles
  • Identifiability
  • Specificity
  • Time
  • Before, after, or during time of speech
  • Modality
  • NO
  • Case
  • Voice
  • Determiners
  • Auxiliary verbs

24
Grammatical Meanings
  • YES
  • How is identifiability expressed?
  • Determiner
  • Word order
  • Optional case marker
  • Optional verb agreement
  • How is specificity expressed?
  • How are generics expressed?
  • How are predicate nominals marked?
  • NO
  • How are English determiners translated?
  • The boy cried.
  • The lion is a fierce beast.
  • I ate a sandwich.
  • He is a soldier.
  • Il est soldat.

25
Argument Roles
  • Actor
  • Roughly, deep subject
  • Undergoer
  • Roughly, deep object
  • Predicate and predicatee
  • The woman is the manager.
  • Recipient
  • I gave a book to the students.
  • Beneficiary
  • I made a phone call for Sam.

26
Why not subject and object?
  • Languages use their voice systems for different
    purposes.
  • Mapudungun obligatorily uses an inverse marked
    verb when third person acts on first or second
    person.
  • Verb agrees with undergoer
  • Undergoer exhibits other subjecthood properties
  • Actor may be object.
  • Yes How are actor and undergoer encoded in
    combination with other semantic features like
    adversity (Japanese) and person (Mapudungun)?
  • No How is English voice translated into another
    language?

27
Argument Roles
  • Accompaniment
  • With someone
  • With pleasure
  • Material
  • (out) of wood
  • About 20 more roles
  • From the Lingua checklist Comrie Smith (1977)
  • Many also found in tectogrammatical
    representations
  • Around 80 locative relations
  • From Lingua checklist
  • Many temporal relations

28
Noun Phrase Features
  • Person
  • Number
  • Biological gender
  • Animacy
  • Distance (for deictics)
  • Identifiability
  • Specificity
  • Possession
  • Other semantic roles
  • Accompaniment, material, location, time, etc.
  • Type
  • Proper, common, pronoun
  • Cardinals
  • Ordinals
  • Quantifiers
  • Given and new information
  • Not used yet because of limited context in the
    elicitation tool.

29
Clause level features
  • Tense
  • Aspect
  • Lexical, grammatical, phase
  • Type
  • Declarative, open-q, yes-no-q
  • Function
  • Main, argument, adjunct, relative
  • Source
  • Hearsay, first-hand, sensory, assumed
  • Assertedness
  • Asserted, presupposed, wanted
  • Modality
  • Permission, obligation
  • Internal, external

30
Other clause types(Constructions)
  • Causative
  • Make/let/have someone do something
  • Predication
  • May be expressed with or without an overt copula.
  • Existential
  • There is a problem.
  • Impersonal
  • One doesnt smoke in restaurants in the US.
  • Lament
  • If only I had read the paper.
  • Conditional
  • Comparative
  • Etc.

31
Outline
  • Background
  • The AVENUE Machine Translation System
  • Contents of the RTB
  • An inventory of grammatical meanings
  • Languages that have been elicited
  • Tools for RTB creation
  • Future work
  • Evaluation
  • Navigation

32
Tools for RTB Creation
  • Change the inventory of grammatical meanings
  • Make new RTBs for other purposes

33
The Process
Tense Aspect
Clause-Level
Noun-Phrase
Feature Specification
Modality

List of semantic features and values
Feature Maps which combinations of features and
values are of interest
Feature Structure Sets
Reverse Annotated Feature Structure Sets add
English sentences
The Corpus
Sampling
Smaller Corpus
34
Feature Specification
  • XML Schema
  • XSLT Script
  • Human readable form
  • Feature Causer intentionality
  • Values intentional, unintentional
  • Feature Causee control
  • Values in control, not in control
  • Feature Causee volitionality
  • Values willing, unwilling
  • Feature Causation type
  • Values direct, indirect

35
Feature Combination
  • Person and number interact with tense in many
    fusional languages.
  • In English, tense interacts with questions
  • Will you go?

36
Feature Combination Template
  • ((predicatee
  • ((np-general-type pronoun-type common-noun-type)
  • (np-person person-first person-second
    person-third)
  • (np-number num-sg num-pl)
  • (np-biological-gender bio-gender-male
    bio-gender-female)))
  • (predicate ((np-general-type common-noun-type)
  • (np-person person-third)))
  • (c-copula-type role)
  • (predicate ((adj-general-type quality-type)
  • (c-copula-type attributive)))
  • (predicate ((np-general-type common-noun-type)
  • (np-person person-third)
  • (c-copula-type identity)))
  • (c-secondary-type secondary-copula) (c-polarity
    all)
  • (c-general-type declarative)
  • (c-speech-act sp-act-state)
  • (c-v-grammatical-aspect gram-aspect-neutral)
  • (c-v-lexical-aspect state)
  • (c-v-absolute-tense past present future)
  • (c-v-phase-aspect durative))

Summarizes 288 feature structures, which are
automatically generated.
37
Annotation Tool
  • Feature structure viewer
  • Various views of the feature structure
  • Omit features whose value is not-applicable
  • Group related features together
  • Aspect
  • causation

38
Outline
  • Background
  • The AVENUE Machine Translation System
  • Contents of the RTB
  • An inventory of grammatical meanings
  • Languages that have been elicited
  • Tools for RTB creation
  • Future work
  • Evaluation
  • Navigation

39
Evaluation
  • Current funding has not covered evaluation of the
    RTB.
  • Except for informal observations as it was
    translated into several languages.
  • Does it elicit the meanings it was intended to
    elicit?
  • Informal observation usually
  • Is it useful for machine translation?

40
Hard Problems
  • Reverse annotating meanings that are not
    grammaticalized in English.
  • Evidentiality
  • He stole the bread.
  • Context Translate this as if you do not have
    first hand knowledge. In English, we might say,
    They say that he stole the bread or I hear
    that he stole the bread.

41
Hard Problems
  • Reverse annotating things that can be said in
    several ways in English.
  • Impersonals
  • One doesnt smoke here.
  • You dont smoke here.
  • They dont smoke here.
  • Credit cards arent accepted.
  • Problem in the Reflex corpus because space was
    limited.

42
Navigation
  • Currently, feature combinations are specified by
    a human.
  • Plan to work in active learning mode.
  • Build seed RTB
  • Translate some data
  • Do some learning
  • Identify most valuable pieces of information to
    get next
  • Generate an RTB for those pieces of information
  • Translate more
  • Learn more
  • Generate more, etc.
Write a Comment
User Comments (0)
About PowerShow.com