Title: An Overview of the AVENUE Project
1An Overview of the AVENUE Project
- Presented by
- Lori Levin
- Language Technologies Institute
- School of Computer Science
- Carnegie Mellon University
- Pittsburgh, PA USA
2AVENUE Project
- Dr. Jaime Carbonell, PI
- Dr. Alon Lavie, Co-PI
- Dr. Lori Levin, Co-PI
- Dr. Robert Frederking
- Dr. Ralf Brown
- Dr. Rodolfo Vega
- Mapudungun
- Dr. Eliseo Cañulef
- Rosendo Huisca
- and others
- Erik Peterson
- Christian Monson
- Ariadna Font Llitjós
- Alison Alvarez
- Roberto Aranovich
- Dr. Jeff Good
- Dr. Katharina Probst
- Hebrew
- Dr. Shuly Wintner
- student
This research was funded in part by NSF grant
number IIS-0121-631.
3MT Approaches
- Interlingua
introduce-self
Sentence Planning
Semantic Analysis
Syntactic Parsing Pronoun-acc-1-sg chiamare-1sg N
Text Generation np poss-1sg name BE-pres N
Transfer Rules
AVENUE Automate Rule Learning
Source Mi chiamo Lori
Target My name is Lori
Direct SMT, EBMT
4Approaches to MT
- Direct
- Works best with large parallel corpora
- Millions of words
- Can be done without linguistic resources
- Interlingua
- Useful when you are translating between more than
two languages - Requires linguistic knowledge
- Transfer
- Requires linguistic knowledge
5Useful Resources for MT
- Parallel corpus
- Monolingual corpus
- Lexicon
- Morphological Analyzer (lemmatizer)
- Human Linguist
- Human non-linguist
6Low Resource Situations
- Indigenous languages
- May lack large corpora
- May lack a computational linguist
- Strategic Languages
- Aside from standard written Arabic and Chinese
- Resource-rich language limited domain
- Most of the large parallel corpora are newspaper,
parliamentary proceedings, or broadcast news - Fewer resources for conversation related to
humanitarian aid.
7Why Machine Translation for Languages with
Limited Resources?
- We are in the age of information explosion
- The internetwebGoogle ? anyone can get the
information they want anytime - But what about the text in all those other
languages? - How do they read all this English stuff?
- How do we read all the stuff that they put
online? - MT for these languages would Enable
- Better government access to native indigenous and
minority communities - Better minority and native community
participation in information-rich activities
(health care, education, government) without
giving up their languages. - Civilian and military applications (disaster
relief) - Language preservation
8Mixed Resource Situations
- Some resources are available and others arent.
9Omnivorous MT
- Eat whatever resources are available
- Eat large or small amounts of data
10AVENUEs Inventory
- Resources
- Parallel corpus
- Monolingual corpus
- Lexicon
- Morphological Analyzer (lemmatizer)
- Human Linguist
- Human non-linguist
- Techniques
- Rule based transfer system
- Example Based MT
- Morphology Learning
- Rule Learning
- Interactive Rule Refinement
- Multi-Engine MT
11The Avenue Low Resource Scenario
Elicitation
Rule Learning
Run-Time System
Rule Refinement
Morphology
Translation Correction Tool
Word-Aligned Parallel Corpus
INPUT TEXT
Run Time Transfer System
Rule Refinement Module
Elicitation Corpus
Decoder
Elicitation Tool
Lexical Resources
OUTPUT TEXT
12The Avenue Low Resource Scenario
Elicitation
Rule Learning
Run-Time System
Rule Refinement
Morphology
Translation Correction Tool
Word-Aligned Parallel Corpus
INPUT TEXT
Run Time Transfer System
Rule Refinement Module
Elicitation Corpus
Decoder
Elicitation Tool
Lexical Resources
OUTPUT TEXT
13The Avenue Low Resource Scenario
Elicitation
Rule Learning
Run-Time System
Rule Refinement
Morphology
Translation Correction Tool
Word-Aligned Parallel Corpus
INPUT TEXT
Run Time Transfer System
Rule Refinement Module
Elicitation Corpus
Decoder
Elicitation Tool
Lexical Resources
OUTPUT TEXT
14The Avenue Low Resource Scenario
Elicitation
Rule Learning
Run-Time System
Rule Refinement
Morphology
Translation Correction Tool
Word-Aligned Parallel Corpus
INPUT TEXT
Run Time Transfer System
Rule Refinement Module
Elicitation Corpus
Decoder
Elicitation Tool
Lexical Resources
OUTPUT TEXT
15AVENUE
- Rules can be written by hand or learned
automatically. - Hybrid
- Rule-based transfer
- Statistical decoder
- Multi-engine combinations with SMT and EBMT
16AVENUE systems(Small and experimental, but
tested on unseen data)
- Hebrew-to-English
- Alon Lavie, Shuly Wintner, Katharina Probst
- Hand-written and automatically learned
- Automatic rules trained on 120 sentences perform
slightly better than about 20 hand-written rules. - Hindi-to-English
- Lavie, Peterson, Probst, Levin, Font, Cohen,
Monson - Automatically learned
- Performs better than SMT when training data is
limited to 50K words
17AVENUE systems(Small and experimental, but
tested on unseen data)
- English-to-Spanish
- Ariadna Font Llitjos
- Hand-written, automatically corrected
- Mapudungun-to-Spanish
- Roberto Aranovich and Christian Monson
- Hand-written
- Dutch-to-English
- Simon Zwarts
- Hand-written
18The Avenue Low Resource Scenario
Elicitation
Rule Learning
Run-Time System
Rule Refinement
Morphology
Translation Correction Tool
Word-Aligned Parallel Corpus
INPUT TEXT
Run Time Transfer System
Rule Refinement Module
Elicitation Corpus
Decoder
Elicitation Tool
Lexical Resources
OUTPUT TEXT
19Elicitation
- Get data from someone who is
- Bilingual
- Literate
- With consistent spelling
- Not experienced with linguistics
20English-Hindi Example
Elicitation Tool Erik Peterson
21English-Chinese Example
Note Translator has to insert spaces between
words in Chinese.
22English-Arabic Example
23Purpose of Elicitation
- srcsent Tú caíste
- tgtsent eymi ütrünagimi
- aligned ((1,1),(2,2))
- context tú Juan masculino, 2a persona del
singular - comment You (John) fell
- srcsent Tú estás cayendo
- tgtsent eymi petu ütünagimi
- aligned ((1,1),(2 3,2 3))
- context tú Juan masculino, 2a persona del
singular - comment You (John) are falling
- srcsent Tú caíste
- tgtsent eymi ütrunagimi
- aligned ((1,1),(2,2))
- context tú María femenino, 2a persona del
singular - comment You (Mary) fell
- Provide a small but highly targeted corpus of
hand aligned data - To support machine learning from a small data set
- To discover basic word order
- To discover how syntactic dependencies are
expressed - To discover which grammatical meanings are
reflected in the morphology or syntax of the
language
24Languages
- The set of feature structures with English
sentences has been delivered to the Linguistic
Data Consortium as part of the Reflex program. - Translated (by LDC) into
- Thai
- Bengali
- Plans to translate into
- Seven strategic languages per year for five
years. - As one small part of a language pack (BLARK) for
each language.
25Languages
- Spanish version in progress at New Mexico State
University (Helmreich and Cowie) - Plans to translate into Guarani
- Portuguese version in progress in Brazil
(Marcello Modesto) - Plans to translate into Karitiana
- 200 speakers
- Plans to translate into Inupiaq (Kaplan and
MacLean)
26Previous Elicitation Work
- Pilot corpus
- Around 900 sentences
- No feature structures
- Mapudungun
- Two partial translations
- Quechua
- Three translations
- Aymara
- Seven translations
- Hebrew
- Hindi
- Several translations
- Dutch
27The Avenue Low Resource Scenario
Elicitation
Rule Learning
Run-Time System
Rule Refinement
Morphology
Translation Correction Tool
Word-Aligned Parallel Corpus
INPUT TEXT
Run Time Transfer System
Rule Refinement Module
Elicitation Corpus
Decoder
Elicitation Tool
Lexical Resources
OUTPUT TEXT
28AVENUE Machine Translation System
SL the old man, TL ha-ish ha-zaqen NPNP
DET ADJ N -gt DET N DET ADJ ( (X1Y1) (X1Y3)
(X2Y4) (X3Y2) ((X1 AGR) 3-SING) ((X1 DEF
DEF) ((X3 AGR) 3-SING) ((X3 COUNT)
) ((Y1 DEF) DEF) ((Y3 DEF) DEF) ((Y2 AGR)
3-SING) ((Y2 GENDER) (Y4 GENDER)) )
- Type information
- Synchronous Context Free Rules
- Alignments
- x-side constraints
- y-side constraints
- xy-constraints,
- e.g. ((Y1 AGR) (X1 AGR))
Jaime Carbonell (PI), Alon Lavie (Co-PI), Lori
Levin (Co-PI) Rule learning Katharina Probst
29Rule Learning - Overview
- Goal Acquire Syntactic Transfer Rules
- Use available knowledge from the major-language
side (grammatical structure) - Three steps
- Flat Seed Generation first guesses at transfer
rules flat syntactic structure - Compositionality Learning use previously learned
rules to learn hierarchical structure - Constraint Learning refine rules by learning
appropriate feature constraints
30Flat Seed Rule Generation
31Flat Seed Rule Generation
- Create a flat transfer rule specific to the
sentence pair, partially abstracted to POS - Words that are aligned word-to-word and have the
same POS in both languages are generalized to
their POS - Words that have complex alignments (or not the
same POS) remain lexicalized - One seed rule for each translation example
- No feature constraints associated with seed rules
(but mark the example(s) from which it was
learned)
32Compositionality Learning
33Compositionality Learning
- Detection traverse the c-structure of the
English sentence, add compositional structure for
translatable chunks - Generalization adjust constituent sequences and
alignments - Two implemented variants
- Safe Compositionality there exists a transfer
rule that correctly translates the
sub-constituent - Maximal Compositionality Generalize the rule if
supported by the alignments, even in the absence
of an existing transfer rule for the
sub-constituent
34Constraint Learning
35Constraint Learning
- Goal add appropriate feature constraints to the
acquired rules - Methodology
- Preserve general structural transfer
- Learn specific feature constraints from example
set - Seed rules are grouped into clusters of similar
transfer structure (type, constituent sequences,
alignments) - Each cluster forms a version space a partially
ordered hypothesis space with a specific and a
general boundary - The seed rules in a group form the specific
boundary of a version space - The general boundary is the (implicit) transfer
rule with the same type, constituent sequences,
and alignments, but no feature constraints -
36Transfer and Decoding
Elicitation
Rule Learning
Run-Time System
Rule Refinement
Morphology
Translation Correction Tool
Word-Aligned Parallel Corpus
INPUT TEXT
Run Time Transfer System
Rule Refinement Module
Elicitation Corpus
Decoder
Elicitation Tool
Lexical Resources
OUTPUT TEXT
37The Transfer Engine
38Symbolic Decoder
- System rarely finds a full parse/transfer for
complete input sentence - XFER engine produces comprehensive lattice of
segment translations - Decoder selects best combination of translation
segments - Search for optimal scoring path of partial
translations, based on multiple features - Target Language Model scores
- XFER Rule Scores
- Path Fragmentation
- Other features
- Symbolic decoding essential for scenarios where
there is insufficient data for training large
target LM - Effective Rule Scoring is crucial
39The Avenue Low Resource Scenario
Elicitation
Rule Learning
Run-Time System
Rule Refinement
Morphology
Translation Correction Tool
Word-Aligned Parallel Corpus
INPUT TEXT
Run Time Transfer System
Rule Refinement Module
Elicitation Corpus
Decoder
Elicitation Tool
Lexical Resources
OUTPUT TEXT
40Rule Refinement
Elicitation
Rule Learning
Run-Time System
Rule Refinement
Morphology
Translation Correction Tool
Word-Aligned Parallel Corpus
INPUT TEXT
Run Time Transfer System
Rule Refinement Module
Elicitation Corpus
Decoder
Elicitation Tool
Lexical Resources
OUTPUT TEXT
41Interactive and Automatic Refinement of
Translation Rules
- Problem Improve Machine Translation quality.
- Proposed Solution Put bilingual speakers back
into the loop use their corrections to detect
the source of the error and automatically improve
the lexicon and the grammar. - Approach Automate post-editing efforts by
feeding them back into the MT system. - Automatic refinement of translation rules that
caused an error beyond post-editing. - Goal Improve MT coverage and overall quality.
42Technical Challenges
Automatic Evaluation of Refinement process
Elicit minimal MT information from non-expert
users
43Error Typology for Automatic Rule Refinement
(simplified)
- Missing word
- Extra word
- Wrong word order
- Incorrect word
- Wrong agreement
44TCTool (Demo)
Interactive elicitation of error information
- Add a word
- Delete a word
- Modify a word
- Change word order
Actions
45Types of Refinement Operations
Automatic Rule Adaptation
- 1. Refine a translation rule
- R0 ? R1 (change R0 to make it more specific
or more general)
R0
una casa bonito
a nice house
R1
N gender ADJ gender
a nice house
una casa bonita
46Types of Refinement Operations
Automatic Rule Adaptation
- 2. Bifurcate a translation rule
- R0 ? R0 (same, general rule)
- ? R1 (add a new more specific rule)
R0
una casa bonita
a nice house
R1
ADJ type pre-nominal
un gran artista
a great artist
47Automatic Rule Adaptation
A concrete example
Error Information Elicitation
error
Change word order SL Gaudí was a great artist
MT system output TL Gaudí era un artista
grande Ucorrection Gaudí era un artista
grande Gaudí era un gran artista
correction
clue word
Refinement Operation Typology
48Mapudungun
- Indigenous Language of Chile and Argentina
- 1 Million Mapuche Speakers
49Mapudungun Language
- 900,000 Mapuche people
- At least 300.000 speakers of Mapudungun
- Polysynthetic
- sl pe- rke- fi- ñ
Maria - ver-REPORT-3pO-1pSgS/IND
- tl DICEN QUE LA VI A MARÍA
- (They say that) I saw Maria.
50AVENUE Mapudungun
- Joint project between Carnegie Mellon University,
the Chilean Ministry of Education, and
Universidad de la Frontera.
51Mapudungun to Spanish Resources
- Initially
- Large team of native speakers at Universidad de
la Frontera, Temuco, Chile - Some knowledge of linguistics
- No knowledge of computational linguistics
- No corpus
- A few short word lists
- No morphological analyzer
- Later Computational Linguists with non-native
knowledge of Mapudungun - Other considerations
- Produce something that is useful to the
community, especially for bilingual education - Experimental MT systems are not useful
52Mapudungun
Corpus 170 hours of spoken Mapudungun
Example Based MT
Spelling checker
Elicitation
Rule Learning
Run-Time System
Rule Refinement
Morphology
Translation Correction Tool
Word-Aligned Parallel Corpus
INPUT TEXT
Run Time Transfer System
Rule Refinement Module
Elicitation Corpus
Decoder
Elicitation Tool
Lexical Resources
OUTPUT TEXT
Spanish Morphology from UPC, Barcelona
53Mapudungun Products
- http//www.lenguasamerindias.org/
- Click traductor mapudungún
- Dictionary lookup (Mapudungun to Spanish)
- Morphological analysis
- Example Based MT (Mapudungun to Spanish)
54I Didnt see Maria
S
S
VP
VP
NP
NP
a
V
VSuffG
V
no
VSuffG
VSuff
N
pe
vi
N
VSuffG
VSuff
ñ
Maria
María
fi
VSuff
la
55Transfer to Spanish Top-Down
S
S
VP
VP
VPVP VBar NP -gt VBar "a" NP ( (X1Y1) (X2
Y3) ((X2 type) (NOT personal)) ((X2
human) c ) (X0 X1) ((X0 object) X2)
(Y0 X0) ((Y0 object) (X0 object)) (Y1
Y0) (Y3 (Y0 object)) ((Y1 objmarker person)
(Y3 person)) ((Y1 objmarker number) (Y3
number)) ((Y1 objmarker gender) (Y3 ender)))
NP
NP
a
V
VSuffG
VSuffG
VSuff
N
pe
VSuffG
VSuff
ñ
Maria
fi
VSuff
la
56Mapudungun
- Indigenous Language of Chile and Argentina
- 1 Million Mapuche Speakers
57Collaboration
Eliseo Cañulef Rosendo Huisca Hugo Carrasco
Hector Painequeo Flor Caniupil Luis Caniupil
Huaiquiñir Marcela Collio Calfunao Cristian
Carrillan Anton Salvador Cañulef
- Mapuche Language Experts
- Universidad de la Frontera (UFRO)
- Instituto de Estudios Indígenas (IEI)
- Institute for Indigenous Studies
- Chilean Funding
- Chilean Ministry of Education (Mineduc)
- Bilingual and Multicultural Education Program
Carolina Huenchullan Arrúe Claudio Millacura
Salas
58Accomplishments
- Corpora Collection
- Spoken Corpus
- Collected Luis Caniupil Huaiquiñir
- Medical Domain
- 3 of 4 Mapudungun Dialects
- 120 hours of Nguluche
- 30 hours of Lafkenche
- 20 hours of Pwenche
- Transcribed in Mapudungun
- Translated into Spanish
- Written Corpus
- 200,000 words
- Bilingual Mapudungun Spanish
- Historical and newspaper text
nmlch-nmjm1_x_0405_nmjm_00 M ltSPAgtno pütokovilu
kay ko C no, si me lo tomaba con agua M
chumgechi pütokoki femuechi pütokon pu ltNoisegt
C como se debe tomar, me lo tomé
pués nmlch-nmjm1_x_0406_nmlch_00 M
Chengewerkelafuymiürke C Ya no estabas como
gente entonces!
59Accomplishments
- Developed At UFRO
- Bilingual Dictionary with Examples
- 1,926 entries
- Spelling Corrected Mapudungun Word List
- 117,003 fully-inflected word forms
- Segmented Word List
- 15,120 forms
- Stems translated into Spanish
60Accomplishments
- Developed at LTI using Mapudungun language
resources from UFRO - Spelling Checker
- Integrated into OpenOffice
- Hand-built Morphological Analyzer
- Prototype Machine Translation Systems
- Rule-Based
- Example-Based
- Website LenguasAmerindias.org
61AVENUE Hebrew
- Joint project of Carnegie Mellon University and
University of Haifa
62Hebrew Language
- Native language of about 3-4 Million in Israel
- Semitic language, closely related to Arabic and
with similar linguistic properties - RootPattern word formation system
- Rich verb and noun morphology
- Particles attach as prefixed to the following
word definite article (H), prepositions
(B,K,L,M), coordinating conjuction (W),
relativizers (,K) - Unique alphabet and Writing System
- 22 letters represent (mostly) consonants
- Vowels represented (mostly) by diacritics
- Modern texts omit the diacritic vowels, thus
additional level of ambiguity bare word ? word - Example MHGR ? mehager, mhagar, mhger
63Hebrew Resources
- Morphological analyzer developed at Technion
- Constructed our own Hebrew-to-English lexicon,
based primarily on existing Dahan H-to-E and
E-to-H dictionary - Human Computational Linguists
- Native Speakers
64Hebrew
Elicitation
Rule Learning
Run-Time System
Rule Refinement
Morphology
Translation Correction Tool
Word-Aligned Parallel Corpus
INPUT TEXT
Run Time Transfer System
Rule Refinement Module
Elicitation Corpus
Decoder
Elicitation Tool
Lexical Resources
OUTPUT TEXT
65Flat Seed Rule Generation
66Compositionality Learning
67Constraint Learning
68Challenges for Hebrew MT
- Paucity in existing language resources for Hebrew
- No publicly available broad coverage
morphological analyzer - No publicly available bilingual lexicons or
dictionaries - No POS-tagged corpus or parse tree-bank corpus
for Hebrew - No large Hebrew/English parallel corpus
- Scenario well suited for CMU transfer-based MT
framework for languages with limited resources
69Hebrew Morphology Example
- Input word BWRH
- 0 1 2 3 4
- --------BWRH--------
- -----B-----WR--H--
- --B---H----WRH---
-
70Hebrew Morphology Example
- Y0 ((SPANSTART 0) Y1 ((SPANSTART 0)
Y2 ((SPANSTART 1) - (SPANEND 4) (SPANEND
2) (SPANEND 3) - (LEX BWRH) (LEX B)
(LEX WR) - (POS N) (POS
PREP)) (POS N) - (GEN F)
(GEN M) - (NUM S)
(NUM S) - (STATUS ABSOLUTE))
(STATUS ABSOLUTE)) - Y3 ((SPANSTART 3) Y4 ((SPANSTART 0)
Y5 ((SPANSTART 1) - (SPANEND 4) (SPANEND
1) (SPANEND 2) - (LEX LH) (LEX
B) (LEX H) - (POS POSS)) (POS
PREP)) (POS DET)) - Y6 ((SPANSTART 2) Y7 ((SPANSTART 0)
- (SPANEND 4) (SPANEND
4) - (LEX WRH) (LEX
BWRH) - (POS N) (POS
LEX)) - (GEN F)
- (NUM S)
71Sample Output (dev-data)
- maxwell anurpung comes from ghana for israel four
years ago and since worked in cleaning in hotels
in eilat - a few weeks ago announced if management club
hotel that for him to leave israel according to
the government instructions and immigration
police - in a letter in broken english which spread among
the foreign workers thanks to them hotel for
their hard work and announced that will purchase
for hm flight tickets for their countries from
their money
72Quechua?Spanish MT
- V-Unit funded Summer project in Cusco (Peru)
June-August 2005 preparations and data
collection started earlier - Intensive Quechua course in Centro Bartolome de
las Casas (CBC) - Worked together with two Quechua native and one
non-native speakers on developing infrastructure
(correcting elicited translations, segmenting and
translating list of most frequent words)
73Quechua ? Spanish Prototype MT System
- Stem Lexicon (semi-automatically generated) 753
lexical entries - Suffix lexicon 21 suffixes
- (150 Cusihuaman)
- Quechua morphology analyzer
- 25 translation rules
- Spanish morphology generation module
- User-Studies 10 sentences, 3 users (2 native, 1
non-native)
74Quechua facts
- Agglutinative language
- A stem can often have 10 to 12 suffixes, but it
can have up to 28 suffixes - Supposedly clear cut boundaries, but in reality
several suffixes change when followed by certain
other suffixes - No irregular verbs, nouns or adjectives
- Does not mark for gender
- No adjective agreement
- No definite or indefinite articles (topic and
focus markers perform a similar task of
articles and intonation in English or Spanish)
75Quechua examples
- takini (also written takiniy)
- sing 1sg (I sing) ? canto
- takishani (takishaniy)
- sing progr 1sg (I am singing) ? estoy
cantando - takipakuqchu?
- taki sing
- -paku to join a group to do something
- -q agentive
- -chu interrogative
- ? (para) cantar con la gente (del pueblo)?
- (to sing with the people (of the village)?)
76Quechua Resources
- A few native speakers, not linguists
- A computational linguist learning Quechua
- Two fluent, but non-native linguists
77Quechua
Parallel Corpus OCR with correction
Elicitation
Rule Learning
Run-Time System
Rule Refinement
Morphology
Translation Correction Tool
Word-Aligned Parallel Corpus
INPUT TEXT
Run Time Transfer System
Rule Refinement Module
Elicitation Corpus
Decoder
Elicitation Tool
Lexical Resources
OUTPUT TEXT
78Grammar rules
cantando
- takishani -gt estoy cantando (I am singing)
- VBar,3
- VBarVBar V VSuff VSuff -gt V V
- ( (X1Y2)
- ((x0 person) (x3 person))
- ((x0 number) (x3 number))
- ((x2 mood) c ger)
- ((y2 mood) (x2 mood))
- ((y1 form) c estar)
- ((y1 person) (x3 person))
- ((y1 number) (x3 number))
- ((y1 tense) (x3 tense))
- ((x0 tense) (x3 tense))
- ((y1 mood) (x3 mood))
- ((x3 inflected) c )
- ((x0 inflected) ))
Spanish Morphology Generation
lex cantar mood ger
lex estar person 1 number sg tense
pres mood ind
estoy
79Hindi Resources
- Large statistical lexicon from the Linguistic
Data Consortium (LDC) - Parallel Corpus from LDC
- Morphological Analyzer-Generator from LDC
- Lots of native speakers
- Computational linguists with little or no
knowledge of Hindi - Experimented with the size of the parallel corpus
- Miserly and large scenarios
80Hindi
EBMT
Parallel Corpus
SMT
Elicitation
Rule Learning
Run-Time System
Rule Refinement
Morphology
Translation Correction Tool
Word-Aligned Parallel Corpus
INPUT TEXT
Run Time Transfer System
Rule Refinement Module
Elicitation Corpus
Decoder
Elicitation Tool
Lexical Resources
OUTPUT TEXT
15,000 Noun Phrases from Penn TreeBank
Supported by DARPA TIDES
81Manual Transfer Rules Example
NP PP NP1 NP P Adj N
N1 ke eka aXyAya N
jIvana
NP NP1 PP Adj N
P NP one chapter of N1
N life
NP1 ke NP2 -gt NP2 of NP1 Ex jIvana ke
eka aXyAya life of (one) chapter
gt a chapter of life NP,12 NPNP PP
NP1 -gt NP1 PP ( (X1Y2) (X2Y1) ((x2
lexwx) 'kA') ) NP,13 NPNP NP1 -gt
NP1 ( (X1Y1) ) PP,12 PPPP NP Postp
-gt Prep NP ( (X1Y2) (X2Y1) )
82Hindi-English
Very miserly training data. Seven combinations of
components Strong decoder allows
re-ordering Three automatic scoring metrics