Title: AMTEXT:%20Extraction-based%20MT%20for%20Arabic
1AMTEXTExtraction-based MT for Arabic
- Alon Lavie, Jaime Carbonell
- Language Technologies Institute
- Carnegie Mellon University
- Email alavie,jgc_at_cs.cmu.edu
- Project Members
- Laura Kieras, Peter Jansen
- Informant
- Loubna El Abadi
2Objective
- Develop a framework for high-accuracy MT of
extracted entities, objects and their
relationships, which is - Rapidly portable and adaptable to new source
languages - Easily expandable to new types of entities and
relationships
3AMTEXT Approach
- Develop an elicitation corpus specifically
designed for targeted extraction patterns - Learn generalized transfer rules for targeted
extraction patterns from elicitation corpus - Acquire high accuracy Named-Entity translation
lexicon limited translation lexicon for
targeted vocabulary - Runtime use partial parser transfer rules to
translate only the matched portions of SL text
4Elicitation Example
5Learning Transfer Rules
- Different notion of rule generalization than in
our full XFER approach - Generalize from examples to NEs that play
specific roles in target extraction pattern - Verbs and function words may not be generalized
- Example
Peres will meet with Bush today peres yipagesh
im bush hayom
Goal Rule
SS NE-P yipagesh im NE-P TE -gt NE-P will
meet with NE-P TE((X1Y1) (X4Y5) (X5Y6))
6Partial Parsing
- Input Full text in the foreign language
- Output Translation of extracted/matched text
- Goal Extract by effectively matching transfer
rules with the full text - Identify/parse NEs and words in restricted
vocabulary - Identify transfer-rule (source-side) patterns
- Handle expected high-levels of ambiguity
Peres, meluve b-sar ha-xuc shalom, yipagesh im
bush hayom
NE-P
NE-P
NE-P
TE
Peres will meet with Bush today
7Input/Output
- Input
- Full text in source language (Arabic)
- Output
- English translation of extracted entities and
relationships - (Possibly also a structured representation)
????? ????? ????? ?????? ?????? ???? ???? ????
????? ????? ????? ??? ????? ??????? ?????? ?????
?? ???? ???????? ?????? ??????? ?????? ???????
?? ??????? ??????? ????? ?? ???? 23 ???? ??????
300 ?????. ???? ?????? ?????? ???? ?? ???????
???????? ??????? ???????? ?? ???? ????? ??????.
The Abu Hafz al-Masri Brigades - al-Qaida warned
car bombs killed 23 people injured
300 others
AMTEXT System
8Scope of Pilot System
- Arabic-to-English
- Newswire text (available from TIDES)
- Limited set of actions (X meet Y) (X attend Y)
(X hold Y) (X kill Y) (X announce Y) - Limited translation patterns
- ltsubj-NEgt ltverbgt ltobjgt ltLOCgt ltTEgt
- Limited vocabulary
9Evaluation Plan
- Compare AMTEXT approach to full-text
Arabic-to-English SMT, on a limited task of
translation of relations within the scope of
coverage - Establish a test set for evaluation
- Define an appropriate metric Precision/Recall/F1
of relations and entities - Compare performance