Title: Bridging the Gap: Cutting Edge Technologies Working for LesserResourced Languages
1Bridging the Gap Cutting Edge Technologies
Working for Lesser-Resourced Languages
- Christian Monson, Ariadna Font Llitjós, Lori
Levin, Alon Lavie, Alison Alvarez, Roberto
Aranovich, Jaime Carbonell, Robert Frederking,
Erik Peterson, Kathrin Probst
2MT Challenges
Semantic Analysis
Sentence Planning
Text Generation
Transfer Rules
Syntactic Parsing
Source (e.g. Quechua)
Target (e.g. English)
Direct SMT, EBMT
3MT Challenges
Need Human expertise But high quality
Semantic Analysis
Sentence Planning
Text Generation
Transfer Rules
Syntactic Parsing
Source (e.g. Quechua)
Target (e.g. English)
Direct SMT, EBMT
4MT Challenges
Need Human expertise But high quality
Semantic Analysis
Sentence Planning
Text Generation
Transfer Rules
Syntactic Parsing
Source (e.g. Quechua)
Target (e.g. English)
Direct SMT, EBMT
Need large bilingual corpus But fast to develop
5AVENUE MT Approach
Interlingua
Semantic Analysis
Sentence Planning
Transfer Rules
Text Generation
Syntactic Parsing
AVENUE Automate Rule Learning
Source (e.g. Quechua)
Target (e.g. English)
Direct SMT, EBMT
6AVENUE MT Approach
Interlingua
Semantic Analysis
Sentence Planning
Transfer Rules
Text Generation
Syntactic Parsing
AVENUE Automate Rule Learning
Source (e.g. Quechua)
Target (e.g. English)
Direct SMT, EBMT
Leverage Linguistic Structure Utilize Bilingual
Lingual Speakers
7Inupiaq 100s of Speakers
Marcellos Languages? 100s of Speakers
Quechua 6 Million Speakers
8Three Sub-Problems
- Morphology Induction
- Initial Syntax Learning
- Syntax Refinement
9Morphology Induction
- Paradigms Organize Morphology
10Paradigm Discovery in 3 Steps
- Search for partial paradigms in a network of
candidates. - Cluster overlapping partial paradigms
- Filter the clusters, keeping the largest clusters
most likely to model true paradigms
A Portion of a Spanish paradigm candidate network
11Morpho Challenge 2007
- Unsupervised Morphology Induction Competition
- English
- 3rd Place Overall
- Bested the strong baseline Morfessor (Creutz,
2006) - German
- 1st Place with Combined ParaMor-Morfessor System
12Syntax Induction