Bridging the Gap: Cutting Edge Technologies Working for LesserResourced Languages PowerPoint PPT Presentation

presentation player overlay
About This Presentation
Transcript and Presenter's Notes

Title: Bridging the Gap: Cutting Edge Technologies Working for LesserResourced Languages


1
Bridging the Gap Cutting Edge Technologies
Working for Lesser-Resourced Languages
  • Christian Monson, Ariadna Font Llitjós, Lori
    Levin, Alon Lavie, Alison Alvarez, Roberto
    Aranovich, Jaime Carbonell, Robert Frederking,
    Erik Peterson, Kathrin Probst

2
MT Challenges
  • Interlingua

Semantic Analysis
Sentence Planning
Text Generation
Transfer Rules
Syntactic Parsing
Source (e.g. Quechua)
Target (e.g. English)
Direct SMT, EBMT
3
MT Challenges
  • Interlingua

Need Human expertise But high quality
Semantic Analysis
Sentence Planning
Text Generation
Transfer Rules
Syntactic Parsing
Source (e.g. Quechua)
Target (e.g. English)
Direct SMT, EBMT
4
MT Challenges
  • Interlingua

Need Human expertise But high quality
Semantic Analysis
Sentence Planning
Text Generation
Transfer Rules
Syntactic Parsing
Source (e.g. Quechua)
Target (e.g. English)
Direct SMT, EBMT
Need large bilingual corpus But fast to develop
5
AVENUE MT Approach
Interlingua
Semantic Analysis
Sentence Planning
Transfer Rules
Text Generation
Syntactic Parsing
AVENUE Automate Rule Learning
Source (e.g. Quechua)
Target (e.g. English)
Direct SMT, EBMT
6
AVENUE MT Approach
Interlingua
Semantic Analysis
Sentence Planning
Transfer Rules
Text Generation
Syntactic Parsing
AVENUE Automate Rule Learning
Source (e.g. Quechua)
Target (e.g. English)
Direct SMT, EBMT
Leverage Linguistic Structure Utilize Bilingual
Lingual Speakers
7
Inupiaq 100s of Speakers
Marcellos Languages? 100s of Speakers
Quechua 6 Million Speakers
8
Three Sub-Problems
  • Morphology Induction
  • Initial Syntax Learning
  • Syntax Refinement

9
Morphology Induction
  • Paradigms Organize Morphology

10
Paradigm Discovery in 3 Steps
  • Search for partial paradigms in a network of
    candidates.
  • Cluster overlapping partial paradigms
  • Filter the clusters, keeping the largest clusters
    most likely to model true paradigms

A Portion of a Spanish paradigm candidate network
11
Morpho Challenge 2007
  • Unsupervised Morphology Induction Competition
  • English
  • 3rd Place Overall
  • Bested the strong baseline Morfessor (Creutz,
    2006)
  • German
  • 1st Place with Combined ParaMor-Morfessor System

12
Syntax Induction
Write a Comment
User Comments (0)
About PowerShow.com