MT Challenges - PowerPoint PPT Presentation

1 / 13
About This Presentation
Title:

MT Challenges

Description:

MT Challenges. Ed Kenschaft. University of Maryland ... CAT vs. MT. Linguistically informed systems. Supervised learning. Exploit all available resources ... – PowerPoint PPT presentation

Number of Views:38
Avg rating:3.0/5.0
Slides: 14
Provided by: umiac7
Category:
Tags: challenges | mt

less

Transcript and Presenter's Notes

Title: MT Challenges


1
MT Challenges
  • Ed Kenschaft
  • University of Maryland
  • kensch at umd

2
My Perspective
  • Software engineer
  • Linguistics student at UMD
  • Researcher in NLP group
  • Studied with SIL translators
  • Analyst with SIL software development

3
SIL International
  • Faith-based Christian organization
  • Partner with speakers of languages that have
    never been written down
  • Purposes
  • preserve the language and culture
  • document the language for study
  • translate the Bible and community development
    materials
  • Documented 1400 languages in 70 countries

4
Challenges of Bible Translation
  • Ultra-low-density languages
  • Nearly endless variety of target languages
  • 2000-3000 remaining
  • Exceedingly rich domain of discourse
  • approximates all of natural language
  • Demand for 100 accuracy/fluency
  • Cultural variation
  • Intelligibility ? Fidelity

5
Cultural Context
  • Cleanse me with hyssop, and I will be
    clean wash me, and I will be whiter than
    snow.(Psalm 517, NIV)
  • What is hyssop?
  • What is snow?
  • What does it mean to be white?
  • Cleanse me with a plant indigenous to the lands
    of the ancient Near East, used in Jewish
    religious ceremonies, and I will be whiter than
    the precipitation that falls like rain when the
    weather is very cold, which indicates a state of
    moral purity.

6
Intelligibility ? Fidelity
  • Where there is no vision, the people perish.
    (Proverbs 2918a, KJ21)
  • When people do not accept divine guidance, they
    run wild. (Pr 2918a, NLT)

7
Waste of Time?
  • Can a computer replace a translator?
  • Limited domains only
  • What can it do?
  • Word-processing
  • Data storage analysis
  • First draft?

8
General Approach
  • CAT vs. MT
  • Linguistically informed systems
  • Supervised learning
  • Exploit all available resources
  • SL resources
  • Existing TL data

9
Data Representation
  • Text encoding
  • Unicode
  • Fonts
  • Graphite
  • Interlinear text
  • LinguaLinks, Toolbox, FieldWorks

10
Elicitation Analysis
  • Elicit syntactic morphological data
  • AVENUE, EXPEDITION
  • Elicit word lists for language survey
  • WordSurv

11
SL Resources
  • Related language adaptation
  • CARLA
  • Projection across word alignment
  • GIZA, Multi-Align, Parser Projection

12
NLG
  • Rich interlingua
  • TBTA (Tod Allman)
  • Statistical fluency enhancement
  • (Sebastian Varges)

13
The Limits of NLP
  • Who knows?
  • TMI-2004
Write a Comment
User Comments (0)
About PowerShow.com