Title: Introduction to MT
1Introduction to MT
- Ling 575
- Fei Xia
- Week 2 1/09/07
2Outline
- Course overview
- Discussion
- Introduction to MT
- Divergence and crossing
3Questions
- Who is going to do a MA thesis on MT?
- Who knows a language other than English?
- Have you played with MT systems (e.g., BabelFish)
before? - Any particular topics that interest you?
4Course overview
5General info
- Course website http//courses.washington.edu/lin
g575x/ - Syllabus (incl. slides and papers) updated every
week. - Message board
- ESubmit
- Office hours Fri 10am-1120am.
- Prerequisites
- Ling570
- Programming C or C.
- Introduction to probability and statistics
6Expectations
- Reading
- Papers are online
- Finish reading before class. Bring your questions
to class. - Grade
- Assignments 30
- Project 50
- Class participation 20
- No quizzes, exams
7Assignments
- Three assignments, 10 each
- Goal become familiar with common MT packages
- GIZA IBM Models 1-4
- Pharaoh clump-based decoder
8Project
- Pick a date (via EPost) 4th 9th weeks
- Choose a topic and find papers
- Choose 1-2 papers and tell the group
- Leading discussion 40 min 1 hour
- Final short presentation (?) 15-20 mins
-
- Final report
9Deadlines
- Jan 13 Choose the date (via EPost)
- 1 week before your presentation email me the
topic and 1-2 papers - 1 day (by 8am on Monday) email me your slides
- On the day presentation
- March 13 final report, via ESubmit
10Final presentation/report
- Leading discussion literature review
- Final presentation an update on the topic and/or
methodology. -
- Final report
- Major previous work on the topic.
- Your thoughts and plan to address it
- The literature review and methodology
chapters of your MA thesis. -
11Possible topics
- MT with bridge languages
- MT for Low-density languages
- Collecting parallel data
- New models
- Decoding
- System combination
- Preprocessing e.g., NE translation
- Postprocessing e.g., reranking
- .
12Finding papers
- Go to acl.ldc.upenn.edu
- Look under CL, ACL, COLING, EACL, HLT, NAACL, and
Workshops (e.g., EMNLP). - Search for keywords such as Machine translation
13Finding papers (cont)
- Citeseer citeseer.ist.psu.edu
- Start with several papers on a topic and compare
the reference lists find the intersection. - Feel free to email me with the candidates
beforehand.
14Session layout
- Week 1 no class
- Weeks 2-3 Fei
- Weeks 4-9 Fei and a student
- Week 10 students
15Questions?
16Discussion
17Questions
- What resources do you need to build an MT system?
- How does a MT system work? How would you build an
MT system? - What are the main challenges?
- What are the major approaches?
18Introduction to MT
19 A brief history of MT (Based on work by John
Hutchins)
- Before the computer In the mid 1930s, a
French-Armenian Georges Artsrouni and a Russian
Petr Troyanskii applied for patents for
translating machines. - The pioneers (1947-1954) the first public MT
demo was given in 1954 (by IBM and Georgetown
University). - The decade of optimism (1954-1966) ALPAC
(Automatic Language Processing Advisory
Committee) report in 1966 "there is no immediate
or predictable prospect of useful machine
translation."
20A brief history of MT (cont)
- The aftermath of the ALPAC report (1966-1980) a
virtual end to MT research - The 1980s Interlingua, example-based MT
- The 1990s Statistical MT
- The 2000s Hybrid MT
21Where are we now?
- Huge potential/need due to the internet,
globalization and international politics. - Quick development time due to SMT, the
availability of parallel data and computers. - Translation is reasonable for language pairs with
a large amount of resource. - Start to include more minor languages.
- Got stuck?
22What is MT good for?
- Rough translation web data
- Computer-aided human translation
- Translation for limited domain
- Cross-lingual IR
- Machine is better than human in
- Speed much faster than humans
- Memory can easily memorize millions of
word/phrase translations. - Manpower machines are much cheaper than humans
- Fast learner it takes minutes or hours to build
a new system. Erasable memory ? - Never complain, never get tired,
23Major challenges in MT
24Translation is hard
- Novels
- Word play, jokes, puns, hidden messages
- Concept gaps go Greek, bei fen
- Other constraints lyrics, dubbing, poem,
25Major challenges
- Getting the right words
- Choosing the correct root form
- Getting the correct inflected form
- Inserting spontaneous words
- Putting the words in the correct order
- Word order SVO vs. SOV,
- Unique constructions
- Divergence
26Lexical choice
- Homonymy/Polysemy bank, run
- Concept gap no corresponding concepts in another
language go Greek, go Dutch, fen sui, lame duck,
- Coding (Concept ? lexeme mapping) differences
- More distinction in one language e.g., kinship
vocabulary. - Different division of conceptual space
27Discussion
- More examples on concept gaps
- More examples on coding differences
- Does language shape thought? (Sapir-Whorf
assumption) - Ex1 Chinese radicals whale, dolphin, bat,
marriage - Ex2 kinship, color
28Choosing the appropriate inflection
- Inflection gender, number, case, tense,
- Ex
- Number Ch-Eng all the concrete nouns
- ch_book ? book, books
- Gender Eng-Fr all the adjectives
- Case Eng-Korean all the arguments
- Tense Ch-Eng all the verbs
- ch_buy ? buy, bought, will buy
29Inserting spontaneous words
- Function words
- Determiners Ch-Eng
- ch_book ? a book, the book, the books,
books - Prepositions Ch-Eng
- ch_November ? in November
- Relative pronouns Ch-Eng
- ch_buy ch_book de ch_person ? the person
who bought /book/ - Possessive pronouns Ch-Eng
- ch_he ch_raise ch_hand ? He raised his
hand(s) - Conjunction Eng-Ch
- Although S1, S2 ? ch_although S1, ch_but S2
-
-
30Inserting spontaneous words (cont)
- Content words
- Dropped argument Ch-Eng
- ch_buy le ma ? Has Subj bought Obj?
- Chinese First name Eng-Ch
- Jiang ? ch_Jiang ch_Zemin
- Abbreviation, Acronyms Ch-Eng
- ch_12 ch_big ? the 12th National Congress of
the CPC (Communist Party of China) -
31Major challenges
- Getting the right words
- Choosing the correct root form
- Getting the correct inflected form
- Inserting spontaneous words
- Putting the words in the correct order
- Word order SVO vs. SOV,
- Unique construction
- Structural divergence
32Word order
- SVO, SOV, VSO,
- VP PP ? PP VP
- VP AdvP ? AdvP VP
- Adj N ? N Adj
- NP PP ? PP NP
- NP S ? S NP
- P NP ? NP P
33Word order (cont)
34Unique Constructions
- Overt wh-movement Eng-Ch
- Eng Why do you think that he came yesterday?
- Ch you why think he yesterday come ASP?
- Ch you think he yesterday why come?
- Ba-construction Ch-Eng
- She ba homework finish ASP ? She finished her
homework. - He ba wall dig ASP CL hole ? He digged a hole in
the wall. - She ba orange peel ASP skin ? She peeled the
oranges skin.
35Translation divergences
- Source and target parse trees (dependency trees)
are not identical. - Example I like Mary ? S Marta me gusta a mi
(Mary pleases me) - More discussion after the break
36Major approaches
37How humans do translation?
- Learn a foreign language
- Memorize word translations
- Learn some patterns
- Exercise
- Passive activity read, listen
- Active activity write, speak
- Translation
- Understand the sentence
- Clarify or ask for help (optional)
- Translate the sentence
Training stage
Translation lexicon
Templates, transfer rules
Reinforced learning? Reranking?
Decoding stage
Parsing, semantics analysis?
Interactive MT?
Word-level? Phrase-level? Generate from meaning?
38What kinds of resources are available to MT?
- Translation lexicon
- Bilingual dictionary
- Templates, transfer rules
- Grammar books
- Parallel data, comparable data
- Thesaurus, WordNet, FrameNet,
- NLP tools tokenizer, morph analyzer, parser,
- ? More resources for major languages, less for
minor languages.
39Major approaches
- Transfer-based
- Interlingua
- Example-based (EBMT)
- Statistical MT (SMT)
- Hybrid approach
40The MT triangle
Meaning
(interlingua)
Synthesis
Analysis
Transfer-based
Phrase-based SMT, EBMT
Word-based SMT, EBMT
word
Word
41Transfer-based MT
- Analysis, transfer, generation
- Parse the source sentence
- Transform the parse tree with transfer rules
- Translate source words
- Get the target sentence from the tree
- Resources required
- Source parser
- A translation lexicon
- A set of transfer rules
- An example Mary bought a book yesterday.
42Transfer-based MT (cont)
- Parsing linguistically motivated grammar or
formal grammar? - Transfer
- context-free rules? A path on a dependency tree?
- Apply at most one rule at each level?
- How are rules created?
- Translating words word-to-word translation?
- Generation using LM or other additional
knowledge? - How to create the needed resources automatically?
43Interlingua
- For n languages, we need n(n-1) MT systems.
- Interlingua uses a language-independent
representation. - Conceptually, Interlingua is elegant we only
need n analyzers, and n generators. - Resource needed
- A language-independent representation
- Sophisticated analyzers
- Sophisticated generators
44Interlingua (cont)
- Questions
- Does language-independent meaning representation
really exist? If so, what does it look like? - It requires deep analysis how to get such an
analyzer e.g., semantic analysis - It requires non-trivial generation How is that
done? - It forces disambiguation at various levels
lexical, syntactic, semantic, discourse levels. - It cannot take advantage of similarities between
a particular language pair. -
45Example-based MT
- Basic idea translate a sentence by using the
closest match in parallel data. - First proposed by Nagao (1981).
- Ex
- Training data
- w1 w2 w3 w4 ? w1 w2 w3 w4
- w5 w6 w7 ? w5 w6 w7
- w8 w9 ? w8 w9
- Test sent
- w1 w2 w6 w7 w9 ? w1 w2 w6 w7 w9
46EMBT (cont)
- Types of EBMT
- Lexical (shallow)
- Morphological / POS analysis
- Parse-tree based (deep)
- Types of data required by EBMT systems
- Parallel text
- Bilingual dictionary
- Thesaurus for computing semantic similarity
- Syntactic parser, dependency parser, etc.
47EBMT (cont)
- Word alignment using dictionary and heuristics
- ? exact match
- Generalization
- Clusters dates, numbers, colors, shapes, etc.
- Clusters can be built by hand or learned
automatically. - Ex
- Exact match 12 players met in Paris last Tuesday
? - 12 Spieler trafen sich
letzen Dienstag in Paris - Templates num players met in city time ?
- num Spieler trafen sich
time in city
48Statistical MT
- Basic idea learn all the parameters from
parallel data. - Major types
- Word-based
- Phrase-based
- Strengths
- Easy to build, and it requires no human knowledge
- Good performance when a large amount of training
data is available. - Weaknesses
- How to express linguistic generalization?
49Comparison of resource requirement
50Hybrid MT
- Basic idea combine strengths of different
approaches - Syntax-based generalization at syntactic level
- Interlingua conceptually elegant
- EBMT memorizing translation of n-grams
generalization at various level. - SMT fully automatic using LM optimizing some
objective functions. - Types of hybrid HT
- Borrowing concepts/methods
- SMT from EBMT phrase-based SMT Alignment
templates - EBMT from SMT automatically learned translation
lexicon - Transfer-based from SMT automatically learned
translation lexicon, transfer rules using LM -
- Using two MTs in a pipeline
- Using transfer-based MT as a preprocessor of SMT
- Using multiple MTs in parallel, then adding a
re-ranker.
51Evaluation of MT
52Evaluation
- Unlike many NLP tasks (e.g., tagging, chunking,
parsing, IE, pronoun resolution), there is no
single gold standard for MT. - Human evaluation accuracy, fluency,
- Problem expensive, slow, subjective,
non-reusable. - Automatic measures
- Edit distance
- Word error rate (WER), Position-independent WER
(PER) - Simple string accuracy (SSA), Generation string
accuracy (GSA) - BLEU
53Edit distance
- The Edit distance (a.k.a. Levenshtein distance)
is defined as the minimal cost of transforming
str1 into str2, using three operations
(substitution, insertion, deletion). - Use DP and the complexity is O(mn).
54WER, PER, and SSA
- WER (word error rate) is edit distance, divided
by Ref. - PER (position-independent WER) same as WER but
disregards word ordering - SSA (Simple string accuracy) 1 - WER
- Previous example
- Sys w1 w2 w3 w4
- Ref w1 w3 w2
- Edit distance 2
- WER2/3
- PER1/3
- SSA1/3
55Generation string accuracy (GSA)
- Example
- Ref w1 w2 w3 w4
- Sys w2 w3 w4 w1
- Del1, Ins1 ? SSA1/2
- Move1, Del0, Ins0 ? GSA3/4
56BLEU
- Proposal by Papineni et. al. (2002)
- Most widely used in MT community.
- BLEU is a weighted average of n-gram precision
(pn) between system output and all references,
multiplied by a brevity penalty (BP).
57N-gram precision
- N-gram precision the percent of n-grams in the
system output that are correct. - Clipping
- Sys the the the the the the
- Ref the cat sat on the mat
- Unigram precision
- Max_Ref_count the max number of times a ngram
occurs in any single reference translation.
58N-gram precision
-
- i.e. the percent of n-grams in the system output
that are correct (after clipping).
59Brevity Penalty
- For each sent si in system output, find closest
matching reference ri (in terms of length). - Longer system output is already penalized by the
n-gram precision measure.
60An example
- Sys The cat was on the mat
- Ref1 The cat sat on a mat
- Ref2 There was a cat on the mat
- Assuming N3
- p15/6, p23/5, p31/4, BP1 ? BLEU0.50
- What if N4?
61Summary
- Course overview
- Major challenges in MT
- Choose the right words (root form, inflection,
spontaneous words) - Put them in right positions (word order, unique
constructions, divergences)
62Summary (cont)
- Major approaches
- Transfer-based MT
- Interlingua
- Example-based MT
- Statistical MT
- Hybrid MT
- Evaluation of MT systems
- Edit distance
- WER, PER, SSA, GSA
- BLEU
63Next time
- Hw1 is due.
- Word-based SMT
- Read Knights tutorial beforehand
64Additional slides
65Translation divergences(based on Bonnie Dorrs
work)
- Thematic divergence I like Mary ?
- S Marta me gusta a mi (Mary pleases me)
- Promotional divergence John usually goes home ?
- S Juan suele ira casa (John tends to go
home) - Demotional divergence I like eating ?G Ich esse
gern (I eat likingly) - Structural divergence John entered the house ?
- S Juan entro en la casa (John entered in
the house)
66Translation divergences (cont)
- Conflational divergence I stabbed John ?
- S Yo le di punaladas a Juan (I gave
knife-wounds to John) - Categorial divergence I am hungry ?
- G Ich habe Hunger (I have hunger)
- Lexical divergence John broke into the room ?
- S Juan forzo la entrada al cuarto (John
forced the entry to the room)
67Calculating edit distance
- D(0, 0) 0
- D(i, 0) delCost i
- D(0, j) insCost j
- D(i1, j1)
- min( D(i,j) sub,
- D(i1, j) insCost,
- D(i, j1) delCost)
- sub 0 if str1i1str2j1
- subCost otherwise
68An example
- Sys w1 w2 w3 w4
- Ref w1 w3 w2
- All three costs are 1.
- Edit distance2