Introduction to MT - PowerPoint PPT Presentation

1 / 68
About This Presentation
Title:

Introduction to MT

Description:

Who is going to do a MA thesis on MT? Who 'knows' a ... Have you played with MT systems (e.g., BabelFish) before? Any particular topics that interest you? ... – PowerPoint PPT presentation

Number of Views:82
Avg rating:3.0/5.0
Slides: 69
Provided by: coursesWa
Category:

less

Transcript and Presenter's Notes

Title: Introduction to MT


1
Introduction to MT
  • Ling 575
  • Fei Xia
  • Week 2 1/09/07

2
Outline
  • Course overview
  • Discussion
  • Introduction to MT
  • Divergence and crossing

3
Questions
  • Who is going to do a MA thesis on MT?
  • Who knows a language other than English?
  • Have you played with MT systems (e.g., BabelFish)
    before?
  • Any particular topics that interest you?

4
Course overview
5
General info
  • Course website http//courses.washington.edu/lin
    g575x/
  • Syllabus (incl. slides and papers) updated every
    week.
  • Message board
  • ESubmit
  • Office hours Fri 10am-1120am.
  • Prerequisites
  • Ling570
  • Programming C or C.
  • Introduction to probability and statistics

6
Expectations
  • Reading
  • Papers are online
  • Finish reading before class. Bring your questions
    to class.
  • Grade
  • Assignments 30
  • Project 50
  • Class participation 20
  • No quizzes, exams

7
Assignments
  • Three assignments, 10 each
  • Goal become familiar with common MT packages
  • GIZA IBM Models 1-4
  • Pharaoh clump-based decoder

8
Project
  • Pick a date (via EPost) 4th 9th weeks
  • Choose a topic and find papers
  • Choose 1-2 papers and tell the group
  • Leading discussion 40 min 1 hour
  • Final short presentation (?) 15-20 mins
  • Final report

9
Deadlines
  • Jan 13 Choose the date (via EPost)
  • 1 week before your presentation email me the
    topic and 1-2 papers
  • 1 day (by 8am on Monday) email me your slides
  • On the day presentation
  • March 13 final report, via ESubmit

10
Final presentation/report
  • Leading discussion literature review
  • Final presentation an update on the topic and/or
    methodology.
  • Final report
  • Major previous work on the topic.
  • Your thoughts and plan to address it
  • The literature review and methodology
    chapters of your MA thesis.

11
Possible topics
  • MT with bridge languages
  • MT for Low-density languages
  • Collecting parallel data
  • New models
  • Decoding
  • System combination
  • Preprocessing e.g., NE translation
  • Postprocessing e.g., reranking
  • .

12
Finding papers
  • Go to acl.ldc.upenn.edu
  • Look under CL, ACL, COLING, EACL, HLT, NAACL, and
    Workshops (e.g., EMNLP).
  • Search for keywords such as Machine translation

13
Finding papers (cont)
  • Citeseer citeseer.ist.psu.edu
  • Start with several papers on a topic and compare
    the reference lists find the intersection.
  • Feel free to email me with the candidates
    beforehand.

14
Session layout
  • Week 1 no class
  • Weeks 2-3 Fei
  • Weeks 4-9 Fei and a student
  • Week 10 students

15
Questions?
16
Discussion
17
Questions
  • What resources do you need to build an MT system?
  • How does a MT system work? How would you build an
    MT system?
  • What are the main challenges?
  • What are the major approaches?

18
Introduction to MT
19
A brief history of MT (Based on work by John
Hutchins)
  • Before the computer In the mid 1930s, a
    French-Armenian Georges Artsrouni and a Russian
    Petr Troyanskii applied for patents for
    translating machines.
  • The pioneers (1947-1954) the first public MT
    demo was given in 1954 (by IBM and Georgetown
    University).
  • The decade of optimism (1954-1966) ALPAC
    (Automatic Language Processing Advisory
    Committee) report in 1966 "there is no immediate
    or predictable prospect of useful machine
    translation."

20
A brief history of MT (cont)
  • The aftermath of the ALPAC report (1966-1980) a
    virtual end to MT research
  • The 1980s Interlingua, example-based MT
  • The 1990s Statistical MT
  • The 2000s Hybrid MT

21
Where are we now?
  • Huge potential/need due to the internet,
    globalization and international politics.
  • Quick development time due to SMT, the
    availability of parallel data and computers.
  • Translation is reasonable for language pairs with
    a large amount of resource.
  • Start to include more minor languages.
  • Got stuck?

22
What is MT good for?
  • Rough translation web data
  • Computer-aided human translation
  • Translation for limited domain
  • Cross-lingual IR
  • Machine is better than human in
  • Speed much faster than humans
  • Memory can easily memorize millions of
    word/phrase translations.
  • Manpower machines are much cheaper than humans
  • Fast learner it takes minutes or hours to build
    a new system. Erasable memory ?
  • Never complain, never get tired,

23
Major challenges in MT
24
Translation is hard
  • Novels
  • Word play, jokes, puns, hidden messages
  • Concept gaps go Greek, bei fen
  • Other constraints lyrics, dubbing, poem,

25
Major challenges
  • Getting the right words
  • Choosing the correct root form
  • Getting the correct inflected form
  • Inserting spontaneous words
  • Putting the words in the correct order
  • Word order SVO vs. SOV,
  • Unique constructions
  • Divergence

26
Lexical choice
  • Homonymy/Polysemy bank, run
  • Concept gap no corresponding concepts in another
    language go Greek, go Dutch, fen sui, lame duck,
  • Coding (Concept ? lexeme mapping) differences
  • More distinction in one language e.g., kinship
    vocabulary.
  • Different division of conceptual space

27
Discussion
  • More examples on concept gaps
  • More examples on coding differences
  • Does language shape thought? (Sapir-Whorf
    assumption)
  • Ex1 Chinese radicals whale, dolphin, bat,
    marriage
  • Ex2 kinship, color

28
Choosing the appropriate inflection
  • Inflection gender, number, case, tense,
  • Ex
  • Number Ch-Eng all the concrete nouns
  • ch_book ? book, books
  • Gender Eng-Fr all the adjectives
  • Case Eng-Korean all the arguments
  • Tense Ch-Eng all the verbs
  • ch_buy ? buy, bought, will buy

29
Inserting spontaneous words
  • Function words
  • Determiners Ch-Eng
  • ch_book ? a book, the book, the books,
    books
  • Prepositions Ch-Eng
  • ch_November ? in November
  • Relative pronouns Ch-Eng
  • ch_buy ch_book de ch_person ? the person
    who bought /book/
  • Possessive pronouns Ch-Eng
  • ch_he ch_raise ch_hand ? He raised his
    hand(s)
  • Conjunction Eng-Ch
  • Although S1, S2 ? ch_although S1, ch_but S2

30
Inserting spontaneous words (cont)
  • Content words
  • Dropped argument Ch-Eng
  • ch_buy le ma ? Has Subj bought Obj?
  • Chinese First name Eng-Ch
  • Jiang ? ch_Jiang ch_Zemin
  • Abbreviation, Acronyms Ch-Eng
  • ch_12 ch_big ? the 12th National Congress of
    the CPC (Communist Party of China)

31
Major challenges
  • Getting the right words
  • Choosing the correct root form
  • Getting the correct inflected form
  • Inserting spontaneous words
  • Putting the words in the correct order
  • Word order SVO vs. SOV,
  • Unique construction
  • Structural divergence

32
Word order
  • SVO, SOV, VSO,
  • VP PP ? PP VP
  • VP AdvP ? AdvP VP
  • Adj N ? N Adj
  • NP PP ? PP NP
  • NP S ? S NP
  • P NP ? NP P

33
Word order (cont)
  • Languages you know

34
Unique Constructions
  • Overt wh-movement Eng-Ch
  • Eng Why do you think that he came yesterday?
  • Ch you why think he yesterday come ASP?
  • Ch you think he yesterday why come?
  • Ba-construction Ch-Eng
  • She ba homework finish ASP ? She finished her
    homework.
  • He ba wall dig ASP CL hole ? He digged a hole in
    the wall.
  • She ba orange peel ASP skin ? She peeled the
    oranges skin.

35
Translation divergences
  • Source and target parse trees (dependency trees)
    are not identical.
  • Example I like Mary ? S Marta me gusta a mi
    (Mary pleases me)
  • More discussion after the break

36
Major approaches
37
How humans do translation?
  • Learn a foreign language
  • Memorize word translations
  • Learn some patterns
  • Exercise
  • Passive activity read, listen
  • Active activity write, speak
  • Translation
  • Understand the sentence
  • Clarify or ask for help (optional)
  • Translate the sentence

Training stage
Translation lexicon
Templates, transfer rules
Reinforced learning? Reranking?
Decoding stage
Parsing, semantics analysis?
Interactive MT?
Word-level? Phrase-level? Generate from meaning?
38
What kinds of resources are available to MT?
  • Translation lexicon
  • Bilingual dictionary
  • Templates, transfer rules
  • Grammar books
  • Parallel data, comparable data
  • Thesaurus, WordNet, FrameNet,
  • NLP tools tokenizer, morph analyzer, parser,
  • ? More resources for major languages, less for
    minor languages.

39
Major approaches
  • Transfer-based
  • Interlingua
  • Example-based (EBMT)
  • Statistical MT (SMT)
  • Hybrid approach

40
The MT triangle
Meaning
(interlingua)

Synthesis
Analysis
Transfer-based
Phrase-based SMT, EBMT
Word-based SMT, EBMT
word
Word
41
Transfer-based MT
  • Analysis, transfer, generation
  • Parse the source sentence
  • Transform the parse tree with transfer rules
  • Translate source words
  • Get the target sentence from the tree
  • Resources required
  • Source parser
  • A translation lexicon
  • A set of transfer rules
  • An example Mary bought a book yesterday.

42
Transfer-based MT (cont)
  • Parsing linguistically motivated grammar or
    formal grammar?
  • Transfer
  • context-free rules? A path on a dependency tree?
  • Apply at most one rule at each level?
  • How are rules created?
  • Translating words word-to-word translation?
  • Generation using LM or other additional
    knowledge?
  • How to create the needed resources automatically?

43
Interlingua
  • For n languages, we need n(n-1) MT systems.
  • Interlingua uses a language-independent
    representation.
  • Conceptually, Interlingua is elegant we only
    need n analyzers, and n generators.
  • Resource needed
  • A language-independent representation
  • Sophisticated analyzers
  • Sophisticated generators

44
Interlingua (cont)
  • Questions
  • Does language-independent meaning representation
    really exist? If so, what does it look like?
  • It requires deep analysis how to get such an
    analyzer e.g., semantic analysis
  • It requires non-trivial generation How is that
    done?
  • It forces disambiguation at various levels
    lexical, syntactic, semantic, discourse levels.
  • It cannot take advantage of similarities between
    a particular language pair.

45
Example-based MT
  • Basic idea translate a sentence by using the
    closest match in parallel data.
  • First proposed by Nagao (1981).
  • Ex
  • Training data
  • w1 w2 w3 w4 ? w1 w2 w3 w4
  • w5 w6 w7 ? w5 w6 w7
  • w8 w9 ? w8 w9
  • Test sent
  • w1 w2 w6 w7 w9 ? w1 w2 w6 w7 w9

46
EMBT (cont)
  • Types of EBMT
  • Lexical (shallow)
  • Morphological / POS analysis
  • Parse-tree based (deep)
  • Types of data required by EBMT systems
  • Parallel text
  • Bilingual dictionary
  • Thesaurus for computing semantic similarity
  • Syntactic parser, dependency parser, etc.

47
EBMT (cont)
  • Word alignment using dictionary and heuristics
  • ? exact match
  • Generalization
  • Clusters dates, numbers, colors, shapes, etc.
  • Clusters can be built by hand or learned
    automatically.
  • Ex
  • Exact match 12 players met in Paris last Tuesday
    ?
  • 12 Spieler trafen sich
    letzen Dienstag in Paris
  • Templates num players met in city time ?
  • num Spieler trafen sich
    time in city

48
Statistical MT
  • Basic idea learn all the parameters from
    parallel data.
  • Major types
  • Word-based
  • Phrase-based
  • Strengths
  • Easy to build, and it requires no human knowledge
  • Good performance when a large amount of training
    data is available.
  • Weaknesses
  • How to express linguistic generalization?

49
Comparison of resource requirement
50
Hybrid MT
  • Basic idea combine strengths of different
    approaches
  • Syntax-based generalization at syntactic level
  • Interlingua conceptually elegant
  • EBMT memorizing translation of n-grams
    generalization at various level.
  • SMT fully automatic using LM optimizing some
    objective functions.
  • Types of hybrid HT
  • Borrowing concepts/methods
  • SMT from EBMT phrase-based SMT Alignment
    templates
  • EBMT from SMT automatically learned translation
    lexicon
  • Transfer-based from SMT automatically learned
    translation lexicon, transfer rules using LM
  • Using two MTs in a pipeline
  • Using transfer-based MT as a preprocessor of SMT
  • Using multiple MTs in parallel, then adding a
    re-ranker.

51
Evaluation of MT
52
Evaluation
  • Unlike many NLP tasks (e.g., tagging, chunking,
    parsing, IE, pronoun resolution), there is no
    single gold standard for MT.
  • Human evaluation accuracy, fluency,
  • Problem expensive, slow, subjective,
    non-reusable.
  • Automatic measures
  • Edit distance
  • Word error rate (WER), Position-independent WER
    (PER)
  • Simple string accuracy (SSA), Generation string
    accuracy (GSA)
  • BLEU

53
Edit distance
  • The Edit distance (a.k.a. Levenshtein distance)
    is defined as the minimal cost of transforming
    str1 into str2, using three operations
    (substitution, insertion, deletion).
  • Use DP and the complexity is O(mn).

54
WER, PER, and SSA
  • WER (word error rate) is edit distance, divided
    by Ref.
  • PER (position-independent WER) same as WER but
    disregards word ordering
  • SSA (Simple string accuracy) 1 - WER
  • Previous example
  • Sys w1 w2 w3 w4
  • Ref w1 w3 w2
  • Edit distance 2
  • WER2/3
  • PER1/3
  • SSA1/3

55
Generation string accuracy (GSA)
  • Example
  • Ref w1 w2 w3 w4
  • Sys w2 w3 w4 w1
  • Del1, Ins1 ? SSA1/2
  • Move1, Del0, Ins0 ? GSA3/4

56
BLEU
  • Proposal by Papineni et. al. (2002)
  • Most widely used in MT community.
  • BLEU is a weighted average of n-gram precision
    (pn) between system output and all references,
    multiplied by a brevity penalty (BP).

57
N-gram precision
  • N-gram precision the percent of n-grams in the
    system output that are correct.
  • Clipping
  • Sys the the the the the the
  • Ref the cat sat on the mat
  • Unigram precision
  • Max_Ref_count the max number of times a ngram
    occurs in any single reference translation.

58
N-gram precision
  • i.e. the percent of n-grams in the system output
    that are correct (after clipping).

59
Brevity Penalty
  • For each sent si in system output, find closest
    matching reference ri (in terms of length).
  • Longer system output is already penalized by the
    n-gram precision measure.

60
An example
  • Sys The cat was on the mat
  • Ref1 The cat sat on a mat
  • Ref2 There was a cat on the mat
  • Assuming N3
  • p15/6, p23/5, p31/4, BP1 ? BLEU0.50
  • What if N4?

61
Summary
  • Course overview
  • Major challenges in MT
  • Choose the right words (root form, inflection,
    spontaneous words)
  • Put them in right positions (word order, unique
    constructions, divergences)

62
Summary (cont)
  • Major approaches
  • Transfer-based MT
  • Interlingua
  • Example-based MT
  • Statistical MT
  • Hybrid MT
  • Evaluation of MT systems
  • Edit distance
  • WER, PER, SSA, GSA
  • BLEU

63
Next time
  • Hw1 is due.
  • Word-based SMT
  • Read Knights tutorial beforehand

64
Additional slides
65
Translation divergences(based on Bonnie Dorrs
work)
  • Thematic divergence I like Mary ?
  • S Marta me gusta a mi (Mary pleases me)
  • Promotional divergence John usually goes home ?
  • S Juan suele ira casa (John tends to go
    home)
  • Demotional divergence I like eating ?G Ich esse
    gern (I eat likingly)
  • Structural divergence John entered the house ?
  • S Juan entro en la casa (John entered in
    the house)

66
Translation divergences (cont)
  • Conflational divergence I stabbed John ?
  • S Yo le di punaladas a Juan (I gave
    knife-wounds to John)
  • Categorial divergence I am hungry ?
  • G Ich habe Hunger (I have hunger)
  • Lexical divergence John broke into the room ?
  • S Juan forzo la entrada al cuarto (John
    forced the entry to the room)

67
Calculating edit distance
  • D(0, 0) 0
  • D(i, 0) delCost i
  • D(0, j) insCost j
  • D(i1, j1)
  • min( D(i,j) sub,
  • D(i1, j) insCost,
  • D(i, j1) delCost)
  • sub 0 if str1i1str2j1
  • subCost otherwise

68
An example
  • Sys w1 w2 w3 w4
  • Ref w1 w3 w2
  • All three costs are 1.
  • Edit distance2
Write a Comment
User Comments (0)
About PowerShow.com