Title: Fall 2004
1EECS 595 / LING 541 / SI 661
Natural Language Processing
- Fall 2004
- Lecture Notes 7
2Machine Translation
3Example (from the Hansards corpus)
- English
- lts id960001gt I would like the government and the
Postmaster General to agree that we place the
union and the Postmaster General under
trusteeship so that we can look at his books and
records, including those of his management people
and all the memos he has received from them, some
of which must have shocked him rigid. - lts id960002gt If the minister would like to
propose that, I for one would be prepared to
support him. - French
- lts id960001gt Je voudrais que le gouvernement et
le ministre des Postes conviennent de placer le
syndicat et le ministre des Postes sous tutelle
afin que nous puissions examiner ses livres et
ses dossiers, y compris ceux de ses
collaborateurs, et tous les mémoires qu'il a
reçus d'eux, dont certains l'ont sidéré. - lts id960002gt Si le ministre voulait proposer
cela, je serais pour ma part disposé à l'appuyer.
4Example
- These lies are like their father that begets
them gross as a mountain, open, palpable(Henry
IV, Part 1, act 2, scene 2)
5Language similarities and differences
- Word order (SVO English, Mandarin, VSO Irish,
Classical Arabic, SOV Hindi, Japanese) - Prepositions (Jap.) (to Mariko, Mariko-ni)
- Lexical distinctions (Sp.)
- the bottle floated out
- la botella salió flotando
- Brother (Jap.) otooto (younger), oniisan
(older) - They (Fr.) elles (feminine), ils (masculine)
6Why is Machine Translation Hard?
- Analysis
- Transfer/interlingua
- Generation
INPUT
7Basic Strategies of MT
- Direct Approach
- 50s,60s
- naïve
- Indirect Interlingua
- No looking back
- Language-neutral
- No influence on the target language
- Indirect Transfer
- Preferred
I
E
F
8Levels of Linguistic Processing
- Phonology
- Orthography
- Morphology (inflectional, derivational)
- Syntax (e.g., agreement)
- Semantics (e.g., concrete vs. abstract terms)
- Discourse (e.g., use of pronouns)
- Pragmatics (world knowledge)
9Category Ambiguity
- Morphological ambiguity (Wachtraum)
- Part-of-speech (category) ambiguity (e.g.
round) - Some help comes from morphology (rounding)
- Using syntax, some ambiguities disappear (context
dictates category)
10Homography and Polysemy
- Homographs (light, club, bank)
- Polysemous words (channel, crane)
- for different categories - syntax
- for same category - semantics
11Structural Ambiguity
- Humans can have multiple interpretations (parses)
for the same sentence - Example prepositional phrase attachment
- Use context to disambiguate
- For machine translation, context can be hard to
define
12Use of Linguistic Knowledge
- Subcategorization frames
- Semantic features (is an object readable?)
13Contextual Knowledge
- In practice, very few sentences are truly
ambiguous - Context makes sense for humans (telescope
example), not for machines - no clear definition of context
14Other Strategies
- Pick most natural interpretation
- Ask the author
- Make a guess
- Hope for a free ride
- Direct transfer
15Anaphora Resolution
- Use of pronouns (it, him, himself, her)
- Definite anaphora (the young man)
- Antecedents
- Same problems as for ambiguity resolution
- Similar solutions (e.g., subcategorization)
16When does MT work?
- Machine-Aided Translation (MAT)
- Restricted Domains (e.g., technical manuals)
- Restricted Languages (sublanguages)
- To give the reader an idea of what the text is
about
17The Noisy Channel Model
- Source-channel model of communication
- Parametric probabilistic models of language and
translation - Training such models
18Statistics
f
e
e
E ? F
F ? E
encoder
decoder
e argmax P(ef) argmax P(fe) P(e)
e
e
translation model
language model
19Parametric probabilistic models
- Language model (LM)
- Deleted interpolation
- Translation model (TM)
P(e) P(e1, e2, , eL) P(e1) P(e2e1)
P(eLe1 eL-1)
P(eLe1 eK-1) ? P(eLeL-2, eL-1)
Alignment P(f,ae)
20IBMs EM trained models
- Word translation
- Local alignment
- Fertilities
- Class-based alignment
- Non-deficient algorithm (avoid overlaps, overflow)
21Evaluation
- Human judgements adequacy, grammaticality
- Automatic methods
- BLEU
- ROUGE
22Readings for next time