Fall 2004 - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Fall 2004

Description:

Machine Translation. Example (from the Hansards corpus) English ... For machine translation, context can be hard to define. Use of Linguistic Knowledge ... – PowerPoint PPT presentation

Number of Views:37
Avg rating:3.0/5.0
Slides: 23
Provided by: rad75
Category:
Tags: fall

less

Transcript and Presenter's Notes

Title: Fall 2004


1

EECS 595 / LING 541 / SI 661
Natural Language Processing
  • Fall 2004
  • Lecture Notes 7

2
Machine Translation
3
Example (from the Hansards corpus)
  • English
  • lts id960001gt I would like the government and the
    Postmaster General to agree that we place the
    union and the Postmaster General under
    trusteeship so that we can look at his books and
    records, including those of his management people
    and all the memos he has received from them, some
    of which must have shocked him rigid.
  • lts id960002gt If the minister would like to
    propose that, I for one would be prepared to
    support him.
  • French
  • lts id960001gt Je voudrais que le gouvernement et
    le ministre des Postes conviennent de placer le
    syndicat et le ministre des Postes sous tutelle
    afin que nous puissions examiner ses livres et
    ses dossiers, y compris ceux de ses
    collaborateurs, et tous les mémoires qu'il a
    reçus d'eux, dont certains l'ont sidéré.
  • lts id960002gt Si le ministre voulait proposer
    cela, je serais pour ma part disposé à l'appuyer.

4
Example
  • These lies are like their father that begets
    them gross as a mountain, open, palpable(Henry
    IV, Part 1, act 2, scene 2)

5
Language similarities and differences
  • Word order (SVO English, Mandarin, VSO Irish,
    Classical Arabic, SOV Hindi, Japanese)
  • Prepositions (Jap.) (to Mariko, Mariko-ni)
  • Lexical distinctions (Sp.)
  • the bottle floated out
  • la botella salió flotando
  • Brother (Jap.) otooto (younger), oniisan
    (older)
  • They (Fr.) elles (feminine), ils (masculine)

6
Why is Machine Translation Hard?
  • Analysis
  • Transfer/interlingua
  • Generation

INPUT
7
Basic Strategies of MT
  • Direct Approach
  • 50s,60s
  • naïve
  • Indirect Interlingua
  • No looking back
  • Language-neutral
  • No influence on the target language
  • Indirect Transfer
  • Preferred

I
E
F
8
Levels of Linguistic Processing
  • Phonology
  • Orthography
  • Morphology (inflectional, derivational)
  • Syntax (e.g., agreement)
  • Semantics (e.g., concrete vs. abstract terms)
  • Discourse (e.g., use of pronouns)
  • Pragmatics (world knowledge)

9
Category Ambiguity
  • Morphological ambiguity (Wachtraum)
  • Part-of-speech (category) ambiguity (e.g.
    round)
  • Some help comes from morphology (rounding)
  • Using syntax, some ambiguities disappear (context
    dictates category)

10
Homography and Polysemy
  • Homographs (light, club, bank)
  • Polysemous words (channel, crane)
  • for different categories - syntax
  • for same category - semantics

11
Structural Ambiguity
  • Humans can have multiple interpretations (parses)
    for the same sentence
  • Example prepositional phrase attachment
  • Use context to disambiguate
  • For machine translation, context can be hard to
    define

12
Use of Linguistic Knowledge
  • Subcategorization frames
  • Semantic features (is an object readable?)

13
Contextual Knowledge
  • In practice, very few sentences are truly
    ambiguous
  • Context makes sense for humans (telescope
    example), not for machines
  • no clear definition of context

14
Other Strategies
  • Pick most natural interpretation
  • Ask the author
  • Make a guess
  • Hope for a free ride
  • Direct transfer

15
Anaphora Resolution
  • Use of pronouns (it, him, himself, her)
  • Definite anaphora (the young man)
  • Antecedents
  • Same problems as for ambiguity resolution
  • Similar solutions (e.g., subcategorization)

16
When does MT work?
  • Machine-Aided Translation (MAT)
  • Restricted Domains (e.g., technical manuals)
  • Restricted Languages (sublanguages)
  • To give the reader an idea of what the text is
    about

17
The Noisy Channel Model
  • Source-channel model of communication
  • Parametric probabilistic models of language and
    translation
  • Training such models

18
Statistics
  • Given f, guess e

f
e
e
E ? F
F ? E
encoder
decoder
e argmax P(ef) argmax P(fe) P(e)
e
e
translation model
language model
19
Parametric probabilistic models
  • Language model (LM)
  • Deleted interpolation
  • Translation model (TM)

P(e) P(e1, e2, , eL) P(e1) P(e2e1)
P(eLe1 eL-1)
P(eLe1 eK-1) ? P(eLeL-2, eL-1)
Alignment P(f,ae)
20
IBMs EM trained models
  • Word translation
  • Local alignment
  • Fertilities
  • Class-based alignment
  • Non-deficient algorithm (avoid overlaps, overflow)

21
Evaluation
  • Human judgements adequacy, grammaticality
  • Automatic methods
  • BLEU
  • ROUGE

22
Readings for next time
  • JM Chapters 18, 21
Write a Comment
User Comments (0)
About PowerShow.com