Fall 2005 - PowerPoint PPT Presentation

1 / 43
About This Presentation
Title:

Fall 2005

Description:

s id=960001 I would like the government and the Postmaster General to agree ... de placer le syndicat et le ministre des Postes sous tutelle afin que nous ... – PowerPoint PPT presentation

Number of Views:69
Avg rating:3.0/5.0
Slides: 44
Provided by: rad75
Category:
Tags: fall | nous | rheme

less

Transcript and Presenter's Notes

Title: Fall 2005


1

EECS 595 / LING 541 / SI 661
Natural Language Processing
  • Fall 2005
  • Lecture Notes 9

2
Machine Translation
3
Example (from the Hansards corpus)
  • English
  • lts id960001gt I would like the government and the
    Postmaster General to agree that we place the
    union and the Postmaster General under
    trusteeship so that we can look at his books and
    records, including those of his management people
    and all the memos he has received from them, some
    of which must have shocked him rigid.
  • lts id960002gt If the minister would like to
    propose that, I for one would be prepared to
    support him.
  • French
  • lts id960001gt Je voudrais que le gouvernement et
    le ministre des Postes conviennent de placer le
    syndicat et le ministre des Postes sous tutelle
    afin que nous puissions examiner ses livres et
    ses dossiers, y compris ceux de ses
    collaborateurs, et tous les mémoires qu'il a
    reçus d'eux, dont certains l'ont sidéré.
  • lts id960002gt Si le ministre voulait proposer
    cela, je serais pour ma part disposé à l'appuyer.

4
Example
  • These lies are like their father that begets
    them gross as a mountain, open, palpable(Henry
    IV, Part 1, act 2, scene 2)

5
Language similarities and differences
  • Word order (SVO English, Mandarin, VSO Irish,
    Classical Arabic, SOV Hindi, Japanese)
  • Prepositions (Jap.) (to Mariko, Mariko-ni)
  • Lexical distinctions (Sp.)
  • the bottle floated out
  • la botella salió flotando
  • Brother (Jap.) otooto (younger), oniisan
    (older)
  • They (Fr.) elles (feminine), ils (masculine)

6
Why is Machine Translation Hard?
  • Analysis
  • Transfer/interlingua
  • Generation

INPUT
7
Basic Strategies of MT
  • Direct Approach
  • 50s,60s
  • naïve
  • Indirect Interlingua
  • No looking back
  • Language-neutral
  • No influence on the target language
  • Indirect Transfer
  • Preferred

I
E
F
8
Levels of Linguistic Processing
  • Phonology
  • Orthography
  • Morphology (inflectional, derivational)
  • Syntax (e.g., agreement)
  • Semantics (e.g., concrete vs. abstract terms)
  • Discourse (e.g., use of pronouns)
  • Pragmatics (world knowledge)

9
Category Ambiguity
  • Morphological ambiguity (Wachtraum)
  • Part-of-speech (category) ambiguity (e.g.
    round)
  • Some help comes from morphology (rounding)
  • Using syntax, some ambiguities disappear (context
    dictates category)

10
Homography and Polysemy
  • Homographs (light, club, bank)
  • Polysemous words (channel, crane)
  • for different categories - syntax
  • for same category - semantics

11
Structural Ambiguity
  • Humans can have multiple interpretations (parses)
    for the same sentence
  • Example prepositional phrase attachment
  • Use context to disambiguate
  • For machine translation, context can be hard to
    define

12
Use of Linguistic Knowledge
  • Subcategorization frames
  • Semantic features (is an object readable?)

13
Contextual Knowledge
  • In practice, very few sentences are truly
    ambiguous
  • Context makes sense for humans (telescope
    example), not for machines
  • no clear definition of context

14
Other Strategies
  • Pick most natural interpretation
  • Ask the author
  • Make a guess
  • Hope for a free ride
  • Direct transfer

15
Anaphora Resolution
  • Use of pronouns (it, him, himself, her)
  • Definite anaphora (the young man)
  • Antecedents
  • Same problems as for ambiguity resolution
  • Similar solutions (e.g., subcategorization)

16
The Noisy Channel Model
  • Source-channel model of communication
  • Parametric probabilistic models of language and
    translation
  • Training such models

17
Statistics
  • Given f, guess e

f
e
e
E ? F
F ? E
encoder
decoder
e argmax P(ef) argmax P(fe) P(e)
e
e
translation model
language model
18
Parametric probabilistic models
  • Language model (LM)
  • Deleted interpolation
  • Translation model (TM)

P(e) P(e1, e2, , eL) P(e1) P(e2e1)
P(eLe1 eL-1)
P(eLe1 eK-1) ? P(eLeL-2, eL-1)
Alignment P(f,ae)
19
Statistical MT
  • English and Cebuano
  • In the beginning God created the heaven and the
    earth.
  • Sa sinugdan gibuhat sa Dios ang mga langit ug ang
    yuta.
  • And God called the firmament Heaven.
  • Ug gihinganlan sa Dios ang hawan nga Langit.
  • And God called the dry land Earth
  • Ug ang mamala nga dapit gihinganlan sa Dios nga
    Yuta
  • use co-occurrence, word order, cognates
  • corpora are needed
  • sentence alignment needs to be done first

20
Statistical MT
Translate from French une fleur rouge?
21
Issues to deal with
  • word order
  • I like to drink coffee
  • watashi wa kohii o nomu no ga suki desu
  • I-subj coffee-obj drink-dat-rheme like
  • vocabulary
  • wall
  • pared, muro
  • phrases
  • play
  • pièce de théâtre

22
MT/noisy channel models
  • Text-to-text (summ), also text-to-signal, speech
    recognition, OCR, spelling correction
  • P(textpixels) P(text) P(pixelstext)

23
IBMs EM trained models (1-5)
  • Word translation
  • Local alignment
  • Fertilities
  • Class-based alignment
  • Non-deficient algorithm (avoid overlaps, overflow)

24
Steps
  • Tokenization
  • Sentence alignment (1-1, 2-2, 2-1 mappings)
  • Church and Gale (based on sentence length)
  • Church (sequences of 4-grams) based on cognates
  • Melamed (longest common subsequence of words)
    also cognates

25
Model 1
  • Alignments
  • La maison bleue
  • The blue house
  • Alignments 1,2,3, 1,3,2, 1,3,3, 1,1,1
  • All are equally likely
  • Conditional probabilities
  • P(fA,e) ?

26
Model 1 (contd)
  • Algorithm
  • Pick length of translation
  • Choose an alignment
  • Pick the French words
  • That gives you P(f,Ae)
  • We need P(fA,e)
  • Use EM (expectation-maximisation) to find the
    hidden variables
  • (see Kevin Knights tutorial)

27
Model 1
  • We need p(fe) but we dont know the word
    alignments (which are assumed to be equally
    likely)

28
Model 2
  • Distortion parameters D(ij,l,m)
  • i and j are words in the two sentences
  • l and m are the lengths of these sentences.

29
Model 3
  • Fertility
  • P(?ie)
  • Examples
  • (a) play pièce de théâtre
  • (to) place mettre en place
  • p1 is an extra parameter that defines ?0

30
Current work
  • Handling phrases
  • Using syntax
  • In the model
  • In discriminative reranking
  • Low density languages

31
Evaluation
  • Human judgements adequacy, grammaticality
  • Automatic methods
  • BLEU
  • ROUGE

32
When does MT work?
  • Machine-Aided Translation (MAT)
  • Restricted Domains (e.g., technical manuals)
  • Restricted Languages (sublanguages)
  • To give the reader an idea of what the text is
    about

33
Dialogueand conversational agents
REMEMBER TO READ THE NEW VERSION OF THIS CHAPTER
ON THE WEB!
34
Abbott You know, strange as it may seem, they
give ball players nowadays very peculiar
names...Now, on the Cooperstown team we have
Who's on first, What's on second, I Don't Know is
on third- Costello That's what I want to find
out. I want you to tell me the names of the
fellows on the Cooperstown team. Abbott I'm
telling you. Who's on first, What's on second, I
Don't Know is on third. Costello You know the
fellows' names? Abbott Yes. Costello Well,
then, who's playin' first? Abbott Yes.
Costello I mean the fellow's name on first
base. Abbott Who. Costello The fellow's name
on first base for Cooperstown. Abbott Who.
Costello The guy on first base. Abbott Who is
on first base. Costello Well, what are you
asking me for? Abbott I'm not asking you--I'm
telling you. Who is on first. Costello I'm
asking you--who's on first? Abbott That's the
man's name.
35
Costello That's who's name? Abbott Yes.
Costello Well, go ahead, tell me! Abbott Who.
Costello The guy on first. Abbott Who.
Costello The first baseman. Abbott Who is on
first. Costello Have you got a first baseman on
first? Abbott Certainly. Costello Well, all
I'm trying to find out is what's the guy's name
on first base. Abbott Oh, no, no. What is on
second base. Costello I'm not asking you who's
on second.
36
What makes dialogue different
  • Turns and utterances (turn-taking)
  • Turn-taking rules
  • At each TRP (transition-relevance place)
  • designated speaker, any speaker, current speaker
  • Barge-in possible
  • Significant silence
  • A Is there something bothering you or not? (1.0
    s)
  • A Yes or no? (1.5 s)
  • A Eh?
  • B No.

37
Grounding
  • Common ground between speaker and hearer.
  • A returning on flight 1118
  • C mm hmmm (backchannel, acknowledgment token)
  • Other continuers
  • Continued attention
  • Relevant next contribution
  • Acknowledgement (e.g. sure)
  • Demonstration (paraphrasing, reformulating)
  • Display (repeat verbatim)
  • Example
  • C I will take the 5 pm flight on the 11th.
  • A On the 11th?

38
Conversational Implicature
  • Example
  • When do you want to travel?
  • I have a meeting there early in the morning on
    the 13th.
  • Implicature licensed inferences reasonable
    hearers can make.
  • Quantity
  • Agent there are three non-stop flights daily

39
Grices maxims
  • Maxim of quantity
  • make your contribution informative
  • but not more than needed
  • Maxim of quality
  • do not say what you believe is false
  • do not say that for which you lack evidence
  • Maxim of relevance
  • Maxim of manner
  • avoid ambiguity
  • avoid obscurity
  • be brief
  • be orderly

40
Dialogue acts
  • Performative sentences
  • I name this ship the Titanic
  • I second that motion
  • I bet you five dollars that it will snow tomorrow
  • Speech acts
  • locutionary acts uttering a sentence with a
    particular meaning
  • illocutionary acts asking, promising, answering
  • perlocutionary acts producing effects upon the
    feelings, thoughts, or actions of the addressee

41
Speech acts (contd)
  • Assertives suggesting, putting forward,
    swearing, boasting, concluding
  • Directives asking, ordering, requesting,
    inviting, advising, begging
  • Commissives promising, planning, vowing,
    betting, opposing
  • Expressives thanking, apologizing, welcoming,
    deploring
  • Declarations I resign, youre fired.

42
Automatic interpretation of dialogue acts
  • DAMSL - Dialogue Act Markup in Several Layers
  • Agreement (Accept, Maybe, Reject-Part, Hold)
  • Answer
  • Understanding (Signal-not-understood,
    Signal-understood, ack, repeat-rephrase,
    completion)

43
Techniques for DA recognition
  • Plan theoretic (agents, assumptions, goals)
  • Cue-based (please, are you?, rising pitch,
    stress - agreement vs. backchannel)
  • Statistical approaches
Write a Comment
User Comments (0)
About PowerShow.com