Title: Linguistic Approaches to Machine Translation
1Linguistic Approaches to Machine Translation
- May 5, 2009
- Teresa Herrmann
2Outline
- Motivation
- Linguistic Background
- Translation Challenges
- Linguistic approaches to Machine Translation
- Direct translation
- Translation by transfer
- Interlingua approach
3Motivation
- Machine Translation
- of natural language text / speech
- First MT approaches model human translation
process - language-specific analysis, transfer and
generation - lexical and grammatical knowledge about source
and target language
4Linguistic Terminology and Background
- Word-level
- Morphology
- Word Classes and Grammatical Categories
- Lexical semantics
- Sentence-level
- Syntax structure of sentences
- Semantics representation of meaning
5What is a word?
- The quick brown fox jumps over the lazy dog.
- tokenization segment sequence of characters
into words - convention in western languages word boundaries
spaces - Many Asian languages no spaces between words or
even sentences - Thequickbrownfoxjumpsoverthelazydog
- Ambiguous characters make word segmentation
difficult
6Morphology
- internal structure von words, word formation
- words are composed of morphemes smallest
meaning-carrying units of language - house, house-s, small, small-est,
un-predict-able, un-happi-ness - morpheme types
- stem morphemes may appear as separate words
house-s, Tisch-e - functional or bound morphemes need to be
connected with stem morphemes - afix types prefix, suffix, infix, circumfix
- un-happy, kauf-st, Gespräch-s-ablauf, ge-kauf-t
7Morphology
- Word formation through morpheme composition
- Inflection tenses, count, person, case
- walk-s, walk-ed, kauf-st, kauf-te, car, car-s,
Haus, (des) Haus-es, , ein-e schön-e Blume,
arbol-es verde-s - function of the morpheme add information about
tenses, count, person and case - Derivation
- happi-ness, un-predict-able, Zufrieden-heit,
un-brauch-bar, ent-kleiden - bound morphemes derive new words e.g. noun ?
adjective - Composition
- rain-bow, water-proof, Haus-tür, Einkauf-s-wagen,
eis-kalt - modification of the stem morpheme ? create new
words
8Morphology Specialities
- morpho-phonological processes at morpheme
boundaries - stick / sticks, go / goes
- Fugen-s in German compounds Universität-s-profe
ssor - German Umlaut Mutter / Mütter
- Vowel Harmony
- Hungarian, Turkish choice of suffix depends on
already existing vowels in the word
9Morphological Analysis
- finite state automata
- grammatical and lexical knowledge
- inflectional schemata
- lexical information which inflectional class
each word belongs to - for most languages very good coverage and fast
10Word Classes
- The role a word plays within a sentence is
determined by its part of speech (POS) - Noun power, apple, beauty,
- Verb go, sleep, gehen, essen,
- Adjective red, happy, asleep, groß (Haus),
schnell (Auto), - Adverb often, happily, immediately, schnell
(fahren), - Determiner the, a, all (the), which, der, eine,
alle, - Pronoun she, it, them, er, sie, es, mein,
wessen, - Preposition under, of, in, auf, bei,
- Conjunction and, because, if, und, auch, obwohl,
11POS Tagger
- a words POS gives useful information for
translation - ambiguous words can (noun), can (verb)
- statistical POS tagger
- assign each word a POS based on relative
frequency counts and its context in a training
corpus
12Grammatical Categories nouns, pronouns,
adjectives
- person
- pronoun and verb have to agree in this feature I
go, he goes - pro-drop (pronoun dropping) languages e.g. (yo)
hablo I speak - number
- singular, dual, trial, plural singular I vs.
plural we - gender
- masculine, feminine, neuter, animate, inanimate
- case
- role of participant within phrase distinguish
subject, object, - nominative, genitive, accusative, dative,
partitive, locative,
13Grammatical Categories - verbs
- tense future, past or present
- walk, is walking, walked, have walked, had
walked, will walk, will have walked - Japanese even adjectives and nouns are marked
shiroi (white), shirokatta (was white) - aspect
- completeness, habituality, progressiveness
- e.g. imperfect vs. perfect habitual she sings,
continuous she is singing - mood
- factuality, likelihood, possibility, uncertainty
- indicative (he is here), subjunctive (if he were
here), optative - voice
- active (Mary kisses John), passive (John is
kissed by Mary), middle (greek I get myself
taught), causative
14Lexical Semantics
- ambiguous meanings of words
- polysemy words with same surface form have
different (related) meaning - interest Interesse, Zinsen, Anteil
- bank financial institution, to sit on, of a
river - homonymy completely unrelated meaning
- can you can do it!, a can of beans
- ? in order to translate, the correct meaning
within the given context has to be identified
word sense disambiguation - relations between words
- synonmy need require
- antonymy related unrelated big small cheap
expensive - hypernymy (is-a) house building
- meronymy (part-of) door house
15Sentences and sentence structure
- Sequence of words terminated by punctuation mark
full stop, question/exclamation mark, - Jane bought the house.
- SUBJECT VERB (OBJECT)
- subject of a sentence is a phrase headed by a
noun ? noun phrase (NP) - Jane, the woman, a woman, the young woman, she,
the young woman who lives across the street - the verb is in the second position (in English)
- number of objects is determined by the verb and
its valency - transitive verbs 1 or more objects to buy s.th.
(valency1) to give s.o. s.th (valency2) - intransitive verbs 0 objects to sleep
- verb and object(s) together form a constituent ?
verb phrase (VP) - valency needs to be satuated for the sentence to
be complete - Jane bought. Jane slept the house.
16Sentence structure cont.
- additional information can be added to the
sentence in terms of adjuncts - prepositional phrases (PP)
- Jane bought the house (from Jim) (without
hesitation). - Jane bought the house (in the posh neighborhood
across the river). - adverbs
- (Yesterday) Jane bought the house (cheaply).
- embedded clauses
- relative clauses
- Jane (who recently won in the lottery) bought the
house (that was just put on the market). - ? nested sentences, recursive structure of
sentences
17Syntactical Theory
- Assumption
- natural language sentences follow certain
regularities - constituents
- precedence
- sentence structure can be modeled by a
context-free grammar, e.g. Phrase Strucure
Grammar - G ltV, S, P, Sgt where
- V Non-terminal symbols here Syntactic
Categories - S, subset of V Terminal symbols here POS
- P set of production rules describing constituent
structure - S Start symbol Category Sentence
18Phrase Structure Grammar
- Example Grammar Production Rules
- S ? NP VP
- NP ? det N
- NP ? NP PP
- VP ? V NP
- VP ? V NP PP
- PP ? P NP
- Example Lexicon
- Jane NP
- house N
- telescope N
- man N
- the det
- a det
- sees V
- buys V
- with P
19Syntactical Parsing
- given a natural language sentence, return a
syntactical parse tree - headed by an S node
- spanning all words in the sentence
- Parsing strategies
- bottom-up
- top-down
20Syntactical Parsing
Jane buys a house.
21Syntactical Parsing
Lexicon look-up Jane NP buys V a det house
N
NP V det N Jane buys a house.
22Syntactical Parsing
Grammar look-up no rules
NP V det N Jane buys a house.
23Syntactical Parsing
Grammar look-up no rules
NP V det N Jane buys a house.
24Syntactical Parsing
Grammar look-up NP ? det N
NP
NP V det N Jane buys a house.
25Syntactical Parsing
Grammar look-up VP ? V NP
VP
NP
NP V det N Jane buys a house.
26Syntactical Parsing
- no words or nodes left
- highest node S
- ? parse complete
Grammar look-up S ? NP VP
S
VP
NP
NP V det N Jane buys a house.
27Syntactical Parsing
S
- det N Rel V Prep det N adv
V det N Prep det N - The woman who won in the lottery yesterday bought
the house across the street.
NP
VP
SRel
NP
VP
PP
PP
NP
NP
NP
NP
28Syntactical Phenomena
- certain syntactical phenomena cannot be covered
by simple CFGs - Agreement of morpho-syntactic features
- English Subject-Verb (agreement in person and
number) - German Subject-Verb (person number)
determiner - adjective noun (case, number,
gender) - Subcategorization ensure correct amount of
arguments for a verb - intransitive verbs Jane sleeps. / Jane sleeps
the house. - transitive verbs Jane bought the house. / Jane
bought.
29Syntactical Phenomena
- Long-distance dependencies
- Maria hat letzten Sonntag, obwohl sie sich nicht
gut gefühlt hat, die Hausaufgaben gemacht. - Variable word order in German
- Maria gibt dem Mann das Buch.
- Dem Mann gibt Maria das Buch.
- Das Buch gibt Maria dem Mann.
- Maria dem Mann gibt das Buch.
30Unification Grammars
- feature structures represent properties of
linguistic objects - categorial (POS) information
- agreement information
- case, tense,
- basic principle unification
31Unification Grammars
- agreement of subject and verb using reentrancies
1
32HPSG
- Head-driven phrase structure grammar (Pollard
Sag, 1987) - composition of sentence from phrase constituents
as in phrase structure grammar - typed feature structures
- unification
- ? ensure agreement of feature values and correct
subcategorization
33Subcategorization in HPSG
- She drinks wine
- verb drink requires 2 arguments to fill ist
subcategorization frame - subject 1
- object 2
34Parsing Difficulties
- Lexical Ambiguities
- word is assigned multiple POS tags in lexicon
- Time flies like an arrow.
- ? try to disambiguate during parsing
- Structural Ambiguities
- sentence has multiple correct parses
- sentence consituent may be part of several
grammar rules - Jane saw the man with the telescope NP ? NP
PP - Jane saw the man with the telescope VP ? NP
PP - Garden Path sentences
- initial parse leads into the wrong direction
(garden path) - The horse raced past the barn fell.
35Structure Ambiguities / Garden Path sentences
- NP/VP Attachment Ambiguity
- The cop saw the burglar with the
binoculars - The cop saw the burglar with the gun
- NP/S Complement Attachment Ambiguity
- The athlete realised his goal last week
- The athlete realised his shoes were across
the room - Clause-boundary Ambiguity
- Since Jay always jogs a mile the race
doesnt seem very long - Since Jay always jogs a mile doesnt seem
very long - Red. Relative-Main Clause Ambiguity
- The woman delivered the junkmail on
Thursdays - The woman delivered the junkmail threw it
away - Relative/Complement Clause Ambiguity
- The doctor told the woman that he was in
love with to leave - The doctor told the woman that he was in
love with her
36Semantics
- meaning of natural language constructs words,
sentences, text/discourse - compositionality of meaning
- meaning of a sentence is composed from the
meaning of its parts - Semantic formalisms to represent natural language
meaning - First order logic
- Higher order logic formalisms
- Frame semantics
37First order logic
- composition of sentence meaning
- incrementally constructed
- meaningful constituents
S
love(peter, jane)
VP
love(_, jane)
NP V NP Peter loves Jane peter love(_,_) jane
38Higher order logic
- Limitations of first order logic
- The red door door(x) and red(x)
- The alleged criminal criminal(x) and alleged(x)
??? - Every student writes a paper Ax (student(x) ? Ey
(paper(y) AND write(x, y))) - ? solution Type theory
39Backup Type Theory
- words are assigned types according to their
abilities to merge with other words - type e entity type t truth values
- composition along the syntactical tree
- S John sleeps tNPJohn e VP sleeps lte,tgt
- S John loves Mary t VP loves Mary lte,tgtNP
John e V loves lte, lte,tgtgt NP Marye
40Backup Type Theory - Complex Example
- S every alleged criminal loves Mary t
- NP every alleged criminal ltlte,tgt,tgt
- N alleged criminal lte,tgt
- VP loves Mary lte,tgt
- det every ltlte,tgtltlte,tgttgtgt adj alleged
ltlte,tgtlte,tgtgt N criminal lte,tgt V loves lte,lte,tgtgt
NP Marye
41Frame Semantics
- meaning depends on knowledge about the world
- world knowledge is encoded in frames mental
representation of stereotypical situations - to buy commercial transaction frame
- elements involved seller, buyer, goods, price
(required), invoice, receipt (optional) - relations within frames
- seller owns goods, determines price, buyer has
money, pays price, exchange of goods and money,
reversed property situation afterwards - relations between frames
- to buy to sell same elements, direction
reversed
42Challenges of Natural Language Processing
- disambiguation
- word sense
- structural
- co-references refering to objects within and
across sentence boundaries - anaphora pronouns
- The man goes to work. He takes the bus.
- deictic references depend on context
- here, now, I, you refer to different
objects depending on the context in which they
occur - refering to the same object using a synonym,
hypernym - Jane bought the house on Elm Street. The building
had just been put on the market.
43Translation Challenges
- problematic issues when dealing with machine
translation in particular - Translation divergences
- meaning is conveyed, syntactic structure and
semantic distribution of meaning components is
different in the two languages - Translation mismatches
- difference in information content between source
and target language
44Translation Divergences
- Structural divergence word order
- En a delicious soup ? Sp una sopa deliciosa
- En She saw that he left the house after dinner ?
De Sie sah, dass er das Haus nach dem
Abendessen verließ. - Thematic divergence changes of grammatical role
- En You like her. ? Sp Ella te gusta.
- Lit She you_acc pleases.
- Head switching
- En The baby just ate. ? Sp El bebé acaba de
comer. - Lit The baby finishes of to-eat
- syntactic head translated as modifier /
complement of new head
45Translation Divergences cont.
- Lexical Gap
- En rice ? Japanese cooked-rice, rice-plant,
uncooked-rice - 1n mapping, no direct correspondence
- Lexicalization
- En swim across ? Sp cruzar nadando
- to cross swimming
- semantic content differently distributed
- Categorial
- En a little bread ? Sp un poco de pan
- adjective ? noun
- Collocational
- En make a decision ? De eine Entscheidung
treffen
46Linguistic approach to Machine Translation
- Perform translation at different levels of
linguistic abstraction - Direct translation no abstraction
- Syntactic transfer
- Semantic transfer
- Interlingua
Vauquois Triangle
47Direct Translation
- earliest approach to MT
- simple word-level analysis and generation
- POS
- morphology
- source-target language dictionary
- bilingual word mapping
- Problems
- idiomatic expressions
- different word order, structural shifts
John eats an apple.
morphological Analysis
Bilingual Dictionary
morphological Generation
John isst einen Apfel.
48Transfer-based systems
- second generation of MT systems Rule-based MT
- Transfer in 3 steps
- analysis of source sentence ? abstract
representation - transfer of source language representation into
target language representation - generation target language representation ?
surface form of target language sentence - System Components
- lexica
- monolingual source language lexicon
- monolingual target language lexicon
- bilingual dictionary entries
- grammar
- monolingual analysis and generation
- bilingual transfer rules
49Syntactical Transfer
- Level of abstraction syntactical representation
- Analysis of source language (SL) sentence into
source language dependent syntactic tree - Transfer of SL syntactical tree into target
language (TL) syntactical tree - Generation of TL natural language sentence
John eats an apple.
John isst einen Apfel.
monoling. Lexicon
Analysis
monoling. Lexicon
Generation
monoling. Grammar
biling. Lex. Rules
monoling. Grammar
biling. gramm. Rules
Transfer
50Example Translation
- Det Adj N Det N Adj
- A delicious soup Una sopa deliciosa
NP
NP
N1
N1
SL Tree
TL Tree
51Syntactical Tree Transformation Rules
- NP NP
- tv(X) tv(Y) tv(X) tv(Y)
- N1 N1
- Adj N N Adj
- tv(A) tv(B) tv(B) tv(A)
- Det Det
- A Una
- delicious deliciosa
- soup sopa
tv translation variable
52Semantical Transfer
- Level of abstraction semantic representation
- Analysis of SL sentence into semantic
representation - Transfer of SL representation into TL
representation - Generation of TL natural language sentence
John eats an apple.
John isst einen Apfel.
monoling. Lexicon
Analysis
monoling. Lexicon
Generation
monoling. Grammar
biling. Lex. Rules
monoling. Grammar
biling. gramm. Rules
Ex (eat(John, x) AND apple(x))
Ex (essen(John, x) AND apfel(x))
Transfer
53Example translation
- you like her ella te gusta
- SL representationpres, like, L, aterm(pro,X,
and, hearer, X,personal, X), aterm(pro,Y,
and, fem, Y, sing, Y, personal, Y) - TL representationpres, gustar, L, aterm(pro,
Y, and, fem, Y, sing, Y, personal,
Y), aterm(pro,X, and, hearer,
X,personal, X) - Transfer Ruleslike, E, Arg1E, Arg2E
ltgt gustar, E, Arg2S, Arg1S -Arg1E ltgt
Arg1S,Arg2E ltgt Arg2S.
54Transfer-based MT systems
- translation process based on linguistic
properties - actual linguistic theories, e.g. HPSG, Frame
Semantics - or system-internal linguistic representation
tailored for translation purposes - variable level of abstraction
- transfer rules explicitly model the differences
between languages - Disadvantages
- (bilingual) language specialist required to
develop linguistic components - including new languages
- 3 new components Analysis, Transfer, Generation
55Interlingua MT
- Idea
- intermediate language
- abstract language-independent representation
pure meaning - Translation without transfer
- analyze input sentence and generate interlingua
representation - generate target language sentence directly from
interlingua representation - access to world knowledge
English target sentence
English source sentence
German target sentence
Interlingua
German source sentence
French target sentence
French source sentence
target sentence
source sentence
56Representations in Interlingua MT
- Interlingua Representation
- language independent
- encode linguistic knowledge
- subject, objects,
- non-linguistic knowledge
- pragmatic factors focus, relations among text
units, speaker attitudes and intentions,
stylistics, speech acts, deictic references,
prior context, physical context - Representation of World Knowledge
- ontology non-linguistic knowledge about things
and relations between them - an apple is a fruit, property of fruits edible,
property of eat subject needs to be animate - inference mechanisms
- logical operators and, or, If then else
- textual entailment John stopped smoking ? He
must have smoked before.
57Interlingua Translation
- Level of abstraction pure meaning
- Analysis of SL sentence into interlingua
representation - Generation of TL natural language sentence from
interlingua
John eats an apple.
John isst einen Apfel.
monoling. Lexicon
Generation
Analysis
World Knowledge
monoling. Lexicon
monoling. Grammar
monoling. Grammar
Interlingua Repr.
58Interlingua MT
- lower engineering effort for including new
languages - 1 analysis component for each new source language
- 1 generation component for each new target
language - no transfer rules needed
- Translation divergences are handled at
monolingual level - suitable encoding of lexical and grammar entries
necessary - Disadvantages
- language specialist still required
- world knowledge necessary
- ? available systems use domain model covering
only a small domain - true interlingua not reached so far
59Conclusion
- linguistic approaches to machine translation
- direct translation
- transfer of linguistic representations
- translation via interlingua
- based on lexical and grammatical knowledge about
languages - require language expert to design lexicon and
grammar components of the MT system