Linguistic Approaches to Machine Translation - PowerPoint PPT Presentation

1 / 56
About This Presentation
Title:

Linguistic Approaches to Machine Translation

Description:

... ed, kauf-st, kauf-te, car, car-s, Haus, (des) Haus-es, ..., ein ... elements involved: seller, buyer, goods, price (required), invoice, receipt (optional) ... – PowerPoint PPT presentation

Number of Views:444
Avg rating:3.0/5.0
Slides: 57
Provided by: muntsi
Category:

less

Transcript and Presenter's Notes

Title: Linguistic Approaches to Machine Translation


1
Linguistic Approaches to Machine Translation
  • May 5, 2009
  • Teresa Herrmann

2
Outline
  • Motivation
  • Linguistic Background
  • Translation Challenges
  • Linguistic approaches to Machine Translation
  • Direct translation
  • Translation by transfer
  • Interlingua approach

3
Motivation
  • Machine Translation
  • of natural language text / speech
  • First MT approaches model human translation
    process
  • language-specific analysis, transfer and
    generation
  • lexical and grammatical knowledge about source
    and target language

4
Linguistic Terminology and Background
  • Word-level
  • Morphology
  • Word Classes and Grammatical Categories
  • Lexical semantics
  • Sentence-level
  • Syntax structure of sentences
  • Semantics representation of meaning

5
What is a word?
  • The quick brown fox jumps over the lazy dog.
  • tokenization segment sequence of characters
    into words
  • convention in western languages word boundaries
    spaces
  • Many Asian languages no spaces between words or
    even sentences
  • Thequickbrownfoxjumpsoverthelazydog
  • Ambiguous characters make word segmentation
    difficult

6
Morphology
  • internal structure von words, word formation
  • words are composed of morphemes smallest
    meaning-carrying units of language
  • house, house-s, small, small-est,
    un-predict-able, un-happi-ness
  • morpheme types
  • stem morphemes may appear as separate words
    house-s, Tisch-e
  • functional or bound morphemes need to be
    connected with stem morphemes
  • afix types prefix, suffix, infix, circumfix
  • un-happy, kauf-st, Gespräch-s-ablauf, ge-kauf-t

7
Morphology
  • Word formation through morpheme composition
  • Inflection tenses, count, person, case
  • walk-s, walk-ed, kauf-st, kauf-te, car, car-s,
    Haus, (des) Haus-es, , ein-e schön-e Blume,
    arbol-es verde-s
  • function of the morpheme add information about
    tenses, count, person and case
  • Derivation
  • happi-ness, un-predict-able, Zufrieden-heit,
    un-brauch-bar, ent-kleiden
  • bound morphemes derive new words e.g. noun ?
    adjective
  • Composition
  • rain-bow, water-proof, Haus-tür, Einkauf-s-wagen,
    eis-kalt
  • modification of the stem morpheme ? create new
    words

8
Morphology Specialities
  • morpho-phonological processes at morpheme
    boundaries
  • stick / sticks, go / goes
  • Fugen-s in German compounds Universität-s-profe
    ssor
  • German Umlaut Mutter / Mütter
  • Vowel Harmony
  • Hungarian, Turkish choice of suffix depends on
    already existing vowels in the word

9
Morphological Analysis
  • finite state automata
  • grammatical and lexical knowledge
  • inflectional schemata
  • lexical information which inflectional class
    each word belongs to
  • for most languages very good coverage and fast

10
Word Classes
  • The role a word plays within a sentence is
    determined by its part of speech (POS)
  • Noun power, apple, beauty,
  • Verb go, sleep, gehen, essen,
  • Adjective red, happy, asleep, groß (Haus),
    schnell (Auto),
  • Adverb often, happily, immediately, schnell
    (fahren),
  • Determiner the, a, all (the), which, der, eine,
    alle,
  • Pronoun she, it, them, er, sie, es, mein,
    wessen,
  • Preposition under, of, in, auf, bei,
  • Conjunction and, because, if, und, auch, obwohl,

11
POS Tagger
  • a words POS gives useful information for
    translation
  • ambiguous words can (noun), can (verb)
  • statistical POS tagger
  • assign each word a POS based on relative
    frequency counts and its context in a training
    corpus

12
Grammatical Categories nouns, pronouns,
adjectives
  • person
  • pronoun and verb have to agree in this feature I
    go, he goes
  • pro-drop (pronoun dropping) languages e.g. (yo)
    hablo I speak
  • number
  • singular, dual, trial, plural singular I vs.
    plural we
  • gender
  • masculine, feminine, neuter, animate, inanimate
  • case
  • role of participant within phrase distinguish
    subject, object,
  • nominative, genitive, accusative, dative,
    partitive, locative,

13
Grammatical Categories - verbs
  • tense future, past or present
  • walk, is walking, walked, have walked, had
    walked, will walk, will have walked
  • Japanese even adjectives and nouns are marked
    shiroi (white), shirokatta (was white)
  • aspect
  • completeness, habituality, progressiveness
  • e.g. imperfect vs. perfect habitual she sings,
    continuous she is singing
  • mood
  • factuality, likelihood, possibility, uncertainty
  • indicative (he is here), subjunctive (if he were
    here), optative
  • voice
  • active (Mary kisses John), passive (John is
    kissed by Mary), middle (greek I get myself
    taught), causative

14
Lexical Semantics
  • ambiguous meanings of words
  • polysemy words with same surface form have
    different (related) meaning
  • interest Interesse, Zinsen, Anteil
  • bank financial institution, to sit on, of a
    river
  • homonymy completely unrelated meaning
  • can you can do it!, a can of beans
  • ? in order to translate, the correct meaning
    within the given context has to be identified
    word sense disambiguation
  • relations between words
  • synonmy need require
  • antonymy related unrelated big small cheap
    expensive
  • hypernymy (is-a) house building
  • meronymy (part-of) door house

15
Sentences and sentence structure
  • Sequence of words terminated by punctuation mark
    full stop, question/exclamation mark,
  • Jane bought the house.
  • SUBJECT VERB (OBJECT)
  • subject of a sentence is a phrase headed by a
    noun ? noun phrase (NP)
  • Jane, the woman, a woman, the young woman, she,
    the young woman who lives across the street
  • the verb is in the second position (in English)
  • number of objects is determined by the verb and
    its valency
  • transitive verbs 1 or more objects to buy s.th.
    (valency1) to give s.o. s.th (valency2)
  • intransitive verbs 0 objects to sleep
  • verb and object(s) together form a constituent ?
    verb phrase (VP)
  • valency needs to be satuated for the sentence to
    be complete
  • Jane bought. Jane slept the house.

16
Sentence structure cont.
  • additional information can be added to the
    sentence in terms of adjuncts
  • prepositional phrases (PP)
  • Jane bought the house (from Jim) (without
    hesitation).
  • Jane bought the house (in the posh neighborhood
    across the river).
  • adverbs
  • (Yesterday) Jane bought the house (cheaply).
  • embedded clauses
  • relative clauses
  • Jane (who recently won in the lottery) bought the
    house (that was just put on the market).
  • ? nested sentences, recursive structure of
    sentences

17
Syntactical Theory
  • Assumption
  • natural language sentences follow certain
    regularities
  • constituents
  • precedence
  • sentence structure can be modeled by a
    context-free grammar, e.g. Phrase Strucure
    Grammar
  • G ltV, S, P, Sgt where
  • V Non-terminal symbols here Syntactic
    Categories
  • S, subset of V Terminal symbols here POS
  • P set of production rules describing constituent
    structure
  • S Start symbol Category Sentence

18
Phrase Structure Grammar
  • Example Grammar Production Rules
  • S ? NP VP
  • NP ? det N
  • NP ? NP PP
  • VP ? V NP
  • VP ? V NP PP
  • PP ? P NP
  • Example Lexicon
  • Jane NP
  • house N
  • telescope N
  • man N
  • the det
  • a det
  • sees V
  • buys V
  • with P

19
Syntactical Parsing
  • given a natural language sentence, return a
    syntactical parse tree
  • headed by an S node
  • spanning all words in the sentence
  • Parsing strategies
  • bottom-up
  • top-down

20
Syntactical Parsing
Jane buys a house.
21
Syntactical Parsing
Lexicon look-up Jane NP buys V a det house
N
NP V det N Jane buys a house.
22
Syntactical Parsing
Grammar look-up no rules
NP V det N Jane buys a house.
23
Syntactical Parsing
Grammar look-up no rules
NP V det N Jane buys a house.
24
Syntactical Parsing
Grammar look-up NP ? det N
NP
NP V det N Jane buys a house.
25
Syntactical Parsing
Grammar look-up VP ? V NP
VP
NP
NP V det N Jane buys a house.
26
Syntactical Parsing
  • no words or nodes left
  • highest node S
  • ? parse complete

Grammar look-up S ? NP VP
S
VP
NP
NP V det N Jane buys a house.
27
Syntactical Parsing
S
  • det N Rel V Prep det N adv
    V det N Prep det N
  • The woman who won in the lottery yesterday bought
    the house across the street.

NP
VP
SRel
NP
VP
PP
PP
NP
NP
NP
NP
28
Syntactical Phenomena
  • certain syntactical phenomena cannot be covered
    by simple CFGs
  • Agreement of morpho-syntactic features
  • English Subject-Verb (agreement in person and
    number)
  • German Subject-Verb (person number)
    determiner - adjective noun (case, number,
    gender)
  • Subcategorization ensure correct amount of
    arguments for a verb
  • intransitive verbs Jane sleeps. / Jane sleeps
    the house.
  • transitive verbs Jane bought the house. / Jane
    bought.

29
Syntactical Phenomena
  • Long-distance dependencies
  • Maria hat letzten Sonntag, obwohl sie sich nicht
    gut gefühlt hat, die Hausaufgaben gemacht.
  • Variable word order in German
  • Maria gibt dem Mann das Buch.
  • Dem Mann gibt Maria das Buch.
  • Das Buch gibt Maria dem Mann.
  • Maria dem Mann gibt das Buch.

30
Unification Grammars
  • feature structures represent properties of
    linguistic objects
  • categorial (POS) information
  • agreement information
  • case, tense,
  • basic principle unification

31
Unification Grammars
  • agreement of subject and verb using reentrancies
    1

32
HPSG
  • Head-driven phrase structure grammar (Pollard
    Sag, 1987)
  • composition of sentence from phrase constituents
    as in phrase structure grammar
  • typed feature structures
  • unification
  • ? ensure agreement of feature values and correct
    subcategorization

33
Subcategorization in HPSG
  • She drinks wine
  • verb drink requires 2 arguments to fill ist
    subcategorization frame
  • subject 1
  • object 2

34
Parsing Difficulties
  • Lexical Ambiguities
  • word is assigned multiple POS tags in lexicon
  • Time flies like an arrow.
  • ? try to disambiguate during parsing
  • Structural Ambiguities
  • sentence has multiple correct parses
  • sentence consituent may be part of several
    grammar rules
  • Jane saw the man with the telescope NP ? NP
    PP
  • Jane saw the man with the telescope VP ? NP
    PP
  • Garden Path sentences
  • initial parse leads into the wrong direction
    (garden path)
  • The horse raced past the barn fell.

35
Structure Ambiguities / Garden Path sentences
  • NP/VP Attachment Ambiguity
  • The cop saw the burglar with the
    binoculars
  • The cop saw the burglar with the gun
  • NP/S Complement Attachment Ambiguity
  • The athlete realised his goal last week
  • The athlete realised his shoes were across
    the room
  • Clause-boundary Ambiguity
  • Since Jay always jogs a mile the race
    doesnt seem very long
  • Since Jay always jogs a mile doesnt seem
    very long
  • Red. Relative-Main Clause Ambiguity
  • The woman delivered the junkmail on
    Thursdays
  • The woman delivered the junkmail threw it
    away
  • Relative/Complement Clause Ambiguity
  • The doctor told the woman that he was in
    love with to leave
  • The doctor told the woman that he was in
    love with her

36
Semantics
  • meaning of natural language constructs words,
    sentences, text/discourse
  • compositionality of meaning
  • meaning of a sentence is composed from the
    meaning of its parts
  • Semantic formalisms to represent natural language
    meaning
  • First order logic
  • Higher order logic formalisms
  • Frame semantics

37
First order logic
  • composition of sentence meaning
  • incrementally constructed
  • meaningful constituents

S
love(peter, jane)
VP
love(_, jane)
NP V NP Peter loves Jane peter love(_,_) jane
38
Higher order logic
  • Limitations of first order logic
  • The red door door(x) and red(x)
  • The alleged criminal criminal(x) and alleged(x)
    ???
  • Every student writes a paper Ax (student(x) ? Ey
    (paper(y) AND write(x, y)))
  • ? solution Type theory

39
Backup Type Theory
  • words are assigned types according to their
    abilities to merge with other words
  • type e entity type t truth values
  • composition along the syntactical tree
  • S John sleeps tNPJohn e VP sleeps lte,tgt
  • S John loves Mary t VP loves Mary lte,tgtNP
    John e V loves lte, lte,tgtgt NP Marye

40
Backup Type Theory - Complex Example
  • S every alleged criminal loves Mary t
  • NP every alleged criminal ltlte,tgt,tgt
  • N alleged criminal lte,tgt
  • VP loves Mary lte,tgt
  • det every ltlte,tgtltlte,tgttgtgt adj alleged
    ltlte,tgtlte,tgtgt N criminal lte,tgt V loves lte,lte,tgtgt
    NP Marye

41
Frame Semantics
  • meaning depends on knowledge about the world
  • world knowledge is encoded in frames mental
    representation of stereotypical situations
  • to buy commercial transaction frame
  • elements involved seller, buyer, goods, price
    (required), invoice, receipt (optional)
  • relations within frames
  • seller owns goods, determines price, buyer has
    money, pays price, exchange of goods and money,
    reversed property situation afterwards
  • relations between frames
  • to buy to sell same elements, direction
    reversed

42
Challenges of Natural Language Processing
  • disambiguation
  • word sense
  • structural
  • co-references refering to objects within and
    across sentence boundaries
  • anaphora pronouns
  • The man goes to work. He takes the bus.
  • deictic references depend on context
  • here, now, I, you refer to different
    objects depending on the context in which they
    occur
  • refering to the same object using a synonym,
    hypernym
  • Jane bought the house on Elm Street. The building
    had just been put on the market.

43
Translation Challenges
  • problematic issues when dealing with machine
    translation in particular
  • Translation divergences
  • meaning is conveyed, syntactic structure and
    semantic distribution of meaning components is
    different in the two languages
  • Translation mismatches
  • difference in information content between source
    and target language

44
Translation Divergences
  • Structural divergence word order
  • En a delicious soup ? Sp una sopa deliciosa
  • En She saw that he left the house after dinner ?
    De Sie sah, dass er das Haus nach dem
    Abendessen verließ.
  • Thematic divergence changes of grammatical role
  • En You like her. ? Sp Ella te gusta.
  • Lit She you_acc pleases.
  • Head switching
  • En The baby just ate. ? Sp El bebé acaba de
    comer.
  • Lit The baby finishes of to-eat
  • syntactic head translated as modifier /
    complement of new head

45
Translation Divergences cont.
  • Lexical Gap
  • En rice ? Japanese cooked-rice, rice-plant,
    uncooked-rice
  • 1n mapping, no direct correspondence
  • Lexicalization
  • En swim across ? Sp cruzar nadando
  • to cross swimming
  • semantic content differently distributed
  • Categorial
  • En a little bread ? Sp un poco de pan
  • adjective ? noun
  • Collocational
  • En make a decision ? De eine Entscheidung
    treffen

46
Linguistic approach to Machine Translation
  • Perform translation at different levels of
    linguistic abstraction
  • Direct translation no abstraction
  • Syntactic transfer
  • Semantic transfer
  • Interlingua

Vauquois Triangle
47
Direct Translation
  • earliest approach to MT
  • simple word-level analysis and generation
  • POS
  • morphology
  • source-target language dictionary
  • bilingual word mapping
  • Problems
  • idiomatic expressions
  • different word order, structural shifts

John eats an apple.
morphological Analysis
Bilingual Dictionary
morphological Generation
John isst einen Apfel.
48
Transfer-based systems
  • second generation of MT systems Rule-based MT
  • Transfer in 3 steps
  • analysis of source sentence ? abstract
    representation
  • transfer of source language representation into
    target language representation
  • generation target language representation ?
    surface form of target language sentence
  • System Components
  • lexica
  • monolingual source language lexicon
  • monolingual target language lexicon
  • bilingual dictionary entries
  • grammar
  • monolingual analysis and generation
  • bilingual transfer rules

49
Syntactical Transfer
  • Level of abstraction syntactical representation
  • Analysis of source language (SL) sentence into
    source language dependent syntactic tree
  • Transfer of SL syntactical tree into target
    language (TL) syntactical tree
  • Generation of TL natural language sentence

John eats an apple.
John isst einen Apfel.
monoling. Lexicon
Analysis
monoling. Lexicon
Generation
monoling. Grammar
biling. Lex. Rules
monoling. Grammar
biling. gramm. Rules
Transfer
50
Example Translation
  • Det Adj N Det N Adj
  • A delicious soup Una sopa deliciosa

NP
NP
N1
N1
SL Tree
TL Tree
51
Syntactical Tree Transformation Rules
  • NP NP
  • tv(X) tv(Y) tv(X) tv(Y)
  • N1 N1
  • Adj N N Adj
  • tv(A) tv(B) tv(B) tv(A)
  • Det Det
  • A Una
  • delicious deliciosa
  • soup sopa

tv translation variable
52
Semantical Transfer
  • Level of abstraction semantic representation
  • Analysis of SL sentence into semantic
    representation
  • Transfer of SL representation into TL
    representation
  • Generation of TL natural language sentence

John eats an apple.
John isst einen Apfel.
monoling. Lexicon
Analysis
monoling. Lexicon
Generation
monoling. Grammar
biling. Lex. Rules
monoling. Grammar
biling. gramm. Rules
Ex (eat(John, x) AND apple(x))
Ex (essen(John, x) AND apfel(x))
Transfer
53
Example translation
  • you like her ella te gusta
  • SL representationpres, like, L, aterm(pro,X,
    and, hearer, X,personal, X), aterm(pro,Y,
    and, fem, Y, sing, Y, personal, Y)
  • TL representationpres, gustar, L, aterm(pro,
    Y, and, fem, Y, sing, Y, personal,
    Y), aterm(pro,X, and, hearer,
    X,personal, X)
  • Transfer Ruleslike, E, Arg1E, Arg2E
    ltgt gustar, E, Arg2S, Arg1S -Arg1E ltgt
    Arg1S,Arg2E ltgt Arg2S.

54
Transfer-based MT systems
  • translation process based on linguistic
    properties
  • actual linguistic theories, e.g. HPSG, Frame
    Semantics
  • or system-internal linguistic representation
    tailored for translation purposes
  • variable level of abstraction
  • transfer rules explicitly model the differences
    between languages
  • Disadvantages
  • (bilingual) language specialist required to
    develop linguistic components
  • including new languages
  • 3 new components Analysis, Transfer, Generation

55
Interlingua MT
  • Idea
  • intermediate language
  • abstract language-independent representation
    pure meaning
  • Translation without transfer
  • analyze input sentence and generate interlingua
    representation
  • generate target language sentence directly from
    interlingua representation
  • access to world knowledge

English target sentence
English source sentence
German target sentence
Interlingua
German source sentence
French target sentence
French source sentence
target sentence
source sentence
56
Representations in Interlingua MT
  • Interlingua Representation
  • language independent
  • encode linguistic knowledge
  • subject, objects,
  • non-linguistic knowledge
  • pragmatic factors focus, relations among text
    units, speaker attitudes and intentions,
    stylistics, speech acts, deictic references,
    prior context, physical context
  • Representation of World Knowledge
  • ontology non-linguistic knowledge about things
    and relations between them
  • an apple is a fruit, property of fruits edible,
    property of eat subject needs to be animate
  • inference mechanisms
  • logical operators and, or, If then else
  • textual entailment John stopped smoking ? He
    must have smoked before.

57
Interlingua Translation
  • Level of abstraction pure meaning
  • Analysis of SL sentence into interlingua
    representation
  • Generation of TL natural language sentence from
    interlingua

John eats an apple.
John isst einen Apfel.
monoling. Lexicon
Generation
Analysis
World Knowledge
monoling. Lexicon
monoling. Grammar
monoling. Grammar
Interlingua Repr.
58
Interlingua MT
  • lower engineering effort for including new
    languages
  • 1 analysis component for each new source language
  • 1 generation component for each new target
    language
  • no transfer rules needed
  • Translation divergences are handled at
    monolingual level
  • suitable encoding of lexical and grammar entries
    necessary
  • Disadvantages
  • language specialist still required
  • world knowledge necessary
  • ? available systems use domain model covering
    only a small domain
  • true interlingua not reached so far

59
Conclusion
  • linguistic approaches to machine translation
  • direct translation
  • transfer of linguistic representations
  • translation via interlingua
  • based on lexical and grammatical knowledge about
    languages
  • require language expert to design lexicon and
    grammar components of the MT system
Write a Comment
User Comments (0)
About PowerShow.com