Title: Language Divergences and Solutions
1 Language Divergences and Solutions
- Advanced Machine Translation Seminar
- Alison Alvarez
2Overview
- Introduction
- Morphology Primer
- Translation Mismatches
- Types
- Solutions
- Translation Divergences
- Types
- Solutions
- Different MT Systems
- Generation Heavy Machine Translation
- DUSTer
3Source ? Target
- Languages dont encode the same information in
the same way - Makes MT complicated
- Keeps all of us employed
4Morphology in a Nutshell
- Morphemes are word parts
- Work er
- Iki ta ku na ku na ri ma shi ta
- Types of Morphemes
- Derivational makes new word
- Inflectional adds information to an existing word
5Morphology in a Nutshell
- Analytic/Isolating
- little or no inflectional morphology, separate
words - Vietnamese, Chinese
- I was made to go
- Synthetic
- Lots of inflectional morphology
- Fusional vs. Agglutinating
- Romance Languages, Finnish, Japanese, Mapudungun
- Ika (to go) se (to make/let) rare (passive) ta
(past tense) - He need s (3rd person singular) it.
6Translation Differences
- Types
- Translation Mismatches
- Different information from source to target
- Translation Divergences
- Same information from source to target, but the
meaning is distributed differently in each
language
7Translation Mismatches
- the information that is conveyed is different
in the source and target languages - Types
- Lexical level
- Typological level
8Lexical Mismatches
- A lexical item in one language may have more
distinctions than in another
Brother
? otouto Younger Brother
??? Ani-san Older Brother
9Typological Mismatches
- Mismatch between languages with different levels
of grammaticalization - One language may be more structurally complex
- Source marking, Obligatory Subject
10Typological Mismatches
- Source Quechua vs. English
- (they say) s/he was singing --gt takisharansi
- taki (sing) sha (progressive) ra (past) n
(3rd sg) si (reportative) - Obligatory Arguments English vs. Japanese
- Kusuri wo Nonda --gt (I, you, etc.) took medicine.
- Makasemasu! --gt(Ill) leave (it) to (you)
11Translation Mismatch Solutions
- More information --gt Less information (easy)
- Less information --gt More information (hard)
- Context clues
- Language Models
- Generalization
- Formal representations
12Translation Divergences
- the same information is conveyed in source and
target texts - Divergences are quite common
- Occurs in about 1 out of every three sentences in
the TREC El Norte Newspaper corpus
(Spanish-English) - Sentences can have multiple kinds of divergences
13Translation Divergence Types
- Categorial Divergence
- Conflational Divergence
- Structural Divergence
- Head Swapping Divergence
- Thematic Divergence
14Categorial Divergence
- Translation that uses different parts of speech
- Tener hambre (have hunger) --gt be hungry
- Noun --gt adjective
15Conflational Divergence
- The translation of two words using a single word
that combines their meaning - Can also be called a lexical gap
- X stab Z --gt X dar puñaladas a Z (X give stabs to
Z) - glastuinbouw --gt cultivation under glass
16Structural Divergence
- A difference in the realization of incorporated
arguments - PP to Object
- X entrar en Y (X enter in Y) --gt X enter Y
- X ask for a referendum --gt X pedir un referendum
(ask-for a referendum)
17Head Swapping Divergence
- Involves the demotion of a head verb and the
promotion of a modifier verb to head position
S NP VP N V PP I ran into the room.
S NP VP N V PP VP Yo entro en el cuarto
corriendo
18Thematic Divergence
- This divergence occurs when sentence arguments
switch argument roles from one language to
another - X gustar a Y (X please to Y) --gt Y like X
19Divergence Solutions and Statistical/EBMT Systems
- Not really addressed explicitly in SMT
- Covered in EBMT only if it is covered extensively
in the data
20Divergence Solutions and Transfer Systems
- Hand-written transfer rules
- Automatic extraction of transfer rules from
bi-texts - Problematic with multiple divergences
21Divergence Solutions and Interlingua Systems
- Melcuks Deep Syntactic Structure
- Jackendoffs Lexical Semantic Structure
- Both require explicit symmetric knowledge from
both source and target language - Expensive
22Divergence Solutions and Interlingua Systems
John swam across a river
event CAUSE JOHN event GO JOHN path ACROSS
JOHN position AT JOHN RIVER manner
SWIMINGLY
Juan cruza el rÃo nadando
23Generation-Heavy MT
- Built to address language divergences
- Designed for source-poor/target-rich translation
- Non-Interlingual
- Non-Transfer
- Uses symbolic overgeneration to account for
different translation divergences
24Generation-Heavy MT
- Source language
- syntactic parser
- translation lexicon
- Target language
- lexical semantics, categorial variations
subcategorization frames for overgeneration - Statistical language model
25GHMT System
26Analysis Stage
- Independent of Target Language
- Creates a deep syntactic dependency
- Only argument structure, top-level conceptual
nodes thematic-role information - Should normalize over syntactic morphological
phenomena
27Translation Stage
- Converts SL lexemes to TL lexemes
- Maintains dependency structure
28Analysis/Translation Stage
GIVE (v) cause go
I agent
STAB (n) theme
JOHN goal
29Generation Stage
- Lexical Structural Selection
- Conversion to a thematic dependency
- Uses syntactic-thematic linking map
- loose linking
- Structural expansion
- Addresses conflation head-swapped divergences
- Turn thematic dependency to TL syntactic
dependency - Addresses categorial divergence
30Generation Stage Structural Expansion
31Generation Stage
- Linearization Step
- Creates a word lattice to encode different
possible realizations - Implemented using oxyGen engine
- Sentences ranked extracted
- Nitrogens statistical extractor
32Generation Stage
33GHMT Results
- 4 of 5 Spanish-English divergences can be
generated using structural expansion categorial
variations - The remaining 1 out of 5 needed more world
knowledge or idiom handling - SL syntactic parser can still be hard to come by
34Divergences and DUSTer
- Helps to overcome divergences for word alignment
improve coder agreement - Changes an English sentence structure to resemble
another language - More accurate alignment and projection of
dependency trees without training on dependency
tree data
35DUSTer
- Motivation for the development of automatic
correction of divergences - Every Language Pair has translation divergences
that are easy to recognize - Knowing what they are and how to accommodate
them provides the basis for refined word level
alignment - Refined word-level alignment results in
improved projection of structural information
from English to another language
36DUSTer
37DUSTer
- Bi-text parsed on English side only
- Linguistically Motivated common search terms
- Conducted on Spanish Arabic (and later Chinese
Hindi) - Uses all of the divergences mentioned before,
plus a light verb divergence - Try ? put to trying ? poner a prueba
38DUSTer Rule Development Methods
- Identify canonical transformations for each
divergence type - Categorize English sentences into divergence type
or none - Apply appropriate transformations
- Humans align E ? E ? foreign language
39DUSTer Rules
- "kill" gt "LightVB kill(N)" (LightVB light
verb) - Presumably, this will work for "kill" gt "give
death to - "borrow" gt "take lent (thing) to
- "hurt" gt "make harm to
- "fear" gt "have fear of
- "desire" gt "have interest in
- "rest" gt "have repose on
- "envy" gt "have envy of
- type1.B.X English2 1 3 Spanish2 1 3 4 5
- Verblt1,i,CatVarV_Ngt Nounlt2,j,Subjgt
Nounlt3,k,Objgt lt--gt - LightVBlt1,Verbgt Nounlt2,j,Subjgt
Nounlt3,i,Objgt Obliquelt4,Pred,Prepgt
Nounlt5,k,PObjgt
40DUSTer Results
41Conclusion
- Divergences are common
- They are not handled well by most MT systems
- GHMT can account for divergences, but still needs
development - DUSTer can handle divergences through structure
transformations, but requires a great deal of
linguistic knowledge
42The End
43References
- Dorr, Bonnie J., "Machine Translation
Divergences A Formal Description and Proposed
Solution," Computational Linguistics, 204, pp.
597--633, 1994. - Dorr, Bonnie J. and Nizar Habash, "Interlingua
Approximation A Generation-Heavy Approach", In
Proceedings of Workshop on Interlingua
Reliability, Fifth Conference of the Association
for Machine Translation in the Americas,
AMTA-2002,Tiburon, CA, pp. 1--6, 2002 - Dorr, Bonnie J., Clare R. Voss, Eric Peterson,
and Michael Kiker, "Concept Based Lexical
Selection," Proceedings of the AAAI-94 fall
symposium on Knowledge Representation for Natural
Language Processing in Implemented Systems, New
Orleans, LA, pp. 21--30, 1994. - Dorr, Bonnie J., Lisa Pearl, Rebecca Hwa, and
Nizar Habash, "DUSTer A Method for Unraveling
Cross-Language Divergences for Statistical
Word-Level Alignment," Proceedings of the Fifth
Conference of the Association for Machine
Translation in the Americas, AMTA-2002,Tiburon,
CA, pp. 31--43, 2002. - Habash, Nizar and Bonnie J. Dorr, "Handling
Translation Divergences Combining Statistical
and Symbolic Techniques in Generation-Heavy
Machine Translation", In Proceedings of the Fifth
Conference of the Association for Machine
Translation in the Americas, AMTA-2002,Tiburon,
CA, pp. 84--93, 2002. - Haspelmath, Martin. Understanding Morphology.
Oxford Univeristy Press, 2002. - Kameyama, Megumi and Ryo Ochitani, Stanley
Peters Resolving Translation Mismatches With
Information Flow Annual Meeting of the
Assocation of Computational Linguistics, 1991
44Other Divergences
- Idioms
- Aspectual Divergences
- Knowledge outside of Lexical Semantics