Latest Developments in SMT - PowerPoint PPT Presentation

About This Presentation

Title:

Latest Developments in SMT

Description:

... 2003; example stolen from Knight & Koehn http://www.iccs.inf.ed.ac.uk/~pkoehn ... Relates primarily to transfer (or equiv.) Statistical vs. logical ... – PowerPoint PPT presentation

Number of Views:82

Avg rating:3.0/5.0

Slides: 73

Provided by: harold

Category:

more less

Transcript and Presenter's Notes

Title: Latest Developments in SMT

1
Latest Developments in (S)MT
MT Wars II The Empire (Linguistics) strikes back

Harold Somers
University of Manchester

2
Overview

The story so far
EBMT
SMT
Latest developments in RBMT
Is there convergence?
Some attempts to classify MT
(Carl and Wus MT model spaces)
Has the empire struck back?

3
The story so far EBMT

Early history well known
Nagao (1981/3)
Early development as part of RBMT
Relationship with Translation Memories
Focus (cf. Somers 1998) on
Matching algorithms
Selection and storage of examples
Mainly sentence-based
TL generation (Recombination) not much addressed

Somers, H. (1998) New paradigms in MT, 10th
European Summer School in Logic, Language and
Information, Workshop on MT, Saarbrücken revised
version in Machine Translation 14 (1999) and 2nd
revised version in M. Carl A. Way (2003) Recent
Advances in EBMT (Kluwer).
4
EBMT in a nutshell

(In case youve been on Tatooine for the last 15
years)
Database of paired examples
Translation involves
Finding the best example(s) (matching)
Identifying which bits do(nt) match (alignment)
Replacing the non-matching bits (if multiple
examples, gluing them together) (recombination)
All of the above at run-time

5
EBMT in a nutshell (cont.)

Main difficulty is boundary friction in two
senses

The old man is dead Le vieil homme est mort The
old woman is dead
Le vieil femme est mort
The operation was interrupted because the file
was hidden. a. The operation was interrupted
because the Ctrl-c key was pressed.
Lopération a été interrompue car la touché
Ctrl-c a été enfoncée. b. The specified method
failed because the file is hidden. La méthode
spécifiée a échoué car le fichier est masqué
6
EBMT later developments

Example generalisation (templates)
Incorporation of linguistic resources and/or
statistical measures
Structured representation of examples
Use of statistical techniques

7
Example generalisation

(Furuse Iida, Kaji et al., Matsumoto et
al., Carl, Cicekli Güvenir, Brown, McTait, Way
et al.)
Similar examples can be combined to give a more
general example
Can be seen as a way of generating transfer rules
(and lexicons)
Process may be entirely automatic, based on
string matching
or seeded using linguistic information (POS
tags) or resources (bilingual dictionary)

8
Example generalisation (cont.)
The monkey ate a peach ? saru wa momo o tabeta
The man ate a peach ? hito wa momo o
tabeta
monkey ? saru man ? hito
The ate a peach ? wa momo o tabeta
The dog ate a rabbit ? inu wa usagi o tabeta
dog ? inu rabbit ? usagi
The x ate a ...y ? x wa y tabeta
9
Example generalisation (cont.)

Thats too simple (e.g. because of boundary
friction)
Need to introduce constraints on the slots, e.g.
using POS tags and morphological information
(which implies some other processing)
Can use clustering algorithms to infer
substitution sets

10
Incorporation of linguistic resources

Actually, early EBMT used all sorts of linguistic
resources
Briefly there was a move towards more pure
approaches
Now we see much use of POS tags (sometimes only
partial, e.g. marker words Way et al.),
morphological analysis (as just mentioned),
bilingual lexicons
Target-language grammars for recombination/generat
ion phase

11
Incorporation of statistical measures

Example database preprocessed to assign weights
(probabilities) to fragments and their
translations (Aramaki et al.)
Good way of handling ambiguities due to
alternative translations
Clustering words into equivalence classes for
example generalization (Brown)
Using statistical tools to extract translation
knowledge from parallel corpora (Yamamoto
Matsumoto)
Statistically induced grammars for translation or
generation, as in ...

12
Use of structured representations

Again, a feature of early EBMT, now reappearing
Translation grammars induced from the example set
Examples stored as tree structures
(overwhelmingly dependency structures)

13
Translation grammars

Carl generates translation grammars from aligned
linguistically annotated texts
WayData-Oriented Translation based on Poutsmas
DOP, using both PS and LFG models)

14
Structured examples

Use of tree comparison algorithms to extract
translation patterns from parsed corpora/tree
banks (Watanabe et al.)
Translation pairings extracted from aligned
parsed examples (Menezes Richardson)
Tree-to-string approach used by Langlais Gotti
and Liu et al. ( statistical generation model)

15
Typical use of structured examples

Rule-based analysis and generation
example-based transfer
Input is parsed into representation using a
traditional or statistics-based analyser
TL representation constructed by combining
translation mappings learned from the parallel
corpus
TL sentence generated using a hand-written or
machine-learned generation grammar
Is this still EBMT?
Note that the only example-based part is use of
mappings which are learned, not computed at
run-time

16
Pure EBMT (Lepage Denoual)

In contrast (but now something of an oddity)
pure analogy-based EBMT
Use of proportional analogies ABCD
Terms in the analogies are translation pairs
A?A B?B C?C D?D

17
(No Transcript)
18
Pure EBMT

No explicit transfer
No extraction of symbolic knowledge
No use of templates
Analogies do not always represent any sort of
linguistic reality
No training or preprocessing
Solving the proportional analogies is done at
run-time

19
The story so far (SMT)

Early history well known
IBM group inspired by improved results in speech
recognition when non-linguistic approach taken
Availability of Canadian Hansards inspired purely
statistical approach to MT (1988)
Immediate partial success (60) to the dismay of
MT people
Early observers (Wilks) predicted hybrid methods
(stone soup) would evolve
Later developments
Phrase-based SMT
Syntax-based SMT

20
SMT in a nutshell

(In case youve been on Kamino for the last 15
years)
From parallel corpus two sets of statistical data
are extracted
Translation model probabilities that a given
word e in the SL gives rise to a word f in the TL
(Target) language model most probable word-order
for the words predicted by the translation model
These two models are computed off-line
Given an input sentence, a decoder applies the
two models, and juggles the probabilities to get
the best score various methods have been
proposed

21
SMT in a nutshell (cont.)

The translation model has to take into account
the fact that
for a given e in there may be various different
fs depending on context (grammatical variants as
well as alternatives due to polysemy or homonymy)
a given e may not necessarily correspond to a
single f, or any f at all fertility
(e.g. may have ? aurait implemented ? mis en
application)

22
SMT in a nutshell (cont.)

The language model has to take into account the
fact that
The TL words predicted by the translation model
will not occur in the same order as the SL words
distortion
TL word choices can depend on neighbouring words
(which may be easy to model) or, especially
because of distortion, more distant words
long-distance dependencies, much harder to
model

23
SMT in a nutshell (cont.)

Main difficulty combination of fertility and
distortion
Zeitmangel erschwert das Problem.
Lack of time makes the problem more difficult.
Eine Diskussion erübrigt sich demnach.
Therefore there is no point in discussion.
Das ist der Sache nicht angemessen.
That is not appropriate for this matter.
Den Vorschlag lehnt die Kommission ab.
The Commission rejects the proposal.

24
SMT later developments

Phrase-based SMT
Extend models beyond individual words to word
sequences (phrases)
Direct phrase alignment
Word alignment induced phrase model
Alignment templates
Results better than word-based models, and show
improvement proportional (log-linear) to corpus
size
Phrases do not correspond to constituents, and
limiting them to do so hurts results

25
Direct phrase alignment

(Wang Waible 1998, Och et al., 1999, Marcu
Wong 2002)
Enhance word translation model by adding joint
probabilities, i.e. probabilities for phrases
Phrase probabilities compensate for missing
lexical probabilities
Easy to integrate probabilities from different
sources/methods, allows for mutual compensation

26
Word alignment induced model

Koehn et al. 2003 example stolen from Knight
Koehn http//www.iccs.inf.ed.ac.uk/pkoehn/publica
tions/tutorial2003.pdf

Maria did not slap the green witch
Maria no daba una botefada a la bruja verda
Start with all phrase pairs justified by the word
alignment
27
Word alignment induced model

Koehn et al. 2003 example stolen from Knight
Koehn http//www.iccs.inf.ed.ac.uk/pkoehn/publica
tions/tutorial2003.pdf

(Maria, Maria), (no, did not) (daba una botefada,
slap), (a la, the), (verde, green), (bruja,
witch)
28
Word alignment induced model

Koehn et al. 2003 example stolen from Knight
Koehn http//www.iccs.inf.ed.ac.uk/pkoehn/publica
tions/tutorial2003.pdf

(Maria, Maria), (no, did not) (daba una botefada,
slap), (a la, the), (verde, green) (bruja,
witch), (Maria no, Maria did not), (no daba una
botefada, did not slap), (daba una botefada a la,
slap the), (bruja verde, green witch)
etc.
29
Word alignment induced model

Koehn et al. 2003 example stolen from Knight
Koehn http//www.iccs.inf.ed.ac.uk/pkoehn/publica
tions/tutorial2003.pdf

(Maria, Maria), (no, did not), (slap, daba una
bofetada), (a la, the), (bruja, witch), (verde,
green), (Maria no, Maria did not), (no daba una
bofetada, did not slap), (daba una bofetada a la,
slap the), (bruja verde, green witch), (Maria no
daba una bofetada, Maria did not slap), (no daba
una bofetada a la, did not slap the), (a la
bruja verde, the green witch), (Maria no daba una
bofetada a la, Maria did not slap the), (daba una
bofetada a la bruja verde, slap the green
witch), (no daba una bofetada a la bruja verde,
did not slap the green witch), (Maria no daba una
bofetada a la bruja verde, Maria did not slap the
green witch)
30
Word alignment induced model

Given the phrase pairs collected, estimate the
phrase translation probability distribution by
relative frequency (without smoothing)

31
Alignment templates

Och et al. 1999 further developed by Marcu and
Wong 2002, Koehn and Knight 2003, Koehn et al.
2003)
Problem of sparse data worse for phrases
So use word classes instead of words
alignment templates instead of phrases
more reliable statistics for translation table
smaller translation table
more complex decoding
Word classes are induced (by distributional
statistics), so may not correspond to intuitive
(linguistic) classes
Takes context into account

32
Problems with phrase-based models

Still do not handle very well ...
dependencies (especially long-distance)
distortion
discontinuities (e.g. bought habe ... gekauft)
More promising seems to be ...

33
Syntax-based SMT

Better able to handle
Constituents
Function words
Grammatical context (e.g. case marking)
Inversion Transduction Grammars
Hierarchical transduction model
Tree-to-string translation
Tree-to-tree translation

34
Inversion transduction grammars

Wu and colleagues (1997 onwards)
Grammar generates two trees in parallel and
mappings between them
Rules can specify order changes
Restriction to binary rules limits complexity

35
Inversion transduction grammars
36
Inversion transduction grammars

Grammar is trained on word-aligned bilingual
corpus Note that all the rules are learned
automatically
Translation uses a decoder which effectively
works like traditional RBMT
Parser uses source side of transduction rules to
build a parse tree
Transduction rules are applied to transform the
tree
The target text is generated by linearizing the
tree

37
(No Transcript)
38
(No Transcript)
39
Almost all possible mappings can be
handled Missing ones (crossing constraints) are
not found in Wus corpus But examples can be
found, apparently
40
Hierarchical transduction model

(Alshawi et al. 1998)
Based on finite-state transducers, also uses
binary notation
Uses automatically induced dependency structure
Initial head-word pair is chosen
Sentence is then expanded by translating the
dependent structures

41
Tree-to-string translation

(Yamada Knight 2001, Charniak 2003)
Uses (statistical) parser on input side only
Tree is then subject to reordering and insertion
according to models learned from data
Lexical translation is then done, again according
to probability models

42
(No Transcript)
43
Tree-to-tree translation

(Gildea 2003)
Use parser on both sides to capture structurual
differences
Subtree cloning
(Habash 2002, Cmejrek et al. 2003)
Full morphology/syntactic/semantic parsing
All based on stachastic grammars

44
Latest developments in RBMT

RBMT making a come-back (e.g. METIS)
Perhaps it was always there, just wasnt
represented in CL journals/conferences
There is some activity, but around the periphery
Open-source systems
development for low-density languages
Much use made of corpus-derived modules, eg
tagging, chunking
SMT is now RBMT, only the rules are learned
rather than written by linguists

45
Overview

The story so far
EBMT
SMT
Latest developments in RBMT
Is there convergence?
Some attempts to classify MT
(Carl and Wus MT model spaces)
Has the empire struck back?

46
Classifications of MT

Empirical vs. Rationalist
data- vs theory-driven
use (or not) of symbolic representation
From MLIM chapter 4
high vs. low coverage
low vs. high quality/fluency
shallow vs. deep representation
Distinguish in the above
design vs. consequence
How true are they anyway?

47
EBMTSMT Is there convergence?

Lively debate on mtlist
Articles by
Somers, Turcato Popowich in Carl Way (2003)
Hutchins, Carl, Wu (2006) in special issue of
Machine Translation
Slides marked need your input!

48
Essential features of EBMT

Use of bilingual corpus data as the main (only?)
source of knowledge (Somers)
Most early EBMT systems were hybrids
We do not know a priori which parts of example
are relevant (Turcato Popowich)
Raw data is consulted at run-time (little or) no
preprocessing
Therefore template-based EBMT is already a hybrid
(with RBMT)
Act of matching the input against the examples,
regardless of how they are stored (Hutchins)

49
Pros (and cons) of analogy model

Like CBR
Library of cases used during task performance
Analogous examples broken down, adapted,
recombined
In contrast with other machine learning methods
Offline learning to compile abstract performance
model
No loss of coverage due to incorrect
generalization during training
Guaranteed correct when input is exactly like an
example in the training set (not true of SMT)
But Lack of generalization leads to potential
runtime inefficiency

(Wu, 2006)
50
EBMTSMT Common features

Easily agreed
Use of bilingual corpus data as the main (only?)
source of knowledge
Translation relations are derived automatically
from the data
Underlying methods are independent of
language-pair, and hence of language similarity
More contentious
Bilingual corpus data should be real (a practical
issue for SMT, but some EBMT systems use
hand-crafted examples)
System can be easily extended just by adding more
data

51
EBMTRBMT common features

Hybrid is easy to conceive
Rule-based analysis/generation with example-based
transfer
Example-based processing only for awkward cases

52
SMTRBMT common features

Some versions of SMT exactly mirror classic RBMT
parse-transfer-generate
Same things are hard
Long-distance dependency
Discontinuous constituents

53
Wus 3D classification of all MT

Example-based vs. schema-based
abstraction or generalization performed at
run-time
Compositional vs. lexical
Relates primarily to transfer (or equiv.)
Statistical vs. logical
Pictures also show historical development

54
Classic (direct and transfer) MT models

Early systems (Georgetown) lexical and
compositional
Treatment of idioms, collocations, phrasal
translations in classical 2G transfer systems
Modern RBMT systems starting to adopt statistical
methods (according to Wu)
Where do commercial systems sit?

55
(No Transcript)
56
EBMT systems
57
SMT systems
58
Example-based SMT systems
59
Summary
60
Model space corpus-based MT (Carl 2000)

Based on Dummetts theory of meaning
Rich vs austere
Complexity of representations
Molecular vs holistic
Descriptions based on finite set of predefined
features vs global distinctions
Fine-grained vs coarse-grained
Based on smaller or larger units

61
Rich vs austere

Translation memories are most austere, depending
only on graphemic similarity
TMs with annotated examples (eg Planas Furuse)
are richer
Early EBMT systems, and recent systems where
examples are generalized are rich
EBMT using light annotation (eg TAGS, markers)
are moderately rich
Pure EBMT (Lepage Denoual) is austere
Early SMT systems were austere, but move towards
syntax makes them richer
Phrase-based SMT still austere

62
METIS
EBMT where examples are lightly annotated
Phrase-based SMT
Syntax-based SMT
Pure EBMT (Lepage)
Marker-based EBMT (Way)
Template-based EBMT (McTait, Brown, Cicekli)
Early SMT (Brown et al.)
Annotated translation memories
Translation memories
Classic EBMT (Sato, Nagao)
63
Molecular vs holistic

Early SMT purely holistic, as is pure EBMT
TMs molecular distance measure based on fixed
set of symbols
Translation templates are holistic, but molecular
if they depend on some sort of analysis
Phrase-based and syntax-based SMT highly
molecular

64
EBMT where examples are lightly annotated
Early SMT (Brown et al.)
METIS generation
Template-based EBMT (McTait, Brown)
Pure EBMT (Lepage)
Annotated translation memories
Phrase-based SMT
Syntax-based SMT
Marker-based EBMT (Way)
Classic EBMT (Sato, Nagao)
METIS analysis
Translation memories
Template-based EBMT (Cicekli)
65
Coarse- vs. fine-grained

Coarse-grained translates with bigger units
TM system wirks only on sentences coarse-grained
Word-based systems are fine-grained Early SMT
Phrase-based SMT slightly more coarse-grained
Template-based EBMT fine-grained

66
Template-based EBMT (McTait, Brown)
Early SMT (Brown et al.)
fine
Phrase-based SMT
Marker-based EBMT (Way)
Translation memories
coarse
67
Overview

The story so far
EBMT
SMT
Latest developments in RBMT
Is there convergence?
Some attempts to classify MT
(Carl and Wus MT model spaces)
Has the empire struck back?

68
Has the empire struck back?

Is linguistics back in MT?
Was MT ever of interest to linguists?
Is SMT like RBMT?

69
Vauquois triangle
To what extent can a given system be described
in terms of the classic view of MT (G2) ?
70
Has the empire struck back?

Is linguistics back in MT?
Was MT ever of interest to linguists?
Is SMT like RBMT?

As predicted by Wilks (Stone soup talk, 1992)
way forward is hybrid
Negative experience (for me) of seeing SMT
presenters rediscovering problems first described
by Yngve, Vauquois ...
... without referencing the original papers!

71
LINGUISTICS
72
EBMT
SMT
Fill in the gaps
Early SMT (Brown et al.)
EBMT where examples are lightly annotated
Pure EBMT (Lepage)
Annotated translation memories
Phrase-based SMT
Template-based EBMT (McTait, Brown)
Syntax-based SMT
Classic EBMT (Sato, Nagao)
Marker-based EBMT (Way)
Template-based EBMT (Cicekli)
Translation memories
RBMT

Write a Comment

User Comments (0)