Latest Developments in SMT - PowerPoint PPT Presentation

About This Presentation
Title:

Latest Developments in SMT

Description:

... 2003; example stolen from Knight & Koehn http://www.iccs.inf.ed.ac.uk/~pkoehn ... Relates primarily to transfer (or equiv.) Statistical vs. logical ... – PowerPoint PPT presentation

Number of Views:82
Avg rating:3.0/5.0
Slides: 73
Provided by: harold
Category:

less

Transcript and Presenter's Notes

Title: Latest Developments in SMT


1
Latest Developments in (S)MT
MT Wars II The Empire (Linguistics) strikes back
  • Harold Somers
  • University of Manchester

2
Overview
  • The story so far
  • EBMT
  • SMT
  • Latest developments in RBMT
  • Is there convergence?
  • Some attempts to classify MT
  • (Carl and Wus MT model spaces)
  • Has the empire struck back?

3
The story so far EBMT
  • Early history well known
  • Nagao (1981/3)
  • Early development as part of RBMT
  • Relationship with Translation Memories
  • Focus (cf. Somers 1998) on
  • Matching algorithms
  • Selection and storage of examples
  • Mainly sentence-based
  • TL generation (Recombination) not much addressed

Somers, H. (1998) New paradigms in MT, 10th
European Summer School in Logic, Language and
Information, Workshop on MT, Saarbrücken revised
version in Machine Translation 14 (1999) and 2nd
revised version in M. Carl A. Way (2003) Recent
Advances in EBMT (Kluwer).
4
EBMT in a nutshell
  • (In case youve been on Tatooine for the last 15
    years)
  • Database of paired examples
  • Translation involves
  • Finding the best example(s) (matching)
  • Identifying which bits do(nt) match (alignment)
  • Replacing the non-matching bits (if multiple
    examples, gluing them together) (recombination)
  • All of the above at run-time

5
EBMT in a nutshell (cont.)
  • Main difficulty is boundary friction in two
    senses

The old man is dead Le vieil homme est mort The
old woman is dead
Le vieil femme est mort
The operation was interrupted because the file
was hidden. a. The operation was interrupted
because the Ctrl-c key was pressed.
Lopération a été interrompue car la touché
Ctrl-c a été enfoncée. b. The specified method
failed because the file is hidden. La méthode
spécifiée a échoué car le fichier est masqué
6
EBMT later developments
  • Example generalisation (templates)
  • Incorporation of linguistic resources and/or
    statistical measures
  • Structured representation of examples
  • Use of statistical techniques

7
Example generalisation
  • (Furuse Iida, Kaji et al., Matsumoto et
    al., Carl, Cicekli Güvenir, Brown, McTait, Way
    et al.)
  • Similar examples can be combined to give a more
    general example
  • Can be seen as a way of generating transfer rules
    (and lexicons)
  • Process may be entirely automatic, based on
    string matching
  • or seeded using linguistic information (POS
    tags) or resources (bilingual dictionary)

8
Example generalisation (cont.)
The monkey ate a peach ? saru wa momo o tabeta
The man ate a peach ? hito wa momo o
tabeta
monkey ? saru man ? hito
The ate a peach ? wa momo o tabeta
The dog ate a rabbit ? inu wa usagi o tabeta
dog ? inu rabbit ? usagi
The x ate a ...y ? x wa y tabeta
9
Example generalisation (cont.)
  • Thats too simple (e.g. because of boundary
    friction)
  • Need to introduce constraints on the slots, e.g.
    using POS tags and morphological information
    (which implies some other processing)
  • Can use clustering algorithms to infer
    substitution sets

10
Incorporation of linguistic resources
  • Actually, early EBMT used all sorts of linguistic
    resources
  • Briefly there was a move towards more pure
    approaches
  • Now we see much use of POS tags (sometimes only
    partial, e.g. marker words Way et al.),
    morphological analysis (as just mentioned),
    bilingual lexicons
  • Target-language grammars for recombination/generat
    ion phase

11
Incorporation of statistical measures
  • Example database preprocessed to assign weights
    (probabilities) to fragments and their
    translations (Aramaki et al.)
  • Good way of handling ambiguities due to
    alternative translations
  • Clustering words into equivalence classes for
    example generalization (Brown)
  • Using statistical tools to extract translation
    knowledge from parallel corpora (Yamamoto
    Matsumoto)
  • Statistically induced grammars for translation or
    generation, as in ...

12
Use of structured representations
  • Again, a feature of early EBMT, now reappearing
  • Translation grammars induced from the example set
  • Examples stored as tree structures
    (overwhelmingly dependency structures)

13
Translation grammars
  • Carl generates translation grammars from aligned
    linguistically annotated texts
  • WayData-Oriented Translation based on Poutsmas
    DOP, using both PS and LFG models)

14
Structured examples
  • Use of tree comparison algorithms to extract
    translation patterns from parsed corpora/tree
    banks (Watanabe et al.)
  • Translation pairings extracted from aligned
    parsed examples (Menezes Richardson)
  • Tree-to-string approach used by Langlais Gotti
    and Liu et al. ( statistical generation model)

15
Typical use of structured examples
  • Rule-based analysis and generation
    example-based transfer
  • Input is parsed into representation using a
    traditional or statistics-based analyser
  • TL representation constructed by combining
    translation mappings learned from the parallel
    corpus
  • TL sentence generated using a hand-written or
    machine-learned generation grammar
  • Is this still EBMT?
  • Note that the only example-based part is use of
    mappings which are learned, not computed at
    run-time

16
Pure EBMT (Lepage Denoual)
  • In contrast (but now something of an oddity)
    pure analogy-based EBMT
  • Use of proportional analogies ABCD
  • Terms in the analogies are translation pairs
    A?A B?B C?C D?D

17
(No Transcript)
18
Pure EBMT
  • No explicit transfer
  • No extraction of symbolic knowledge
  • No use of templates
  • Analogies do not always represent any sort of
    linguistic reality
  • No training or preprocessing
  • Solving the proportional analogies is done at
    run-time

19
The story so far (SMT)
  • Early history well known
  • IBM group inspired by improved results in speech
    recognition when non-linguistic approach taken
  • Availability of Canadian Hansards inspired purely
    statistical approach to MT (1988)
  • Immediate partial success (60) to the dismay of
    MT people
  • Early observers (Wilks) predicted hybrid methods
    (stone soup) would evolve
  • Later developments
  • Phrase-based SMT
  • Syntax-based SMT

20
SMT in a nutshell
  • (In case youve been on Kamino for the last 15
    years)
  • From parallel corpus two sets of statistical data
    are extracted
  • Translation model probabilities that a given
    word e in the SL gives rise to a word f in the TL
  • (Target) language model most probable word-order
    for the words predicted by the translation model
  • These two models are computed off-line
  • Given an input sentence, a decoder applies the
    two models, and juggles the probabilities to get
    the best score various methods have been
    proposed

21
SMT in a nutshell (cont.)
  • The translation model has to take into account
    the fact that
  • for a given e in there may be various different
    fs depending on context (grammatical variants as
    well as alternatives due to polysemy or homonymy)
  • a given e may not necessarily correspond to a
    single f, or any f at all fertility
  • (e.g. may have ? aurait implemented ? mis en
    application)

22
SMT in a nutshell (cont.)
  • The language model has to take into account the
    fact that
  • The TL words predicted by the translation model
    will not occur in the same order as the SL words
    distortion
  • TL word choices can depend on neighbouring words
    (which may be easy to model) or, especially
    because of distortion, more distant words
    long-distance dependencies, much harder to
    model

23
SMT in a nutshell (cont.)
  • Main difficulty combination of fertility and
    distortion
  • Zeitmangel erschwert das Problem.
  • Lack of time makes the problem more difficult.
  • Eine Diskussion erübrigt sich demnach.
  • Therefore there is no point in discussion.
  • Das ist der Sache nicht angemessen.
  • That is not appropriate for this matter.
  • Den Vorschlag lehnt die Kommission ab.
  • The Commission rejects the proposal.

24
SMT later developments
  • Phrase-based SMT
  • Extend models beyond individual words to word
    sequences (phrases)
  • Direct phrase alignment
  • Word alignment induced phrase model
  • Alignment templates
  • Results better than word-based models, and show
    improvement proportional (log-linear) to corpus
    size
  • Phrases do not correspond to constituents, and
    limiting them to do so hurts results

25
Direct phrase alignment
  • (Wang Waible 1998, Och et al., 1999, Marcu
    Wong 2002)
  • Enhance word translation model by adding joint
    probabilities, i.e. probabilities for phrases
  • Phrase probabilities compensate for missing
    lexical probabilities
  • Easy to integrate probabilities from different
    sources/methods, allows for mutual compensation

26
Word alignment induced model
  • Koehn et al. 2003 example stolen from Knight
    Koehn http//www.iccs.inf.ed.ac.uk/pkoehn/publica
    tions/tutorial2003.pdf

Maria did not slap the green witch
Maria no daba una botefada a la bruja verda
Start with all phrase pairs justified by the word
alignment
27
Word alignment induced model
  • Koehn et al. 2003 example stolen from Knight
    Koehn http//www.iccs.inf.ed.ac.uk/pkoehn/publica
    tions/tutorial2003.pdf

(Maria, Maria), (no, did not) (daba una botefada,
slap), (a la, the), (verde, green), (bruja,
witch)
28
Word alignment induced model
  • Koehn et al. 2003 example stolen from Knight
    Koehn http//www.iccs.inf.ed.ac.uk/pkoehn/publica
    tions/tutorial2003.pdf

(Maria, Maria), (no, did not) (daba una botefada,
slap), (a la, the), (verde, green) (bruja,
witch), (Maria no, Maria did not), (no daba una
botefada, did not slap), (daba una botefada a la,
slap the), (bruja verde, green witch)
etc.
29
Word alignment induced model
  • Koehn et al. 2003 example stolen from Knight
    Koehn http//www.iccs.inf.ed.ac.uk/pkoehn/publica
    tions/tutorial2003.pdf

(Maria, Maria), (no, did not), (slap, daba una
bofetada), (a la, the), (bruja, witch), (verde,
green), (Maria no, Maria did not), (no daba una
bofetada, did not slap), (daba una bofetada a la,
slap the), (bruja verde, green witch), (Maria no
daba una bofetada, Maria did not slap), (no daba
una bofetada a la, did not slap the), (a la
bruja verde, the green witch), (Maria no daba una
bofetada a la, Maria did not slap the), (daba una
bofetada a la bruja verde, slap the green
witch), (no daba una bofetada a la bruja verde,
did not slap the green witch), (Maria no daba una
bofetada a la bruja verde, Maria did not slap the
green witch)
30
Word alignment induced model
  • Given the phrase pairs collected, estimate the
    phrase translation probability distribution by
    relative frequency (without smoothing)

31
Alignment templates
  • Och et al. 1999 further developed by Marcu and
    Wong 2002, Koehn and Knight 2003, Koehn et al.
    2003)
  • Problem of sparse data worse for phrases
  • So use word classes instead of words
  • alignment templates instead of phrases
  • more reliable statistics for translation table
  • smaller translation table
  • more complex decoding
  • Word classes are induced (by distributional
    statistics), so may not correspond to intuitive
    (linguistic) classes
  • Takes context into account

32
Problems with phrase-based models
  • Still do not handle very well ...
  • dependencies (especially long-distance)
  • distortion
  • discontinuities (e.g. bought habe ... gekauft)
  • More promising seems to be ...

33
Syntax-based SMT
  • Better able to handle
  • Constituents
  • Function words
  • Grammatical context (e.g. case marking)
  • Inversion Transduction Grammars
  • Hierarchical transduction model
  • Tree-to-string translation
  • Tree-to-tree translation

34
Inversion transduction grammars
  • Wu and colleagues (1997 onwards)
  • Grammar generates two trees in parallel and
    mappings between them
  • Rules can specify order changes
  • Restriction to binary rules limits complexity

35
Inversion transduction grammars
36
Inversion transduction grammars
  • Grammar is trained on word-aligned bilingual
    corpus Note that all the rules are learned
    automatically
  • Translation uses a decoder which effectively
    works like traditional RBMT
  • Parser uses source side of transduction rules to
    build a parse tree
  • Transduction rules are applied to transform the
    tree
  • The target text is generated by linearizing the
    tree

37
(No Transcript)
38
(No Transcript)
39
Almost all possible mappings can be
handled Missing ones (crossing constraints) are
not found in Wus corpus But examples can be
found, apparently
40
Hierarchical transduction model
  • (Alshawi et al. 1998)
  • Based on finite-state transducers, also uses
    binary notation
  • Uses automatically induced dependency structure
  • Initial head-word pair is chosen
  • Sentence is then expanded by translating the
    dependent structures

41
Tree-to-string translation
  • (Yamada Knight 2001, Charniak 2003)
  • Uses (statistical) parser on input side only
  • Tree is then subject to reordering and insertion
    according to models learned from data
  • Lexical translation is then done, again according
    to probability models

42
(No Transcript)
43
Tree-to-tree translation
  • (Gildea 2003)
  • Use parser on both sides to capture structurual
    differences
  • Subtree cloning
  • (Habash 2002, Cmejrek et al. 2003)
  • Full morphology/syntactic/semantic parsing
  • All based on stachastic grammars

44
Latest developments in RBMT
  • RBMT making a come-back (e.g. METIS)
  • Perhaps it was always there, just wasnt
    represented in CL journals/conferences
  • There is some activity, but around the periphery
  • Open-source systems
  • development for low-density languages
  • Much use made of corpus-derived modules, eg
    tagging, chunking
  • SMT is now RBMT, only the rules are learned
    rather than written by linguists

45
Overview
  • The story so far
  • EBMT
  • SMT
  • Latest developments in RBMT
  • Is there convergence?
  • Some attempts to classify MT
  • (Carl and Wus MT model spaces)
  • Has the empire struck back?

46
Classifications of MT
  • Empirical vs. Rationalist
  • data- vs theory-driven
  • use (or not) of symbolic representation
  • From MLIM chapter 4
  • high vs. low coverage
  • low vs. high quality/fluency
  • shallow vs. deep representation
  • Distinguish in the above
  • design vs. consequence
  • How true are they anyway?

47
EBMTSMT Is there convergence?
  • Lively debate on mtlist
  • Articles by
  • Somers, Turcato Popowich in Carl Way (2003)
  • Hutchins, Carl, Wu (2006) in special issue of
    Machine Translation
  • Slides marked need your input!

48
Essential features of EBMT
  • Use of bilingual corpus data as the main (only?)
    source of knowledge (Somers)
  • Most early EBMT systems were hybrids
  • We do not know a priori which parts of example
    are relevant (Turcato Popowich)
  • Raw data is consulted at run-time (little or) no
    preprocessing
  • Therefore template-based EBMT is already a hybrid
    (with RBMT)
  • Act of matching the input against the examples,
    regardless of how they are stored (Hutchins)

49
Pros (and cons) of analogy model
  • Like CBR
  • Library of cases used during task performance
  • Analogous examples broken down, adapted,
    recombined
  • In contrast with other machine learning methods
  • Offline learning to compile abstract performance
    model
  • No loss of coverage due to incorrect
    generalization during training
  • Guaranteed correct when input is exactly like an
    example in the training set (not true of SMT)
  • But Lack of generalization leads to potential
    runtime inefficiency

(Wu, 2006)
50
EBMTSMT Common features
  • Easily agreed
  • Use of bilingual corpus data as the main (only?)
    source of knowledge
  • Translation relations are derived automatically
    from the data
  • Underlying methods are independent of
    language-pair, and hence of language similarity
  • More contentious
  • Bilingual corpus data should be real (a practical
    issue for SMT, but some EBMT systems use
    hand-crafted examples)
  • System can be easily extended just by adding more
    data

51
EBMTRBMT common features
  • Hybrid is easy to conceive
  • Rule-based analysis/generation with example-based
    transfer
  • Example-based processing only for awkward cases

52
SMTRBMT common features
  • Some versions of SMT exactly mirror classic RBMT
  • parse-transfer-generate
  • Same things are hard
  • Long-distance dependency
  • Discontinuous constituents

53
Wus 3D classification of all MT
  • Example-based vs. schema-based
  • abstraction or generalization performed at
    run-time
  • Compositional vs. lexical
  • Relates primarily to transfer (or equiv.)
  • Statistical vs. logical
  • Pictures also show historical development

54
Classic (direct and transfer) MT models
  • Early systems (Georgetown) lexical and
    compositional
  • Treatment of idioms, collocations, phrasal
    translations in classical 2G transfer systems
  • Modern RBMT systems starting to adopt statistical
    methods (according to Wu)
  • Where do commercial systems sit?

55
(No Transcript)
56
EBMT systems
57
SMT systems
58
Example-based SMT systems
59
Summary
60
Model space corpus-based MT (Carl 2000)
  • Based on Dummetts theory of meaning
  • Rich vs austere
  • Complexity of representations
  • Molecular vs holistic
  • Descriptions based on finite set of predefined
    features vs global distinctions
  • Fine-grained vs coarse-grained
  • Based on smaller or larger units

61
Rich vs austere
  • Translation memories are most austere, depending
    only on graphemic similarity
  • TMs with annotated examples (eg Planas Furuse)
    are richer
  • Early EBMT systems, and recent systems where
    examples are generalized are rich
  • EBMT using light annotation (eg TAGS, markers)
    are moderately rich
  • Pure EBMT (Lepage Denoual) is austere
  • Early SMT systems were austere, but move towards
    syntax makes them richer
  • Phrase-based SMT still austere

62
METIS
EBMT where examples are lightly annotated
Phrase-based SMT
Syntax-based SMT
Pure EBMT (Lepage)
Marker-based EBMT (Way)
Template-based EBMT (McTait, Brown, Cicekli)
Early SMT (Brown et al.)
Annotated translation memories
Translation memories
Classic EBMT (Sato, Nagao)
63
Molecular vs holistic
  • Early SMT purely holistic, as is pure EBMT
  • TMs molecular distance measure based on fixed
    set of symbols
  • Translation templates are holistic, but molecular
    if they depend on some sort of analysis
  • Phrase-based and syntax-based SMT highly
    molecular

64
EBMT where examples are lightly annotated
Early SMT (Brown et al.)
METIS generation
Template-based EBMT (McTait, Brown)
Pure EBMT (Lepage)
Annotated translation memories
Phrase-based SMT
Syntax-based SMT
Marker-based EBMT (Way)
Classic EBMT (Sato, Nagao)
METIS analysis
Translation memories
Template-based EBMT (Cicekli)
65
Coarse- vs. fine-grained
  • Coarse-grained translates with bigger units
  • TM system wirks only on sentences coarse-grained
  • Word-based systems are fine-grained Early SMT
  • Phrase-based SMT slightly more coarse-grained
  • Template-based EBMT fine-grained

66
Template-based EBMT (McTait, Brown)
Early SMT (Brown et al.)
fine
Phrase-based SMT
Marker-based EBMT (Way)
Translation memories
coarse
67
Overview
  • The story so far
  • EBMT
  • SMT
  • Latest developments in RBMT
  • Is there convergence?
  • Some attempts to classify MT
  • (Carl and Wus MT model spaces)
  • Has the empire struck back?

68
Has the empire struck back?
  • Is linguistics back in MT?
  • Was MT ever of interest to linguists?
  • Is SMT like RBMT?

69
Vauquois triangle
To what extent can a given system be described
in terms of the classic view of MT (G2) ?
70
Has the empire struck back?
  • Is linguistics back in MT?
  • Was MT ever of interest to linguists?
  • Is SMT like RBMT?
  • As predicted by Wilks (Stone soup talk, 1992)
    way forward is hybrid
  • Negative experience (for me) of seeing SMT
    presenters rediscovering problems first described
    by Yngve, Vauquois ...
  • ... without referencing the original papers!

71
LINGUISTICS
72
EBMT
SMT
Fill in the gaps
Early SMT (Brown et al.)
EBMT where examples are lightly annotated
Pure EBMT (Lepage)
Annotated translation memories
Phrase-based SMT
Template-based EBMT (McTait, Brown)
Syntax-based SMT
Classic EBMT (Sato, Nagao)
Marker-based EBMT (Way)
Template-based EBMT (Cicekli)
Translation memories
RBMT
Write a Comment
User Comments (0)
About PowerShow.com