Corpora and Translation - PowerPoint PPT Presentation

About This Presentation
Title:

Corpora and Translation

Description:

Corpora of texts and their translations ... Some words are translated by several words, e.g. cheap peu cher. Not always obvious how to align ... – PowerPoint PPT presentation

Number of Views:316
Avg rating:3.0/5.0
Slides: 39
Provided by: somers6
Category:

less

Transcript and Presenter's Notes

Title: Corpora and Translation


1
Corpora and Translation
  • Parallel corpora
  • Statistical MT
  • (not to mention Corpus of translated text, for
    translation studies)

2
Parallel corpora
  • Corpora of texts and their translations
  • Basic idea that such parallel corpora implicitly
    contain lots of information about translation
    equivalence
  • Nowadays many such bitexts are available
  • bilingual countries have laws, parliamentary
    proceedings, and other documents
  • large multinational organizations (UN, EU
    Europarl corpus, etc.)
  • multinational commercial organizations produce
    multilingual texts

3
Bilingual concordance
Source TransSearch, Laboratoire de Recherche
Appliquée en Linguistique Informatique,
Université de Montréal http//www-rali.iro.umont
real.ca
4
Parallel corpora
  • Usually not corpora in the strict sense (planned,
    annotated, etc.)
  • Usefulness may depend on
  • the quality of translation
  • the closeness of translation
  • whether we have a text and its translation, or a
    multilingually authored text
  • the language pair
  • Parallel corpus needs to be aligned

5
Alignment
  • Means annotating the bilingual corpus to show
    explicitly the correspondences
  • at sentence level
  • at word and phrase level
  • Main difficulty for sentence alignment is that
    translations do not always keep sentence
    boundaries, or even sentence order
  • In addition, translation may be localized and
    therefore not especially faithful

6
Sentence-level alignment
  • If parallel corpus is quite a literal
    translation, this can be done using quite
    low-level information
  • sentence length
  • looking for anchors
  • proper names, dates, figures
  • eg in a parliamentary debate, speakers names

7
Alignment tools
8
Corpus-based MT
  • Translation memory (tool for translators)
  • database of previous translations
  • find close matching examples to current
    translation unit
  • translator decides what to do with it

9
Note that translator has to know/decide what bits
of the target sentence to change
10
Corpus-based MT
  • Translation memory (tool for translators)
  • database of previous translations
  • find close matching examples to current
    translation unit
  • translator decides what to do with it
  • Example-based translation
  • similar idea, but computer program tries to
    manipulate example(s)
  • may involve learning general rules from
    multiple examples

11
Statistical MT
  • Pioneered by IBM in early 1990s
  • Spurred on by better success in speech
    recognition of statistical over linguistic
    rule-based approaches
  • Idea that translation can be modelled as a
    statistical process
  • Seems to work best in limited domain where given
    data is a good model of future translations

12
Translation as a probabilistic problem
  • For a given SL sentence Si, there are ? number of
    translations T of varying probability
  • Task is to find for Si the sentence Tj for which
    the probability P(Tj Si) is the highest

13
Two models
  • P(Tj Si) is a function of two models
  • The probabilities of the individual words that
    make up Tj given the individual words in Si -
    the translation model
  • The probability that the individual words that
    make up Tj are in the appropriate order the
    language model

14
Expressed in mathematical terms
  • Since S is a given, and constant, this can be
    simplified as

Translation model
Language model
15
So how do we translate?
  • For a given input sentence Si we have to have a
    practical way to find the Tj that maximizes the
    formula
  • We have to start somewhere, so we start with the
    translation model which words look most likely
    to help us?
  • In a systematic way we can keep trying different
    combinations together with the language model
    until we stop getting improvements

16
Seek improvement by trying other combinations
17
Where do the models come from?
  • All the statistical parameters are pre-computed
    (learned), based on a parallel corpus
  • Language model is probabilities of word sequences
    (n-grams)
  • Translation model is derived from aligned
    parallel corpus
  • This approach is attractive to some as an example
    of machine learning
  • The computer learns to translate (just) from
    seeing previous examples of translation

18
The translation model
  • Take sentence-aligned parallel corpus
  • Extract entire vocabulary for both languages
  • For every word-pair, calculate probability that
    they correspond e.g. by comparing distributions

19
Problem fertility
  • fertility not all word correspondences are 11
  • Some words have multiple possible translations,
    e.g. the ? le, la, l, les
  • Some words have no translation, e.g. in il se
    rase he shaves, se ??
  • Some words are translated by several words, e.g.
    cheap ? peu cher
  • Not always obvious how to align

20
Problem distortion
  • Notice that corresponding words do not appear in
    the same order.
  • The translation model includes probabilities for
    distortion
  • e.g. P(25) the P that ws in position 2 will
    produce a wt in position 5
  • can be more complex P(52,4,6) the P that ws in
    position 2 will produce a wt in position 5 when S
    has 4 words and T has 6.

21
The language model
  • Impractical to calculate probability of every
    word sequence
  • Many will be very improbable
  • Because they are ungrammatical
  • Or because they happen not to occur in the data
  • Probabilities of sequences of n words (n-grams)
    more practical
  • Bigram model
  • where P(wiwi1) ?f(wi1, wi)/f(wi)

22
Sparse data
  • Relying on n-grams with a large n risks
    0-probabilities
  • Bigrams are less risky but sometimes not
    discriminatory enough
  • e.g. I hire men who is good pilots
  • 3- or 4-grams allow a nice compromise, and if a
    3-gram is previously unseen, we can give it a
    score based on the component bigrams
    (smoothing)

23
Put it all together and ?
  • To build a statistical MT system we need
  • Aligned bilingual corpus
  • Training programs which will extract from the
    corpora all the statistical data for the models
  • A decoder which takes a given input, and seeks
    the output that evaluates the magic argmax
    formula based on a heuristic search algorithm
  • Software for this purpose is freely available
  • http//www.statmt.org/moses/, http//www.isi.edu/l
    icensed-sw/pharaoh/
  • Claim is that an MT system for a new language
    pair can be built in a matter of hours

24
SMT latest developments
  • Nevertheless, quality is limited
  • SMT researchers quickly learned that this crude
    approach can get them so far (quite far
    actually), but that to go the extra distance you
    need linguistic knowledge (eg morphology,
    phrases, consitutents)
  • Latest developments aim to incorporate this
  • Big difference is that it too can be LEARNED
    (automatically) from corpora
  • So SMT still contrasts with traditional RBMT
    where rules are hand coded by linguists

25
Direct phrase alignment
  • (Wang Waible 1998, Och et al., 1999, Marcu
    Wong 2002)
  • Enhance word translation model by adding joint
    probabilities, i.e. probabilities for phrases
  • Phrase probabilities compensate for missing
    lexical probabilities
  • Easy to integrate probabilities from different
    sources/methods, allows for mutual compensation

26
Word alignment induced model
  • Koehn et al. 2003 example stolen from Knight
    Koehn http//www.iccs.inf.ed.ac.uk/pkoehn/publica
    tions/tutorial2003.pdf

Maria did not slap the green witch
Maria no daba una botefada a la bruja verda
Start with all phrase pairs justified by the word
alignment
27
Word alignment induced model
  • Koehn et al. 2003 example stolen from Knight
    Koehn http//www.iccs.inf.ed.ac.uk/pkoehn/publica
    tions/tutorial2003.pdf

(Maria, Maria), (no, did not) (daba una botefada,
slap), (a la, the), (verde, green), (bruja,
witch)
28
Word alignment induced model
  • Koehn et al. 2003 example stolen from Knight
    Koehn http//www.iccs.inf.ed.ac.uk/pkoehn/publica
    tions/tutorial2003.pdf

(Maria, Maria), (no, did not) (daba una botefada,
slap), (a la, the), (verde, green) (bruja,
witch), (Maria no, Maria did not), (no daba una
botefada, did not slap), (daba una botefada a la,
slap the), (bruja verde, green witch)
etc.
29
Word alignment induced model
  • Koehn et al. 2003 example stolen from Knight
    Koehn http//www.iccs.inf.ed.ac.uk/pkoehn/publica
    tions/tutorial2003.pdf

(Maria, Maria), (no, did not), (slap, daba una
bofetada), (a la, the), (bruja, witch), (verde,
green), (Maria no, Maria did not), (no daba una
bofetada, did not slap), (daba una bofetada a la,
slap the), (bruja verde, green witch), (Maria no
daba una bofetada, Maria did not slap), (no daba
una bofetada a la, did not slap the), (a la
bruja verde, the green witch), (Maria no daba una
bofetada a la, Maria did not slap the), (daba una
bofetada a la bruja verde, slap the green
witch), (no daba una bofetada a la bruja verde,
did not slap the green witch), (Maria no daba una
bofetada a la bruja verde, Maria did not slap the
green witch)
30
Alignment templates
  • Och et al. 1999 further developed by Marcu and
    Wong 2002, Koehn and Knight 2003, Koehn et al.
    2003)
  • Problem of sparse data worse for phrases
  • So use word classes instead of words
  • alignment templates instead of phrases
  • more reliable statistics for translation table
  • smaller translation table
  • more complex decoding
  • Word classes are induced (by distributional
    statistics), so may not correspond to intuitive
    (linguistic) classes
  • Takes context into account

31
Problems with phrase-based models
  • Still do not handle very well ...
  • dependencies (especially long-distance)
  • distortion
  • discontinuities (e.g. bought habe ... gekauft)
  • More promising seems to be ...

32
Syntax-based SMT
  • Better able to handle
  • Constituents
  • Function words
  • Grammatical context (e.g. case marking)
  • Inversion Transduction Grammars
  • Hierarchical transduction model
  • Tree-to-string translation
  • Tree-to-tree translation

33
Inversion transduction grammars
  • Wu and colleagues (1997 onwards)
  • Grammar generates two trees in parallel and
    mappings between them
  • Rules can specify order changes
  • Restriction to binary rules limits complexity

34
Inversion transduction grammars
35
Inversion transduction grammars
  • Grammar is trained on word-aligned bilingual
    corpus Note that all the rules are learned
    automatically
  • Translation uses a decoder which effectively
    works like traditional RBMT
  • Parser uses source side of transduction rules to
    build a parse tree
  • Transduction rules are applied to transform the
    tree
  • The target text is generated by linearizing the
    tree

36
(No Transcript)
37
(No Transcript)
38
Other approaches
  • Other approaches use more and more linguistic
    information
  • In each case automatically learned, especially
    from treebanks
  • Traditional (rule-based) MT used (hand-written)
    grammars and lexicons
  • State-of-the-art MT is moving back in this
    direction, except that linguistic rules are
    machine learned
Write a Comment
User Comments (0)
About PowerShow.com