METIS - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

METIS

Description:

to obtain free text translations of reasonably high linguistic quality from ... by incorporating translation equivalence information at lemma and structure ... – PowerPoint PPT presentation

Number of Views:75
Avg rating:3.0/5.0
Slides: 35
Provided by: webpag4
Category:

less

Transcript and Presenter's Notes

Title: METIS


1
METIS
  • STATISTICAL MACHINE TRANSLATION USING MONOLINGUAL
    CORPORA

2
METIS-AUTHORS
  • IOANNIS DOLOGLOU
  • STELLA MARKANTONATOU
  • GEORGE TAMBOURATZIS
  • OLGA YANNOUTSOU
  • ATHANASSIA FOURLA
  • NIKOS IOANNOU

3
Structure of the presentation
  • The idea
  • Translation Equivalence Information
  • Description of the system
  • Assessment

4
GOALS
GOALS
  • METIS aimed to assess the possibility
  • to obtain free text translations of reasonably
    high linguistic quality from large annotated
    monolingual corpora
  • with pattern-matching techniques
  • by incorporating translation equivalence
    information at lemma and structure level (the
    latter by employing tag-mapping rules)

5
TRANSLATION EQUIVALENCE INFORMATION
  • Tag-equivalence tables
  • Tagsets used
  • English CLAWS5 (on BNC)
  • Greek ILSP-PAROLE (on HNC)
  • Dutch CGN tags (on Corpus Spoken Dutch (CGN) and
    the Eindhoven corpus)

6
(No Transcript)
7
(No Transcript)
8
TRANSLATION EQUIVALENCE INFORMATION
  • Tag-mapping rules
  • These are rules which map the input structure
    onto an abstract one which is closer to the
    structure of the translation sought

9

 
10
TRANSLATION EQUIVALENCE INFORMATION
  • Appropriate bilingual lexica
  • These are lexica that provide multiple
    translations of lemmata, expressions and PoS
    information for both languages

11
(No Transcript)
12
The System
  • System Requirements in Resources
  • Bilingual lexicon file with PoS information
  • Tag-mapping Rules file
  • Tagged and Lemmatized source language sentence
  • Tagged and Lemmatized target language corpus
  • Weights file

13
System Operation (1)
Bilingual Lexicon
SS
TS1
1.Lemma to Lemma Translation
SS
Rules
TS2
correspondence
TS1
2. Rule Application
14
System Operation (2)
TS2
Weights File
CORPUS
Corpus TS
3. Corpus Search
15
Example Sentence
Source Sentence SS
  • Actual translation
  • The woman peels the apple
  • OR
  • The woman is peeling the apple

16
1. Lemma to lemma translation
Target language sentence (lemma to lemma
translated) TS1
and term correspondence, e.g. o/At corresponds
to the/AT?
17
2. Rules application (1)
Example rule 1\\VbMnIdPr______IpAv__
- 1\\VVB-VVZ \be\VBB-VBZ1\\VVG
18
2. Rules application (2)
1\\VbMnIdPr______IpAv__
correspondence
1\\VVB-VVZ \be\VBB-VBZ1\\VVG
19
2. Rules application (4)
Target language sentence after rules application
TS2
20
3. Sentence Comparison
The system examines all the sentences in the
corpus and finds the one with the highest
similarity percentage.
21
ASSESSMENT
22
Main idea to facilitate post-editing according
to users preferences
  • A lot of discussion has taken place among
    professional translators as to what type of
    errors are easier to correct e.g. grammatical
    corrections are easier to correct than semantic
    ones of course this is to a certain extent
    highly subjective therefore criteria are not set.

23
RANK-BASED EVALUATION
  • Solution Experimental Set-up
  • Select a given corpus of sentences (in the source
    language) and for each of them provide a corpus
    of translations (in the target language).
  • Use a (group of) human translator(s) to rank the
    target corpus in terms of suitability of each
    sentence as the translation of a source sentence.
  • Rank all target corpus sentences according to
    their suitability as translations.

24
RANK-BASED EVALUATION
  • Solution Experimental Set-up
  • Provide all target sentences as input to the
    METIS system and allow METIS to rank them as
    potential translations.
  • Compare the rankings of the target-corpus
    sentences according to (i) METIS and (ii) the
    group of translators, generating a measure of the
    correspondence between the two rankings.
  • Vary the values of system parameters to fine-tune
    the system response.

25
Requirements
  • typology of errors
  • penalty of errors according to difficulty of
    correctability.
  • target corpus with ranked sentences according to
    degree of similarity to the source sentence i.e.
    number of errors and final score

26
EXAMPLE OF TARGET CORPUS
  • Source sentence
  • ? ???a??a ?a?a???e? t? µ???.
  • (The woman is peeling the apple)
  • Target corpus
  • The woman is peeling the apple. (class A)
  • The woman cleans the apple. (class A)
  • The woman has been peeling the apple. (class A)
  • The woman peeled the apple. (class B)
  • The woman has been washing the apple (class C)

27
RANK-BASED EVALUATION
28
The benchmarking corpus for assessment
  • Hellenic National Corpus (HNC) as a source corpus
  • British National Corpus (BNC) as a target corpus
  • Construction of a toy corpus (S/T) for dealing
    with specific phenomena

29
The benchmarking corpus for assessment (cont.)
  • Phenomena studied valency, impersonal, copular
    and ergative verb phrases, agreement (3rd
    singular/plural, as most common), word order,
    tense and aspect, subordinate clauses, sentence
    types, sentence construction, determiner phrases,
    modifiers in different positions, agreement
    between adjectives and nouns, degrees of
    adjectives and adverbs and sentential order of
    adverbs, definite/indefinite article, the
    structure I like (µ?? a??se?), clitics, and
    possessives.
  • Phenomena treated tenses, the structure I
    like, clitics, possessives, definite/indefinite
    articles passive voice.

30
  • WHERE TO GO
  • Integrate generation at morphological level
  • Break the sentence barriers --- additional
    generation capacity perhaps
  • Integrate lexical semantic information (wordnets,
    semantic distances)

31
  • CALCULATING THE DISTANCE BETWEEN METIS HUMAN
    TRANSLATORS
  • If the ranking of the given sentence is the same,
    no penalty is imposed and the score is 0.
  • A sentence scores 0, when all sentences ranked by
    translators above it remain above it in the METIS
    ranking and all sentences ranked by translators
    below it remain below it in the Metis ranking.
  • If a sentence belongs to class X and is ranked
    higher than certain sentences of a higher class,
    then it is penalised. The penalty score is equal
    to the number of sentences of a higher ranking
    that it has overtaken.

32
  • CALCULATING THE DISTANCE BETWEEN METIS HUMAN
    TRANSLATORS (contd.)
  • A sentence belonging to class X will be
    penalized, if a sentence belonging to a lower
    category has overtaken it. The score is equal to
    the number of sentences that have overtaken it.
  • If a sentence belonging to class X has been both
    (i) overtaken by sentences of a lower category
    and (ii) has overtaken sentences of a higher
    category, then the total penalty is equal to the
    sum of the penalties for cases (i) and (ii), as
    defined above.

33
Present continuous
34
Results of the experiments with METIS
  • Present continuous
  • (The woman is peeling the apple)
  • The Greek present tense corresponds to the
    English simple present, present continuous and
    present perfect.
  • The lowest penalty (8) was achieved with the use
    of tag mapping rules for the present tense
    correspondence.
Write a Comment
User Comments (0)
About PowerShow.com