Title: METIS
1METIS
- STATISTICAL MACHINE TRANSLATION USING MONOLINGUAL
CORPORA
2METIS-AUTHORS
- IOANNIS DOLOGLOU
- STELLA MARKANTONATOU
- GEORGE TAMBOURATZIS
- OLGA YANNOUTSOU
- ATHANASSIA FOURLA
- NIKOS IOANNOU
3Structure of the presentation
- The idea
- Translation Equivalence Information
- Description of the system
- Assessment
4GOALS
GOALS
- METIS aimed to assess the possibility
- to obtain free text translations of reasonably
high linguistic quality from large annotated
monolingual corpora - with pattern-matching techniques
- by incorporating translation equivalence
information at lemma and structure level (the
latter by employing tag-mapping rules)
5TRANSLATION EQUIVALENCE INFORMATION
- Tag-equivalence tables
- Tagsets used
- English CLAWS5 (on BNC)
- Greek ILSP-PAROLE (on HNC)
- Dutch CGN tags (on Corpus Spoken Dutch (CGN) and
the Eindhoven corpus)
6(No Transcript)
7(No Transcript)
8TRANSLATION EQUIVALENCE INFORMATION
- Tag-mapping rules
- These are rules which map the input structure
onto an abstract one which is closer to the
structure of the translation sought
9Â
10TRANSLATION EQUIVALENCE INFORMATION
- Appropriate bilingual lexica
- These are lexica that provide multiple
translations of lemmata, expressions and PoS
information for both languages
11(No Transcript)
12The System
- System Requirements in Resources
- Bilingual lexicon file with PoS information
- Tag-mapping Rules file
- Tagged and Lemmatized source language sentence
- Tagged and Lemmatized target language corpus
- Weights file
13System Operation (1)
Bilingual Lexicon
SS
TS1
1.Lemma to Lemma Translation
SS
Rules
TS2
correspondence
TS1
2. Rule Application
14System Operation (2)
TS2
Weights File
CORPUS
Corpus TS
3. Corpus Search
15Example Sentence
Source Sentence SS
- Actual translation
- The woman peels the apple
- OR
- The woman is peeling the apple
161. Lemma to lemma translation
Target language sentence (lemma to lemma
translated) TS1
and term correspondence, e.g. o/At corresponds
to the/AT?
172. Rules application (1)
Example rule 1\\VbMnIdPr______IpAv__
- 1\\VVB-VVZ \be\VBB-VBZ1\\VVG
182. Rules application (2)
1\\VbMnIdPr______IpAv__
correspondence
1\\VVB-VVZ \be\VBB-VBZ1\\VVG
192. Rules application (4)
Target language sentence after rules application
TS2
203. Sentence Comparison
The system examines all the sentences in the
corpus and finds the one with the highest
similarity percentage.
21ASSESSMENT
22Main idea to facilitate post-editing according
to users preferences
- A lot of discussion has taken place among
professional translators as to what type of
errors are easier to correct e.g. grammatical
corrections are easier to correct than semantic
ones of course this is to a certain extent
highly subjective therefore criteria are not set.
23RANK-BASED EVALUATION
- Solution Experimental Set-up
- Select a given corpus of sentences (in the source
language) and for each of them provide a corpus
of translations (in the target language). - Use a (group of) human translator(s) to rank the
target corpus in terms of suitability of each
sentence as the translation of a source sentence. - Rank all target corpus sentences according to
their suitability as translations.
24RANK-BASED EVALUATION
- Solution Experimental Set-up
- Provide all target sentences as input to the
METIS system and allow METIS to rank them as
potential translations. - Compare the rankings of the target-corpus
sentences according to (i) METIS and (ii) the
group of translators, generating a measure of the
correspondence between the two rankings. - Vary the values of system parameters to fine-tune
the system response.
25Requirements
- typology of errors
- penalty of errors according to difficulty of
correctability. - target corpus with ranked sentences according to
degree of similarity to the source sentence i.e.
number of errors and final score
26EXAMPLE OF TARGET CORPUS
- Source sentence
- ? ???a??a ?a?a???e? t? µ???.
- (The woman is peeling the apple)
- Target corpus
- The woman is peeling the apple. (class A)
- The woman cleans the apple. (class A)
- The woman has been peeling the apple. (class A)
- The woman peeled the apple. (class B)
- The woman has been washing the apple (class C)
27RANK-BASED EVALUATION
28The benchmarking corpus for assessment
- Hellenic National Corpus (HNC) as a source corpus
- British National Corpus (BNC) as a target corpus
- Construction of a toy corpus (S/T) for dealing
with specific phenomena
29The benchmarking corpus for assessment (cont.)
- Phenomena studied valency, impersonal, copular
and ergative verb phrases, agreement (3rd
singular/plural, as most common), word order,
tense and aspect, subordinate clauses, sentence
types, sentence construction, determiner phrases,
modifiers in different positions, agreement
between adjectives and nouns, degrees of
adjectives and adverbs and sentential order of
adverbs, definite/indefinite article, the
structure I like (µ?? a??se?), clitics, and
possessives. - Phenomena treated tenses, the structure I
like, clitics, possessives, definite/indefinite
articles passive voice.
30- WHERE TO GO
- Integrate generation at morphological level
- Break the sentence barriers --- additional
generation capacity perhaps - Integrate lexical semantic information (wordnets,
semantic distances)
31- CALCULATING THE DISTANCE BETWEEN METIS HUMAN
TRANSLATORS - If the ranking of the given sentence is the same,
no penalty is imposed and the score is 0. - A sentence scores 0, when all sentences ranked by
translators above it remain above it in the METIS
ranking and all sentences ranked by translators
below it remain below it in the Metis ranking. - If a sentence belongs to class X and is ranked
higher than certain sentences of a higher class,
then it is penalised. The penalty score is equal
to the number of sentences of a higher ranking
that it has overtaken.
32- CALCULATING THE DISTANCE BETWEEN METIS HUMAN
TRANSLATORS (contd.) - A sentence belonging to class X will be
penalized, if a sentence belonging to a lower
category has overtaken it. The score is equal to
the number of sentences that have overtaken it. - If a sentence belonging to class X has been both
(i) overtaken by sentences of a lower category
and (ii) has overtaken sentences of a higher
category, then the total penalty is equal to the
sum of the penalties for cases (i) and (ii), as
defined above.
33Present continuous
34Results of the experiments with METIS
- Present continuous
- (The woman is peeling the apple)
- The Greek present tense corresponds to the
English simple present, present continuous and
present perfect. - The lowest penalty (8) was achieved with the use
of tag mapping rules for the present tense
correspondence.