Contextual Bitextderived Paraphrases in Automatic MT Evaluation - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Contextual Bitextderived Paraphrases in Automatic MT Evaluation

Description:

Karolina Owczarzak, Declan Groves, Josef Van Genabith, Andy Way ... nous n'avons pas {we don't have, we have no} Generating paraphrases ... – PowerPoint PPT presentation

Number of Views:41
Avg rating:3.0/5.0
Slides: 21
Provided by: karolinao
Category:

less

Transcript and Presenter's Notes

Title: Contextual Bitextderived Paraphrases in Automatic MT Evaluation


1
Contextual Bitext-derived Paraphrases in
Automatic MT Evaluation
Karolina Owczarzak, Declan Groves, Josef Van
Genabith, Andy Way
National Centre for Language Technology, Dublin
City University
HLT-NAACL, 09 June 2006
2
Overview
  • Automatic MT evaluation and its limitations
  • Generation of paraphrases from word alignments
  • Using paraphrases in evaluation
  • Correlation with human judgments
  • Paraphrase quality

3
Automatic evaluation of MT quality
  • Most popular metrics BLEU and NIST

4
Automatic evaluation of MT quality
  • Most popular metrics BLEU and NIST

But we dont have an answer
But we have no answer to it
However we cannot react
However we have no reply to that
5
Automatic evaluation of MT quality
  • Most popular metrics BLEU and NIST

But
we
have
an
answer
dont
But
we
have
no
answer
to
it
However we cannot react
7-grams 0/1
6-grams 0/2
5-grams 0/3
4-grams 0/4
However
we
have
no
reply
to
that
3-grams 1/5
2-grams 3/6
1-grams 6/7
0.0000 or smoothed 0.4392
6
Automatic evaluation of MT quality
  • Insensitive to admissible lexical differences
  • answer ? reply
  • Insensitive to admissible syntactic differences
  • yesterday it was raining ? it was raining
    yesterday
  • we dont have ? we have no

7
Automatic evaluation of MT quality
  • Attempts to come up with better metrics
  • - word order
  • Translation Error Rate (Snover et al. 2005)
  • Maximum Matching String (Turian et al. 2003)
  • - lexical and word-order issues
  • CDER (Leusch et al. 2006)
  • METEOR (Banerjee and Lavie 2005)
  • linear regression model (Russo-Lassner et al.
    2005)
  • Need POS taggers, stemmers, thesauri, WordNet

8
Word and phrase alignment
  • Statistical Machine Translation

Source Lg Text
Target Lg Text
agréable
nice, pleasant, good
nous navons pas
we dont have, we have no
9
Generating paraphrases
  • For each word/phrase ei find all words/phrases
    fi1, , fin that ei aligns with, then for each fi
    find all words/phrases ek?i1, , ek?in that fi
    aligns with (Bannard and Callison-Burch 2005)

pleasant 0.75 agreeable 0.25 good 0.8
great 0.2 good 0.99
0.5 0.75 0.375 0.5 0.25 0.125 0.25
0.8 0.2 0.25 0.2 0.05 0.25 0.99
0.2475
nice good (0.4475), pleasant (0.375),
agreeable (0.125), great (0.05)
10
Paraphrases in automatic MT evaluation
ea pea1, ., pean ez pezn , ., pezn


11
Paraphrases in automatic MT evaluation
For each segment
ea pea1, ., pean ez pezn , ., pezn


12
Experiment 1
  • Test set 2000 sentences, French-English Europarl
  • Two translations
  • Pharaoh phrase-based SMT
  • Logomedia rule-based MT
  • Scored with BLEU and NIST
  • Original reference
  • Best-matching reference using paraphrases derived
    from the test set
  • Paraphrase lists generated using GIZA and
    refined word alignment strategy (Och and Ney,
    2003 Koehn et al., 2003 Tiedemann, 2004)
  • Subset of 100 sentences from each translation
    scored by two human judges (accuracy, fluency)

13
Examples of paraphrases
  • area field, this area, sector, aspect, this
    sector
  • above all specifically, especially
  • agreement accordance
  • believe that believe, think that, feel that,
    think
  • extensive widespread, broad, wide
  • make progress on can move forward
  • risk management management of risks

14
Examples of reference segments
Example 1 Candidate translation the question of
climates with is a good example Original
reference the climate issue is a good example of
this Best-match reference the climate question
is a good example of this

Example 2 Candidate translation thank you very
much mr commissioner Original reference thank
you commissioner Best-match reference thank you
very much commissioner
15
Results
Translation by Pharaoh on EP 2000 sent

16
Pearsons correlation with human judgment

Subset of 100 sentences from the translation by
Pharaoh (EP 2000 sent)
17
Paraphrase quality
  • 700,000 sentence pairs, French-English Europarl
  • Paraphrase lists generated using GIZA and
    refined word alignment strategy (Och and Ney,
    2003 Koehn et al., 2003 Tiedemann, 2004)
  • Quality of paraphrases evaluated with respect to
    syntactic and semantic accuracy

Bannard and Callison-Burch, 2005
18
Results
Syntactic accuracy

Semantic accuracy
19
Filtering paraphrases
  • Some inaccuracy still useful
  • be supported support, supporting
  • Filters
  • - exclude closed class items prepositions,
    personal pronouns, possessive pronouns, auxiliary
    verbs have and be
  • Fr. à Eng. to, in, at
  • to ? in ? at
  • - prevent paraphrases of the form
  • ei (w) ei (w), where
  • w ? (prepositions, pronouns, auxiliary verbs,
    modal verbs, negation, conjunction)
  • aspect aspect is
  • hours hours for
  • available not available
  • - POS taggers, parsers

20
References
  • Satanjeev Banerjee and Alon Lavie. 2005. METEOR
    An Automatic Metric for MT Evaluation with
    Improved Correlation with Human Judgments.
    Proceedings of the ACL 2005 Workshop on Intrinsic
    and Extrinsic Evaluation Measures for MT and/or
    Summarization 65-73.
  • Colin Bannard and Chris Callison-Burch. 2005.
    Paraphrasing with Bilingual Parallel Corpora.
    Proceedings of the 43rd Annual Meeting of the
    Association for Computational Linguistics (ACL
    2005) 597-604.
  • Philipp Koehn, Franz Och and Daniel Marcu. 2003.
    Statistical Phrase-Based Translation. Proceedings
    of the Human Language Technology Conference
    (HLT-NAACL 2003) 48-54.
  • Grazia Russo-Lassner, Jimmy Lin, and Philip
    Resnik. 2005. A Paraphrase-based Approach to
    Machine Translation Evaluation. Technical Report
    LAMP-TR-125/CS-TR-4754/UMIACS-TR-2005-57,
    University of Maryland, College Park, MD.
  • Mathew Snover, Bonnie Dorr, Richard Schwartz,
    John Makhoul, Linnea Micciula and Ralph
    Weischedel. 2005. A Study of Translation Error
    Rate with Targeted Human Annotation. Technical
    Report LAMP-TR-126, CS-TR-4755,
    UMIACS-TR-2005-58, University of Maryland,
    College Park. MD.
  • Jörg Tiedemann. 2004. Word to word alignment
    strategies. Proceedings of the 20th International
    Conference on Computational Linguistics (COLING
    2004) 212-218.
  • Gregor Leusch, Nicola Ueffing and Hermann Ney.
    2006. CDER Efficient MT Evaluation Using Block
    Movements. To appear in Proceedings of the 11th
    Conference of the European Chapter of the
    Association for Computational Linguistics (EACL
    2006).
  • Franz Josef Och and Hermann Ney. 2003. A
    Systematic Comparison of Various Statistical
    Alignment Modes. Computational Linguistics,
    291951.
  • Franz Josef Och, Daniel Gildea, Sanjeev
    Khudanpur, Anoop Sarkar, Kenji Yamada, Alex
    Fraser, Shankar Kumar, Libin Shen, David Smith,
    Katherine Eng, Viren Jain, Zhen Jin, and Dragomir
    Radev. 2003. Syntax for statistical machine
    translation. Technical report, Center for
    Language and Speech Processing, John Hopkins
    University, Baltimore, MD.
Write a Comment
User Comments (0)
About PowerShow.com