Statistical modelling of MT output corpora for Information Extraction - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

Statistical modelling of MT output corpora for Information Extraction

Description:

Statistical modelling of MT output corpora for Information Extraction. Overview ... words not present in the reference translation (overgenerated in MT) ... – PowerPoint PPT presentation

Number of Views:38
Avg rating:3.0/5.0
Slides: 22
Provided by: ddi4
Category:

less

Transcript and Presenter's Notes

Title: Statistical modelling of MT output corpora for Information Extraction


1
  • Statistical modelling of MT output corpora for
    Information Extraction

2
Overview
  • Using MT output for IE
  • Requirements and evaluation of usability
  • S-score measuring the degree of word
    significance for a text by contrasting text
    and corpus usages
  • Experiment set-up and MT evaluation metrics
  • using differences in S-scores for MT evaluation
  • Results of MT evaluation for IE
  • Comparison of MT systems
  • Correlations with human evaluation measures of MT
  • Issues of MT architecture and evaluation scores
  • Conclusions Future work

3
Using MT for IE
  • Requirements for human use and for automatic
    processing are different
  • fluency is less important than adequacy
  • stylistic errors are less important than factual
    errors, e.g.
  • MT Bill Fisher 'to send a bill to a
    fisher
  • Frequency issues
  • low-frequent words carry the most important
    information (require accurate disambiguation)
  • Some IE tasks use statistical models (expected to
    be different for MT)

4
Frequency issues disambiguation
  • Examples

5
Frequency issues statistical modelling for IE
  • Research on adaptive IE automatic template
    acquisition via statistical means
  • find sentences containing statistically
    significant words
  • build templates around such sentences
  • Template element fillers (e.g., NEs) often appear
    among statistically significant words
  • Distribution of word frequencies is expected to
    be different for MT checking if this is the case

6
Measuring statistical significance
  • Swordtext -- the score of statistical
    significance for a particular word in a
    particular text
  • Pwordtext -- the relative frequency of the word
    in the text
  • Pwordrest-corp -- the relative frequency of the
    same word in the rest of the corpus, without this
    text
  • Nwordtxt-not-found -- the proportion of texts
    in the corpus, where this word is not found
    (number of texts, where it is not found divided
    by number of texts in the corpus)
  • Pwordall-corp -- the relative frequency of the
    word in the whole corpus, including this
    particular text

7
Intuitive appeal of significance scores
  • Selecting words potentially important for IE
  • In the Marseille Facet of the Urba-Gracco Affair,
    Messrs. Emmanuelli, Laignel, Pezet, and Sanmarco
    Confronted by the Former Officials of the SP
    Research Department
  • On Wednesday, February 9, the presiding judge of
    the Court of Criminal Appeals of Lyon, Henri
    Blondet, charged with investigating the Marseille
    facet of the Urba-Gracco affair, proceeded with
    an extensive confrontation among several
    Socialist deputies and former directors of
    Urba-Gracco. Ten persons, including Henri
    Emmanuelli and Andre Laignel, former treasurers
    of the SP, Michel Pezet, and Philippe Sanmarco,
    former deputies (SP) from the Bouches-du-Rhône,
    took part in a hearing which lasted more than
    seven hours

8
...Intuitive appeal of significance scores
  • Ordering words

9
Metric for usability of MT for IE
  • Suggestion measuring differences in statistical
    significance for a human translation and MT
    allows estimating the amount of prospective
    problems
  • Question do any human evaluation measures of MT
    correlate with differences in S-scores for
    different MT systems?

10
Experiment setup
  • Available 100 texts developed for DARPA 94 MT
    evaluation exercise
  • French originals
  • 2 different human translations (reference and
    expert)
  • 5 translations of MT systems ("French into
    English)
  • knowledge-based Systran Reverso Metal
    Globalink
  • IBM statistical approach to MT Candide
  • DARPA evaluation scores available for each system
    and for human expert translation
  • Informativeness Adequacy Fluency
  • Calculating distances of combined S-scores
    between
  • the human reference translation
  • other translations (MT and the expert
    translation)

11
The distance scores
  • Based on comparing sets of words with S-score gt 1
  • words significant in both texts with different
    statistical significance scores
  • words not present in the reference translation
    (overgenerated in MT)
  • words not present in MT, but present in the
    reference translation (undergenerated in MT)
  • Computing distance scores
  • o-score for avoiding overgeneration (
    Presicion)
  • u-score for avoiding undergeneration ( Recall)
  • uo combined score (calculated as F-measure)

12
Computing distance scores...
  • Words that changed their significance
  • Overgeneration score
  • Undergeneration score

13
Computing distance scores
  • Scores for avoiding over- and under-generation
  • Making scores compatible across texts
  • (the number of significant words may be
    different)

14
The resulting distance scores
15
DARPA Adequacy and scores
16
o-score DARPA 94 Adequacy
17
DARPA Fluency and scores
18
uo-score and DARPA 94 Fluency
19
Results and correlation of scores
  • Human expert translation scores higher than MT
  • Statistical MT system Candide is
    characteristically different
  • Strong positive correlation found for
  • o-score DARPA adequacy
  • Weak positive correlation found for
  • uo DARPA fluency
  • No correlation was found between u-score (high
    for statistical MT) and human MT evaluation
    measures

20
Conclusions
  • Word-significance measure S is useful in other
    areas
  • (e.g., distinguishing lexical and morphological
    differences)
  • Threshold S gt 1 distinguishes content and
    functional words across different languages
  • (checked for English, French and Russian)
  • Statistical modelling showed substantial
    differences between human translation and MT
    output corpora
  • Measures of contrastive frequencies for words in
    a particular text and the rest of the corpus
    correlate with human evaluation of MT (scores for
    adequacy)

21
Future work
  • Statistical modelling of Example-based MT
  • Investigating the actual performance of IE
    systems on different tasks using MT of different
    quality (with different "usability for IE"
    scores) and its correlation with proposed MT
    evaluation measures
  • Establishing formal properties for intuitive
    judgements about translation quality (translation
    equivalence, adequacy, and fluency in human
    translation and MT)
Write a Comment
User Comments (0)
About PowerShow.com