Statistical modelling of MT output corpora for Information Extraction - PowerPoint PPT Presentation

1 / 21

About This Presentation

Title:

Statistical modelling of MT output corpora for Information Extraction

Description:

Statistical modelling of MT output corpora for Information Extraction. Overview ... words not present in the reference translation (overgenerated in MT) ... – PowerPoint PPT presentation

Number of Views:38

Avg rating:3.0/5.0

Slides: 22

Provided by: ddi4

Category:

more less

Transcript and Presenter's Notes

Title: Statistical modelling of MT output corpora for Information Extraction

1

Statistical modelling of MT output corpora for
Information Extraction

2
Overview

Using MT output for IE
Requirements and evaluation of usability
S-score measuring the degree of word
significance for a text by contrasting text
and corpus usages
Experiment set-up and MT evaluation metrics
using differences in S-scores for MT evaluation
Results of MT evaluation for IE
Comparison of MT systems
Correlations with human evaluation measures of MT
Issues of MT architecture and evaluation scores
Conclusions Future work

3
Using MT for IE

Requirements for human use and for automatic
processing are different
fluency is less important than adequacy
stylistic errors are less important than factual
errors, e.g.
MT Bill Fisher 'to send a bill to a
fisher
Frequency issues
low-frequent words carry the most important
information (require accurate disambiguation)
Some IE tasks use statistical models (expected to
be different for MT)

4
Frequency issues disambiguation

Examples

5
Frequency issues statistical modelling for IE

Research on adaptive IE automatic template
acquisition via statistical means
find sentences containing statistically
significant words
build templates around such sentences
Template element fillers (e.g., NEs) often appear
among statistically significant words
Distribution of word frequencies is expected to
be different for MT checking if this is the case

6
Measuring statistical significance

Swordtext -- the score of statistical
significance for a particular word in a
particular text
Pwordtext -- the relative frequency of the word
in the text
Pwordrest-corp -- the relative frequency of the
same word in the rest of the corpus, without this
text
Nwordtxt-not-found -- the proportion of texts
in the corpus, where this word is not found
(number of texts, where it is not found divided
by number of texts in the corpus)
Pwordall-corp -- the relative frequency of the
word in the whole corpus, including this
particular text

7
Intuitive appeal of significance scores

Selecting words potentially important for IE
In the Marseille Facet of the Urba-Gracco Affair,
Messrs. Emmanuelli, Laignel, Pezet, and Sanmarco
Confronted by the Former Officials of the SP
Research Department
On Wednesday, February 9, the presiding judge of
the Court of Criminal Appeals of Lyon, Henri
Blondet, charged with investigating the Marseille
facet of the Urba-Gracco affair, proceeded with
an extensive confrontation among several
Socialist deputies and former directors of
Urba-Gracco. Ten persons, including Henri
Emmanuelli and Andre Laignel, former treasurers
of the SP, Michel Pezet, and Philippe Sanmarco,
former deputies (SP) from the Bouches-du-Rhône,
took part in a hearing which lasted more than
seven hours

8
...Intuitive appeal of significance scores

Ordering words

9
Metric for usability of MT for IE

Suggestion measuring differences in statistical
significance for a human translation and MT
allows estimating the amount of prospective
problems
Question do any human evaluation measures of MT
correlate with differences in S-scores for
different MT systems?

10
Experiment setup

Available 100 texts developed for DARPA 94 MT
evaluation exercise
French originals
2 different human translations (reference and
expert)
5 translations of MT systems ("French into
English)
knowledge-based Systran Reverso Metal
Globalink
IBM statistical approach to MT Candide
DARPA evaluation scores available for each system
and for human expert translation
Informativeness Adequacy Fluency
Calculating distances of combined S-scores
between
the human reference translation
other translations (MT and the expert
translation)

11
The distance scores

Based on comparing sets of words with S-score gt 1
words significant in both texts with different
statistical significance scores
words not present in the reference translation
(overgenerated in MT)
words not present in MT, but present in the
reference translation (undergenerated in MT)
Computing distance scores
o-score for avoiding overgeneration (
Presicion)
u-score for avoiding undergeneration ( Recall)
uo combined score (calculated as F-measure)

12
Computing distance scores...

Words that changed their significance

Overgeneration score

Undergeneration score

13
Computing distance scores

Scores for avoiding over- and under-generation

Making scores compatible across texts
(the number of significant words may be
different)

14
The resulting distance scores
15
DARPA Adequacy and scores
16
o-score DARPA 94 Adequacy
17
DARPA Fluency and scores
18
uo-score and DARPA 94 Fluency
19
Results and correlation of scores

Human expert translation scores higher than MT
Statistical MT system Candide is
characteristically different
Strong positive correlation found for
o-score DARPA adequacy
Weak positive correlation found for
uo DARPA fluency
No correlation was found between u-score (high
for statistical MT) and human MT evaluation
measures

20
Conclusions

Word-significance measure S is useful in other
areas
(e.g., distinguishing lexical and morphological
differences)
Threshold S gt 1 distinguishes content and
functional words across different languages
(checked for English, French and Russian)
Statistical modelling showed substantial
differences between human translation and MT
output corpora
Measures of contrastive frequencies for words in
a particular text and the rest of the corpus
correlate with human evaluation of MT (scores for
adequacy)

21
Future work

Statistical modelling of Example-based MT
Investigating the actual performance of IE
systems on different tasks using MT of different
quality (with different "usability for IE"
scores) and its correlation with proposed MT
evaluation measures
Establishing formal properties for intuitive
judgements about translation quality (translation
equivalence, adequacy, and fluency in human
translation and MT)