Rule Learning Latest Results - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

Rule Learning Latest Results

Description:

Seed generation: for each sentence pair, one or more seed rules are ... Simple sentences with DO, INDO, and some other expressions, e.g. 'The man saw the woman. ... – PowerPoint PPT presentation

Number of Views:56
Avg rating:3.0/5.0
Slides: 19
Provided by: Kath164
Category:

less

Transcript and Presenter's Notes

Title: Rule Learning Latest Results


1
Rule Learning Latest Results
  • Katharina Probst
  • September 18, 2002

2
Reminder Learning Process
  • Seed generation for each sentence pair, one or
    more seed rules are produced
  • Compositionality based on previously learned
    rules, as much structure as possible is added to
    the seed rules
  • Version space learning the compositional seed
    rules are grouped by constituent sequences and
    alignments, generalized

3
Summer 2002 timeline
4
Evaluation Corpus
  • 141 phrases and sentences
  • Restricted to simple constructions
  • Some NPs, e.g. the very tall man
  • Some AdjPs, e.g. very tall
  • Simple sentences with DO, INDO, and some other
    expressions, e.g. The man saw the woman., The
    family ate dinner.
  • No PPs, relative clauses, questions, etc.

5
Target language German
  • Evaluation corpus was translated into German and
    aligned
  • Some alignment issues, e.g.
  • The wife is sitting down.
  • Die Ehefrau setzt sich hin.
  • ((1,1),(2,2),(4,3),(5,5))

6
Iterative Type Learning
  • Among (many) other things, the system reads in a
    file which specifies in what order the types of
    transfer rules are to be learned.
  • Types are specified in the training corpus for
    each sentence pair.
  • E.g. if the file specifies
  • AdjP,NP,S
  • then the system learns the rules in this order
    and no other rules.

7
Cross-Validation
  • The training corpus was divided into 10 parts
  • Training on 90 of the corpus, testing on
    remaining 10 of the corpus
  • The 10 test set is obtained by excluding every
    (k10)th sentence from the training set, 1 ? k ? 9

8
Translation Accuracy (I)
9
Translation Accuracy (II)
10
Translation Accuracy (III) - Comparison
11
Average number of possible translations per
sentence
12
Number of version spaces
13
Average number of sentences per version space
14
Average number of merges per version space
15
Proportion of compositional rules
16
Average number of compositional elements per rule
17
Conclusions
  • Preliminary results may not scale, so conclusions
    are to be taken with a grain of salt
  • Version Space learning helps
  • Compositionality improves test set accuracy
  • On average, 4 sentences per VS are merged 1.6
    times
  • With lots of TL information, on average about 4-5
    possible translations per sentences

18
Future work
  • Bigger corpus
  • Different languages
  • Less information on TL side
  • Reverse translation direction
  • Refine algorithms (e.g. step size)
Write a Comment
User Comments (0)
About PowerShow.com