Number Sense Disambiguation - PowerPoint PPT Presentation

About This Presentation

Title:

Number Sense Disambiguation

Description:

Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab) Sabine Buchholz (Toshiba CRL) – PowerPoint PPT presentation

Number of Views:149

Avg rating:3.0/5.0

Slides: 28

Provided by: Stuart285

Category:

more less

Transcript and Presenter's Notes

Title: Number Sense Disambiguation

1
Number Sense Disambiguation

Stuart Moore
Supervised by
Anna Korhonen (Computer Lab)?
Sabine Buchholz (Toshiba CRL)?

2
Number Sense Disambiguation

Similar to Word Sense Disambiguation
Seek to classify numbers into different senses
e.g. year, time, telephone number...

3
Applications

Speech Synthesis
1990
nineteen-ninety
one thousand, nine hundred and ninety
2015
two thousand and fifteen
eight fifteen p.m.
Information Retrieval
Parsing

4
Aim

To successfully classify numbers into sense
categories
To use a semi-supervised method
Avoids the need for a large, human annotated
training set
Allows economical adaptation to different
languages and domains

5
Differences with Word Sense Disambiguation

There are infinitely many numbers you will
almost certainly come across 'digit strings' you
have not seen in training data.
Intuitively, the models for 2007 and 2008 should
be similar
But the model for 5, or 2007.4, should be
different
There is no resource equivalent to a dictionary,
enumerating all possible senses of a number.

6
Previous System

The report Normalization of non-standard Words
(Sproat et al, 2001) defines a taxonomy of 13
'senses' for numbers
They annotated 4 corpora, the largest of which is
a subsection of the North American News Text
Corpus newswire text from 1994-97
They used this to create a decision tree
classifier
The main focus of the report was the performance
when expanding abbreviations, and numbers are not
examined in detail.

7
Number Sense Categories
(Counts are from the training data of the North
American News Text Corpus)?
8
Overview of my system

Based on work by Yarowsky (1995) investigating
decision lists for Word Sense Disambiguation
Takes a few annotated 'seed examples', together
with a large, unannotated corpus.
Generates one model using the seed examples, and
applies this to the unannotated corpus.
This is used as input to generate another model.
The process can be iterated many times

9
Overview of my system
10
Features

The context of each number is examined for a list
of features.
Local context 5 tokens from the number
Punctuation, words, word stems, number features
Specific location (e.g. token following number)?
Wider context 15 tokens from the number
Words and Word stems only
Bag of words (anywhere within the window)?

11
Rules

Each rule is conditional on the presence of one
or two features
Consider all possible combinations of features
that occur together at least five times in the
training corpus.
Based on Yarowsky's rules, but more powerful
He had 'Bag of word' rules, and some rules
combining two words in the local area
He did not have any specific numeric or
punctuation features.

12
Ranking Rules

a is a parameter that can be varied to change the
effect of negative examples on the model
Rank rules according to log likelihood
When classifying, use the first rule that matches
the target sentence

Follows Yarowsky (1995)?
For each rule, count the number of examples for
each number sense
Calculate Log Likelihood

13
Performance as a fully supervised system

We applied the method to the entire training set,
and investigated its performance on the training
and test sets
This gives an idea of the 'upper bound' of
performance of the system

14
Performance on training data
97.2
Log Likelihood cut off
15
Performance on test data
81.2
66.0
Log Likelihood cut off
16
Performance as a fully Supervised system - Summary

Accuracy is 66.0 on test data
Using the most common number type for
unclassified examples increases accuracy to 81.2
The Sproat et al system achieves an accuracy of
97.6 on the same task
Uses decision trees instead of decision lists
Decision trees generally classify everything
less suitable for an iterative process.

17
Performance as a fully Supervised system - Summary

A large proportion of the test data
approximately 25 - was unclassified.
By adding in unlabelled data to the training set,
we hope to increase coverage of the rules, and
thereby boost accuracy
(experiment not yet performed)?

18
Performance as a semi-supervised system

Concept
Provide a small number of seed examples, from
which rules are extrapolated over various
iterations.
Important to have high precision in the first
iteration
(Recall can be low, as long as it's not too low)?
Future iterations aim to improve recall

19
Performance as a semi-supervised system

After experimenting with a few different
strategies for the first iteration, the following
was found to perform best
Rank all rules based on their scores from the
seed examples
For each number type, take the three highest
scoring rules (more if several had an equal
score)?
Apply these rules to the unlabelled data.
If a number is matched by rules from more than
one number type, do not classify it

20
How many seed examples are needed?

Seed examples were randomly picked from the
training data
Equal numbers of seed examples for each number
type
Definite improvement seen for going up to 40 seed
examples
Limited improvement after this point

Precision ( of those assigned where the category
is correct)?
21
Performance of the second iteration training
data
Peak 84.84 (LogLike gt 5.0)?
Baseline - 56.24
Log Likelihood cut off
22
Performance of the second iteration test data
Peak 75.2 (LogLike gt 5.2)?
Using previous peak value, cut off5.0, gives
74.93 accuracy
Log Likelihood cut off
23
Future Work

Error analysis of the data
More sophisticated features
Part of Speech tags, or a parser
More sophisticated rules
Try to allow more than two features per rule,
without creating too many rules to be handled.
Different rule strategies
Closer to a decision tree
Other machine learning methods?

24
Future Work

Increase coverage
Investigate use of document level features, using
method from Stevenson et al, 2008
Investigate different strategies for picking the
seed examples
Distribute according to relative frequency of
categories, rather than a set number per category
Investigate the effects of more unannotated data
Can use sections of the North American News
Corpus that haven't been annotated.

25
Future Work

Consider modifying the number classes
Should some categories be combined?
Would moving the categories into a tree structure
improve performance?
Are different classes needed for different
domains (e.g. financial, biomedical) or
languages?
Investigate corpus for consistency
A few inconsistent examples have been identified

26
(No Transcript)
27
Number Features