Title: Natural Language Applications of InstanceBased Learning
1Natural Language Applications of Instance-Based
Learning
- Stress acquisition (Daelemans et al., 1994)
- Grapheme-to-phoneme conversion (van den Bosch
Daelemans, 1993) - Part-of-speech tagging (Daelemans et al., 1996)
- Domain-specific lexical tagging (Cardie, 1993)
- Word sense disambiguation (Ng Lee, 1996
Mooney, 1996) - Partial parsing (Argamon et al. 1998 Cardie
Pierce, 1998) - PP-attachment (Daelemans et al., 1999)
- Context-sensitive parsing (Simmons Yu, 1992)
- Text categorization (Riloff Lehnert, 1994)
2Part-of-Speech Tagging (Daelemans Zavrel, 1996)
- Similarity metric Hamming distance
- Feature weighting Information gain
- Training Method
- Tagging
- If word in lexicon, use the known word case base
- Else use the unknown word case base
3Case Representation
- Known Words 4 features
- disambiguated tags for two preceding tokens
- list of possible tags for focus token
- list of possible tags for following token
4Case Representation
- Unknown Words 6 features
- disambiguated tags for preceding token
- list of possible tags for following token
- three last letters (in lieu of morphological
info) - first letter (to provide capitalization and
prefix information)
5Results
- Penn Treebank WSJ (Marcus et al., 1993)
- 2 million words training
- 200,000 words blind test set
- numbers are considered unknown words
- Performs comparably to transformation-based
learning (Brill, 1995) - Outperforms best Markov model based tagger
(Weischedel et al., 1993)
6Domain-Specific Lexical Tagging (Cardie, 1993,
1994)
- Information extraction system business joint
ventures - Task For each word in the input stream,
determine its - part-of-speech (18)
- general semantic class (14)
- specific semantic class (42)
- information extraction concept (11)
- Treat each prediction task independently
7Case Retrieval
- Similarity metric
- k-nearest neighbor
- Hamming distance partial matches
- Use training corpus to determine k
- Feature selection Decision trees
- Retrieval algorithm
- Retrieve top k cases.
- Return those cases whose focus word feature
matches the focus word, if any exist. Otherwise
return all k cases. - Let the retrieved cases vote on the four class
values. - Case base construction
- p-o-s, IE concept
- 120 sentences from MUC business JV corpus
- 2056 cases for open class words
- semantic classes
- 175 sentences from MUC business JV corpus
- 3060 cases for open class words
8Results
- Lexical tagging tasks
- part-of-speech tagging 95.0
- general semantic class 85.7
- specific semantic class 86.3
- IE concept 96.8
- 60-70 for non-nil concepts
- Replacing the CBL system for semantic class
tagging, with handcrafted, conservative
heuristics caused severe drop in recall of IE
system (41). - No separate procedure or case representation
needed for unknown words. 89.1 accuracy for
p-o-s tagging of unknown words. - Can detect when known words are appearing in
entirely new contexts.
9Word sense disambiguation (Ng Lee, 1996)
- Sense definitions from WordNet
- Builds one classifier per word
- Case representation for word w
- Li correct p-o-s for words i positions to the
left - Ri correct p-o-s for words i positions to the
right - M morphological form for w
- Ki... Km binary features that indicate the
presence of m words that frequently co-occur with
w in the same sentence - determined by computing pw(sense i keyword k)
for all words that appear with w in a sentence - p gt M1
- k must appear gt M2 times with sense i
- at most M3 keywords are chosen keywords
- C1... C9 local collocations containing w
- e.g. interest, interest rate, national interest
in - V verb predictive of w with sense i as an object
L3, L2, L1, R3, R2, R1, M, K1, ..., Km, C1, ...,
C9, V
10Case Retrieval
- Uses the PEBLS system (Cost Salzberg, 1993)
- 1-nearest neighbor
- distance between two values v1 and v2 of feature
f - where
- C 1,i is the number of training examples with
value v1 for f that is classified as sense i
and - C1 is the number of training examples with value
v1 for f in any sense
11Evaluation
- Bruce Wiebe (1994) data set
- 2369 sentences with the noun interest
- 6 possible senses
- 100 trials, 600 random test sentences, 1769
training - Accuracy 87.4 (1.37 std dev)
- Bruce Wiebe attain 78 accuracy using
decomposable probabilistic models - substantially higher accuracy than any previous
WSD work on interest - New corpus
- 192,800 sense tagged words
- 121 nouns 7.8 senses per noun
- 70 verbs 12.0 senses per verb
- frequently occurring account for 20 of all
occurrences - test set 1 (Brown corpus) 54.0 vs. 47.1 for
most frequent sense - test set 2 (WSJ) 68.6 vs. 63.7 for most
frequent sense -