Natural Language Applications of InstanceBased Learning - PowerPoint PPT Presentation

1 / 11
About This Presentation
Title:

Natural Language Applications of InstanceBased Learning

Description:

Toyota Corp. has set up a joint venture firm with... 22 local context features ... conservative heuristics caused severe drop in recall of IE system (41 ... – PowerPoint PPT presentation

Number of Views:24
Avg rating:3.0/5.0
Slides: 12
Provided by: car7159
Category:

less

Transcript and Presenter's Notes

Title: Natural Language Applications of InstanceBased Learning


1
Natural Language Applications of Instance-Based
Learning
  • Stress acquisition (Daelemans et al., 1994)
  • Grapheme-to-phoneme conversion (van den Bosch
    Daelemans, 1993)
  • Part-of-speech tagging (Daelemans et al., 1996)
  • Domain-specific lexical tagging (Cardie, 1993)
  • Word sense disambiguation (Ng Lee, 1996
    Mooney, 1996)
  • Partial parsing (Argamon et al. 1998 Cardie
    Pierce, 1998)
  • PP-attachment (Daelemans et al., 1999)
  • Context-sensitive parsing (Simmons Yu, 1992)
  • Text categorization (Riloff Lehnert, 1994)

2
Part-of-Speech Tagging (Daelemans Zavrel, 1996)
  • Similarity metric Hamming distance
  • Feature weighting Information gain
  • Training Method
  • Tagging
  • If word in lexicon, use the known word case base
  • Else use the unknown word case base

3
Case Representation
  • Known Words 4 features
  • disambiguated tags for two preceding tokens
  • list of possible tags for focus token
  • list of possible tags for following token

4
Case Representation
  • Unknown Words 6 features
  • disambiguated tags for preceding token
  • list of possible tags for following token
  • three last letters (in lieu of morphological
    info)
  • first letter (to provide capitalization and
    prefix information)

5
Results
  • Penn Treebank WSJ (Marcus et al., 1993)
  • 2 million words training
  • 200,000 words blind test set
  • numbers are considered unknown words
  • Performs comparably to transformation-based
    learning (Brill, 1995)
  • Outperforms best Markov model based tagger
    (Weischedel et al., 1993)

6
Domain-Specific Lexical Tagging (Cardie, 1993,
1994)
  • Information extraction system business joint
    ventures
  • Task For each word in the input stream,
    determine its
  • part-of-speech (18)
  • general semantic class (14)
  • specific semantic class (42)
  • information extraction concept (11)
  • Treat each prediction task independently

7
Case Retrieval
  • Similarity metric
  • k-nearest neighbor
  • Hamming distance partial matches
  • Use training corpus to determine k
  • Feature selection Decision trees
  • Retrieval algorithm
  • Retrieve top k cases.
  • Return those cases whose focus word feature
    matches the focus word, if any exist. Otherwise
    return all k cases.
  • Let the retrieved cases vote on the four class
    values.
  • Case base construction
  • p-o-s, IE concept
  • 120 sentences from MUC business JV corpus
  • 2056 cases for open class words
  • semantic classes
  • 175 sentences from MUC business JV corpus
  • 3060 cases for open class words

8
Results
  • Lexical tagging tasks
  • part-of-speech tagging 95.0
  • general semantic class 85.7
  • specific semantic class 86.3
  • IE concept 96.8
  • 60-70 for non-nil concepts
  • Replacing the CBL system for semantic class
    tagging, with handcrafted, conservative
    heuristics caused severe drop in recall of IE
    system (41).
  • No separate procedure or case representation
    needed for unknown words. 89.1 accuracy for
    p-o-s tagging of unknown words.
  • Can detect when known words are appearing in
    entirely new contexts.

9
Word sense disambiguation (Ng Lee, 1996)
  • Sense definitions from WordNet
  • Builds one classifier per word
  • Case representation for word w
  • Li correct p-o-s for words i positions to the
    left
  • Ri correct p-o-s for words i positions to the
    right
  • M morphological form for w
  • Ki... Km binary features that indicate the
    presence of m words that frequently co-occur with
    w in the same sentence
  • determined by computing pw(sense i keyword k)
    for all words that appear with w in a sentence
  • p gt M1
  • k must appear gt M2 times with sense i
  • at most M3 keywords are chosen keywords
  • C1... C9 local collocations containing w
  • e.g. interest, interest rate, national interest
    in
  • V verb predictive of w with sense i as an object

L3, L2, L1, R3, R2, R1, M, K1, ..., Km, C1, ...,
C9, V
10
Case Retrieval
  • Uses the PEBLS system (Cost Salzberg, 1993)
  • 1-nearest neighbor
  • distance between two values v1 and v2 of feature
    f
  • where
  • C 1,i is the number of training examples with
    value v1 for f that is classified as sense i
    and
  • C1 is the number of training examples with value
    v1 for f in any sense

11
Evaluation
  • Bruce Wiebe (1994) data set
  • 2369 sentences with the noun interest
  • 6 possible senses
  • 100 trials, 600 random test sentences, 1769
    training
  • Accuracy 87.4 (1.37 std dev)
  • Bruce Wiebe attain 78 accuracy using
    decomposable probabilistic models
  • substantially higher accuracy than any previous
    WSD work on interest
  • New corpus
  • 192,800 sense tagged words
  • 121 nouns 7.8 senses per noun
  • 70 verbs 12.0 senses per verb
  • frequently occurring account for 20 of all
    occurrences
  • test set 1 (Brown corpus) 54.0 vs. 47.1 for
    most frequent sense
  • test set 2 (WSJ) 68.6 vs. 63.7 for most
    frequent sense
Write a Comment
User Comments (0)
About PowerShow.com