Classifying Unknown Proper Noun Phrases Without Context - PowerPoint PPT Presentation

About This Presentation

Title:

Classifying Unknown Proper Noun Phrases Without Context

Description:

Classifying Unknown Proper Noun Phrases Without Context Joseph Smarr & Christopher D. Manning Symbolic Systems Program Stanford University April 5, 2002 – PowerPoint PPT presentation

Number of Views:161

Avg rating:3.0/5.0

Slides: 27

Provided by: josephs94

Category:

more less

Transcript and Presenter's Notes

Title: Classifying Unknown Proper Noun Phrases Without Context

1
Classifying Unknown Proper Noun Phrases Without
Context

Joseph Smarr Christopher D. Manning
Symbolic Systems Program
Stanford University
April 5, 2002

2
The Problem of Unknown Words

No statistics are generated for unknown words ?
problematic for statistical NLP
Same problem for Proper Noun Phrases
Also need to bracket entire PNP
Particularly acute in domains with large number
of terms or new words being constantly generated
Drug names
Company names
Movie titles
Place Names
Peoples Names

3
Proper Noun Phrase Classification

Task Given a Proper Noun Phrase (one or more
words that collectively refer to an entity),
assign it a semantic class (e.g. drug name,
company name, etc)
Example MUC ENAMEX test (classifying PNPs in
text as organizations, places, and people)
Problem How do we classify unknown PNPs?

4
Existing Techniques for PNP Classification

Large, manually constructed lists of names
Includes common words (Inc., Dr., etc.)
Syntactic patterns in surrounding context
XXXX himself ? person
profession of/at/with XXXX ? organization
Machine learning with word-level features
Capitalization, punctuation, special chars, etc.

5
Limitations of Existing Techniques

Manually constructed lists and rules
Slow/expensive to create and maintain
Domain-specific solutions
Wont generate to new categories
Misses valuable source of information
People often classify PNPs by how they look

Cotrimoxazole
Wethersfield
Alien Fury Countdown to Invasion
6
Whats in a Name?

Claim If people can classify unknown PNPs
without context, they must be using the
composition of the PNP itself
Common accompanying words
Common letters and letter sequences
Number and length of words in PNP
Idea Build a statistical generative model that
captures these features from data

7
Common Words and Letter Sequences
8
Number and Length of Words
9
Generative Model Used for Classification

Probabilistic generative model for each category
Parameters set from
statistics in training data
cross-validation on held-out data (20)
Standard Bayesian Classification

Predicted-Category(pnp) argmaxc P(cpnp)
argmaxc
P(c)aP(pnpc)
10
Generative Model for Each Category
Length n-gram model and word model P(pnpc)
Pn-gram(word-lengths(pnp))
Pword i?pnp P(wiword-length(wi))
Word model mixture of character n-gram model and
common word model P(wilen) llenPn-gram(wilen)
k/len (1-llen) Pword(wilen)
N-Gram Models deleted interpolation P0-gram(symbo
lhistory) uniform-distribution Pn-gram(sh)
lC(h)Pempirical(sh) (1- lC(h))P(n-1)-gram(sh)
11
Walkthrough Example Alec Baldwin

Length sequence 0, 0, 0, 4, 7, 0
Words ____Alec , lec Baldwin

Cumulative Log Probability
12
Walkthrough Example Baldwin
Note Baldwin appears both in a persons name and
in a place name
13
Experimental Setup

Five categories of Proper Noun Phrases
Drugs, companies, movies, places, people
Train on 90 of data, test on 10
20 of training data held-out for parameter
setting (cross validation)
5000 examples per category total
Each result presented is average/stdev of 10
separate train/test folds
Three types of tests
pairwise 1 category vs. 1 category
1-all 1 cateory vs. union of all other
categories
n-way every category for itself

14
Experimental Results Classification Accuracy
15
Experimental ResultsConfusion Matrix
Predicted Category
drug nyse movie place person
drug nyse movie place person
Correct Category
16
Sources of Incorrect Classification

Words that appear in one category drive
classification in other categories
e.g. Delaware misclassified as company because of
GTE Delaware LP, etc.
Inherent ambiguity
e.g. movies named after people/places/etc
? Nuremberg ? John Henry
? Love, Inc. ? Prozac Nation

17
Examples of Misclassified PNPs

Errors from misleading words
Calcium Stanley
Best Foods (24 movies with Best, 2 companies)
Bloodhounds, Inc.
Nebraska (movie One Standing Nebraska)
Chris Rock (24 movies with Rock, no other people)
Can you classify these PNPs?
R C
Randall Hopkirk
Steeple Aston
Nandanar
Gerdau

18
Contribution of Model Features

Character n-gram is best single feature
Word model is good, but subsumed by character
n-gram
Length n-gram helps character n-gram, but not much

19
Effect of Increasing N-Gram Length
character n-gram model
length n-gram model

Classification accuracy of n-gram models alone
Longer n-grams are useful, but only to a point

20
Effect of Increasing Training Data

Classifier approaches full potential with little
training data
Increasing training data even more is unlikely to
help much

21
Compensating for Word-Length Bias

Problem Character n-gram model places more
emphasis on longer words because more terms get
multiplied
But are longer words really more important?
Solution Take (k/length)th root of each words
probability
Treat each word like a single base with an
ignored exponent
Observation Performance is best when kgt1
Deviation from theoretical expectation

22
Compensating for Word-Length Bias
23
Generative Models Can Also Generate!

Step 1 Stochastically generate word-length
sequence using length n-gram model
Step 2 Generate each word using character n-gram
model

movie Alien in Oz Dragons The Ever Harlane El
Tombre place Archfield Lee-Newcastleridge Qatad
drug Ambenylin Carbosil DM 49 Esidrine Plus
Base with Moisturalent nyse Downe Financial
Grp PR Host Manage U.S.B. Householding
Ltd. Intermedia Inc.
person Benedict W. Suthberg Elias
Lindbert Atkinson Hugh Grob II
24
Acquiring Proficiency in New Domains