Title: Natural Language Processing
1Natural Language Processing
2New Topic Parts of Speech
- What are they?
- Distribution
- Tagsets
3Parts of Speech
- Eight basic categories
- Noun, verb, pronoun, preposition, adjective,
adverb, article, conjunction - Based on morphological and distributional
properties (not semantics) - Some are easy, others are murky
4Parts of Speech
- Two kinds of category
- Closed class
- Prepositions, articles, conjunctions, pronouns,
auxiliary verbs, particles, numerals - Open class
- Nouns, verbs, adjectives, adverbs
5Distribution
- Most words have one part of speech
- Of the rest, most have two
- A small number of words have lots of parts of
speech - At the bottom of the old well
- Tears of joy would well in her eyes
- Well!
- Do you speak French well?
- The baby is quite well now, thanks.
- Unfortunately, the words with lots of parts of
speech occur with high frequency
6Sets of Parts of SpeechTagsets
- There are various standard tagsets, of different
sizes - The choice of tagset is based on the application
- Accurate tagging can be done with even large
tagsets
7 Tag Description Example Tag Description
Example CC Coordin. Conjunction and, but, or
SYM Symbol ,, CD Cardinal number one,
two, three TO to to DT Determiner a, the
UH Interjection ah, oops EX Existential
there there VB Verb, base form eat FW
Foreign word mea culpa VBD Verb, past tense
ate IN Preposition/sub-conj of, in, by VBG
Verb, gerund eating JJ Adjective yellow VBN
Verb, past participle eaten JJR Adj.,
comparative bigger VBP Verb, non-3sg pres eat
JJS Adj., superlative wildest VBZ Verb, 3sg
pres eats LS List item marker 1, 2, One WDT
Wh-determiner which, that MD Modal can,
should WP Wh-pronoun what, who NN Noun,
sing. or mass llama WP Possessive wh- whose
NNS Noun, plural llamas WRB Wh-adverb how,
where NNP Proper noun, singular IBM Dollar
sign NNPS Proper noun, plural Carolinas
Pound sign PDT Predeterminer all, both
Left quote ( or ) POS Possessive ending
s Right quote ( or ) PP Personal
pronoun I, you, he ( Left parenthesis (,(,
f, lt) PP Possessive pronoun your, ones )
Right parenthesis (,), g, gt) RB Adverb
quickly, never , Comma , RBR Adverb,
comparative faster . Sentence-final punc (. !
?) RBS Adverb, superlative fastest
Mid-sentence punc ( ... -) RP Particle
up, off k
8Tagging
- Part of speech tagging is the process of
assigning parts of speech to each word in a
sentence Assume we have - A tagset
- A dictionary/lexicon that gives you the possible
set of tags for each entry - A text to be tagged
9Motivations
- Heres a simple one someone say
- Refuse
- Project
- Compact
- Word sense
- Parsing
10Three Methods
- Rules
- Probabilities
- Sort of both
11Rules
- Hand-crafted rules for ambiguous words that test
the context to make appropriate choices - Early attempts fairly error-prone
- Extremely labor-intensive
12Other approaches
- See the next lecture slides