Title: CS626/449 : Natural Language Processing, Speech and the Web/Topics in AI Lecture 31: POS Tagging (discussion to assist the CMU pronunciation dictionary assignment)
1CS626/449 Natural Language Processing, Speech
and the Web/Topics in AILecture 31 POS Tagging
(discussion to assist the CMU pronunciation
dictionary assignment)
- Pushpak BhattacharyyaCSE Dept., IIT Bombay
2Lexicon
- Example
- _ Some_ People_ Jump_ High_ ._
- Lexicon/ Lexical Example
- Dictionary Tag
- Some A (Adjective) Quantifier
- People N (Noun) lot of people
- V (Verb) peopled the city with soldiers
- Jump V (Verb) he jumped high
- N (Noun) This was a good jump
- High R (Adverb) He jumped high
- A (Adjective) high mountain
- N (Noun) Bombay high on a high
3Bigram Assumption
- Best tag sequence
- T
- argmax P(TW)
- argmax P(T)P(WT) (by Bayes Theorem)
- P(T) P(t0 t1t2 tn1.)
- P(t0)P(t1t0)P(t2t1t0)P(t3t2t1t0)
- P(tntn-1tn-2t0)P(tn1tntn-1t0)
- P(t0)P(t1t0)P(t2t1)
P(tntn-1)P(tn1tn) -
- P(titi-1) Bigram Assumption
4Lexical Probability Assumption
- P(WT) P(w0t0-tn1)P(w1w0t0-tn1)P(w2w1w0t0-t
n1) - P(wnw0-wn-1t0-tn1)P(wn1w0-wnt0-tn1)
- Assumption A word is determined completely by
its tag. This is inspired by speech recognition - P(woto)P(w1t1) P(wn1tn1)
-
- P(witi)
-
- P(witi) (Lexical Probability
Assumption) - Thus,
- argmax P(T)P(WT) Equation
5Generative Model
_
People_N
Jump_V
High_R
._.
Lexical Probabilities
N
V
A
.
V
N
N
Bigram Probabilities
N
A
A
This model is called Generative model. Here
words are observed from tags as states. This is
similar to HMM.
6Bigram probabilities
7Lexical Probability
8Calculation from actual data
- Corpus
- Ram got many NLP books. He found them all very
interesting. - Pos Tagged
- N V A N N . N V N A R A .
9Recording numbers
N V A R .
0 2 0 0 0 0
N 0 1 2 1 0 1
V 0 1 0 1 0 0
A 0 1 0 0 1 1
R 0 0 0 1 0 0
. 1 0 0 0 0 0
10Probabilities
N V A R .
0 1 0 0 0 0
N 0 1/5 2/5 1/5 0 1/5
V 0 1/2 0 1/2 0 0
A 0 1/3 0 0 1/3 1/3
R 0 0 0 1 0 0
. 1 0 0 0 0 0
11Compare with the Pronunciation Dictionary
Assignment
- Phoneme Example Translation
- ------- ------- -----------
- AE at AE T
- AH hut HH AH T
- AO ought AO T
- AW cow K AW
- AY hide HH AY D
- B be B IY
In POS tagging the Labels are already given on
the words. The alignment of Words with
labels are already Given. In the assignment the
most Likely alignment is to be Discovered
followed by the Best possible mapping.