CS626/449 : Natural Language Processing, Speech and the Web/Topics in AI Lecture 31: POS Tagging (discussion to assist the CMU pronunciation dictionary assignment)

About This Presentation

Title:

CS626/449 : Natural Language Processing, Speech and the Web/Topics in AI Lecture 31: POS Tagging (discussion to assist the CMU pronunciation dictionary assignment)

Description:

(discussion to assist the CMU pronunciation dictionary assignment) Pushpak Bhattacharyya ... Calculation from actual data. Corpus ^ Ram got many NLP books. ... – PowerPoint PPT presentation

Number of Views:249

Avg rating:3.0/5.0

Slides: 12

Provided by: admi1683

Category:

more less

Transcript and Presenter's Notes

Title: CS626/449 : Natural Language Processing, Speech and the Web/Topics in AI Lecture 31: POS Tagging (discussion to assist the CMU pronunciation dictionary assignment)

1
CS626/449 Natural Language Processing, Speech
and the Web/Topics in AILecture 31 POS Tagging
(discussion to assist the CMU pronunciation
dictionary assignment)

Pushpak BhattacharyyaCSE Dept., IIT Bombay

2
Lexicon

Example
_ Some_ People_ Jump_ High_ ._
Lexicon/ Lexical Example
Dictionary Tag
Some A (Adjective) Quantifier
People N (Noun) lot of people
V (Verb) peopled the city with soldiers
Jump V (Verb) he jumped high
N (Noun) This was a good jump
High R (Adverb) He jumped high
A (Adjective) high mountain
N (Noun) Bombay high on a high

3
Bigram Assumption

Best tag sequence
T
argmax P(TW)
argmax P(T)P(WT) (by Bayes Theorem)
P(T) P(t0 t1t2 tn1.)
P(t0)P(t1t0)P(t2t1t0)P(t3t2t1t0)
P(tntn-1tn-2t0)P(tn1tntn-1t0)
P(t0)P(t1t0)P(t2t1)
P(tntn-1)P(tn1tn)
P(titi-1) Bigram Assumption

4
Lexical Probability Assumption

P(WT) P(w0t0-tn1)P(w1w0t0-tn1)P(w2w1w0t0-t
n1)
P(wnw0-wn-1t0-tn1)P(wn1w0-wnt0-tn1)
Assumption A word is determined completely by
its tag. This is inspired by speech recognition
P(woto)P(w1t1) P(wn1tn1)
P(witi)
P(witi) (Lexical Probability
Assumption)
Thus,
argmax P(T)P(WT) Equation

5
Generative Model
_
People_N
Jump_V
High_R
._.
Lexical Probabilities

N
V
A
.
V
N
N
Bigram Probabilities
N
A
A
This model is called Generative model. Here
words are observed from tags as states. This is
similar to HMM.
6
Bigram probabilities

7
Lexical Probability

8
Calculation from actual data

Corpus
Ram got many NLP books. He found them all very
interesting.
Pos Tagged
N V A N N . N V N A R A .

9
Recording numbers
N V A R .
0 2 0 0 0 0
N 0 1 2 1 0 1
V 0 1 0 1 0 0
A 0 1 0 0 1 1
R 0 0 0 1 0 0
. 1 0 0 0 0 0
10
Probabilities
N V A R .
0 1 0 0 0 0
N 0 1/5 2/5 1/5 0 1/5
V 0 1/2 0 1/2 0 0
A 0 1/3 0 0 1/3 1/3
R 0 0 0 1 0 0
. 1 0 0 0 0 0
11
Compare with the Pronunciation Dictionary
Assignment

Phoneme Example Translation
------- ------- -----------
AE at AE T
AH hut HH AH T
AO ought AO T
AW cow K AW
AY hide HH AY D
B be B IY

In POS tagging the Labels are already given on
the words. The alignment of Words with
labels are already Given. In the assignment the
most Likely alignment is to be Discovered
followed by the Best possible mapping.

Write a Comment

User Comments (0)

About PowerShow.com

CS626/449 : Natural Language Processing, Speech and the Web/Topics in AI Lecture 31: POS Tagging (discussion to assist the CMU pronunciation dictionary assignment) - PowerPoint PPT Presentation

CS626/449 : Natural Language Processing, Speech and the Web/Topics in AI Lecture 31: POS Tagging (discussion to assist the CMU pronunciation dictionary assignment)

(discussion to assist the CMU pronunciation dictionary assignment) Pushpak Bhattacharyya ... Calculation from actual data. Corpus ^ Ram got many NLP books. ... – PowerPoint PPT presentation