CS626/449 : Natural Language Processing, Speech and the Web/Topics in AI Lecture 31: POS Tagging (discussion to assist the CMU pronunciation dictionary assignment) - PowerPoint PPT Presentation

About This Presentation
Title:

CS626/449 : Natural Language Processing, Speech and the Web/Topics in AI Lecture 31: POS Tagging (discussion to assist the CMU pronunciation dictionary assignment)

Description:

(discussion to assist the CMU pronunciation dictionary assignment) Pushpak Bhattacharyya ... Calculation from actual data. Corpus ^ Ram got many NLP books. ... – PowerPoint PPT presentation

Number of Views:249
Avg rating:3.0/5.0
Slides: 12
Provided by: admi1683
Category:

less

Transcript and Presenter's Notes

Title: CS626/449 : Natural Language Processing, Speech and the Web/Topics in AI Lecture 31: POS Tagging (discussion to assist the CMU pronunciation dictionary assignment)


1
CS626/449 Natural Language Processing, Speech
and the Web/Topics in AILecture 31 POS Tagging
(discussion to assist the CMU pronunciation
dictionary assignment)
  • Pushpak BhattacharyyaCSE Dept., IIT Bombay

2
Lexicon
  • Example
  • _ Some_ People_ Jump_ High_ ._
  • Lexicon/ Lexical Example
  • Dictionary Tag
  • Some A (Adjective) Quantifier
  • People N (Noun) lot of people
  • V (Verb) peopled the city with soldiers
  • Jump V (Verb) he jumped high
  • N (Noun) This was a good jump
  • High R (Adverb) He jumped high
  • A (Adjective) high mountain
  • N (Noun) Bombay high on a high

3
Bigram Assumption
  • Best tag sequence
  • T
  • argmax P(TW)
  • argmax P(T)P(WT) (by Bayes Theorem)
  • P(T) P(t0 t1t2 tn1.)
  • P(t0)P(t1t0)P(t2t1t0)P(t3t2t1t0)
  • P(tntn-1tn-2t0)P(tn1tntn-1t0)
  • P(t0)P(t1t0)P(t2t1)
    P(tntn-1)P(tn1tn)
  • P(titi-1) Bigram Assumption

4
Lexical Probability Assumption
  • P(WT) P(w0t0-tn1)P(w1w0t0-tn1)P(w2w1w0t0-t
    n1)
  • P(wnw0-wn-1t0-tn1)P(wn1w0-wnt0-tn1)
  • Assumption A word is determined completely by
    its tag. This is inspired by speech recognition
  • P(woto)P(w1t1) P(wn1tn1)
  • P(witi)
  • P(witi) (Lexical Probability
    Assumption)
  • Thus,
  • argmax P(T)P(WT) Equation

5
Generative Model
_
People_N
Jump_V
High_R
._.
Lexical Probabilities

N
V
A
.
V
N
N
Bigram Probabilities
N
A
A
This model is called Generative model. Here
words are observed from tags as states. This is
similar to HMM.
6
Bigram probabilities

7
Lexical Probability

8
Calculation from actual data
  • Corpus
  • Ram got many NLP books. He found them all very
    interesting.
  • Pos Tagged
  • N V A N N . N V N A R A .

9
Recording numbers
N V A R .
0 2 0 0 0 0
N 0 1 2 1 0 1
V 0 1 0 1 0 0
A 0 1 0 0 1 1
R 0 0 0 1 0 0
. 1 0 0 0 0 0
10
Probabilities
N V A R .
0 1 0 0 0 0
N 0 1/5 2/5 1/5 0 1/5
V 0 1/2 0 1/2 0 0
A 0 1/3 0 0 1/3 1/3
R 0 0 0 1 0 0
. 1 0 0 0 0 0
11
Compare with the Pronunciation Dictionary
Assignment
  • Phoneme Example Translation
  • ------- ------- -----------
  • AE at AE T
  • AH hut HH AH T
  • AO ought AO T
  • AW cow K AW
  • AY hide HH AY D
  • B be B IY

In POS tagging the Labels are already given on
the words. The alignment of Words with
labels are already Given. In the assignment the
most Likely alignment is to be Discovered
followed by the Best possible mapping.
Write a Comment
User Comments (0)
About PowerShow.com