PartofSpeech Tagging - PowerPoint PPT Presentation

1 / 27

About This Presentation

Title:

PartofSpeech Tagging

Description:

The representative put chairs on the table. AT NN VBD NNS IN AT NN. Using Brown/Penn tag sets ... bj.l : probability that word (or word class) l is emitted by ... – PowerPoint PPT presentation

Number of Views:65

Avg rating:3.0/5.0

Slides: 28

Provided by: klplReP

Category:

more less

Transcript and Presenter's Notes

Title: PartofSpeech Tagging

1
Part-of-Speech Tagging

???? ???
? ? ?

2
The beginning

The task of labeling (or tagging) each word in a
sentence with its appropriate part of speech.
The representative put chairs on the table
AT NN VBD NNS
IN AT NN
Using Brown/Penn tag sets
A problem of limited scope
Instead of constructing a complete parse
Fix the syntactic categories of the word in a
sentence
Tagging is a limited but useful application.
Information extraction
Question and answering
Shallow parsing

3
The Information Sources in Tagging

Syntagmatic look at the tags assigned to nearby
words some combinations are highly likely while
others are highly unlikely or impossible
ex) a new play
AT JJ NN
AT JJ VBP
Lexical look at the word itself. (90 accuracy
just by picking the most likely tag for each
word)
Verb is more likely to be a noun than a verb

4
Notation

wi the word at position i in the corpus
ti the tag of wi
wi,im the words occurring at positions i
through im
ti,im the tags ti tim for wi wim
wl the lth word in the lexicon
tj the jth tag in the tag set
C(wl) the number of occurrences of wl in the
training set
C(tj) the number of occurrences of tj in the
training set
C(tj,tk) the number of occurrences of tj followed
by tk
C(wl,tj) the number of occurrences of wl that are
tagged as tj
T number of tags in tag set
W number of words in the lexicon
n sentence length

5
The Probabilistic Model (I)

The sequence of tags in a text as Markov chain.
A words tag only depends on the previous tag
(Limited horizon)
Dependency does not change over time (Time
invariance)
compact notation Limited Horizon Property

6
The Probabilistic Model (II)

Maximum likelihood estimate tag following

7
The Probabilistic Model (III)

(We define P(t1t0)1.0 to simplify
our notation)
The final equation

8
The Probabilistic Model (III)

Algorithm for training a Visible Markov Model
Tagger
Syntagmatic Probabilities
for all tags tj do
for all tags tk do
P(tk tj)C(tj, tk)/C(tj)
end
end
Lexical Probabilities
for all tags tj do
for all words wl do
P(wl tj)C(wl, tj)/C(tj)
end
end

9
The Probabilistic Model (IV)
ltIdealized counts of some tag transitions in the
Brown Corpusgt
10
The Probabilistic Model (V)
ltIdealized counts for the tags that some words
occur with in the Brown Corpusgt
11
The Viterbi algorithm

comment Given a sentence of length n
comment Initialization
d1(PERIOD) 1.0
d1(t) 0.0 for t ? PERIOD
comment Induction
for i 1 to n step 1 do
for all tags tj do
di1(tj) max1ltkltTdi(tk)P(wi1tj)P(tj
tk)
?i1(tj) argmax1ltkltTdi(tk)P(wi1tj)P
(tjtk)
end
end
comment Termination and path-readout
Xn1 argmax1ltjltT dn1(j)
for j n to 1 step 1 do
Xj ?j1(Xj1)
end
P(X1 , , Xn) max1ltjltT dn1(tj)

12
Variations (I)

Unknown words
Unknown words are a major problem for taggers
The simplest model for unknown words
Assume that they can be of any part of speech
Use morphological information
Past tense form words ending in ed
Capitalized

13
Variations (II)

Trigram taggers
The basic Markov Model tagger bigram tagger
two tag memory
disambiguate more cases
Interpolation and variable memory
trigram tagger may make worse pridictions than a
bigram tagger
linear interpolation
Variable Memory Markov Model

14
Variations (III)

Smoothing
Reversibility
Markov model decodes from left to right
decodes from right to left

Kl is the number of possible parts of speech of wl
15
Variations (IV)

Maximum Likelihood Sequence vs. tag by tag
Viterbi Alogorithm maximize P(t1,nw1,n)
Consider maximize P(tiw1,n)
for all i which amounts to summing over different
tag sequance
ex) Time flies like a arrow.
a. NN VBZ RB AT NN. P(.) 0.01
b. NN NNS VB AT NN. P(.) 0.01
c. NN NNS RB AT NN. P(.) 0.001
d. NN VBZ VB AT NN. P(.) 0
one error does not affect the tagging of other
words

16
Applying HMMs to POS tagging(I)

If we have no training data, we can use a HMM to
learn the regularities of tag sequences.
HMM consists of the following elements
a set of states ( tags )
an output alphabet ( words or classes of words )
initial state probabilities
state transition probabilities
symbol emission probabilities

17
Applying HMMs to POS tagging(II)

Jelineks method
bj.l probability that word (or word class) l is
emitted by tag j

18
Applying HMMs to POS tagging(III)

Kupiecs method

L is the number of indices in L
19
Transformation-Based Learning of Tags

Markov assumption are too crude?
transformation-based tagging
Exploit a wider range
An order of magnitude fewer decisions
Two key components
a specification of which error-correcting
transformations are admissible
The learning algorithm

20
Transformation(I)

A triggering environment
A rewrite rule
Form t1?t2 replace t1 by t2

21
Transformation(II)

environments can be conditioned
combination of words and tags
Morphology-triggered transformation
ex) Replace NN by NNS if the unknown words
suffix is -s

22
The learning algorithm

C0 corpus with each word tagged with its most
frequent tag
for k0 step 1 do
?the transformation ui that minimizes
E(ui(Ck))
if (E(Ck)-E(?(Ck))) lt ? then break fi
Ck1 ?(Ck)
tk1 ?
end
Output sequence t1, , tk

23
Relation to other models

Decision trees
similarity with Transformation-based learning
a series of relableing
difference with Transformation-based learning
split at each node in a decision tree
different sequence of transformation for each
node
Probabilistic models in general

24
Automata

Transformation-based tagging has a rule
component, it also has a quantitative component.
Once learning is complete, transformation-based
tagging is purely symbolic
Transformation-based tagger can be converted into
another symbolic object
Roche and Schobes(1995) finite state transducer
Advantage speed

25
Other Method, Other Languages