Shallow Parsing Using CRFs PowerPoint PPT Presentation

presentation player overlay

1 / 12

About This Presentation

Transcript and Presenter's Notes

Title: Shallow Parsing Using CRFs

1
Shallow Parsing Using CRFs

Ashutosh Agarwal
Paper by Fei Sha and Pereira

2
Shallow Parsing Chunking

Introduction
Identifies non-recursive parts of various phrase
types in text
e.g., (NP)(Tom Hanks) ran around (NP)(America).
Tom Hanks and America are the two chunks
identified
NP chunking finding out the non-recursive type
of Nps
Also called base NPs

3
Previous Approaches

Machine Learning Approaches
K-order generative probabilistic models
e.g., Hidden markov models
Makes very naive independence assumptions
Otherwise intractable
As a sequence of classification tasks
Classification of lable depends on input data
prev classified labels of words
Trained to make best local decision
Myopic about the effect of current decision on
later decisions

4
CRF NP Chunker

Input to chunker POS tagged corpora
Output Sequence of 'B', 'I', 'O' where
'B' represents beginning of chunk
'I' represents continuation of chunk
'O' represents outside of a chunk
Hence 'OI' can never occur
One label for each word in sentence
e.g. Tom Hanks ran around America.
Output BIOOB

5
Chunking CRF

Has second order markov-dependency between chunk
tags
i.e., CRFs labels pairs of consecutive tags
Thus, label at position i is ci-1ci
And label at i-1 is ci-2ci-1
And label at 0 is c0
These constraints can be forced by giving
appropriate features -infinite weight

6
Chunking CRFs

We can divide our feature set as
f(yi-1, yi,x,i)p(x,i)q(yi-1,yi)?
p(x,i) is the predicate on input sequence x and
position i
e.g., word at position i is 'the'
q(yi-1,yi) is a predicate on pairs of labels
e.g., POS tags at position i and i-1 are DT,NN,
etc

7
Features in chunking CRFs

Since, the number POS tags, labels to be
generated, etc is finite
Hence finite number of features
Millions of features on large training sets
More features lead to better accuracy
Might lead to overfitting though

8
Example Feature Set
9
Evaluation Metric

Precision P
Fraction of output chunks that exactly match the
reference chunks
Recall R
Fraction of reference chunks returned by chunker
F score (used for comparing with other systems)?
F1 score 2 P R/(PR)?

10
Empirical Results
11
Conclusion

Log-linear parsing models have potential to
supplant currently dominant PCFG parsing models
Allows much richer feature set
Simpler smoothing
Avoids label-bias problem
Prevelant in classifier-based parsers

12
References

Shallow Parsing with Conditional Random Fields,
Fei Sha and Fernando Pereira, University of
Pennsylvania

Write a Comment

User Comments (0)

About PowerShow.com

Recommended Relevance Latest Highest Rated Most Viewed

Sort by:

Related More from user