Title: Kyoto University
1Kyoto University
Language Knowledge Engineering Lab.
Cascaded Classification for High Quality
Head-modifier Pair Selection
Kun Yu 1, Daisuke Kawahara 2 , Sadao
Kurohashi 1
1. Graduate School of Informatics, Kyoto
University 2. Knowledge Creating Communication
Research Center, National Institute of
Information and Communications Technology
NLP2008, Tokyo, Mar.18-20, 2008
2Outline
- Motivation
- High Quality Sentence Selection
- High Quality Head-modifier Pair Selection
- Integrating Selected Head-modifier Pairs into
Parsing - Results Discussion
- Conclusion Future Work
3Motivation
- Un-lexical information for lexicalized parsing
- Head-modifier pairs recognize lexical preference
- Low quality head-modifier pairs affect parsing
accuracy - Good parse selection cannot ensure high quality
for all the head-modifier pairs - Propose a cascaded classification approach to
- - select sentences with good parsing
accuracy by Sent Classifier - - select head-modifier pairs with high
quality from selected sentences by HM Classifier
4Related Work
- Most related work is about good parse selection
- - Reichart and Rappoport (2007) a sample
ensemble parse assessment algorithm to predict
the quality of a parse - - Yates et al. (2006) an algorithm
filtering out high quality parses by performing
semantic analysis - Similar to Sent classifier
- Apply HM classifier after Sent classifier to
select high quality head-modifier pairs
5Outline
- Motivation
- High Quality Sentence Selection
- High Quality Head-modifier Pair Selection
- Integrating Selected Head-modifier Pairs into
Parsing - Results Discussion
- Conclusion Future Work
6High Quality Sentence Selection
- SVM classification Sent classifier
- Features
7High Quality Sentence Selection (cont.)
- Training
- - analyze training corpus by a syntactic
analyzer - - positive examples sentences whose
parsing accuracy is higher than ?sent - - negative examples left sentences
- - ?sent 0.95
- Sentence selection
- - use SVM score (scoresent) as criteria
- - senti is high quality sentence if
scoresent(senti) gt ?sent - - ?sent 0
8Outline
- Motivation
- High Quality Sentence Selection
- High Quality Head-modifier Pair Selection
- Integrating Selected Head-modifier Pairs into
Parsing - Results Discussion
- Conclusion Future Work
9High Quality Head-modifier Pair Selection
- SVM classification HM classifier
- Features
10High Quality Head-modifier Pair Selection (cont.)
- Training
- - analyze training corpus by the same
syntactic analyzer - - positive examples correct head-modifier
pairs - - negative examples left head-modifier
pairs - Head-modifier pair selection
- - use SVM score (scorehm) as criteria
- - hmi is high quality head-modifier pair if
scorehm(hmi) gt ?hm - - ?hm 0
11Outline
- Motivation
- High Quality Sentence Selection
- High Quality Head-modifier Pair Selection
- Integrating Selected Head-modifier Pairs into
Parsing - Results Discussion
- Conclusion Future Work
12A Probabilistic Parsing Model
13Probability Estimation
Estimated by training corpus
Estimated by training corpus selected high
quality head-modifier pairs
Smoothing (collins, 1996) is applied in all
estimation
14Outline
- Motivation
- High Quality Sentence Selection
- High Quality Head-modifier Pair Selection
- Integrating Selected Head-modifier Pairs into
Parsing - Results Discussion
- Conclusion Future Work
15Experimental Setting
- Two experiments
- (1) Head-modifier pair evaluation
- - to prove the validity of proposed
approach - (2) Parsing evaluation
- - to check the effectiveness of
selected high quality head-modifier pairs
16Head-modifier Pair Evaluation
- Data set Penn Chinese Treebank 5.1
- - Sent classifier training 6,204
sentences - - HM classifier training 3,480 sentences
- - testing 346 sentences
- Tool
- - training testing data set analyzer a
deterministic parser - - gold word segmentation and pos-tag
17Head-modifier Pair Evaluation (cont.)
F-score
18Head-modifier Pair Evaluation (cont.)
- Model
- - baseline select all the head-modifier
pairs in sentences with no more than 30 words - - proposed select head-modifier pairs by
cascaded classification - Result
19Parsing Evaluation
- Data set Penn Chinese Treebank 5.1
- - parser training 9,684 sentences
- - testing 346 sentences
- - head-modifier pair selection corpus
syntactic analysis of Chinese Gigaword - (analyzed by the same deterministic
parser in previous test) - - dependency structure transformation
Penn2Malt - - gold word segmentation and pos-tag
- Evaluation metrics
- - unlabeled attachment score (UAS)
20Parsing Evaluation (cont.)
- Head-modifier pair selection model
- - N/A do not use head-modifier pair
- - baseline select all the head-modifier
pairs in sentences with no more than 30 words - - proposed select head-modifier pairs by
cascaded classification - Result
21Discussion
22Discussion (cont.)
- Classifier training
- - Sent classifier HM classifier are
trained on gold word segmentation pos-tag - - the syntactic analysis of Chinese
Gigaword is based on real word segmentation
pos-tag - - word segmentation pos-tag error may
affect quality of selected head-modifier pairs
23Outline
- Motivation
- High Quality Sentence Selection
- High Quality Head-modifier Pair Selection
- Integrating Selected Head-modifier Pairs into
Parsing - Results Discussion
- Conclusion Future Work
24Conclusion
- Propose a cascaded classification approach
- Select both high quality sentence and high
quality head-modifier pairs in sequence - Increase F-score of selected head-modifier pairs
than only using sentence length as selection
criteria - Selected high quality head-modifier pairs give
more help to lexicalized parsing than the
head-modifier pairs selected by sentence length
25Future Work
- Learn parameter setting by development data
- Train Sent classifier HM classifier on real
word segmentation and pos-tag - Compare the proposed approach with other good
parse selection approaches
26Thanks!
kunyu_at_nlp.kuee.kyoto-u.ac.jp