Kyoto University - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

Kyoto University

Description:

The number of semi-colons in this sentence. #Semi. The number of colons in this sentence. ... If there exists semi-colon between head and modifier, set as 1; ... – PowerPoint PPT presentation

Number of Views:65
Avg rating:3.0/5.0
Slides: 27
Provided by: nlpKueeK
Category:

less

Transcript and Presenter's Notes

Title: Kyoto University


1
Kyoto University
Language Knowledge Engineering Lab.
Cascaded Classification for High Quality
Head-modifier Pair Selection
Kun Yu 1, Daisuke Kawahara 2 , Sadao
Kurohashi 1
1. Graduate School of Informatics, Kyoto
University 2. Knowledge Creating Communication
Research Center, National Institute of
Information and Communications Technology
NLP2008, Tokyo, Mar.18-20, 2008
2
Outline
  • Motivation
  • High Quality Sentence Selection
  • High Quality Head-modifier Pair Selection
  • Integrating Selected Head-modifier Pairs into
    Parsing
  • Results Discussion
  • Conclusion Future Work

3
Motivation
  • Un-lexical information for lexicalized parsing
  • Head-modifier pairs recognize lexical preference
  • Low quality head-modifier pairs affect parsing
    accuracy
  • Good parse selection cannot ensure high quality
    for all the head-modifier pairs
  • Propose a cascaded classification approach to
  • - select sentences with good parsing
    accuracy by Sent Classifier
  • - select head-modifier pairs with high
    quality from selected sentences by HM Classifier

4
Related Work
  • Most related work is about good parse selection
  • - Reichart and Rappoport (2007) a sample
    ensemble parse assessment algorithm to predict
    the quality of a parse
  • - Yates et al. (2006) an algorithm
    filtering out high quality parses by performing
    semantic analysis
  • Similar to Sent classifier
  • Apply HM classifier after Sent classifier to
    select high quality head-modifier pairs

5
Outline
  • Motivation
  • High Quality Sentence Selection
  • High Quality Head-modifier Pair Selection
  • Integrating Selected Head-modifier Pairs into
    Parsing
  • Results Discussion
  • Conclusion Future Work

6
High Quality Sentence Selection
  • SVM classification Sent classifier
  • Features

7
High Quality Sentence Selection (cont.)
  • Training
  • - analyze training corpus by a syntactic
    analyzer
  • - positive examples sentences whose
    parsing accuracy is higher than ?sent
  • - negative examples left sentences
  • - ?sent 0.95
  • Sentence selection
  • - use SVM score (scoresent) as criteria
  • - senti is high quality sentence if
    scoresent(senti) gt ?sent
  • - ?sent 0

8
Outline
  • Motivation
  • High Quality Sentence Selection
  • High Quality Head-modifier Pair Selection
  • Integrating Selected Head-modifier Pairs into
    Parsing
  • Results Discussion
  • Conclusion Future Work

9
High Quality Head-modifier Pair Selection
  • SVM classification HM classifier
  • Features

10
High Quality Head-modifier Pair Selection (cont.)
  • Training
  • - analyze training corpus by the same
    syntactic analyzer
  • - positive examples correct head-modifier
    pairs
  • - negative examples left head-modifier
    pairs
  • Head-modifier pair selection
  • - use SVM score (scorehm) as criteria
  • - hmi is high quality head-modifier pair if
    scorehm(hmi) gt ?hm
  • - ?hm 0

11
Outline
  • Motivation
  • High Quality Sentence Selection
  • High Quality Head-modifier Pair Selection
  • Integrating Selected Head-modifier Pairs into
    Parsing
  • Results Discussion
  • Conclusion Future Work

12
A Probabilistic Parsing Model
13
Probability Estimation
Estimated by training corpus
Estimated by training corpus selected high
quality head-modifier pairs
Smoothing (collins, 1996) is applied in all
estimation
14
Outline
  • Motivation
  • High Quality Sentence Selection
  • High Quality Head-modifier Pair Selection
  • Integrating Selected Head-modifier Pairs into
    Parsing
  • Results Discussion
  • Conclusion Future Work

15
Experimental Setting
  • Two experiments
  • (1) Head-modifier pair evaluation
  • - to prove the validity of proposed
    approach
  • (2) Parsing evaluation
  • - to check the effectiveness of
    selected high quality head-modifier pairs

16
Head-modifier Pair Evaluation
  • Data set Penn Chinese Treebank 5.1
  • - Sent classifier training 6,204
    sentences
  • - HM classifier training 3,480 sentences
  • - testing 346 sentences
  • Tool
  • - training testing data set analyzer a
    deterministic parser
  • - gold word segmentation and pos-tag

17
Head-modifier Pair Evaluation (cont.)
  • Evaluation metrics

F-score
18
Head-modifier Pair Evaluation (cont.)
  • Model
  • - baseline select all the head-modifier
    pairs in sentences with no more than 30 words
  • - proposed select head-modifier pairs by
    cascaded classification
  • Result

19
Parsing Evaluation
  • Data set Penn Chinese Treebank 5.1
  • - parser training 9,684 sentences
  • - testing 346 sentences
  • - head-modifier pair selection corpus
    syntactic analysis of Chinese Gigaword
  • (analyzed by the same deterministic
    parser in previous test)
  • - dependency structure transformation
    Penn2Malt
  • - gold word segmentation and pos-tag
  • Evaluation metrics
  • - unlabeled attachment score (UAS)

20
Parsing Evaluation (cont.)
  • Head-modifier pair selection model
  • - N/A do not use head-modifier pair
  • - baseline select all the head-modifier
    pairs in sentences with no more than 30 words
  • - proposed select head-modifier pairs by
    cascaded classification
  • Result

21
Discussion
  • Proper parameter setting

22
Discussion (cont.)
  • Classifier training
  • - Sent classifier HM classifier are
    trained on gold word segmentation pos-tag
  • - the syntactic analysis of Chinese
    Gigaword is based on real word segmentation
    pos-tag
  • - word segmentation pos-tag error may
    affect quality of selected head-modifier pairs

23
Outline
  • Motivation
  • High Quality Sentence Selection
  • High Quality Head-modifier Pair Selection
  • Integrating Selected Head-modifier Pairs into
    Parsing
  • Results Discussion
  • Conclusion Future Work

24
Conclusion
  • Propose a cascaded classification approach
  • Select both high quality sentence and high
    quality head-modifier pairs in sequence
  • Increase F-score of selected head-modifier pairs
    than only using sentence length as selection
    criteria
  • Selected high quality head-modifier pairs give
    more help to lexicalized parsing than the
    head-modifier pairs selected by sentence length

25
Future Work
  • Learn parameter setting by development data
  • Train Sent classifier HM classifier on real
    word segmentation and pos-tag
  • Compare the proposed approach with other good
    parse selection approaches

26
Thanks!
kunyu_at_nlp.kuee.kyoto-u.ac.jp
Write a Comment
User Comments (0)
About PowerShow.com