Report on Semisupervised Training for Statistical Parsing

About This Presentation

Title:

Report on Semisupervised Training for Statistical Parsing

Description:

Idea: two different students learn from each other, incrementally, mutually improving ' ... Composite vs. Monolithic. Large parameter space vs. Small ... LTAG ... – PowerPoint PPT presentation

Number of Views:49

Avg rating:3.0/5.0

Slides: 25

Provided by: zhan52

Category:

more less

Transcript and Presenter's Notes

Title: Report on Semisupervised Training for Statistical Parsing

1
Report on Semi-supervised Training for
Statistical Parsing

Zhang Hao
2002-12-18

2
Brief Introduction

Why semi-supervised training?
Co-training framework and applications
Can parsing fit in this framework?
How?
Conclusion

3
Why Semi-supervised Training

Compromise between su and unsu
Pay-offs
Minimize the need for labeled data
Maximize the value of unlabeled data
Easy portability

4
Co-training Scenario

Idea two different students learn from each
other, incrementally, mutually improving
???????difference(motive) mutual
learning(optimize)-gt agreement(objective).
Task to optimize the objective function of
agreement.
Heuristic selection is important what to learn?

5
Blum Mitchell, 98 Co-training Assumptions

Classification problem
Feature redundancy
Allows different views of data
Each view is sufficient for classification
View independency of features, given class

6
Blum Mitchell, 98 Co-training example

Course home page classification (y/n)
Two views content text/anchor text(more perfect
example two sides of a coin)
Two naïve Bayes classifiers should agree

7
Blum Mitchell, 98 Co-Training Algorithm

Given
A set L of labeled training examples
A set U of unlabeled examples
Create a pool U of examples by choosing u
examples at random from U.
Loop for k iterations
Use L to train a classifier h1 that considers
only the x1 portion of x
Use L to train a classifier h2 that considers
only the x2 portion of x
Allow h1 to label p positive and n negative
examples from U
Allow h2 to label p positive and n negative
examples from U
Add these self-labeled examples to L
Randomly choose 2p2n examples from U to
replenish U

The selected examples are those most
confidently labeled ones, i.e. heuristic
selection
np matches the ratio of negtive to positive
examples
8
Family of Algorithms Related to Co-training
Nigam Ghani 2000
9
Parsing As Supertagging and Attaching Sarkar
2001

The difference between parsing and other NLP
applicationsWSD, WBPC, TC, NEI
A tree vs. A label
Composite vs. Monolithic
Large parameter space vs. Small
LTAG
Each word is tagged with a lexicalized elementary
tree (supertagging)
Parsing is a process of substitution and
adjoining of elementary trees
Supertagger finishes a very large part of job a
traditional parser must do

10
A glimpse of Suppertags
11
Two Models to Co-training

H1 selects trees based on previous context
(tagging probability model)
H2 computes attachment between trees and returns
best parse(parsing probability model)

12
Sarkar 2000 Co-training Algorithm

1. Input labeled and unlabeled
2. Update cache
Randomly select sentences from unlabeled and
refill cache
If cache is empty Exit
3. Train models H1 and H2 using labeled
4. Apply H1 and H2 to cache
5. Pick most probable n from H1 (run through H2)
and add to labeled
6. Pick most probable n from H2 and add to
labeled
7. nnk Go to step 2

13
JHU SW2002 tasks

Co-train Collins CFG parser with Sarkar LTAG
parser
Co-train Rerankers
Co-train CCG supertaggers and parsers

14
Co-training The Algorithm

Requires
Two learners with different views of the task
Cache Manager (CM) to interface with the
disparate learners
A small set of labeled seed data and a larger
pool of unlabelled data
Pseudo-Code
Init Train both learners with labeled seed data
Loop
CM picks unlabelled data to add to cache
Both learners label cache
CM selects newly labeled data to add to the
learners' respective training sets
Learners re-train

15
Novel Methods-Parse Selection

Want to select training examples for one parser
(student) labeled by the other (teacher) so as to
minimize noise and maximize training utility.
Top-n choose the n examples for which the
teacher assigned the highest scores.
Difference choose the examples for which the
teacher assigned a higher score than the student
by some threshold.
Intersection choose the examples that received
high scores from the teacher but low scores from
the student.
Disagreement choose the examples for which the
two parsers provided different analyses and the
teacher assigned a higher score than the student.

16
Effect of Parse Selection
17
CFG-LTAG Co-training
18
Re-rankers Co-training

What is Re-ranking?
A re-ranker reorders the output of an n-best
(probabilistic) parser based on features of the
parse
While parsers use local features to make
decisions, re-rankers use features that can span
the entire tree
Instead of co-training parsers, co-train
different re-rankers

19
Re-rankers Co-training

Motivation Why re-rankers?
Speed
parse data once
reordered many times
Objective function
The lower runtime of re-rankers allows us to
explicitly maximize agreement between parses

20
Re-rankers Co-training

Motivation Why re-rankers?
Accuracy
Re-rankers can improve performance of existing
parsers
Collins 00 cites a 13 percent reduction of error
rate by re-ranking
Task closer to classification
A re-ranker can be seen as a binary classifier
either a parse is the best for a sentence or it
isnt
This is the original domain cotraining was
intended for

21
Re-rankers Co-training

Experimental. But much to be explored. Remember
a re-ranker is easier to develop
Reranker 1 Log linear model
Reranker 2 Linear perceptron model

Room for improvement Current best parser
89.7 Oracle that picks best parse from top 50
95
22
JHU SW2002 Conclusion

Largest experimental study to date on the use of
unlabelled data for improving parser performance.
Co-training enhances performance for parsers and
taggers trained on small (50010,000 sentences)
amounts of labeled data.
Co-training can be used for porting parsers
trained on one genre to parse on another without
any new human-labeled data at all, improving on
state-of-the-art for this task.
Even tiny amounts of human-labelled data for the
target genre enhace porting via co-training.
New methods for Parse Selection have been
developed, and play a crucial role.

23
How to Improve Our Parser?

Similar settinglimited labeled data(Penn CTB)
large amount of unlabeled and somewhat deferent
domain data(PKU People Daily)
To try
Re-rankers developing cycle is much shorter,
worthy of trying. Many ML techniques may be
utilized.
Re-rankers agreement is still an open question

24
Thanks

Write a Comment

User Comments (0)