Title: Report on Semisupervised Training for Statistical Parsing
1Report on Semi-supervised Training for
Statistical Parsing
2Brief Introduction
- Why semi-supervised training?
- Co-training framework and applications
- Can parsing fit in this framework?
- How?
- Conclusion
3Why Semi-supervised Training
- Compromise between su and unsu
- Pay-offs
- Minimize the need for labeled data
- Maximize the value of unlabeled data
- Easy portability
4Co-training Scenario
- Idea two different students learn from each
other, incrementally, mutually improving - ???????difference(motive) mutual
learning(optimize)-gt agreement(objective). - Task to optimize the objective function of
agreement. - Heuristic selection is important what to learn?
5Blum Mitchell, 98 Co-training Assumptions
- Classification problem
- Feature redundancy
- Allows different views of data
- Each view is sufficient for classification
- View independency of features, given class
6Blum Mitchell, 98 Co-training example
- Course home page classification (y/n)
- Two views content text/anchor text(more perfect
example two sides of a coin) - Two naïve Bayes classifiers should agree
7Blum Mitchell, 98 Co-Training Algorithm
- Given
- A set L of labeled training examples
- A set U of unlabeled examples
- Create a pool U of examples by choosing u
examples at random from U. - Loop for k iterations
- Use L to train a classifier h1 that considers
only the x1 portion of x - Use L to train a classifier h2 that considers
only the x2 portion of x - Allow h1 to label p positive and n negative
examples from U - Allow h2 to label p positive and n negative
examples from U - Add these self-labeled examples to L
- Randomly choose 2p2n examples from U to
replenish U
The selected examples are those most
confidently labeled ones, i.e. heuristic
selection
np matches the ratio of negtive to positive
examples
8Family of Algorithms Related to Co-training
Nigam Ghani 2000
9Parsing As Supertagging and Attaching Sarkar
2001
- The difference between parsing and other NLP
applicationsWSD, WBPC, TC, NEI - A tree vs. A label
- Composite vs. Monolithic
- Large parameter space vs. Small
- LTAG
- Each word is tagged with a lexicalized elementary
tree (supertagging) - Parsing is a process of substitution and
adjoining of elementary trees - Supertagger finishes a very large part of job a
traditional parser must do
10A glimpse of Suppertags
11Two Models to Co-training
- H1 selects trees based on previous context
(tagging probability model) - H2 computes attachment between trees and returns
best parse(parsing probability model)
12Sarkar 2000 Co-training Algorithm
- 1. Input labeled and unlabeled
- 2. Update cache
- Randomly select sentences from unlabeled and
refill cache - If cache is empty Exit
- 3. Train models H1 and H2 using labeled
- 4. Apply H1 and H2 to cache
- 5. Pick most probable n from H1 (run through H2)
and add to labeled - 6. Pick most probable n from H2 and add to
labeled - 7. nnk Go to step 2
13JHU SW2002 tasks
- Co-train Collins CFG parser with Sarkar LTAG
parser - Co-train Rerankers
- Co-train CCG supertaggers and parsers
14Co-training The Algorithm
- Requires
- Two learners with different views of the task
- Cache Manager (CM) to interface with the
disparate learners - A small set of labeled seed data and a larger
pool of unlabelled data - Pseudo-Code
- Init Train both learners with labeled seed data
- Loop
- CM picks unlabelled data to add to cache
- Both learners label cache
- CM selects newly labeled data to add to the
learners' respective training sets - Learners re-train
15Novel Methods-Parse Selection
- Want to select training examples for one parser
(student) labeled by the other (teacher) so as to
minimize noise and maximize training utility. - Top-n choose the n examples for which the
teacher assigned the highest scores. - Difference choose the examples for which the
teacher assigned a higher score than the student
by some threshold. - Intersection choose the examples that received
high scores from the teacher but low scores from
the student. - Disagreement choose the examples for which the
two parsers provided different analyses and the
teacher assigned a higher score than the student.
16Effect of Parse Selection
17CFG-LTAG Co-training
18Re-rankers Co-training
- What is Re-ranking?
- A re-ranker reorders the output of an n-best
(probabilistic) parser based on features of the
parse - While parsers use local features to make
decisions, re-rankers use features that can span
the entire tree - Instead of co-training parsers, co-train
different re-rankers
19Re-rankers Co-training
- Motivation Why re-rankers?
- Speed
- parse data once
- reordered many times
- Objective function
- The lower runtime of re-rankers allows us to
explicitly maximize agreement between parses
20Re-rankers Co-training
- Motivation Why re-rankers?
- Accuracy
- Re-rankers can improve performance of existing
parsers - Collins 00 cites a 13 percent reduction of error
rate by re-ranking - Task closer to classification
- A re-ranker can be seen as a binary classifier
either a parse is the best for a sentence or it
isnt - This is the original domain cotraining was
intended for
21Re-rankers Co-training
- Experimental. But much to be explored. Remember
a re-ranker is easier to develop - Reranker 1 Log linear model
- Reranker 2 Linear perceptron model
Room for improvement Current best parser
89.7 Oracle that picks best parse from top 50
95
22JHU SW2002 Conclusion
- Largest experimental study to date on the use of
unlabelled data for improving parser performance. - Co-training enhances performance for parsers and
taggers trained on small (50010,000 sentences)
amounts of labeled data. - Co-training can be used for porting parsers
trained on one genre to parse on another without
any new human-labeled data at all, improving on
state-of-the-art for this task. - Even tiny amounts of human-labelled data for the
target genre enhace porting via co-training. - New methods for Parse Selection have been
developed, and play a crucial role.
23How to Improve Our Parser?
- Similar settinglimited labeled data(Penn CTB)
large amount of unlabeled and somewhat deferent
domain data(PKU People Daily) - To try
- Re-rankers developing cycle is much shorter,
worthy of trying. Many ML techniques may be
utilized. - Re-rankers agreement is still an open question
24Thanks