Large Margin Dependency Parsing Local Constraints and Laplacian Regularization - PowerPoint PPT Presentation

1 / 30

About This Presentation

Title:

Large Margin Dependency Parsing Local Constraints and Laplacian Regularization

Description:

Large Margin Dependency Parsing. Local Constraints and Laplacian Regularization. Qin Iris Wang Colin Chery. Dan Lizotte Dale Schuurmans. University of Alberta ... – PowerPoint PPT presentation

Number of Views:58

Avg rating:3.0/5.0

Slides: 31

Provided by: wang6

Category:

more less

Transcript and Presenter's Notes

Title: Large Margin Dependency Parsing Local Constraints and Laplacian Regularization

1
Large Margin Dependency Parsing Local
Constraints and Laplacian Regularization

Qin Iris Wang
Colin Chery
Dan Lizotte
Dale Schuurmans
University of Alberta
wqin, colinc, dlizotte, dale_at_cs.ualberta.ca

2
Large Margin Training in Parsing

Discriminative training in parsing
Taskar et al. 2004, Tsochantaridis et al. 2004,
McDonald et at. 2005a
State of the art performance in dependency
parsing
McDonald et at. 2005a, 2005b, 2006
But, they didnt consider
The error of any particular component in a tree
Global loss of the whole parse tree
Smoothing methods

3
Our Contributions

Two ideas for improving large margin training
Using local constraints to capture local errors
in a parse tree
Using Laplacian regularization (based on
distributional word similarity) to deal with data
sparseness

4
Outline

Dependency parsing model
Large margin training
Training with local constraints
Laplacian regularization
Experimental results
Related work and conclusions

5
Dependency Tree

A dependency tree structure for the sentence
Syntactic relationships between word pairs in a
sentence

6
Dependency Parsing Model

W (w1, , wn ) an input sentence
T a candidate dependency tree
the set of possible dependency trees
spanning W.

Eisner 1996 McDonald 2005
7
Features for an arc
Lots!

Word pair indicator
Pointwise Mutual Information for that word pair
Distance between words

No Part of Speech Feature
8
Score of Each Word Pair
regularly.
school
The
boy
skipped

The score of each word pair is based on the
features
Considering the word pair (skipped, regularly)
PMI (skipped, regularly) 0.27
dist(skipped, regularly) 2

9
Outline

Dependency parsing model
Large margin training
Training with local constraints
Laplacian regularization
Experimental results
Related work and conclusions

10
Large Margin Training

Minimizing a regularized loss (Hastie et at.,
2004)

i the index of the training sentences Ti the
target tree Li a candidate tree the
distance between the two trees
11
Large Margin Training

Equivalent to solving the quadratic program

McDonald 2005
12
However

Exponential number of constraints
Loss Ignoring the local errors of the parse tree
Over-fitting the training corpus
large number of bi-lexical features , need a good
smoothing (regularization) method

13
Outline

Dependency parsing model
Large margin training
Training with local constraints
Laplacian regularization
Experimental results
Related work and conclusions

14
Local Constraints (an example)
4
1
2
3
school
The
boy
skipped
regularly
5
6
score(The, boy) gt score(The,
skipped) 1 score(boy, skipped) gt
score(The, skipped) 1
score(skipped, school) gt score(school,
regularly) 1 score(skipped, regularly) gt
score(school, regularly) 1
5
1
2
5
3
6
4
6
15
Local Constraints
Convex!
16
Objective with Local Constraints

The corresponding new quadratic program

polynomial constraints!!
j number of constraints in A
17
Outline

Dependency parsing model
Large margin training
Training with local constraints
Laplacian regularization
Experimental results
Related work and conclusions

18
Distributional Word Similarity

Words that tend to appear in the same contexts
tend to have similar meanings (Harris, 1968)
Represent a word by a feature vector of contexts
Similarity of two words cosine similarity of
their vectors

Similarity between word pairs

19
Laplacian Regularization

Enforce the similar links (word pairs) to have
similar weights

L(S) D(S) S D(S) a diagonal matrix
S similarity matirx of word pairs L(S)
Laplacian matrix of S
20
Refined Large Margin Objective
21
(No Transcript)
22
Outline

Dependency parsing model
Large margin training
Training with local constraints
Laplacian regularization
Experimental results
Related work and conclusions

23
Experimental Setup

Chinese Treebank (CTB) (Xue et al., 2004), same
data split as Bikel (2004).
Training Sections 1-270
Development Sections 301-325
Testing Sections 271-300
Dependency trees
Converted from the constituency trees (Bikel,
2004)
CTB-10, CTB-15
Word similarities
Chinese Gigaword corpus

24
Experimental Details

For any unseen link
the weight is computed as the similarity weighted
average of similar links seen in the training
corpus.
Parsing accuracy

25
Experimental Results - 1
Accuracy Results on Dev Set()
26
Experimental Results - 2
Accuracy Results on Test Set()
27
Comparison with Other Work

The probabilistic approach on Chinese dependency
parsing (Wang et al. 2005)
61.04 (dev set)
76.31 (test set)
Our approach
65.71 (dev set) ?
68.27 (test set) ?

much simpler feature set
28
Outline

Dependency parsing model
Large margin training
Training with local constraints
Laplacian regularization
Experimental results
Related work and conclusions

29
Related Work

Large margin training for parsing
McDonald et al. 2005
Taskar et al. 2004, Tsochantaridis et al. 2004
Yamada and Matsumoto 2003
Maximize conditional likelihood (maximum entropy)
Charniak 2000
Ratnaparkhi 1999
Dependency parsing on Chinese
Wang et al. 2005 (Also purely bilexical)
Bikel and Chiang (2000)
Levy and Manning (2003)

30
Conclusions

Two contributions to the standard large margin
training approach
Applied the refined local constraints to the
large margin criterion
The parameters were smoothed according to word
similarities, via the Laplacian regularization
Extensions
Consider directed features, contextual features
Parse English and other languages

31
Thanks!
Questions?

Updated paper available online at
http//www.cs.ualberta.ca/wqin/

32
Lexicalized Dependency Parsing

Word based parameters
No POS tags or grammatical categories needed
Advantages
Making the annotation of Treebank easier
Beneficial for languages such as Chinese

33
Experimental Results - 3
Accuracy Results on Training Set()
34
Features for an arc
Lots!

Word pair features
PMI features
Distance features

35
Distributional Word Similarity

Words that tend to appear in the same contexts
tend to have similar meanings (Harris, 1968)
Represent a word w by a feature vector f of
contexts
P(w, c) the probability of w and c co-occur