Large Margin Dependency Parsing Local Constraints and Laplacian Regularization - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

Large Margin Dependency Parsing Local Constraints and Laplacian Regularization

Description:

Large Margin Dependency Parsing. Local Constraints and Laplacian Regularization. Qin Iris Wang Colin Chery. Dan Lizotte Dale Schuurmans. University of Alberta ... – PowerPoint PPT presentation

Number of Views:58
Avg rating:3.0/5.0
Slides: 31
Provided by: wang6
Category:

less

Transcript and Presenter's Notes

Title: Large Margin Dependency Parsing Local Constraints and Laplacian Regularization


1
Large Margin Dependency Parsing Local
Constraints and Laplacian Regularization
  • Qin Iris Wang
    Colin Chery
  • Dan Lizotte
    Dale Schuurmans
  • University of Alberta
  • wqin, colinc, dlizotte, dale_at_cs.ualberta.ca

2
Large Margin Training in Parsing
  • Discriminative training in parsing
  • Taskar et al. 2004, Tsochantaridis et al. 2004,
    McDonald et at. 2005a
  • State of the art performance in dependency
    parsing
  • McDonald et at. 2005a, 2005b, 2006
  • But, they didnt consider
  • The error of any particular component in a tree
  • Global loss of the whole parse tree
  • Smoothing methods

3
Our Contributions
  • Two ideas for improving large margin training
  • Using local constraints to capture local errors
    in a parse tree
  • Using Laplacian regularization (based on
    distributional word similarity) to deal with data
    sparseness

4
Outline
  • Dependency parsing model
  • Large margin training
  • Training with local constraints
  • Laplacian regularization
  • Experimental results
  • Related work and conclusions

5
Dependency Tree
  • A dependency tree structure for the sentence
  • Syntactic relationships between word pairs in a
    sentence

6
Dependency Parsing Model
  • W (w1, , wn ) an input sentence
  • T a candidate dependency tree
  • the set of possible dependency trees
    spanning W.

Eisner 1996 McDonald 2005
7
Features for an arc
Lots!
  • Word pair indicator
  • Pointwise Mutual Information for that word pair
  • Distance between words

No Part of Speech Feature
8
Score of Each Word Pair
regularly.
school
The
boy
skipped
  • The score of each word pair is based on the
    features
  • Considering the word pair (skipped, regularly)
  • PMI (skipped, regularly) 0.27
  • dist(skipped, regularly) 2

9
Outline
  • Dependency parsing model
  • Large margin training
  • Training with local constraints
  • Laplacian regularization
  • Experimental results
  • Related work and conclusions

10
Large Margin Training
  • Minimizing a regularized loss (Hastie et at.,
    2004)

i the index of the training sentences Ti the
target tree Li a candidate tree the
distance between the two trees
11
Large Margin Training
  • Equivalent to solving the quadratic program

McDonald 2005
12
However
  • Exponential number of constraints
  • Loss Ignoring the local errors of the parse tree
  • Over-fitting the training corpus
  • large number of bi-lexical features , need a good
    smoothing (regularization) method

13
Outline
  • Dependency parsing model
  • Large margin training
  • Training with local constraints
  • Laplacian regularization
  • Experimental results
  • Related work and conclusions

14
Local Constraints (an example)
4
1
2
3
school
The
boy
skipped
regularly
5
6
score(The, boy) gt score(The,
skipped) 1 score(boy, skipped) gt
score(The, skipped) 1
score(skipped, school) gt score(school,
regularly) 1 score(skipped, regularly) gt
score(school, regularly) 1
5
1
2
5
3
6
4
6
15
Local Constraints
Convex!
16
Objective with Local Constraints
  • The corresponding new quadratic program

polynomial constraints!!
j number of constraints in A
17
Outline
  • Dependency parsing model
  • Large margin training
  • Training with local constraints
  • Laplacian regularization
  • Experimental results
  • Related work and conclusions

18
Distributional Word Similarity
  • Words that tend to appear in the same contexts
    tend to have similar meanings (Harris, 1968)
  • Represent a word by a feature vector of contexts
  • Similarity of two words cosine similarity of
    their vectors
  • Similarity between word pairs

19
Laplacian Regularization
  • Enforce the similar links (word pairs) to have
    similar weights

L(S) D(S) S D(S) a diagonal matrix
S similarity matirx of word pairs L(S)
Laplacian matrix of S
20
Refined Large Margin Objective
21
(No Transcript)
22
Outline
  • Dependency parsing model
  • Large margin training
  • Training with local constraints
  • Laplacian regularization
  • Experimental results
  • Related work and conclusions

23
Experimental Setup
  • Chinese Treebank (CTB) (Xue et al., 2004), same
    data split as Bikel (2004).
  • Training Sections 1-270
  • Development Sections 301-325
  • Testing Sections 271-300
  • Dependency trees
  • Converted from the constituency trees (Bikel,
    2004)
  • CTB-10, CTB-15
  • Word similarities
  • Chinese Gigaword corpus

24
Experimental Details
  • For any unseen link
  • the weight is computed as the similarity weighted
    average of similar links seen in the training
    corpus.
  • Parsing accuracy

25
Experimental Results - 1
Accuracy Results on Dev Set()
26
Experimental Results - 2
Accuracy Results on Test Set()
27
Comparison with Other Work
  • The probabilistic approach on Chinese dependency
    parsing (Wang et al. 2005)
  • 61.04 (dev set)
  • 76.31 (test set)
  • Our approach
  • 65.71 (dev set) ?
  • 68.27 (test set) ?

much simpler feature set
28
Outline
  • Dependency parsing model
  • Large margin training
  • Training with local constraints
  • Laplacian regularization
  • Experimental results
  • Related work and conclusions

29
Related Work
  • Large margin training for parsing
  • McDonald et al. 2005
  • Taskar et al. 2004, Tsochantaridis et al. 2004
  • Yamada and Matsumoto 2003
  • Maximize conditional likelihood (maximum entropy)
  • Charniak 2000
  • Ratnaparkhi 1999
  • Dependency parsing on Chinese
  • Wang et al. 2005 (Also purely bilexical)
  • Bikel and Chiang (2000)
  • Levy and Manning (2003)

30
Conclusions
  • Two contributions to the standard large margin
    training approach
  • Applied the refined local constraints to the
    large margin criterion
  • The parameters were smoothed according to word
    similarities, via the Laplacian regularization
  • Extensions
  • Consider directed features, contextual features
  • Parse English and other languages

31
Thanks!
Questions?
  • Updated paper available online at
    http//www.cs.ualberta.ca/wqin/

32
Lexicalized Dependency Parsing
  • Word based parameters
  • No POS tags or grammatical categories needed
  • Advantages
  • Making the annotation of Treebank easier
  • Beneficial for languages such as Chinese

33
Experimental Results - 3
Accuracy Results on Training Set()
34
Features for an arc
Lots!
  • Word pair features
  • PMI features
  • Distance features

35
Distributional Word Similarity
  • Words that tend to appear in the same contexts
    tend to have similar meanings (Harris, 1968)
  • Represent a word w by a feature vector f of
    contexts
  • P(w, c) the probability of w and c co-occur
  • Similarity measure cosine
Write a Comment
User Comments (0)
About PowerShow.com