Strictly Lexical Dependency Parsing - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

Strictly Lexical Dependency Parsing

Description:

Only one word in a similar context of C may be different from a word in C. 1/1/10 ... First work for parsing without using part-of-speech tags ... – PowerPoint PPT presentation

Number of Views:53
Avg rating:3.0/5.0
Slides: 33
Provided by: wang6
Category:

less

Transcript and Presenter's Notes

Title: Strictly Lexical Dependency Parsing


1
Strictly Lexical Dependency Parsing
  • Qin Iris Wang and Dale Schuurmans
  • University of Alberta
  • wqin,dale_at_cs.ualberta.ca

Dekang Lin Google, Inc. lindek_at_google.com
2
Lexical Statistics In Parsing
  • Widely used in previous statistical parsers, such
    as
  • Collins (1996, 1997, 1999)
  • Chaniak (2000)
  • But, have been proved not very useful
  • Gildea (2001)
  • Bikel (2004)
  • Unlexicalized parsing
  • Klein and Manning (2003)

3
Strictly Lexicalized Parsing
  • A dependency parsing model
  • All the parameters are based on word statistics
  • No POS tags or grammatical categories needed
  • Advantages
  • Making the construction of Treebank easier
  • Especially beneficial for languages where POS
    tags are not as clearly defined as English (such
    as Chinese)

4
POS tags in Parsing
  • All previous parsers use a POS lexicon
  • Natural language data is sparse
  • Bikel (2004) found that of all the needed bi-gram
    statistics, only 1.49 were observed in the
    treebank
  • Part-of-speech tags
  • Words belonging to the same part-of-speech are
    expected to have the same syntactic behavior

5
An Alternative Approach
  • Distributional word similarities
  • Words tend to have similar meanings if they tend
    to appear in the same contexts
  • Soft clusters of words
  • Computed automatically
  • Not having been used in parsing before

6
Outline
  • A probabilistic dependency parsing model
  • Similarity-based smoothing
  • Experimental results
  • Related work and conclusions

7
An Example Dependency Tree
  • A dependency tree structure for the sentence
  • The kid skipped school regularly.

? 0
school 4
The 1
kid 2
skipped 3
regularly 5
8
Probabilistic Dependency Parsing
  • S an input sentence T a candidate
    dependency tree
  • F(S) the set of possible dependency trees
    spanning on S. The goal of parsing
  • The tree T is constructed in steps G1, G2 , GN.
    (N words in the sentence)

9
Different Sequences of Steps May Lead to the Same
Dependency Tree
3
5
1
2
4
? 0
school 4
The 1
kid 2
skipped 3
regularly 5
4
5
1
2
3
? 0
school 4
The 1
kid 2
skipped 3
regularly 5
10
Canonical Order of the Links
  • Left to right
  • Bottom-up
  • Head outward
  • Right attaching first

11
Whats Involved in Each Step?
  • Each step involves four events, conditioned on
    the context

Step (events)
Context
G1
? 0
regularly 5
The 1
school 4
kid 2
skipped 3
12
Whats Involved in Each Step?
Step (events)
Context

)
regularly 5
skipped 3
P (
skipped, regularly, C3R 1
)
regularly 5
skipped 3
P(
The number of modifiers already created
13
Whats Involved in Each Step?
Step (events)
Context
G4
14
  • Suppose Gi corresponds to a dependency link (u,
    v, L).
  • Maximum Likelihood estimates ?

Sparse Data!
15
Outline
  • A probabilistic dependency parsing model
  • Similarity-based smoothing
  • Experimental results
  • Related work and conclusion

16
Similarity-based Smoothing
skipped, regularly, C3R 1
)
regularly 5
skipped 3
P(
17
Similarity-based Smoothing
skipped, regularly,
)
regularly 5
skipped 3
P(
S(regularly)
S(skipped)
skipping 0.229951skip 0.197991skips
0.169982sprinted 0.140535bounced
0.139547missed 0.134966cruised 0.133933scooted
0.13387jogged 0.133638wandered 0.132721
frequently 0.365862routinely 0.286178periodicall
y 0.273665often 0.24077constantly
0.234693occasionally 0.226324who
0.200348continuously 0.194026continually
0.177632repeatedly 0.177434
18
Similarity-based Smoothing
skipped, regularly,
)
regularly 5
skipped 3
P(
S(regularly)
S(skipped)
skipping 0.229951skip 0.197991skips
0.169982sprinted 0.140535bounced
0.139547missed 0.134966cruised 0.133933scooted
0.13387jogged 0.133638wandered 0.132721
frequently 0.365862routinely 0.286178periodicall
y 0.273665often 0.24077constantly
0.234693occasionally 0.226324who
0.200348continuously 0.194026continually
0.177632repeatedly 0.177434
19
Similarity-based Smoothing
skipped, regularly,
)
regularly 5
skipped 3
P(
skip frequently skip routinely skip
repeatedly bounced often bounced who bounced
repeatedly
Similar contexts
20
Similarity-based Smoothing
C similar contexts of C
C
)
regularly 5
skipped 3
PSIM (
(
C
)
regularly 5
skipped 3
An event is more likely to occur after context C
if it tends to occur after similar contexts of C
21
Similarity-based Smoothing
  • Finally,
  • P(E C) a PMLE(E C) (1 a)
    PSIM(E C)

C is the frequency count of the corresponding
context C in the training data
22
(No Transcript)
23
Outline
  • A probabilistic dependency parsing model
  • Similarity-based smoothing
  • Experimental results
  • Related work and conclusion

24
Experimental Setup
  • Word similarities
  • Chinese Gigaword corpus
  • Chinese Treebank (CTB 3.0), same data split as
    Bikel (2004)
  • Training Sections 1-270 and 400-931
  • Development Sections 301-325
  • Testing Sections 271-300
  • Dependency trees
  • Converted from the constituency trees (Bikel,
    2004)

25
Experimental Results - 1
Evaluation Results on Chinese Treebank (CTB) 3.0
  • Performance highly correlated with the length of
    the sentences

26
Comparison With an Unlexicalized Model
  • For the unlexicalized model, input to the parser
    is the sequence of the POS tags

27
Comparison With an Strictly Lexicalized Joint
Model
  • where hi and mi are the head and the modifier of
    the i'th dependency link.
  • In (Dagan et al., 1999)

28
Comparison With a Strictly Less Lexicalized
Conditional Model
  • Only one word in a similar context of C may be
    different from a word in C

29
Experimental Results - 2
Performance of Alternative Models (sentence
length lt 40)
30
Outline
  • A probabilistic dependency parsing model
  • Similarity-based smoothing
  • Experimental results
  • Related work and conclusion

31
Related Work
  • Maximize the joint probability
  • Collins (1997)
  • Charniak (2000)
  • Maximize the conditional probability
  • Clark et al. (2002) CCG grammar
  • Ratnaparkhi (1999) maximize the prob. during
    each step
  • Dependency parsing models
  • Yamada and Matsumoto (2002)
  • Eisner (1996)
  • MacDonald et al.(2005)
  • Klein and Manning (2004) the DMV model
  • Parsing Chinese with the Penn Chinese Treebank
  • Bikel and Chiang (2000)
  • Levy and Manning (2003)

32
Conclusions
  • First work for parsing without using
    part-of-speech tags
  • Strictly lexicalized parser outperformed its
    unlexicalized counterpart
  • Takes the advantage of the similarity-based
    smoothing, which has not been successfully
    applied to parsing before.

33
Questions?
Thanks!
34
Notation
  • / no more modifiers on the left/right
    of a word w
  • / the current number of modifiers w
    has taken on its left / right
  • (u, v, d) a dependency link with the direction
    d. u and v are integers, denoting the indices of
    the words (u lt v)
  • LinkR(u, v) a link from u to v
  • LinkL(u, v) a link from v to u

35
Assumptions
  • depends only on w and
  • LinkR(u, v) depends only on u, v, and
  • LinkL(u, v) depends only on u, v, and
  • Suppose the dependency link created in the step i
    is
  • (u, v, d)
  • If d L, Gi is the conjunction of , ,
    and LinkL (u, v).
  • If d R, Gi is the conjunction of , ,
    and LinkR (u, v).

36
Feature Representation
  • Represent a word w by a feature vector
  • The features of w are the set of words occurred
    within a small context window of w in a large
    corpus
  • The value of a feature w is

where P(w, w) is the probability of w and w
co-occur in a context window
37
Similarity-based Smoothing
  • In (Dagan et al., 1999)
  • The parameters in our model consist of
    conditional probabilities P(EC) where E is the
    binary variable linkd(u, v) or and the
    context C is either or

38
Similarity-based Smoothing
  • In our model, the similar contexts are defined
    as
  • We compute the similarity between two contexts
    as
Write a Comment
User Comments (0)
About PowerShow.com