Strictly Lexical Dependency Parsing - PowerPoint PPT Presentation

1 / 32

About This Presentation

Title:

Strictly Lexical Dependency Parsing

Description:

Only one word in a similar context of C may be different from a word in C. 1/1/10 ... First work for parsing without using part-of-speech tags ... – PowerPoint PPT presentation

Number of Views:53

Avg rating:3.0/5.0

Slides: 33

Provided by: wang6

Category:

more less

Transcript and Presenter's Notes

Title: Strictly Lexical Dependency Parsing

1
Strictly Lexical Dependency Parsing

Qin Iris Wang and Dale Schuurmans
University of Alberta
wqin,dale_at_cs.ualberta.ca

Dekang Lin Google, Inc. lindek_at_google.com
2
Lexical Statistics In Parsing

Widely used in previous statistical parsers, such
as
Collins (1996, 1997, 1999)
Chaniak (2000)
But, have been proved not very useful
Gildea (2001)
Bikel (2004)
Unlexicalized parsing
Klein and Manning (2003)

3
Strictly Lexicalized Parsing

A dependency parsing model
All the parameters are based on word statistics
No POS tags or grammatical categories needed
Advantages
Making the construction of Treebank easier
Especially beneficial for languages where POS
tags are not as clearly defined as English (such
as Chinese)

4
POS tags in Parsing

All previous parsers use a POS lexicon
Natural language data is sparse
Bikel (2004) found that of all the needed bi-gram
statistics, only 1.49 were observed in the
treebank
Part-of-speech tags
Words belonging to the same part-of-speech are
expected to have the same syntactic behavior

5
An Alternative Approach

Distributional word similarities
Words tend to have similar meanings if they tend
to appear in the same contexts
Soft clusters of words
Computed automatically
Not having been used in parsing before

6
Outline

A probabilistic dependency parsing model
Similarity-based smoothing
Experimental results
Related work and conclusions

7
An Example Dependency Tree

A dependency tree structure for the sentence
The kid skipped school regularly.

? 0
school 4
The 1
kid 2
skipped 3
regularly 5
8
Probabilistic Dependency Parsing

S an input sentence T a candidate
dependency tree
F(S) the set of possible dependency trees
spanning on S. The goal of parsing
The tree T is constructed in steps G1, G2 , GN.
(N words in the sentence)

9
Different Sequences of Steps May Lead to the Same
Dependency Tree
3
5
1
2
4
? 0
school 4
The 1
kid 2
skipped 3
regularly 5
4
5
1
2
3
? 0
school 4
The 1
kid 2
skipped 3
regularly 5
10
Canonical Order of the Links

Left to right
Bottom-up
Head outward
Right attaching first

11
Whats Involved in Each Step?

Each step involves four events, conditioned on
the context

Step (events)
Context
G1
? 0
regularly 5
The 1
school 4
kid 2
skipped 3
12
Whats Involved in Each Step?
Step (events)
Context

)
regularly 5
skipped 3
P (
skipped, regularly, C3R 1
)
regularly 5
skipped 3
P(
The number of modifiers already created
13
Whats Involved in Each Step?
Step (events)
Context
G4
14

Suppose Gi corresponds to a dependency link (u,
v, L).

Maximum Likelihood estimates ?

Sparse Data!
15
Outline

A probabilistic dependency parsing model
Similarity-based smoothing
Experimental results
Related work and conclusion

16
Similarity-based Smoothing
skipped, regularly, C3R 1
)
regularly 5
skipped 3
P(
17
Similarity-based Smoothing
skipped, regularly,
)
regularly 5
skipped 3
P(
S(regularly)
S(skipped)
skipping 0.229951skip 0.197991skips
0.169982sprinted 0.140535bounced
0.139547missed 0.134966cruised 0.133933scooted
0.13387jogged 0.133638wandered 0.132721
frequently 0.365862routinely 0.286178periodicall
y 0.273665often 0.24077constantly
0.234693occasionally 0.226324who
0.200348continuously 0.194026continually
0.177632repeatedly 0.177434
18
Similarity-based Smoothing
skipped, regularly,
)
regularly 5
skipped 3
P(
S(regularly)
S(skipped)
skipping 0.229951skip 0.197991skips
0.169982sprinted 0.140535bounced
0.139547missed 0.134966cruised 0.133933scooted
0.13387jogged 0.133638wandered 0.132721
frequently 0.365862routinely 0.286178periodicall
y 0.273665often 0.24077constantly
0.234693occasionally 0.226324who
0.200348continuously 0.194026continually
0.177632repeatedly 0.177434
19
Similarity-based Smoothing
skipped, regularly,
)
regularly 5
skipped 3
P(
skip frequently skip routinely skip
repeatedly bounced often bounced who bounced
repeatedly
Similar contexts
20
Similarity-based Smoothing
C similar contexts of C
C
)
regularly 5
skipped 3
PSIM (
(
C
)
regularly 5
skipped 3
An event is more likely to occur after context C
if it tends to occur after similar contexts of C
21
Similarity-based Smoothing

Finally,
P(E C) a PMLE(E C) (1 a)
PSIM(E C)

C is the frequency count of the corresponding
context C in the training data
22
(No Transcript)
23
Outline

A probabilistic dependency parsing model
Similarity-based smoothing
Experimental results
Related work and conclusion

24
Experimental Setup

Word similarities
Chinese Gigaword corpus
Chinese Treebank (CTB 3.0), same data split as
Bikel (2004)
Training Sections 1-270 and 400-931
Development Sections 301-325
Testing Sections 271-300
Dependency trees
Converted from the constituency trees (Bikel,
2004)

25
Experimental Results - 1
Evaluation Results on Chinese Treebank (CTB) 3.0

Performance highly correlated with the length of
the sentences

26
Comparison With an Unlexicalized Model

For the unlexicalized model, input to the parser
is the sequence of the POS tags

27
Comparison With an Strictly Lexicalized Joint
Model

where hi and mi are the head and the modifier of
the i'th dependency link.

In (Dagan et al., 1999)

28
Comparison With a Strictly Less Lexicalized
Conditional Model

Only one word in a similar context of C may be
different from a word in C

29
Experimental Results - 2
Performance of Alternative Models (sentence
length lt 40)
30
Outline

A probabilistic dependency parsing model
Similarity-based smoothing
Experimental results
Related work and conclusion

31
Related Work

Maximize the joint probability
Collins (1997)
Charniak (2000)
Maximize the conditional probability
Clark et al. (2002) CCG grammar
Ratnaparkhi (1999) maximize the prob. during
each step
Dependency parsing models
Yamada and Matsumoto (2002)
Eisner (1996)
MacDonald et al.(2005)
Klein and Manning (2004) the DMV model
Parsing Chinese with the Penn Chinese Treebank
Bikel and Chiang (2000)
Levy and Manning (2003)

32
Conclusions

First work for parsing without using
part-of-speech tags
Strictly lexicalized parser outperformed its
unlexicalized counterpart
Takes the advantage of the similarity-based
smoothing, which has not been successfully
applied to parsing before.

33
Questions?
Thanks!
34
Notation

/ no more modifiers on the left/right
of a word w
/ the current number of modifiers w
has taken on its left / right
(u, v, d) a dependency link with the direction
d. u and v are integers, denoting the indices of
the words (u lt v)
LinkR(u, v) a link from u to v
LinkL(u, v) a link from v to u

35
Assumptions

depends only on w and
LinkR(u, v) depends only on u, v, and
LinkL(u, v) depends only on u, v, and

Suppose the dependency link created in the step i
is
(u, v, d)
If d L, Gi is the conjunction of , ,
and LinkL (u, v).
If d R, Gi is the conjunction of , ,
and LinkR (u, v).

36
Feature Representation

Represent a word w by a feature vector
The features of w are the set of words occurred
within a small context window of w in a large
corpus
The value of a feature w is

where P(w, w) is the probability of w and w
co-occur in a context window
37
Similarity-based Smoothing