Title: Dependency Parsing Parsing Algorithms
1Dependency ParsingParsing Algorithms
- Prashanth Mannem
- LTRC, IIIT-Hyd
- prashanth_at_research.iiit.ac.in
2Outline
- Introduction
- Phrase Structure Grammar
- Dependency Grammar
- Comparison and Conversion
- Dependency Parsing
- Formal definition
- Parsing Algorithms
- Introduction
- Dynamic programming
- Constraint satisfaction
- Deterministic search
3Introduction
- The syntactic parsing of a sentence consists of
finding the correct syntactic structure of that
sentence in a given formalism/grammar. - Dependency Grammar (DG) and Phrase Structure
Grammar (PSG) are two such formalisms.
4Phrase Structure Grammar (PSG)
- Breaks sentence into constituents (phrases)
- Which are then broken into smaller constituents
- Describes phrase structure, clause structure
- E.g.. NP, PP, VP etc..
- Structures often recursive
- The clever tall blue-eyed old man
5Phrase Structure Tree Example
6Dependency Grammar
- Syntactic structure consists of lexical items,
linked by binary asymmetric relations called
dependencies - Interested in grammatical relations between
individual words (governing dependent words) - Does not propose a recursive structure
- Rather a network of relations
- These relations can also have labels
7Dependency Tree Example
8Dependency Tree Example
- Phrasal nodes are missing in the dependency
structure when compared to constituency
structure.
9Dependency Tree with Labels
10Comparison
- Dependency structures explicitly represent
- Head-dependent relations (directed arcs)
- Functional categories (arc labels)
- Possibly some structural categories
(parts-of-speech) - Phrase structure explicitly represent
- Phrases (non-terminal nodes)
- Structural categories (non-terminal labels)
- Possibly some functional categories (grammatical
functions)
11Conversion PSG to DG
- Head of a phrase governs/dominates all the
siblings - Heads are calculated using heuristics
- Dependency relations are established between the
head of each phrase as the parent and its
siblings as children. - The tree thus formed is the unlabeled dependency
tree
12Phrase Structure Tree with Heads
13Dependency Tree
14Conversion DG to PSG
- Each head together with its dependents (and their
dependents) forms a constituent of the sentence - Difficult to assign the structural categories
(NP, VP, S, PP etc) to these derived
constituents - Every projective dependency grammar has a
strongly equivalent context-free grammar, but not
vice versa Gaifman 1965
15Learning DG over PSG
- Dependency Parsing is more straightforward
- Parsing can be reduced to labeling each token wi
with wj - Direct encoding of predicate-argument structure
- Fragments are directly interpretable
- Dependency structure independent of word order
- Suitable for free word order languages (like
Indian languages)
16Outline
- Introduction
- Phrase Structure Grammar
- Dependency Grammar
- Comparison and Conversion
- Dependency Parsing
- Formal definition
- Parsing Algorithms
- Introduction
- Dynamic programming
- Constraint satisfaction
- Deterministic search
17Dependency Tree
- Formal definition
- An input word sequence w1wn
- Dependency graph D (W,E) where
- W is the set of nodes i.e. word tokens in the
input seq. - E is the set of unlabeled tree edges (wi, wj)
(wi, wj ? W). - (wi, wj) indicates an edge from wi (parent) to wj
(child). - Task of mapping an input string to a dependency
graph satisfying certain conditions is called
dependency parsing
18Well-formedness
- A dependency graph is well-formed iff
- Single head Each word has only one head.
- Acyclic The graph should be acyclic.
- Connected The graph should be a single tree with
all the words in the sentence. - Projective If word A depends on word B, then all
words between A and B are also subordinate to B
(i.e. dominated by B).
19Non-projective dependency tree
Ram saw a dog
yesterday which was a
Yorkshire Terrier
Crossing lines
English has very few non-projective cases.
20Outline
- Introduction
- Phrase Structure Grammar
- Dependency Grammar
- Comparison and Conversion
- Dependency Parsing
- Formal definition
- Parsing Algorithms
- Introduction
- Dynamic programming
- Constraint satisfaction
- Deterministic search
21Dependency Parsing
- Dependency based parsers can be broadly
categorized into - Grammar driven approaches
- Parsing done using grammars.
- Data driven approaches
- Parsing by training on annotated/un-annotated
data.
22Dependency Parsing
- Dependency based parsers can be broadly
categorized into - Grammar driven approaches
- Parsing done using grammars.
- Data driven approaches
- Parsing by training on annotated/un-annotated
data. - These approaches are not mutually exclusive
23Parsing Methods
- Three main traditions
- Dynamic programming
- CYK, Eisner, McDonald
- Constraint satisfaction
- Maruyama, Foth et al., Duchier
- Deterministic search
- Covington, Yamada and Matsumuto, Nivre
24Dynamic Programming
- Basic Idea Treat dependencies as constituents.
- Use, e.g. , CYK parser (with minor modifications)
25Dependency Chart Parsing
- Grammar is regarded as context-free, in which
each node is lexicalized - Chart entries are subtrees, i.e., words with all
their left and right dependents - Problem Different entries for different subtrees
spanning a sequence of words with different heads - Time requirement O(n5)
26Generic Chart Parsing
Slide from Eisner, 1997
- for each of the O(n2) substrings,
- for each of O(n) ways of splitting it,
- for each of S analyses of first half
- for each of S analyses of second half,
- for each of c ways of combining them
- combine, add result to chart if best
O(n3S2c)
cap spending at 300 million cap
spending at 300 million
S analyses
S analyses
cS2 analyses of which we keep S
27Headed constituents ...
Slide from Eisner, 1997
- ... have too many signatures.
- How bad is Q(n3S2c)?
- For unheaded constituents, S is constant NP,
VP ... - (similarly for dotted trees). So Q(n3).
- But when different heads Þ different signatures,
- the average substring has Q(n) possible headsand
SQ(n) possible signatures. So Q(n5).
28Dynamic Programming Approaches
- Original version Hays 1964 (grammar driven)
- Link grammar Sleator and Temperley 1991
(grammar driven) - Bilexical grammar Eisner 1996 (data driven)
- Maximum spanning tree McDonald 2006 (data
driven)
29Eisner 1996
- Two novel aspects
- Modified parsing algorithm
- Probabilistic dependency parsing
- Time requirement O(n3)
- Modification Instead of storing subtrees, store
spans - Span Substring such that no interior word links
to any word outside the span. - Idea In a span, only the boundary words are
active, i.e. still need a head or a child - One or both of the boundary words can be active
30Example
Red
figures
on
the
screen
indicated
falling
stocks
_ROOT_
31Example
Red
figures
on
the
screen
indicated
falling
stocks
_ROOT_
Red
figures
indicated
falling
stocks
32Assembly of correct parse
Red
figures
on
the
screen
indicated
falling
stocks
_ROOT_
Start by combining adjacent words to minimal spans
Red
figures
figures
on
on
the
33Assembly of correct parse
Red
figures
on
the
screen
indicated
falling
stocks
_ROOT_
Combine spans which overlap in one word this
word must be governed by a word in the left or
right span.
?
on
the
the
screen
on
the
screen
34Assembly of correct parse
Red
figures
on
the
screen
indicated
falling
stocks
_ROOT_
Combine spans which overlap in one word this
word must be governed by a word in the left or
right span.
?
figures
on
on
the
screen
figures
on
the
screen
35Assembly of correct parse
Red
figures
on
the
screen
indicated
falling
stocks
_ROOT_
Combine spans which overlap in one word this
word must be governed by a word in the left or
right span. Invalid span
Red
figures
on
the
screen
36Assembly of correct parse
Red
figures
on
the
screen
indicated
falling
stocks
_ROOT_
Combine spans which overlap in one word this
word must be governed by a word in the left or
right span.
?
indicated
falling
indicated
falling
stocks
falling
stocks
37Eisners Model
- Recursive Generation
- Each word generates its actual dependents
- Two Markov chains
- Left dependents
- Right dependents
38Eisners Model
- where
- tw(i) is ith tagged word
- lc(i) rc(i) are the left and right children of
ith word
where lcj(i) is the jth left child of the ith
word t(lcj-1(i)) is the tag of the preceding
left child
39McDonalds Maximum Spanning Trees
- Score of a dependency tree sum of scores of
dependencies - Scores are independent of other dependencies
- If scores are available, parsing can be
formulated as maximum spanning tree problem - Two cases
- Projective Use Eisners parsing algorithm.
- Non-projective Use Chu-Liu-Edmonds algorithm
Chu and Liu 1965, Edmonds 1967 - Uses online learning for determining weight
vector w
40Parsing Methods
- Three main traditions
- Dynamic programming
- CYK, Eisner, McDonald
- Constraint satisfaction
- Maruyama, Foth et al., Duchier
- Deterministic parsing
- Covington, Yamada and Matsumuto, Nivre
41Constraint Satisfaction
- Uses Constraint Dependency Grammar
- Grammar consists of a set of boolean constraints,
i.e. logical formulas that describe well-formed
trees - A constraint is a logical formula with variables
that range over a set of predefined values - Parsing is defined as a constraint satisfaction
problem - Constraint satisfaction removes values that
contradict constraints
42Constraint Satisfaction
- Parsing is an eliminative process rather than a
constructive one such as in CFG parsing - Constraint satisfaction in general is NP complete
- Parser design must ensure practical efficiency
- Different approaches
- Constraint propagation techniques which ensure
local consistency Maruyama 1990 - Weighted CDG Foth et al. 2000, Menzel and
Schroder 1998
43Maruyamas Constraint Propagation
- Three steps
- 1) Form initial constraint network using a core
grammar - 2) Remove local inconsistencies
- 3) If ambiguity remains, add new constraints and
repeat step 2
44Weighted Constraint Parsing
- Robust parser, which uses soft constraints
- Each constraint is assigned a weight between 0.0
and 1.0 - Weight 0.0 hard constraint, can only be violated
when no other parse is possible - Constraints assigned manually (or estimated from
treebank) - Efficiency uses a heuristic transformation-based
constraint resolution method
45Transformation-Based Constraint Resolution
- Heuristic search
- Very efficient
- Idea first construct arbitrary dependency
structure, then try to correct errors - Error correction by transformations
- Selection of transformations based on constraints
that cause conflicts - Anytime property parser maintains a complete
analysis at anytime ? can be stopped at any time
and return a complete analysis
46Parsing Methods
- Three main traditions
- Dynamic programming
- CYK, Eisner, McDonald
- Constraint satisfaction
- Maruyama, Foth et al., Duchier
- Deterministic parsing
- Covington, Yamada and Matsumuto, Nivre
47Deterministic Parsing
- Basic idea
- Derive a single syntactic representation
(dependency graph) through a deterministic
sequence of elementary parsing actions - Sometimes combined with backtracking or repair
- Motivation
- Psycholinguistic modeling
- Efficiency
- Simplicity
48Covingtons Incremental Algorithm
- Deterministic incremental parsing in O(n2) time
by trying to link each new word to each preceding
one Covington 2001 -
- PARSE(x (w1, . . . ,wn))
- 1. for i 1 up to n
- 2. for j i - 1 down to 1
- 3. LINK(wi , wj )
- Different conditions, such as Single-Head and
Projectivity, can be incorporated into the LINK
operation.
49Shift-Reduce Type Algorithms
- Data structures
- Stack . . . ,wi S of partially processed tokens
- Queue wj , . . .Q of remaining input tokens
- Parsing actions built from atomic actions
- Adding arcs (wi ? wj , wi ? wj )
- Stack and queue operations
- Left-to-right parsing
- Restricted to projective dependency graphs
50Yamadas Algorithm
- Three parsing actions
- Shift . . .S wi , . . .Q
- . . . , wi S . . .Q
- Left . . . , wi , wj S . . .Q
- . . . , wi S . . .Q wi ? wj
- Right . . . , wi , wj S . . .Q
- . . . , wj S . . .Q wi ? wj
- Multiple passes over the input give time
complexity O(n2)
51Yamada and Matsumoto
- Parsing in several rounds deterministic
bottom-up O(n2) - Looks at pairs of words
- 3 actions shift, left, right
- Shift shifts focus to next word pair
52Yamada and Matsumoto
- Left decides that the left word depends on the
right one
- Right decides that the right word depends on the
left word
53Parsing Algorithm
- Go through each pair of words
- Decide which action to take
- If a relation was detected in a pass, do another
pass - E.g. the little girl
- First pass relation between little and girl
- Second pass relation between the and girl
- Decision on action depends on word pair and
context
54Nivres Algorithm
- Four parsing actions
- Shift . . .S wi , . . .Q
- . . . , wi S . . .Q
- Reduce . . . , wi S . . .Q ?wk wk ? wi
- . . .S . . .Q
- Left-Arc . . . , wi S wj , . . .Q ?wk wk
? wi - . . .S wj , . . .Q wi ? wj
- Right-Arc . . . ,wi S wj , . . .Q ?wk wk
? wj - . . . , wi , wj S . . .Q wi ? wj
55Nivres Algorithm
- Characteristics
- Arc-eager processing of right-dependents
- Single pass over the input gives time worst case
complexity O(2n)
56Example
Red
figures
on
the
screen
indicated
falling
stocks
_ROOT_
S
Q
57Example
Red
figures
on
the
screen
indicated
falling
stocks
_ROOT_
S
Q
Shift
58Example
Red
figures
on
the
screen
indicated
falling
stocks
_ROOT_
S
Q
Left-arc
59Example
Red
figures
on
the
screen
indicated
falling
stocks
_ROOT_
S
Q
Shift
60Example
Red
figures
on
the
screen
indicated
falling
stocks
_ROOT_
S
Q
Right-arc
61Example
Red
figures
on
the
screen
indicated
falling
stocks
_ROOT_
S
Q
Shift
62Example
Red
figures
on
the
screen
indicated
falling
stocks
_ROOT_
S
Q
Left-arc
63Example
Red
figures
on
the
screen
indicated
falling
stocks
_ROOT_
S
Q
Right-arc
64Example
Red
figures
on
the
screen
indicated
falling
stocks
_ROOT_
S
Q
Reduce
65Example
Red
figures
on
the
screen
indicated
falling
stocks
_ROOT_
S
Q
Reduce
66Example
Red
figures
on
the
screen
indicated
falling
stocks
_ROOT_
S
Q
Left-arc
67Example
Red
figures
on
the
screen
indicated
falling
stocks
_ROOT_
S
Q
Right-arc
68Example
Red
figures
on
the
screen
indicated
falling
stocks
_ROOT_
S
Q
Shift
69Example
Red
figures
on
the
screen
indicated
falling
stocks
_ROOT_
S
Q
Left-arc
70Example
Red
figures
on
the
screen
indicated
falling
stocks
_ROOT_
S
Q
Right-arc
71Example
Red
figures
on
the
screen
indicated
falling
stocks
_ROOT_
S
Q
Reduce
72Example
Red
figures
on
the
screen
indicated
falling
stocks
_ROOT_
S
Q
Reduce
73Classifier-Based Parsing
- Data-driven deterministic parsing
- Deterministic parsing requires an oracle.
- An oracle can be approximated by a classifier.
- A classifier can be trained using treebank data.
- Learning algorithms
- Support vector machines (SVM) Kudo and Matsumoto
2002, Yamada and Matsumoto 2003,Isozaki et al.
2004, Cheng et al. 2004, Nivre et al. 2006 - Memory-based learning (MBL) Nivre et al. 2004,
Nivre and Scholz 2004 - Maximum entropy modeling (MaxEnt) Cheng et al.
2005
74Feature Models
- Learning problem
- Approximate a function from parser states,
represented by feature vectors to parser actions,
given a training set of gold standard
derivations. - Typical features
- Tokens and POS tags of
- Target words
- Linear context (neighbors in S and Q)
- Structural context (parents, children, siblings
in G) - Can not be used in dynamic programming algorithms.
75Summary
- Provided an intro to dependency parsing and
various dependency parsing algorithms - Read up Nivres and McDonalds tutorial on
dependency parsing at ESSLLI 07