Dependency Parsing Parsing Algorithms - PowerPoint PPT Presentation

1 / 75
About This Presentation
Title:

Dependency Parsing Parsing Algorithms

Description:

The syntactic parsing of a sentence consists of finding the correct syntactic ... Ram saw a dog yesterday which was a Yorkshire Terrier * Crossing lines ... – PowerPoint PPT presentation

Number of Views:750
Avg rating:3.0/5.0
Slides: 76
Provided by: ltrc6
Category:

less

Transcript and Presenter's Notes

Title: Dependency Parsing Parsing Algorithms


1
Dependency ParsingParsing Algorithms
  • Prashanth Mannem
  • LTRC, IIIT-Hyd
  • prashanth_at_research.iiit.ac.in

2
Outline
  • Introduction
  • Phrase Structure Grammar
  • Dependency Grammar
  • Comparison and Conversion
  • Dependency Parsing
  • Formal definition
  • Parsing Algorithms
  • Introduction
  • Dynamic programming
  • Constraint satisfaction
  • Deterministic search

3
Introduction
  • The syntactic parsing of a sentence consists of
    finding the correct syntactic structure of that
    sentence in a given formalism/grammar.
  • Dependency Grammar (DG) and Phrase Structure
    Grammar (PSG) are two such formalisms.

4
Phrase Structure Grammar (PSG)
  • Breaks sentence into constituents (phrases)
  • Which are then broken into smaller constituents
  • Describes phrase structure, clause structure
  • E.g.. NP, PP, VP etc..
  • Structures often recursive
  • The clever tall blue-eyed old man

5
Phrase Structure Tree Example
6
Dependency Grammar
  • Syntactic structure consists of lexical items,
    linked by binary asymmetric relations called
    dependencies
  • Interested in grammatical relations between
    individual words (governing dependent words)
  • Does not propose a recursive structure
  • Rather a network of relations
  • These relations can also have labels

7
Dependency Tree Example
8
Dependency Tree Example
  • Phrasal nodes are missing in the dependency
    structure when compared to constituency
    structure.

9
Dependency Tree with Labels
10
Comparison
  • Dependency structures explicitly represent
  • Head-dependent relations (directed arcs)
  • Functional categories (arc labels)
  • Possibly some structural categories
    (parts-of-speech)
  • Phrase structure explicitly represent
  • Phrases (non-terminal nodes)
  • Structural categories (non-terminal labels)
  • Possibly some functional categories (grammatical
    functions)

11
Conversion PSG to DG
  • Head of a phrase governs/dominates all the
    siblings
  • Heads are calculated using heuristics
  • Dependency relations are established between the
    head of each phrase as the parent and its
    siblings as children.
  • The tree thus formed is the unlabeled dependency
    tree

12
Phrase Structure Tree with Heads
13
Dependency Tree
14
Conversion DG to PSG
  • Each head together with its dependents (and their
    dependents) forms a constituent of the sentence
  • Difficult to assign the structural categories
    (NP, VP, S, PP etc) to these derived
    constituents
  • Every projective dependency grammar has a
    strongly equivalent context-free grammar, but not
    vice versa Gaifman 1965

15
Learning DG over PSG
  • Dependency Parsing is more straightforward
  • Parsing can be reduced to labeling each token wi
    with wj
  • Direct encoding of predicate-argument structure
  • Fragments are directly interpretable
  • Dependency structure independent of word order
  • Suitable for free word order languages (like
    Indian languages)

16
Outline
  • Introduction
  • Phrase Structure Grammar
  • Dependency Grammar
  • Comparison and Conversion
  • Dependency Parsing
  • Formal definition
  • Parsing Algorithms
  • Introduction
  • Dynamic programming
  • Constraint satisfaction
  • Deterministic search

17
Dependency Tree
  • Formal definition
  • An input word sequence w1wn
  • Dependency graph D (W,E) where
  • W is the set of nodes i.e. word tokens in the
    input seq.
  • E is the set of unlabeled tree edges (wi, wj)
    (wi, wj ? W).
  • (wi, wj) indicates an edge from wi (parent) to wj
    (child).
  • Task of mapping an input string to a dependency
    graph satisfying certain conditions is called
    dependency parsing

18
Well-formedness
  • A dependency graph is well-formed iff
  • Single head Each word has only one head.
  • Acyclic The graph should be acyclic.
  • Connected The graph should be a single tree with
    all the words in the sentence.
  • Projective If word A depends on word B, then all
    words between A and B are also subordinate to B
    (i.e. dominated by B).

19
Non-projective dependency tree
Ram saw a dog
yesterday which was a
Yorkshire Terrier
Crossing lines
English has very few non-projective cases.
20
Outline
  • Introduction
  • Phrase Structure Grammar
  • Dependency Grammar
  • Comparison and Conversion
  • Dependency Parsing
  • Formal definition
  • Parsing Algorithms
  • Introduction
  • Dynamic programming
  • Constraint satisfaction
  • Deterministic search

21
Dependency Parsing
  • Dependency based parsers can be broadly
    categorized into
  • Grammar driven approaches
  • Parsing done using grammars.
  • Data driven approaches
  • Parsing by training on annotated/un-annotated
    data.

22
Dependency Parsing
  • Dependency based parsers can be broadly
    categorized into
  • Grammar driven approaches
  • Parsing done using grammars.
  • Data driven approaches
  • Parsing by training on annotated/un-annotated
    data.
  • These approaches are not mutually exclusive

23
Parsing Methods
  • Three main traditions
  • Dynamic programming
  • CYK, Eisner, McDonald
  • Constraint satisfaction
  • Maruyama, Foth et al., Duchier
  • Deterministic search
  • Covington, Yamada and Matsumuto, Nivre

24
Dynamic Programming
  • Basic Idea Treat dependencies as constituents.
  • Use, e.g. , CYK parser (with minor modifications)

25
Dependency Chart Parsing
  • Grammar is regarded as context-free, in which
    each node is lexicalized
  • Chart entries are subtrees, i.e., words with all
    their left and right dependents
  • Problem Different entries for different subtrees
    spanning a sequence of words with different heads
  • Time requirement O(n5)

26
Generic Chart Parsing
Slide from Eisner, 1997
  • for each of the O(n2) substrings,
  • for each of O(n) ways of splitting it,
  • for each of S analyses of first half
  • for each of S analyses of second half,
  • for each of c ways of combining them
  • combine, add result to chart if best

O(n3S2c)
cap spending at 300 million cap
spending at 300 million
S analyses
S analyses
cS2 analyses of which we keep S
27
Headed constituents ...
Slide from Eisner, 1997
  • ... have too many signatures.
  • How bad is Q(n3S2c)?
  • For unheaded constituents, S is constant NP,
    VP ...
  • (similarly for dotted trees). So Q(n3).
  • But when different heads Þ different signatures,
  • the average substring has Q(n) possible headsand
    SQ(n) possible signatures. So Q(n5).

28
Dynamic Programming Approaches
  • Original version Hays 1964 (grammar driven)
  • Link grammar Sleator and Temperley 1991
    (grammar driven)
  • Bilexical grammar Eisner 1996 (data driven)
  • Maximum spanning tree McDonald 2006 (data
    driven)

29
Eisner 1996
  • Two novel aspects
  • Modified parsing algorithm
  • Probabilistic dependency parsing
  • Time requirement O(n3)
  • Modification Instead of storing subtrees, store
    spans
  • Span Substring such that no interior word links
    to any word outside the span.
  • Idea In a span, only the boundary words are
    active, i.e. still need a head or a child
  • One or both of the boundary words can be active

30
Example
Red
figures
on
the
screen
indicated
falling
stocks
_ROOT_
31
Example
Red
figures
on
the
screen
indicated
falling
stocks
_ROOT_
  • Spans

Red
figures
indicated
falling
stocks
32
Assembly of correct parse
Red
figures
on
the
screen
indicated
falling
stocks
_ROOT_
Start by combining adjacent words to minimal spans
Red
figures
figures
on
on
the
33
Assembly of correct parse
Red
figures
on
the
screen
indicated
falling
stocks
_ROOT_
Combine spans which overlap in one word this
word must be governed by a word in the left or
right span.

?
on
the
the
screen
on
the
screen
34
Assembly of correct parse
Red
figures
on
the
screen
indicated
falling
stocks
_ROOT_
Combine spans which overlap in one word this
word must be governed by a word in the left or
right span.

?
figures
on
on
the
screen
figures
on
the
screen
35
Assembly of correct parse
Red
figures
on
the
screen
indicated
falling
stocks
_ROOT_
Combine spans which overlap in one word this
word must be governed by a word in the left or
right span. Invalid span
Red
figures
on
the
screen
36
Assembly of correct parse
Red
figures
on
the
screen
indicated
falling
stocks
_ROOT_
Combine spans which overlap in one word this
word must be governed by a word in the left or
right span.

?
indicated
falling
indicated
falling
stocks
falling
stocks
37
Eisners Model
  • Recursive Generation
  • Each word generates its actual dependents
  • Two Markov chains
  • Left dependents
  • Right dependents

38
Eisners Model
  • where
  • tw(i) is ith tagged word
  • lc(i) rc(i) are the left and right children of
    ith word

where lcj(i) is the jth left child of the ith
word t(lcj-1(i)) is the tag of the preceding
left child
39
McDonalds Maximum Spanning Trees
  • Score of a dependency tree sum of scores of
    dependencies
  • Scores are independent of other dependencies
  • If scores are available, parsing can be
    formulated as maximum spanning tree problem
  • Two cases
  • Projective Use Eisners parsing algorithm.
  • Non-projective Use Chu-Liu-Edmonds algorithm
    Chu and Liu 1965, Edmonds 1967
  • Uses online learning for determining weight
    vector w

40
Parsing Methods
  • Three main traditions
  • Dynamic programming
  • CYK, Eisner, McDonald
  • Constraint satisfaction
  • Maruyama, Foth et al., Duchier
  • Deterministic parsing
  • Covington, Yamada and Matsumuto, Nivre

41
Constraint Satisfaction
  • Uses Constraint Dependency Grammar
  • Grammar consists of a set of boolean constraints,
    i.e. logical formulas that describe well-formed
    trees
  • A constraint is a logical formula with variables
    that range over a set of predefined values
  • Parsing is defined as a constraint satisfaction
    problem
  • Constraint satisfaction removes values that
    contradict constraints

42
Constraint Satisfaction
  • Parsing is an eliminative process rather than a
    constructive one such as in CFG parsing
  • Constraint satisfaction in general is NP complete
  • Parser design must ensure practical efficiency
  • Different approaches
  • Constraint propagation techniques which ensure
    local consistency Maruyama 1990
  • Weighted CDG Foth et al. 2000, Menzel and
    Schroder 1998

43
Maruyamas Constraint Propagation
  • Three steps
  • 1) Form initial constraint network using a core
    grammar
  • 2) Remove local inconsistencies
  • 3) If ambiguity remains, add new constraints and
    repeat step 2

44
Weighted Constraint Parsing
  • Robust parser, which uses soft constraints
  • Each constraint is assigned a weight between 0.0
    and 1.0
  • Weight 0.0 hard constraint, can only be violated
    when no other parse is possible
  • Constraints assigned manually (or estimated from
    treebank)
  • Efficiency uses a heuristic transformation-based
    constraint resolution method

45
Transformation-Based Constraint Resolution
  • Heuristic search
  • Very efficient
  • Idea first construct arbitrary dependency
    structure, then try to correct errors
  • Error correction by transformations
  • Selection of transformations based on constraints
    that cause conflicts
  • Anytime property parser maintains a complete
    analysis at anytime ? can be stopped at any time
    and return a complete analysis

46
Parsing Methods
  • Three main traditions
  • Dynamic programming
  • CYK, Eisner, McDonald
  • Constraint satisfaction
  • Maruyama, Foth et al., Duchier
  • Deterministic parsing
  • Covington, Yamada and Matsumuto, Nivre

47
Deterministic Parsing
  • Basic idea
  • Derive a single syntactic representation
    (dependency graph) through a deterministic
    sequence of elementary parsing actions
  • Sometimes combined with backtracking or repair
  • Motivation
  • Psycholinguistic modeling
  • Efficiency
  • Simplicity

48
Covingtons Incremental Algorithm
  • Deterministic incremental parsing in O(n2) time
    by trying to link each new word to each preceding
    one Covington 2001
  • PARSE(x (w1, . . . ,wn))
  • 1. for i 1 up to n
  • 2. for j i - 1 down to 1
  • 3. LINK(wi , wj )
  • Different conditions, such as Single-Head and
    Projectivity, can be incorporated into the LINK
    operation.

49
Shift-Reduce Type Algorithms
  • Data structures
  • Stack . . . ,wi S of partially processed tokens
  • Queue wj , . . .Q of remaining input tokens
  • Parsing actions built from atomic actions
  • Adding arcs (wi ? wj , wi ? wj )
  • Stack and queue operations
  • Left-to-right parsing
  • Restricted to projective dependency graphs

50
Yamadas Algorithm
  • Three parsing actions
  • Shift . . .S wi , . . .Q
  • . . . , wi S . . .Q
  • Left . . . , wi , wj S . . .Q
  • . . . , wi S . . .Q wi ? wj
  • Right . . . , wi , wj S . . .Q
  • . . . , wj S . . .Q wi ? wj
  • Multiple passes over the input give time
    complexity O(n2)

51
Yamada and Matsumoto
  • Parsing in several rounds deterministic
    bottom-up O(n2)
  • Looks at pairs of words
  • 3 actions shift, left, right
  • Shift shifts focus to next word pair

52
Yamada and Matsumoto
  • Left decides that the left word depends on the
    right one
  • Right decides that the right word depends on the
    left word

53
Parsing Algorithm
  • Go through each pair of words
  • Decide which action to take
  • If a relation was detected in a pass, do another
    pass
  • E.g. the little girl
  • First pass relation between little and girl
  • Second pass relation between the and girl
  • Decision on action depends on word pair and
    context

54
Nivres Algorithm
  • Four parsing actions
  • Shift . . .S wi , . . .Q
  • . . . , wi S . . .Q
  • Reduce . . . , wi S . . .Q ?wk wk ? wi
  • . . .S . . .Q
  • Left-Arc . . . , wi S wj , . . .Q ?wk wk
    ? wi
  • . . .S wj , . . .Q wi ? wj
  • Right-Arc . . . ,wi S wj , . . .Q ?wk wk
    ? wj
  • . . . , wi , wj S . . .Q wi ? wj

55
Nivres Algorithm
  • Characteristics
  • Arc-eager processing of right-dependents
  • Single pass over the input gives time worst case
    complexity O(2n)

56
Example
Red
figures
on
the
screen
indicated
falling
stocks
_ROOT_
S
Q
57
Example
Red
figures
on
the
screen
indicated
falling
stocks
_ROOT_
S
Q
Shift
58
Example
Red
figures
on
the
screen
indicated
falling
stocks
_ROOT_
S
Q
Left-arc
59
Example
Red
figures
on
the
screen
indicated
falling
stocks
_ROOT_
S
Q
Shift
60
Example
Red
figures
on
the
screen
indicated
falling
stocks
_ROOT_
S
Q
Right-arc
61
Example
Red
figures
on
the
screen
indicated
falling
stocks
_ROOT_
S
Q
Shift
62
Example
Red
figures
on
the
screen
indicated
falling
stocks
_ROOT_
S
Q
Left-arc
63
Example
Red
figures
on
the
screen
indicated
falling
stocks
_ROOT_
S
Q
Right-arc
64
Example
Red
figures
on
the
screen
indicated
falling
stocks
_ROOT_
S
Q
Reduce
65
Example
Red
figures
on
the
screen
indicated
falling
stocks
_ROOT_
S
Q
Reduce
66
Example
Red
figures
on
the
screen
indicated
falling
stocks
_ROOT_
S
Q
Left-arc
67
Example
Red
figures
on
the
screen
indicated
falling
stocks
_ROOT_
S
Q
Right-arc
68
Example
Red
figures
on
the
screen
indicated
falling
stocks
_ROOT_
S
Q
Shift
69
Example
Red
figures
on
the
screen
indicated
falling
stocks
_ROOT_
S
Q
Left-arc
70
Example
Red
figures
on
the
screen
indicated
falling
stocks
_ROOT_
S
Q
Right-arc
71
Example
Red
figures
on
the
screen
indicated
falling
stocks
_ROOT_
S
Q
Reduce
72
Example
Red
figures
on
the
screen
indicated
falling
stocks
_ROOT_
S
Q
Reduce
73
Classifier-Based Parsing
  • Data-driven deterministic parsing
  • Deterministic parsing requires an oracle.
  • An oracle can be approximated by a classifier.
  • A classifier can be trained using treebank data.
  • Learning algorithms
  • Support vector machines (SVM) Kudo and Matsumoto
    2002, Yamada and Matsumoto 2003,Isozaki et al.
    2004, Cheng et al. 2004, Nivre et al. 2006
  • Memory-based learning (MBL) Nivre et al. 2004,
    Nivre and Scholz 2004
  • Maximum entropy modeling (MaxEnt) Cheng et al.
    2005

74
Feature Models
  • Learning problem
  • Approximate a function from parser states,
    represented by feature vectors to parser actions,
    given a training set of gold standard
    derivations.
  • Typical features
  • Tokens and POS tags of
  • Target words
  • Linear context (neighbors in S and Q)
  • Structural context (parents, children, siblings
    in G)
  • Can not be used in dynamic programming algorithms.

75
Summary
  • Provided an intro to dependency parsing and
    various dependency parsing algorithms
  • Read up Nivres and McDonalds tutorial on
    dependency parsing at ESSLLI 07
Write a Comment
User Comments (0)
About PowerShow.com