PARSING - PowerPoint PPT Presentation

About This Presentation
Title:

PARSING

Description:

Title: Slide 1 Author: srini Last modified by: srini Created Date: 10/29/2005 4:47:42 PM Document presentation format: On-screen Show Company: AT&T Labs Research – PowerPoint PPT presentation

Number of Views:95
Avg rating:3.0/5.0
Slides: 52
Provided by: Sri675
Category:

less

Transcript and Presenter's Notes

Title: PARSING


1
PARSING
2
Analyzing Linguistic Units
Task Formal Mechanism Formal Mechanism Formal Mechanism Resulting Representation
Morphology Analyze words into morphemes Context dependency rules FST composition Morphological structure
Phonology Analyze words into phonemes Context dependency rules FST composition Phonemic structure
Syntax Analyze sentences for syntactic relations between words Grammars CFGs PDA Top-down, Bottom-up, Earley, CKY parsing Parse tree, derivation tree
  • Why should we parse a sentence?
  • to detect relations among words
  • used to normalize surface syntactic variations.
  • invaluable for a number of NLP applications

3
Some Concepts
  • Grammar A generative device that prescribes a
    set of valid strings.
  • Parser A device that uncovers the sequence of
    grammar rules that might have generated the input
    sentence.
  • Input Grammar, Sentence
  • Output parse tree, derivation tree
  • Recognizer A device that returns a yes if the
    input string could be generated by the grammar.
  • Input Grammar, Sentence
  • Output boolean

4
Searching for a Parse
  • Grammar rewrite procedure encodes
  • all strings generated by the grammar L(G)
  • all parse trees for each string (s) generated
    T(G) UTs(G)
  • Given an input sentence (I), the set of parse
    trees is TI (G).
  • Parsing is searching for TI (G) ? T(G)
  • Ideally, parser finds the appropriate parse for
    the sentence.

5
CFG for Fragment of English
S
S ? NP VP VP ? V
S ? Aux NP VP PP -gt Prep NP
S ? VP N ? book flight meal money
NP ? Det Nom V ? book include prefer
NP ?PropN Aux ? does
Nom ? N Nom Prep ?from to on
Nom ? N PropN ? Houston TWA
Nom ? Nom PP Det ? that this a
VP ? V NP
VP
NP
V
Nom
Book
Det
that
N
flight
Bottom-up Parsing
Top-down Parsing
6
Top-down/Bottom-up Parsing
Top-down (recursive decent parser) Bottom-up (shift-reduce parser)
Starts from S (goal) Words (input)
Algorithm (Parallel) a. Pick non-terminals b. Pick rules from the grammar to expand the non-terminals a. Match sequence of input symbols with the RHS of some rule b. Replace the sequence by the LHS of the matching rule
Termination Success When the leaves of a tree match the input Failure No more non-terminals to expand in any of the trees Success When S is reached Failure No more rewrites possible
Pros/Cons Pro Goal-driven, starts with S Con Constructs trees that may not match input Pro Constrained by the input string Con Constructs constituents that may not lead to the goal S
  • Control strategy -- how to explore search space?
  • Pursuing all parses in parallel or backtrack or
    ?
  • Which rule to apply next?
  • Which node to expand next?
  • Look at how the Top-down and Bottom-up parsing
    works on the board for Book that flight

7
Top-down, Depth First, Left-to-Right parser
  • Systematic, incremental expansion of the search
    space.
  • In contrast to a parallel parser
  • Start State (S,0)
  • End State (,n) n is the length of input to be
    parsed
  • Next State Rules
  • (wj1b,j) ? (b,j1)
  • (Bb,j) ? (gb,j) if B?g (note B is left-most
    non-terminal)
  • Agenda A data structure to keep track of the
    states to be expanded.
  • Depth-first expansion, if Agenda is a stack.

8
Fig 10.7
CFG
9
Left Corners
  • Can we help top-down parsers with some bottom-up
    information?
  • Unnecessary states created if there are many B?g
    rules.
  • If after successive expansions B ? w d and w
    does not match the input, then the series of
    expansion is wasted.
  • The leftmost symbol derivable from B needs to
    match the input.
  • look ahead to left-corner of the tree
  • B is a left-corner of A if A ? B g
  • Build table with left-corners of all
    non-terminals in grammar and consult before
    applying rule
  • At a given point in state expansion (Bb,j)
  • Pick the rule B ?C g if left-corner of C matches
    the input wj1

10
Limitation of Top-down Parsing Left Recursion
  • Depth-first search will never terminate if
    grammar is left recursive (e.g. NP --gt NP PP)
  • Solutions
  • Rewrite the grammar to a weakly equivalent one
    which is not left-recursive
  • NP ? NP PP
  • NP ? Nom PP
  • NP ? Nom
  • This may make rules unnatural
  • Fix depth of search explicitly
  • Other book-keeping needed in top-down parsing
  • Memoization for reusing previously parsed
    substrings
  • Packed representation for parse ambiguity

NP ? Nom NP NP ? PP NP NP ? e
11
Dynamic Programming for Parsing
  • Memoization
  • Create table of solutions to sub-problems (e.g.
    subtrees) as parse proceeds
  • Look up subtrees for each constituent rather than
    re-parsing
  • Since all parses implicitly stored, all available
    for later disambiguation
  • Examples Cocke-Younger-Kasami (CYK) (1960),
    Graham-Harrison-Ruzzo (GHR) (1980) and Earley
    (1970) algorithms
  • Earley parser O(n3) parser
  • Top-down parser with bottom-up information
  • State i, A ? a b, j
  • j is the position in the string that has been
    parsed
  • i is the position in the string where A begins
  • Top-down prediction S ? w1 wi A g
  • Bottom-up completion a wj1 wn ? wi wn

12
Earley Parser
  • Data Structure An n1 cell array called Chart
  • For each word position, chart contains set of
    states representing all partial parse trees
    generated to date.
  • E.g. chart0 contains all partial parse trees
    generated at the beginning of the sentence
  • Chart entries represent three type of
    constituents
  • predicted constituents (top-down predictions)
  • in-progress constituents (were in the midst of
    )
  • completed constituents (weve found )
  • Progress in parse represented by Dotted Rules
  • Position of indicates type of constituent
  • 0 Book 1 that 2 flight 3
  • (0,S ? VP, 0) (predicting VP)
  • (1,NP ? Det Nom, 2) (finding NP)
  • (0,VP ? V NP , 3) (found VP)

13
Earley Parser Parse Success
  • Final answer is found by looking at last entry in
    chart
  • If entry resembles (0,S ? ? , n) then input
    parsed successfully
  • But note that chart will also contain a record
    of all possible parses of input string, given the
    grammar -- not just the successful one(s)
  • Why is this useful?

14
Earley Parsing Steps
  • Start State (0, S ?S, 0)
  • End State (0, S?a, n) n is the input size
  • Next State Rules
  • Scanner read input
  • (i, A?awj1b, j) ? (i, A?awj1b, j1)
  • Predictor add top-down predictions
  • (i, A?aBb, j) ? (j, B?g, j) if B?g (note B is
    left-most non-terminal)
  • Completer move dot to right when new constituent
    found
  • (i, B?aAb, k) (k, A?g, j) ? (i, B?aAb, j)
  • No backtracking and no states removed keep
    complete history of parse
  • Why is this useful?

15
Earley Parser Steps
Scanner Predictor Completer
When does it apply Applied when terminals are to the right of a dot (0, VP ? V NP, 0) Applied when non-terminals are to the right of a dot (0, S ? VP ,0) Applied when dot reaches the end of a rule (1, NP ? Det Nom , 3)
What chart cell is affected New states are added to the next cell New states are added to current cell New states are added to current cell
What contents in the chart cell Move the dot over the terminal (0, VP ? V NP, 1) One new state for each expansion of the non-terminal in the grammar (0, VP ? V, 0) (0, VP ? V NP, 0) One state for each rule waiting for the constituent such as (0, VP ? V NP, 1) (0, VP ? V NP , 3)
16
Book that flight (Chart 0)
  • Seed chart with top-down predictions for S from
    grammar

17
CFG for Fragment of English
Det ? that this a
N ? book flight meal money
V ? book include prefer
Aux ? does
Nom ? N
Nom ? N Nom
NP ?PropN
VP ? V
Nom ? Nom PP
VP ? V NP
PP ? Prep NP
18
Chart1
V? book ? passed to Completer, which finds 2
states in Chart0 whose left corner is V and
adds them to Chart1, moving dots to right
19
(No Transcript)
20
Retrieving the parses
  • Augment the Completer to add pointer to prior
    states it advances as a field in the current
    state
  • i.e. what states combined to arrive here?
  • Read the pointers back from the final state
  • What if the final cell does not have the final
    state? Error handling.
  • Is it a total loss? No...
  • Chart contains every constituent and combination
    of constituents possible for the input given the
    grammar
  • Useful for partial parsing or shallow parsing
    used in information extraction

21
Alternative Control Strategies
  • Change Earley top-down strategy to bottom-up or
    ...
  • Change to best-first strategy based on the
    probabilities of constituents
  • Compute and store probabilities of constituents
    in the chart as you parse
  • Then instead of expanding states in fixed order,
    allow probabilities to control order of expansion

22
Probabilistic and Lexicalized Parsing
23
Probabilistic CFGs
  • Weighted CFGs
  • Attach weights to rules of CFG
  • Compute weights of derivations
  • Use weights to pick, preferred parses
  • Utility Pruning and ordering the search space,
    disambiguate, Language Model for ASR.
  • Parsing with weighted grammars (like Weighted FA)
  • T arg maxT W(T,S)
  • Probabilistic CFGs are one form of weighted CFGs.

24
Probability Model
  • Rule Probability
  • Attach probabilities to grammar rules
  • Expansions for a given non-terminal sum to 1
  • R1 VP ? V .55
  • R2 VP ? V NP .40
  • R3 VP ? V NP NP .05
  • Estimate the probabilities from annotated corpora
    P(R1)counts(R1)/counts(VP)
  • Derivation Probability
  • Derivation T R1Rn
  • Probability of a derivation
  • Most likely probable parse
  • Probability of a sentence
  • Sum over all possible derivations for the
    sentence
  • Note the independence assumption Parse
    probability does not change based on where the
    rule is expanded.

25
Structural ambiguity
  • S ? NP VP
  • VP ? V NP
  • NP ? NP PP
  • VP ? VP PP
  • PP ? P NP
  • NP ? John Mary Denver
  • V -gt called
  • P -gt from

John called Mary from Denver
S
VP
NP
NP
V
NP
PP
called
John
Mary
P
NP
from
Denver
26
Cocke-Younger-Kasami Parser
  • Bottom-up parser with top-down filtering
  • Start State(s) (A, i, i1) for each A?wi1
  • End State (S, 0,n) n is the input size
  • Next State Rules
  • (B, i, k) (C, k, j) ? (A, i, j) if A?BC

27
Example





John called Mary from Denver
28
Base Case A?w
NP
P Denver
NP from
V Mary
NP called
John
29
Recursive Cases A?BC
NP
P Denver
NP from
X V Mary
NP called
John
30
NP
P Denver
VP NP from
X V Mary
NP called
John
31
NP
X P Denver
VP NP from
X V Mary
NP called
John
32
PP NP
X P Denver
VP NP from
X V Mary
NP called
John
33
PP NP
X P Denver
S VP NP from
V Mary
NP called
John
34
PP NP
X X P Denver
S VP NP from
X V Mary
NP called
John
35
NP PP NP
X P Denver
S VP NP from
X V Mary
NP called
John
36
NP PP NP
X X X P Denver
S VP NP from
X V Mary
NP called
John
37
VP NP PP NP
X X X P Denver
S VP NP from
X V Mary
NP called
John
38
VP NP PP NP
X X X P Denver
S VP NP from
X V Mary
NP called
John
39
VP1 VP2 NP PP NP
X X X P Denver
S VP NP from
X V Mary
NP called
John
40
S VP1 VP2 NP PP NP
X X X P Denver
S VP NP from
X V Mary
NP called
John
41
S VP NP PP NP
X X X P Denver
S VP NP from
X V Mary
NP called
John
42
Probabilistic CKY
  • Assign probabilities to constituents as they are
    completed and placed in the table
  • Computing the probability
  • Since we are interested in the max P(S,0,n)
  • Use the max probability for each constituent
  • Maintain back-pointers to recover the parse.

43
Problems with PCFGs
  • The probability model were using is just based
    on the rules in the derivation.
  • Lexical insensitivity
  • Doesnt use the words in any real way
  • Structural disambiguation is lexically driven
  • PP attachment often depends on the verb, its
    object, and the preposition
  • I ate pickles with a fork.
  • I ate pickles with relish.
  • Context insensitivity of the derivation
  • Doesnt take into account where in the derivation
    a rule is used
  • Pronouns more often subjects than objects
  • She hates Mary.
  • Mary hates her.
  • Solution Lexicalization
  • Add lexical information to each rule

44
An example of lexical information Heads
  • Make use of notion of the head of a phrase
  • Head of an NP is a noun
  • Head of a VP is the main verb
  • Head of a PP is its preposition
  • Each LHS of a rule in the PCFG has a lexical item
  • Each RHS non-terminal has a lexical item.
  • One of the lexical items is shared with the LHS.
  • If R is the number of binary branching rules in
    CFG, in lexicalized CFG O(2?R)
  • Unary rules O(?R)

45
Example (correct parse)
Attribute grammar
46
Example (less preferred)
47
Computing Lexicalized Rule Probabilities
  • We started with rule probabilities
  • VP ? V NP PP P(ruleVP)
  • E.g., count of this rule divided by the number of
    VPs in a treebank
  • Now we want lexicalized probabilities
  • VP(dumped) ? V(dumped) NP(sacks)PP(in)
  • P(ruleVP dumped is the verb sacks is the
    head of the NP in is the head of the PP)
  • Not likely to have significant counts in any
    treebank

48
Another Example
  • Consider the VPs
  • Ate spaghetti with gusto
  • Ate spaghetti with marinara
  • Dependency is not between mother-child.

Vp (ate)
Vp(ate)
Np(spag)
Vp(ate)
Pp(with)
Pp(with)
np
v
v
np
Ate spaghetti with marinara
Ate spaghetti with gusto
49
Log-linear models for Parsing
  • Why restrict to the conditioning to the elements
    of a rule?
  • Use even larger context
  • Word sequence, word types, sub-tree context etc.
  • In general, compute P(yx) where fi(x,y) test
    the properties of the context li is the weight
    of that feature.
  • Use these as scores in the CKY algorithm to find
    the best scoring parse.

50
Supertagging Almost parsing
Poachers now control the
underground trade
S
S
VP
NP
S
NP
NP
V
VP
NP
e
N
NP
V
e
poachers

e
Adj


underground
51
Summary
  • Parsing context-free grammars
  • Top-down and Bottom-up parsers
  • Mixed approaches (CKY, Earley parsers)
  • Preferences over parses using probabilities
  • Parsing with PCFG and PCKY algorithms
  • Enriching the probability model
  • Lexicalization
  • Log-linear models for parsing
Write a Comment
User Comments (0)
About PowerShow.com