Title: PARSING WITH CONTEXTFREE GRAMMARS
1PARSING WITH CONTEXT-FREE GRAMMARS
Thanks to Massimo Poesio
2PARSING
- Parsing is the process of recognizing and
assigning STRUCTURE - Parsing a string with a CFG
- Finding a derivation of the string consistent
with the grammar - The derivation gives us a PARSE TREE
3EXAMPLE (CFR LAST WEEK)
4PARSING AS SEARCH
- Just as in the case of non-deterministic regular
expressions, the main problem with parsing is the
existence of CHOICE POINTS - There is a need for a SEARCH STRATEGY determining
the order in which alternatives are considered
5TOP-DOWN AND BOTTOM-UP SEARCH STRATEGIES
- The search has to be guided by the INPUT and the
GRAMMAR - TOP-DOWN search the parse tree has to be rooted
in the start symbol S - EXPECTATION-DRIVEN parsing
- BOTTOM-UP search the parse tree must be an
analysis of the input - DATA-DRIVEN parsing
6AN EXAMPLE OF TOP-DOWN SEARCH(IN PARALLEL)
7AN EXAMPLE OF BOTTOM-UP SEARCH
8NON-PARALLEL SEARCH
- If its not possible to examine all alternatives
in parallel, its necessary to make further
decisions - Which node in the current search space to expand
first (breadth-first or depth-first) - Which of the applicable grammar rules to expand
first - Which leaf node in a parse tree to expand next
(e.g., leftmost)
9TOP-DOWN, DEPTH-FIRST, LEFT-TO-RIGHT
10TOP-DOWN, DEPTH-FIRST, LEFT-TO-RIGHT (II)
11TOP-DOWN, DEPTH-FIRST, LEFT-TO-RIGHT (III)
12TOP-DOWN, DEPTH-FIRST, LEFT-TO-RIGHT (IV)
13A T-D, D-F, L-R PARSER
14TOP-DOWN vs BOTTOM-UP
- TOP-DOWN
- Only search among grammatical answers
- BUT suggests hypotheses that may not be
consistent with data - Problem left-recursion
- BOTTOM-UP
- Only forms hypotheses consistent with data
- BUT may suggest hypotheses that make no sense
globally
15LEFT-RECURSION
- A LEFT-RECURSIVE grammar may cause a T-D, D-F,
L-R parser to never return - Examples of left-recursive rules
- NP ? NP PP
- S ? S and S
- But also
- NP ? Det Nom
- Det ? NPs
16THE PROBLEM WITH LEFT-RECURSION
17LEFT-RECURSION POOR SOLUTIONS
- Rewrite the grammar to a weakly equivalent one
- Problem may not get correct parse tree
- Limit the depth during search
- Problem limit is arbitrary
18LEFT-CORNER PARSING
- A hybrid of top-down and bottom-up parsing
- Strategy dont consider any expansion unless the
current input can serve as the LEFT-CORNER of
that expansion
19FURTHER PROBLEMS IN PARSING
- Ambiguity
- Church and Patel (1982) the number of attachment
ambiguities grows like the Catalan numbers - C(2) 2, C(3) 5, C(4) 14, C(5) 132, C(6)
469, C(7) 1430, C(8) 4867 - Avoiding reparsing
20COMMON STRUCTURAL AMBIGUITIES
- COORDINATION ambiguity
- OLD (MEN AND WOMEN) vs (OLD MEN) AND WOMEN
- ATTACHMENT ambiguity
- Gerundive VP attachment ambiguity
- I saw the Eiffel Tower flying to Paris
- PP attachment ambiguity
- I shot an elephant in my pajamas
21PP ATTACHMENT AMBIGUITY
22AMBIGUITY SOLUTIONS
- Use a PROBABILISTIC GRAMMAR (not covered in this
module) - Use semantics
23AVOID RECOMPUTING INVARIANTS
- Consider parsing with a top-down parser the NP
- A flight from Indianapolis to Houston on TWA
- With the grammar rules
- NP ? Det Nominal
- NP ? NP PP
- NP ? ProperNoun
24INVARIANTS AND TOP-DOWN PARSING
25THE EARLEY ALGORITHM
26DYNAMIC PROGRAMMING
- A standard T-D parser would reanalyze A FLIGHT 4
times, always in the same way - A DYNAMIC PROGRAMMING algorithm uses a table (the
CHART) to avoid repeating work - The Earley algorithm also
- Does not suffer from the left-recursion problem
- Solves an exponential problem in O(n3)
27THE CHART
- The Earley algorithm uses a table (the CHART) of
size N1, where N is the length of the input - Table entries sit in the gaps between words
- Each entry in the chart is a list of
- Completed constituents
- In-progress constituents
- Predicted constituents
- All three types of objects are represented in the
same way as STATES
28THE CHART GRAPHICAL REPRESENTATION
29STATES
- A state encodes two types of information
- How much of a certain rule has been encountered
in the input - Which positions are covered
- A ? ?, X,Y
- DOTTED RULES
- VP ? V NP ?
- NP ? Det ? Nominal
- S ? ? VP
30EXAMPLES
31SUCCESS
- The parser has succeeded if entry N1 of the
chart contains the state - S ? ? ?, 0,N
32THE ALGORITHM
- The algorithm loops through the input without
backtracking, at each step performing three
operations - PREDICTOR add predictions to the chart
- COMPLETER Move the dot to the right when
looked-for constituent is found - SCANNER read in the next input word
33THE ALGORITHM CENTRAL LOOP
34EARLEY ALGORITHM THE THREE OPERATORS
35EXAMPLE, AGAIN
36EXAMPLE BOOK THAT FLIGHT
37EXAMPLE BOOK THAT FLIGHT (II)
38EXAMPLE BOOK THAT FLIGHT (III)
39EXAMPLE BOOK THAT FLIGHT (IV)
40READINGS
- Jurafsky and Martin, chapter 10.1-10.4