Title: Parsing with PCFG
1Parsing with PCFG
- Ling 571
- Fei Xia
- Week 3 10/11-10/13/05
2Outline
- Misc
- CYK algorithm
- Converting CFG into CNF
- PCFG
- Lexicalized PCFG
3Misc
- Quiz 1 15 pts, due 10/13
- Hw2 10 pts, due 10/13,
- ling580i_au05_at_u, ling580e_au05_at_u
- Treehouse weekly meeting
- Time every Wed 230-330pm, tomorrow is the 1st
meeting - Location EE1 025 (Campus map 12-N, South of
MGH) - Mailing list cl-announce_at_u
- Others
- Pongo policies
- Machines LLC, Parrington, Treehouse
- Linux commands ssh, sftp,
- Catalyst tools ESubmit, EPost,
4CYK algorithm
5Parsing algorithms
- Top-down
- Bottom-up
- Top-down with bottom-up filtering
- Earley algorithm
- CYK algorithm
- ....
6CYK algorithm
- Cocke-Younger-Kasami algorithm (a.k.a. CKY
algorithm) - Require CFG to be in Chomsky Normal Form (CNF).
- Bottom-up chart parsing algorithm using DP.
- Fill in a two-dimension array Cij contains
all the possible syntactic interpretations of the
substring -
- Complexity
7Chomsky normal form (CNF)
- Definition of CNF
- A ? B C
- A ? a
- S ?
- A, B, C are non-terminals a is a terminal.
- S is the start symbol B and C are not.
- For every CFG, there is a CFG in CNF that is
weakly equivalent.
8CYK algorithm
- For every rule A?w_i,
- For span2 to N
- for begin1 to N-span1
- end begin span 1
- for mbegin to end-1
- for all non-terminals A, B, C
- If
-
- then
-
-
-
9CYK algorithm (another way)
- For every rule A?w_i, add it to Cellii
- For span2 to N
- for begin1 to N-span1
- end begin span 1
- for mbegin to end-1
- for all non-terminals A, B, C
- If Cellbeginm contains B?...
and - Cellm1end contains C?
and - A?BC is a rule in the grammar
-
- then add A?BC to Cellbeginend
and - remember m
-
-
-
10An example
- Rules
- VP ? V NP V? book
- VP ? VP PP N?book/flight/cards
- NP ? Det N Det? that/the
- NP ? NP PP P? with
- PP ? P NP
11Parse book that flight C1beginend
VP?V NP (m1) NP?Det N (m2) N?flight
---- Det?that
N?book V?book
end3
end2
end1
begin1 begin2 begin3
12Parse book that flight C2beginspan
VP?V NP (m1)
---- NP?Det N (m2)
N?book V?book Det?that N?flight
span3
span2
span1
begin1 begin2 begin3
13Data structures for the chart
14Summary of CYK algorithm
- Bottom-up using DP
- Require the CFG to be in CNF
- A very efficient algorithm
- Easy to be extended
15Converting CFG into CNF
16Chomsky normal form (CNF)
- Definition of CNF
- A ? B C,
- A ? a,
- S ?
- Where
- A, B, C are non-terminals, a is a terminal,
- S is the start symbol, and B, C are not start
symbols. - For every CFG, there is a CFG in CNF that is
weakly equivalent.
17Converting CFG to CNF
- Add a new symbol S0, and a rule S0?S
- (so the start symbol will not appear on the
rhs of any rule) - (2) Eliminate
- for each rule add
- for each rule , add
- unless has been
previously eliminated.
18Conversion (cont)
- (3) Remove unit rule
- add if
- unless the latter rule was previously
removed. - (4) Replace a rule
where kgt2 - with
- replace any terminal with a new
symbol - and add a new rule
19An example
20Adding
21Removing rules
Remove B?
Remove A?
22Removing unit rules
23Removing unit rules (cont)
24Converting remaining rules
25Summary of CFG parsing
- Simply top-down and bottom-up parsing generate
useless trees. - Top-down with bottom-up filtering has three
problems. - Solution use DP
- Earley algorithm
- CYK algorithm
26Probabilistic CFG (PCFG)
27PCFG
- PCFG is an extension of CFG.
- A PCFG is a 5-tuple(N, T, P, S, Pr), where Pr is
a function assigning probability to each rule in
P - or
- Given a non-terminal A,
28A PCFG
- S ? NP VP 0.8 N?
Mary 0.01 - S ? Aux NP VP 0.15 N?book
0.02 - S ? VP 0.05
-
- VP?V 0.35
V?bought 0.02 - VP?V NP 0.45
- VP?VP PP 0.20 Det?a
0.04 - NP?N 0.8
- NP?Det N 0.2
- .
29Using probabilities
- To estimate prob of a sentence and its parse
trees. - Useful in disambiguation.
- The prob of a tree n is a node in T, r(n) is the
rule used to expand n in T. -
30Computing P(T)
- S ? NP VP 0.8 N?
Mary 0.01 - S ? Aux NP VP 0.15 N?book
0.02 - S ? VP 0.05
-
- VP?V 0.35
V?bought 0.02 - VP?V NP 0.45
- VP?VP PP 0.20 Det?a
0.04 - NP?N 0.8
- NP?Det N 0.2
- The sentence is Mary bought a book.
31The most likely tree
- P(T, S) P(T) P(ST) P(T)
- T is a parse tree, S is a sentence
- The best parse tree for a sentence S
-
32Find the most likely tree
- Given a PCFG and a sentence, how to find the best
parse tree for S? - One algorithm CYK
33CYK algorithm for CFG
- For every rule A?w_i,
- For span2 to N
- for begin1 to N-span1
- end begin span 1
- for mbegin to end-1
- for all non-terminals A, B, C
- If
-
- then
-
-
-
34CYK algorithm for CFG (another implementation)
- For every rule A?w_i,
- For span2 to N
- for begin1 to N-span1
- end begin span 1
- for mbegin to end-1
- for all non-terminals A, B, C
-
-
- if
- then
-
-
-
35Variables for CFG and PCFG
- CFG whether there is a parse tree whose root is
A and which covers - PCFG the prob of the most likely parse tree
whose root is A and which covers
36CYK algorithm for PCFG
- For every rule A?w_i,
- For span2 to N
- for begin1 to N-span1
- end begin span 1
- for mbegin to end-1
- for all non-terminals A, B, C
-
-
- if
- then
-
-
-
37A CFG
- Rules
- VP ? V NP V? book
- VP ? VP PP N?book/flight/cards
- NP ? Det N Det? that/the
- NP ? NP PP P? with
- PP ? P NP
38Parse book that flight
VP?V NP (m1) NP?Det N (m2) N?flight
---- Det?that
N?book V?book
end3
end2
end1
begin1 begin2 begin3
39A PCFG
- Rules
- VP ? V NP 0.4 V? book 0.001
- VP ? VP PP 0.2 N?book 0.01
- NP ? Det N 0.3 Det? that 0.1
- NP ? NP PP 0.2 P? with 0.2
- PP ? P NP 1.0 N?flight 0.02
40Parse book that flight
VP?V NP (m1) 2.4e-7 NP?Det N (m2) 6e-4 N?flight 0.02
---- Det?that 0.1
N?book 0.01 V?book 0.001
end3
end2
end1
begin1 begin2 begin3
41N-best parse trees
- Best parse tree
- N-best parse trees
42CYK algorithm for N-best
- For every rule A?w_i,
- For span2 to N
- for begin1 to N-span1
- end begin span 1
-
-
- for mbegin to end-1
- for all non-terminals A, B, C
- for each
-
-
-
- if val gt one of probs in
-
- then remove the last element
in - and insert val to
the array -
-
- remove the last
element in BbeginendA and
43PCFG for Language Modeling (LM)
- N-gram LM
-
- Syntax-based LM
44Calculating Pr(S)
- Parsing the prob of the most likely parse tree
- LM the sum of all parse trees
45CYK for finding the most likely parse tree
- For every rule A?w_i,
- For span2 to N
- for begin1 to N-span1
- end begin span 1
- for mbegin to end-1
- for all non-terminals A, B, C
-
-
- if
- then
-
-
-
46CYK for calculating LM
- For every rule A?w_i,
- For span2 to N
- for begin1 to N-span1
- end begin span 1
- for mbegin to end-1
- for all non-terminals A, B, C
-
-
-
-
-
-
-
47CYK algorithm
One parse tree boolean tuple
All parse trees boolean list of tuples
Most likely parse tree real number (the max prob) tuple
N-best parse trees list of real numbers list of tuples
LM for sentence real number (the sum of probs) not needed
48Learning PCFG Probabilities
- Given a treebank (i.e., a set of trees), use MLE
- Without treebanks ? inside-outside algorithm
49QA
50Problems of PCFG
- Lack of sensitivity to structural dependency
- Lack of sensitivity to lexical dependency
-
51Structural Dependency
- Each PCFG rule is assumed to be independent of
other rules. - Observation sometimes the choice of how a node
expands is dependent on the location of the node
in the parse tree. - NP?Pron depends on whether the NP was a subject
or an object
52Lexical Dependency
- Given P(NP?NP PP) gt P(VP?VP PP)
- should a PP always be attached to an NP?
- Verbs such as send
- Preps such as of, into
53Solution to the problems
- Structural dependency
- Lexical dependency
- ? Other more sophisticated models.
54Lexicalized PCFG
55Head and head child
- Each syntactic constituent is associated with a
lexical head. -
- Each context-free rule has a head child
- VP? V NP
- NP?Det N
- VP? VP PP
- NP?NP PP
- VP? to VP
- VP? aux VP
56Head propagation
- Lexical head propagates from head child to its
parent. - An example Mary bought a book in the store.
57Lexicalized PCFG
- Lexicalized rules
- VP (bought) ? V(bought) NP 0.01
- VP?V NP 0.01 0 bought -
- VP (bought) ? V (bought) NP (book) 1.5e-7
- VP ? V NP 1.5e-7 0 bought book
58Finding head in a parse tree
- Head propagation table simple rules to find head
child - An example
- (VP left V/VP/Aux)
- (PP left P)
- (NP right N)
59Simplified Model using Lexicalized PCFG
- PCFG P(r(n)n)
- Lexicalized PCFG P(r(n)n, head(n))
- P(VP?VBD NP PP VP, dumped)
- P(VP?VBD NP PP VP, slept)
- Parsers that use lexicalized rules
- Collins parser