Parsing with PCFG - PowerPoint PPT Presentation

About This Presentation
Title:

Parsing with PCFG

Description:

for begin=1 to N-span 1. end = begin span 1; for m=begin to end-1; ... begin=1 begin=2 begin=3. span=1. span=2. span=3. Data structures for the chart (1) (2) ... – PowerPoint PPT presentation

Number of Views:233
Avg rating:3.0/5.0
Slides: 60
Provided by: fei47
Category:
Tags: pcfg | begin | parsing

less

Transcript and Presenter's Notes

Title: Parsing with PCFG


1
Parsing with PCFG
  • Ling 571
  • Fei Xia
  • Week 3 10/11-10/13/05

2
Outline
  • Misc
  • CYK algorithm
  • Converting CFG into CNF
  • PCFG
  • Lexicalized PCFG

3
Misc
  • Quiz 1 15 pts, due 10/13
  • Hw2 10 pts, due 10/13,
  • ling580i_au05_at_u, ling580e_au05_at_u
  • Treehouse weekly meeting
  • Time every Wed 230-330pm, tomorrow is the 1st
    meeting
  • Location EE1 025 (Campus map 12-N, South of
    MGH)
  • Mailing list cl-announce_at_u
  • Others
  • Pongo policies
  • Machines LLC, Parrington, Treehouse
  • Linux commands ssh, sftp,
  • Catalyst tools ESubmit, EPost,

4
CYK algorithm
5
Parsing algorithms
  • Top-down
  • Bottom-up
  • Top-down with bottom-up filtering
  • Earley algorithm
  • CYK algorithm
  • ....

6
CYK algorithm
  • Cocke-Younger-Kasami algorithm (a.k.a. CKY
    algorithm)
  • Require CFG to be in Chomsky Normal Form (CNF).
  • Bottom-up chart parsing algorithm using DP.
  • Fill in a two-dimension array Cij contains
    all the possible syntactic interpretations of the
    substring
  • Complexity

7
Chomsky normal form (CNF)
  • Definition of CNF
  • A ? B C
  • A ? a
  • S ?
  • A, B, C are non-terminals a is a terminal.
  • S is the start symbol B and C are not.
  • For every CFG, there is a CFG in CNF that is
    weakly equivalent.

8
CYK algorithm
  • For every rule A?w_i,
  • For span2 to N
  • for begin1 to N-span1
  • end begin span 1
  • for mbegin to end-1
  • for all non-terminals A, B, C
  • If
  • then

9
CYK algorithm (another way)
  • For every rule A?w_i, add it to Cellii
  • For span2 to N
  • for begin1 to N-span1
  • end begin span 1
  • for mbegin to end-1
  • for all non-terminals A, B, C
  • If Cellbeginm contains B?...
    and
  • Cellm1end contains C?
    and
  • A?BC is a rule in the grammar
  • then add A?BC to Cellbeginend
    and
  • remember m

10
An example
  • Rules
  • VP ? V NP V? book
  • VP ? VP PP N?book/flight/cards
  • NP ? Det N Det? that/the
  • NP ? NP PP P? with
  • PP ? P NP

11
Parse book that flight C1beginend
VP?V NP (m1) NP?Det N (m2) N?flight
---- Det?that
N?book V?book
end3
end2
end1
begin1 begin2 begin3
12
Parse book that flight C2beginspan
VP?V NP (m1)
---- NP?Det N (m2)
N?book V?book Det?that N?flight
span3
span2
span1
begin1 begin2 begin3
13
Data structures for the chart
  • (1)
  • (2)
  • (3)
  • (4)

14
Summary of CYK algorithm
  • Bottom-up using DP
  • Require the CFG to be in CNF
  • A very efficient algorithm
  • Easy to be extended

15
Converting CFG into CNF
16
Chomsky normal form (CNF)
  • Definition of CNF
  • A ? B C,
  • A ? a,
  • S ?
  • Where
  • A, B, C are non-terminals, a is a terminal,
  • S is the start symbol, and B, C are not start
    symbols.
  • For every CFG, there is a CFG in CNF that is
    weakly equivalent.

17
Converting CFG to CNF
  • Add a new symbol S0, and a rule S0?S
  • (so the start symbol will not appear on the
    rhs of any rule)
  • (2) Eliminate
  • for each rule add
  • for each rule , add
  • unless has been
    previously eliminated.

18
Conversion (cont)
  • (3) Remove unit rule
  • add if
  • unless the latter rule was previously
    removed.
  • (4) Replace a rule
    where kgt2
  • with
  • replace any terminal with a new
    symbol
  • and add a new rule

19
An example
20
Adding
21
Removing rules
Remove B?
Remove A?
22
Removing unit rules
  • Remove
  • Remove

23
Removing unit rules (cont)
  • Remove
  • Removing

24
Converting remaining rules
25
Summary of CFG parsing
  • Simply top-down and bottom-up parsing generate
    useless trees.
  • Top-down with bottom-up filtering has three
    problems.
  • Solution use DP
  • Earley algorithm
  • CYK algorithm

26
Probabilistic CFG (PCFG)
27
PCFG
  • PCFG is an extension of CFG.
  • A PCFG is a 5-tuple(N, T, P, S, Pr), where Pr is
    a function assigning probability to each rule in
    P
  • or
  • Given a non-terminal A,

28
A PCFG
  • S ? NP VP 0.8 N?
    Mary 0.01
  • S ? Aux NP VP 0.15 N?book
    0.02
  • S ? VP 0.05
  • VP?V 0.35
    V?bought 0.02
  • VP?V NP 0.45
  • VP?VP PP 0.20 Det?a
    0.04
  • NP?N 0.8
  • NP?Det N 0.2
  • .

29
Using probabilities
  • To estimate prob of a sentence and its parse
    trees.
  • Useful in disambiguation.
  • The prob of a tree n is a node in T, r(n) is the
    rule used to expand n in T.

30
Computing P(T)
  • S ? NP VP 0.8 N?
    Mary 0.01
  • S ? Aux NP VP 0.15 N?book
    0.02
  • S ? VP 0.05
  • VP?V 0.35
    V?bought 0.02
  • VP?V NP 0.45
  • VP?VP PP 0.20 Det?a
    0.04
  • NP?N 0.8
  • NP?Det N 0.2
  • The sentence is Mary bought a book.

31
The most likely tree
  • P(T, S) P(T) P(ST) P(T)
  • T is a parse tree, S is a sentence
  • The best parse tree for a sentence S

32
Find the most likely tree
  • Given a PCFG and a sentence, how to find the best
    parse tree for S?
  • One algorithm CYK

33
CYK algorithm for CFG
  • For every rule A?w_i,
  • For span2 to N
  • for begin1 to N-span1
  • end begin span 1
  • for mbegin to end-1
  • for all non-terminals A, B, C
  • If
  • then

34
CYK algorithm for CFG (another implementation)
  • For every rule A?w_i,
  • For span2 to N
  • for begin1 to N-span1
  • end begin span 1
  • for mbegin to end-1
  • for all non-terminals A, B, C
  • if
  • then

35
Variables for CFG and PCFG
  • CFG whether there is a parse tree whose root is
    A and which covers
  • PCFG the prob of the most likely parse tree
    whose root is A and which covers

36
CYK algorithm for PCFG
  • For every rule A?w_i,
  • For span2 to N
  • for begin1 to N-span1
  • end begin span 1
  • for mbegin to end-1
  • for all non-terminals A, B, C
  • if
  • then

37
A CFG
  • Rules
  • VP ? V NP V? book
  • VP ? VP PP N?book/flight/cards
  • NP ? Det N Det? that/the
  • NP ? NP PP P? with
  • PP ? P NP

38
Parse book that flight
VP?V NP (m1) NP?Det N (m2) N?flight
---- Det?that
N?book V?book
end3
end2
end1
begin1 begin2 begin3
39
A PCFG
  • Rules
  • VP ? V NP 0.4 V? book 0.001
  • VP ? VP PP 0.2 N?book 0.01
  • NP ? Det N 0.3 Det? that 0.1
  • NP ? NP PP 0.2 P? with 0.2
  • PP ? P NP 1.0 N?flight 0.02

40
Parse book that flight
VP?V NP (m1) 2.4e-7 NP?Det N (m2) 6e-4 N?flight 0.02
---- Det?that 0.1
N?book 0.01 V?book 0.001
end3
end2
end1
begin1 begin2 begin3
41
N-best parse trees
  • Best parse tree
  • N-best parse trees

42
CYK algorithm for N-best
  • For every rule A?w_i,
  • For span2 to N
  • for begin1 to N-span1
  • end begin span 1
  • for mbegin to end-1
  • for all non-terminals A, B, C
  • for each
  • if val gt one of probs in
  • then remove the last element
    in
  • and insert val to
    the array
  • remove the last
    element in BbeginendA and

43
PCFG for Language Modeling (LM)
  • N-gram LM
  • Syntax-based LM

44
Calculating Pr(S)
  • Parsing the prob of the most likely parse tree
  • LM the sum of all parse trees

45
CYK for finding the most likely parse tree
  • For every rule A?w_i,
  • For span2 to N
  • for begin1 to N-span1
  • end begin span 1
  • for mbegin to end-1
  • for all non-terminals A, B, C
  • if
  • then

46
CYK for calculating LM
  • For every rule A?w_i,
  • For span2 to N
  • for begin1 to N-span1
  • end begin span 1
  • for mbegin to end-1
  • for all non-terminals A, B, C

47
CYK algorithm

One parse tree boolean tuple
All parse trees boolean list of tuples
Most likely parse tree real number (the max prob) tuple
N-best parse trees list of real numbers list of tuples
LM for sentence real number (the sum of probs) not needed
48
Learning PCFG Probabilities
  • Given a treebank (i.e., a set of trees), use MLE
  • Without treebanks ? inside-outside algorithm

49
QA
  • PCFG
  • CYK algorithm

50
Problems of PCFG
  • Lack of sensitivity to structural dependency
  • Lack of sensitivity to lexical dependency

51
Structural Dependency
  • Each PCFG rule is assumed to be independent of
    other rules.
  • Observation sometimes the choice of how a node
    expands is dependent on the location of the node
    in the parse tree.
  • NP?Pron depends on whether the NP was a subject
    or an object

52
Lexical Dependency
  • Given P(NP?NP PP) gt P(VP?VP PP)
  • should a PP always be attached to an NP?
  • Verbs such as send
  • Preps such as of, into

53
Solution to the problems
  • Structural dependency
  • Lexical dependency
  • ? Other more sophisticated models.

54
Lexicalized PCFG
55
Head and head child
  • Each syntactic constituent is associated with a
    lexical head.
  • Each context-free rule has a head child
  • VP? V NP
  • NP?Det N
  • VP? VP PP
  • NP?NP PP
  • VP? to VP
  • VP? aux VP

56
Head propagation
  • Lexical head propagates from head child to its
    parent.
  • An example Mary bought a book in the store.

57
Lexicalized PCFG
  • Lexicalized rules
  • VP (bought) ? V(bought) NP 0.01
  • VP?V NP 0.01 0 bought -
  • VP (bought) ? V (bought) NP (book) 1.5e-7
  • VP ? V NP 1.5e-7 0 bought book

58
Finding head in a parse tree
  • Head propagation table simple rules to find head
    child
  • An example
  • (VP left V/VP/Aux)
  • (PP left P)
  • (NP right N)

59
Simplified Model using Lexicalized PCFG
  • PCFG P(r(n)n)
  • Lexicalized PCFG P(r(n)n, head(n))
  • P(VP?VBD NP PP VP, dumped)
  • P(VP?VBD NP PP VP, slept)
  • Parsers that use lexicalized rules
  • Collins parser
Write a Comment
User Comments (0)
About PowerShow.com