CPSC 503 Computational Linguistics - PowerPoint PPT Presentation

About This Presentation
Title:

CPSC 503 Computational Linguistics

Description:

Assign valid trees: covers all and only the elements of the input and has an S at the top ... Top-Down Parsing ... an attempt to top-down parse the following ... – PowerPoint PPT presentation

Number of Views:58
Avg rating:3.0/5.0
Slides: 42
Provided by: giuseppe7
Category:

less

Transcript and Presenter's Notes

Title: CPSC 503 Computational Linguistics


1
CPSC 503Computational Linguistics
  • Parsing
  • Lecture 12
  • Giuseppe Carenini

2
Today 27/2
  • Top-down (TD)
  • Bottom-up (BU)
  • Comparing TD and BU
  • TD depth-first left-to-right
  • Adding BU Filtering
  • The Early Algorithm

3
Parsing with CFGs
  • Valid parse trees
  • Sequence of words

I prefer a morning flight
Parser
CFG
  • Assign valid trees covers all and only the
    elements of the input and has an S at the top

4
Parsing as Search
  • CFG
  • Search space of possible parse trees
  • S -gt NP VP
  • S -gt Aux NP VP
  • NP -gt Det Noun
  • VP -gt Verb
  • Det -gt a
  • Noun -gt flight
  • Verb -gt left
  • Aux -gt do, does
  • defines
  • Parsing find all trees that cover all and only
    the words in the input

5
Constraints on Search
  • Sequence of words
  • Valid parse trees

I prefer a morning flight
Parser
CFG (search space)
  • Search Strategies
  • Top-down or goal-directed
  • Bottom-up or data-directed

6
Top-Down Parsing
  • Since were trying to find trees rooted with an S
    (Sentences) start with the rules that give us an
    S.
  • Then work your way down from there to the words.

7
Next step Top Down Space
  • When POS categories are reached, reject trees
    whose leaves fail to match all words in the input

8
Bottom-Up Parsing
  • Of course, we also want trees that cover the
    input words. So start with trees that link up
    with the words in the right way.
  • Then work your way up from there.

9
Two more steps Bottom-Up Space
10
Top-Down vs. Bottom-Up
  • Top-down
  • Only searches for trees that can be answers
  • But suggests trees that are not consistent with
    the words
  • Bottom-up
  • Only forms trees consistent with the words
  • Suggest trees that make no sense globally

11
So Combine Them
  • Top-down control strategy to generate trees
  • Bottom-up to filter out inappropriate parses
  • Top-down Control strategy
  • Depth vs. Breadth first
  • Which node to try to expand next
  • Which grammar rule to use to expand a node
  • (left-most)
  • (textual order)

12
Top-Down, Depth-First, Left-to-Right Search
Sample sentence Does this flight include a
meal?
13
Example
Does this flight include a meal?
14
Example
Does this flight include a meal?
flight
flight
15
Example
Does this flight include a meal?
flight
flight
16
Adding Bottom-up Filtering
  • The following sequence was a waste of time
    because an NP cannot generate a parse tree
    starting with an AUX

Aux
Aux
Aux
Aux
17
Bottom-Up Filtering
18
Problems with TD-BU-filtering
  • Left recursion
  • Ambiguity
  • Repeated Parsing
  • SOLUTION Earley Algorithm
  • (once again dynamic programming!)

19
(1) Left-Recursion
  • These rules appears in most English grammars
  • S -gt S and S
  • VP -gt VP PP
  • NP -gt NP PP

20
(2) Ambiguity
  • I shot an elephant in my pajamas

21
(3) Repeated Work
  • Parsing is hard, and slow. Its wasteful to redo
    stuff over and over and over.
  • Consider an attempt to top-down parse the
    following as an NP
  • A flight from Indi to Houston on TWA

22
  • starts from.
  • NP -gt Det Nom
  • NP-gt NP PP
  • Nom -gt Noun
  • fails and backtracks

flight
23
  • restarts from.
  • NP -gt Det Nom
  • NP-gt NP PP
  • Nom -gt Noun
  • fails and backtracks

flight
24
  • restarts from.
  • fails and backtracks..

flight
25
  • restarts from.
  • Success!

26
  • 4
  • But.
  • 3
  • 2
  • 1

27
Dynamic Programming
  • Fills tables with solution to subproblems

Parsing sub-trees consistent with the input,
once discovered, are stored and can be reused
  • Does not fall prey to left-recursion
  • Stores ambiguous parse compactly
  • Does not do (avoidable) repeated work

28
Earley Parsing
  • Fills a table in a single sweep over the input
    words
  • Table is length N1 N is number of words
  • Table entries represent
  • Completed constituents and their locations
  • In-progress constituents
  • Predicted constituents

29
States
  • The table-entries are called states and are
    represented with dotted-rules.
  • S -gt ? VP A VP is predicted
  • NP -gt Det ? Nominal An NP is in progress
  • VP -gt V NP ? A VP has been found

30
States/Locations
  • Each state has a location indicating the portion
    of the input it applies to
  • S -gt ? VP 0,0 A VP is predicted at the
    start of the sentence
  • NP -gt Det ? Nominal 1,2 An NP is in progress
    the Det goes from 1 to 2
  • VP -gt V NP ? 0,3 A VP has been found
    starting at 0 and ending at 3

31
Graphically
S -gt ? VP 0,0 NP -gt Det ? Nominal 1,2 VP
-gt V NP ? 0,3
32
Earley answer
  • As with most dynamic programming approaches, the
    answer is found by looking in the table in the
    right place.
  • In this case, the following state should be in
    the final column

S gt ?? 0,n1
  • i.e., an S state the that spans from 0 to n1 and
    is complete.

33
Earley processes
  • So sweep through the table from 0 to n1
  • New predicted states are created
  • E.g., S -gt ? VP 0,0 gt VP -gt ? Verb 0,0
  • New incomplete states are created by advancing
    existing states as new constituents are
    discovered
  • E.g., VP -gt ? Verb NP .. gt VP -gt Verb ? NP
    ..
  • New complete states are created in the same way.
  • E.g., VP -gt Verb ? NP .. gt VP -gt Verb NP ?..

34
Example
Book that flight
  • We should find an S from 0 to 3 that is a
    completed state

35
Example
Book that flight
36
So far only a recognizer
  • To generate all parses
  • When old states waiting for the just completed
    constituent are updated gt add a pointer from
    each updated to completed
  • Then simply read off all the backpointers from
    every complete S in the last column of the table

37
Earley and Left Recursion
  • So Earley solves the left-recursion problem
    without having to alter the grammar or
    artificially limiting the search.
  • Never place a state into the chart thats already
    there
  • Copy states before advancing them

38
Earley and Left Recursion 1
  • S -gt NP VP
  • NP -gt NP PP
  • The first rule predicts
  • S -gt ? NP VP 0,0 that adds
  • NP -gt ? NP PP 0,0
  • stops there since adding any subsequent
    prediction would be fruitless

39
Earley and Left Recursion 2
  • When a state gets advanced make a copy and leave
    the original alone
  • Say we have NP -gt ? NP PP 0,0
  • We find an NP from 0 to 2 so we create
  • NP -gt NP ? PP 0,2
  • But we leave the original state as is

40
Dynamic Programming Approaches
  • Earley
  • Top-down, no filtering, no restriction on grammar
    form
  • CYK
  • Bottom-up, no filtering, grammars restricted to
    Chomsky-Normal Form (CNF)

41
Next Time
  • Read Chpt. 11 Features and Unification
Write a Comment
User Comments (0)
About PowerShow.com