CS 904: Natural Language Processing Probabilistic Parsing - PowerPoint PPT Presentation

1 / 24

About This Presentation

Title:

CS 904: Natural Language Processing Probabilistic Parsing

Description:

Weakening the independence assumptions of PCFGs ... Lexicalization: The PCFG independence assumptions do not take into consideration ... – PowerPoint PPT presentation

Number of Views:145

Avg rating:3.0/5.0

Slides: 25

Provided by: ven7

Category:

more less

Transcript and Presenter's Notes

Title: CS 904: Natural Language Processing Probabilistic Parsing

1
CS 904 Natural Language ProcessingProbabilistic
Parsing

L. Venkata Subramaniam
March 21, 2002

2
Chunking and Grammar Induction

Chunking recognizing higher level units of
structure that allow us to compress our
description of a sentence.
Grammar Induction Explain the structure of
chunks found over different sentences.
Parsing can be considered as implementing
chunking.

3
Parsing for Disambiguation

Probabilities for determining the sentence
choose sequence of words from a word lattice with
highest probability (language model).
Probabilities for speedier parsing prune the
search space of a parser.
Probabilities for choosing between parses choose
most likely among many parses of the input
sentence.

4
Treebanks

A collection of example parses.
A commonly used treebank is the Penn Treebank.
The induction problem is now that of extracting
the grammatical knowledge that is implicit in the
example parses.

5
Parsing Models vs Language Models

Parsing is working out parse trees for a given
sentence according to some grammar G.
In probabilistic parsing, we rank the parses.
Given a probabilistic parsing model, the job of
the parser is to find the most probable parse of
a sentence.
We can also define a language model which assigns
a probability to all trees generated by a grammar.

6
Weakening the independence assumptions of PCFGs

In PCFGs we make a number of independence
assumptions.
Context Humans make wide use of context
Context of who we are talking to, where we are,
prior context of the conversation.
Prior discourse context.
People find semantically intuitive readings for
sentences
We need to incorporate these sources of
information to build better parsers than PCFGs.

7
Weakening the Independence Assumptions (Cont.)

Lexicalization The PCFG independence assumptions
do not take into consideration the particular
words in the sentence.
We need to include more information about the
individual words when making decisions about the
parse tree structure.
Structural Context Certain types have locational
preferences in the parse tree.

8
Tree Probabilities and Derivational Probabilities

In the PCFG case the way we derive (order of
rewriting) the tree does not alter the tree
probability.

9
Probabilistic Left-Corner Grammars

Top-down parsing
Tries to predict the child nodes given knowledge
only of the parent nodes eg. PCFG.
Left corner parsing
Tries to predict the child nodes using left
corner and goal category rather than just parent.
Combination of bottom-up and top-down parsing.

10
Probabilistic LCG (Cont.)

P (C -gt lc, c2,,cn/lc, gc)

P (VP -gt VBD, NP, PP/ VBD, S)
11
Phrase Structure Grammars and Dependency Grammars

In a dependency grammar, one word is the head of
a sentence, and all other words are either a
dependent of that word, or else dependent on some
other word which connects to the head word
through a series of dependencies.
Lexicalized Dependencies between words are taken
care of.
Gives a way of decomposing phrase structure rules.

12
Evaluation

Exact Match Criterion Compare parser performance
with hand parses of sentences give 1 for exact
match and 0 for any mistake.
Parseval Measures Measure based on presicion,
recall and crossing brackets. Not very
discriminating.
Success in real tasks.

13
Equivalent Models

Compare models in terms of what information is
being used to condition the prediction of what.
Improving the Models by
Remembering more of derivational history.
Looking at bigger context in a phrase structure
tree.
Enriching the vocabulary of the tree in
deterministic ways.

14
Search Methods

For certain classes of probabilistic grammars
efficient algorithms to find highest probability
parse in polynomial time exist.
Viterbi Algorithms To store steps in a parse
derivation and extending only those parses that
have higher probability till the current cell in
a tableau.
But for this a one-to-one relationship between
derivations and parses needs to exist.

15
Search Methods (Cont.)

Finding the best parse becomes exponential if no
one-to-one relationship exists between the
derivations and parses.
The Stack Decoding Algo
Uniform cost search algorithm.
Expands the least cost node first.
Beam Search
Only keep and extend best partial results.

16
Search Methods (Cont.)

A Search
Uniform-cost search will expand all partial
derivations a certain distance.
Best-first search algorithm judges which
derivation to expand based on how near to a
complete solution it is.
A does both by working out probability of steps
already taken and optimistically estimating the
probabilty of derivational steps still left to
take.
Optimal and Efficient

17
Non-lexicalized Treebank Grammars

Non-lexicalized parsers operate over word
categories.
Disadvantage Less information
Advantage Easier to build, issues of smoothing
and efficiency less severe.

18
PCFG Estimation from Treebank (Charniak, 1996)

Uses Penn Treebank POS and phrasal categories to
induce a maximum likelihood PCFG
by using the relative frequency of local trees as
the estimates for rules
no attempt to do any smoothing or collapsing of
rules
Works surprisingly well majority of parsing
decisions are mundane and can be handled well by
unlexicalized PCFG.

19
Partially Unsupervised Learning (Pereira and
Schabes, 1992)

The parameter estimation space for
realistic-sized PCFGs is very big.
We try to encourage the probabilities into a good
region in the parameter space.
Begin with a Chomsky normal form grammar with
limited non-terminals and POS tags.
Train on Penn treebank sentences.
ignore the non-terminal labels, but use the
treebank bracketing.
Use a modified Inside-Outside algorithm
constrained to consider parses that do not cross
Penn-Treebank nodes.

20
Data Oriented Parsing

Use whichever fragments of trees appear to be
useful.

21
Data Oriented Parsing (Cont.)

Multiple fundamentally distinct derivations of a
single tree.
Parse using Monte Carlo simulation methods
prob. is estimated by taking random samples of
derivations

22
Lexicalized Grammars

Include more information about the individual
words when making decisions about the parse tree
structure.

23
History Based Grammars (HBG)

All prior parse decisions could influence
following parse decisions in the derivation.
(Black et al. 1993)
Use decision trees to decide which features in
the derivational history were important in
determining the expansion of the current node.
Consider only nodes on a path to the root.

24
Dependency Based Models (Collins, 1996)

Lexicalized Dependency Grammar like framework.
baseNP units, other words and the dependencies
between them are captured.
Dependencies are derived from purely categorical
labels by working directly with the phrase
structures from the Penn Treebank.
Simpler and quickly computable with good
performance.

Write a Comment

User Comments (0)