Title: CS 904: Natural Language Processing Probabilistic Parsing
1CS 904 Natural Language ProcessingProbabilistic
Parsing
- L. Venkata Subramaniam
- March 21, 2002
2Chunking and Grammar Induction
- Chunking recognizing higher level units of
structure that allow us to compress our
description of a sentence. - Grammar Induction Explain the structure of
chunks found over different sentences. - Parsing can be considered as implementing
chunking.
3Parsing for Disambiguation
- Probabilities for determining the sentence
choose sequence of words from a word lattice with
highest probability (language model). - Probabilities for speedier parsing prune the
search space of a parser. - Probabilities for choosing between parses choose
most likely among many parses of the input
sentence.
4Treebanks
- A collection of example parses.
- A commonly used treebank is the Penn Treebank.
- The induction problem is now that of extracting
the grammatical knowledge that is implicit in the
example parses.
5Parsing Models vs Language Models
- Parsing is working out parse trees for a given
sentence according to some grammar G. - In probabilistic parsing, we rank the parses.
- Given a probabilistic parsing model, the job of
the parser is to find the most probable parse of
a sentence. - We can also define a language model which assigns
a probability to all trees generated by a grammar.
6Weakening the independence assumptions of PCFGs
- In PCFGs we make a number of independence
assumptions. - Context Humans make wide use of context
- Context of who we are talking to, where we are,
prior context of the conversation. - Prior discourse context.
- People find semantically intuitive readings for
sentences - We need to incorporate these sources of
information to build better parsers than PCFGs.
7Weakening the Independence Assumptions (Cont.)
- Lexicalization The PCFG independence assumptions
do not take into consideration the particular
words in the sentence. - We need to include more information about the
individual words when making decisions about the
parse tree structure. - Structural Context Certain types have locational
preferences in the parse tree.
8Tree Probabilities and Derivational Probabilities
- In the PCFG case the way we derive (order of
rewriting) the tree does not alter the tree
probability.
9Probabilistic Left-Corner Grammars
- Top-down parsing
- Tries to predict the child nodes given knowledge
only of the parent nodes eg. PCFG. - Left corner parsing
- Tries to predict the child nodes using left
corner and goal category rather than just parent. - Combination of bottom-up and top-down parsing.
10Probabilistic LCG (Cont.)
- P (C -gt lc, c2,,cn/lc, gc)
P (VP -gt VBD, NP, PP/ VBD, S)
11Phrase Structure Grammars and Dependency Grammars
- In a dependency grammar, one word is the head of
a sentence, and all other words are either a
dependent of that word, or else dependent on some
other word which connects to the head word
through a series of dependencies. - Lexicalized Dependencies between words are taken
care of. - Gives a way of decomposing phrase structure rules.
12Evaluation
- Exact Match Criterion Compare parser performance
with hand parses of sentences give 1 for exact
match and 0 for any mistake. - Parseval Measures Measure based on presicion,
recall and crossing brackets. Not very
discriminating. - Success in real tasks.
13Equivalent Models
- Compare models in terms of what information is
being used to condition the prediction of what. - Improving the Models by
- Remembering more of derivational history.
- Looking at bigger context in a phrase structure
tree. - Enriching the vocabulary of the tree in
deterministic ways.
14Search Methods
- For certain classes of probabilistic grammars
efficient algorithms to find highest probability
parse in polynomial time exist. - Viterbi Algorithms To store steps in a parse
derivation and extending only those parses that
have higher probability till the current cell in
a tableau. - But for this a one-to-one relationship between
derivations and parses needs to exist.
15Search Methods (Cont.)
- Finding the best parse becomes exponential if no
one-to-one relationship exists between the
derivations and parses. - The Stack Decoding Algo
- Uniform cost search algorithm.
- Expands the least cost node first.
- Beam Search
- Only keep and extend best partial results.
16Search Methods (Cont.)
- A Search
- Uniform-cost search will expand all partial
derivations a certain distance. - Best-first search algorithm judges which
derivation to expand based on how near to a
complete solution it is. - A does both by working out probability of steps
already taken and optimistically estimating the
probabilty of derivational steps still left to
take. - Optimal and Efficient
17Non-lexicalized Treebank Grammars
- Non-lexicalized parsers operate over word
categories. - Disadvantage Less information
- Advantage Easier to build, issues of smoothing
and efficiency less severe.
18PCFG Estimation from Treebank (Charniak, 1996)
- Uses Penn Treebank POS and phrasal categories to
induce a maximum likelihood PCFG - by using the relative frequency of local trees as
the estimates for rules - no attempt to do any smoothing or collapsing of
rules - Works surprisingly well majority of parsing
decisions are mundane and can be handled well by
unlexicalized PCFG.
19Partially Unsupervised Learning (Pereira and
Schabes, 1992)
- The parameter estimation space for
realistic-sized PCFGs is very big. - We try to encourage the probabilities into a good
region in the parameter space. - Begin with a Chomsky normal form grammar with
limited non-terminals and POS tags. - Train on Penn treebank sentences.
- ignore the non-terminal labels, but use the
treebank bracketing. - Use a modified Inside-Outside algorithm
constrained to consider parses that do not cross
Penn-Treebank nodes.
20Data Oriented Parsing
- Use whichever fragments of trees appear to be
useful.
21Data Oriented Parsing (Cont.)
- Multiple fundamentally distinct derivations of a
single tree. - Parse using Monte Carlo simulation methods
- prob. is estimated by taking random samples of
derivations
22Lexicalized Grammars
- Include more information about the individual
words when making decisions about the parse tree
structure.
23History Based Grammars (HBG)
- All prior parse decisions could influence
following parse decisions in the derivation. - (Black et al. 1993)
- Use decision trees to decide which features in
the derivational history were important in
determining the expansion of the current node. - Consider only nodes on a path to the root.
24Dependency Based Models (Collins, 1996)
- Lexicalized Dependency Grammar like framework.
- baseNP units, other words and the dependencies
between them are captured. - Dependencies are derived from purely categorical
labels by working directly with the phrase
structures from the Penn Treebank. - Simpler and quickly computable with good
performance.