Title: Ch 12. Probabilistic Parsing
1Ch 12. Probabilistic Parsing
2Contents
- Introduction
- Some Concept
- Parsing for disambiguation
- Treebank
- Weakening the independent assumptions of PCFGs
- Tree probabilities and derivation probabilities
- Phrase structure grammars and dependency grammars
- Evaluation
- Equivalent models
3Introduction
- THE PRACTICE of parsing
- Can be considered as a implementation of the idea
of chunking - Chunking recognizing higher level units of
structure that allow us to compress description
of a sentence - Grammar induction
- One way to capture the regularity of chunks over
different sentences is to learn structure of the
chunks. - But the structure found depend on the implicit
inductive bias of the learning program - Need to get what structure model to find before
starting building it - Decide what we want to do with parsed sentences.
4Introduction
- Goals of parsing
- As a first step toward semantic representation
- Detecting phrasal chunks for indexing in an IR
systems - As a language models
- For this goals..
- The overall goals is to produce a system that can
place a provably useful structure over arbitrary
sentence. Build a parser. - No need to insist that one begins with a tabular
rasa.
5Some concepts.Parsing for disambiguation
- Three way to use probabilities in a parser
- Probabilities for determining the sentence.
- When actual input is uncertain, determine the
probably correct sentence - Refer Figure 12.1
- Probabilities for speedier parsing
- To find the best parse more quickly.
- Probabilities for choosing between parses
- Choose most likely parse among the many parses of
the input string.
6Treebank
- Pure Grammar Induction approach
- Tend not to produce the parse trees that people
want - A approach to this problem
- Give a learning tool some examples of the kinds
of parse trees that are wanted - A collection of such example parses - treebank
- Penn Treebank
- Feature (Refer Figure 12.1)
- Straightforward notation ( Lisp ) via bracketing
- Phrase is fairly flat
- Makes some attempt to indicate grammatical and
semantic functions. - Use a empty nodes
7Treebank
- At Usage of treebank
- Induction Problem of extracting the grammatical
knowledge - Determine a PCFG from a treebank.
- Count the frequencies of local trees and then
normalize these to give probabilities. - Linguist or Grammar.
8Parsing Model Language Model
- The idea of parsing
- Take a sentence s and to work out parse trees for
it according to some grammar G - In probabilistic parsing
- To place a ranking on possible parses showing how
likely each one is. - Or maybe to return the most likely parse of a
sentences - Definition of probabilistic parsing model
- P(t s,G) where S P(t s,G) 1
- t arg maxt P(t s,G)
9Parsing Model Language Model
- Parsing model
- Little odd thing
- Using probabilities conditioned on a particular
sentence. - Generally, need to base probability estimate on
some more general class of data - More usual approach
- By defining a language model, assign a
probability to all trees generated by the grammar
10Parsing Model Language Model
- Language model
- The joint probabilities, p( t,s G)
- if yield (t) s, p ( t G ), otherwise 0
- under such a model,
- p ( t G ), p ( t ) is the probability of a
particular parse of a sentence according to the
grammar G - Overall probability of a sentences
- S tyield (t) ? L p ( t ) 1
- p( s ) S t p ( s, t ) S tyield (t) S
p ( t ) - So,
- t argmaxt p(t s) argmaxt p(t, s) / p(s)
argmaxt p(t, s) - So a language model can be used as a parsing
model for purpose of choosing between parses
11Parsing Model Language Model
- Collins work.
- Language model provides a better foundations for
modeling
12Weakening the independence assumption of PCFGs
- Discourse context
- The prior discourse context will influence our
interpretation of later sentences. - Many source of information are incorporated in
real time while people parse sentence. - PCFGs independence assumption
- None of factors where relevant to the
probabilities of a parse tree. - In fact, all of these source of evidence are
relevant to and might be usable for
disambiguating probabilistic parses. - Collocation more local semantic disambiguation.
- Prior text indication of discourse context
13Lexicalization
- Two weakness of independence assumption
- Lack of lexicalization
- Probabilities must be dependent on structural
context - Lack of lexicalization
- Refer table 12.2
- Need subcategorization frames.
- Suggest need to include more information about
what actual words are, when making decision about
the structure of the parse tree
14Lexicalization
- Lack of lexicalization
- Issue of choosing phrasal attachment positions.
- Lexical content of phrases almost always provides
enough information to decide the correct
attachment site. - But syntactic category of the phrase normally
provides very little information. - Lexicalize CFG
- Refer 12.8
- Having each phrasal node be marked by its head
word - Effective strategy, but..
- Dont consider some dependencies between pairs of
non-heads.
15Probabilities dependent on structural context
- Another weakness..
- PCFGs are also deficient on purely structural
grounds - The assumption of structural context-freeness
remains. ( grammatical assumption.) - Refer to table 12.3
- Refer to table 12.4
- Take a more thorough corpus study to understand
some of the other effects - Pronouns are so infrequent in the second obj
position
16Tree probabilities and derivational probabilities
- Parse tree..
- Can be thought of as a compact Record of a
branching process, conditioned solely on the
label of the node. - Derivation, derivation model
- A sequence of top-down rewrites until one has a
phrase marker all of whose leaf nodes are
terminals. - Refer figure 12.3, (12.11)
- A given parse tree is in terms of the prob of
derivation of it - To estimate probability of a tree. ( refer
(12.12) )
17Tree probabilities and derivational probabilities
- Canonical Derivation
- In many cases, extra complication is unnecessary.
- The choice of derivational order in the PCFG case
makes no difference to the final probabilities. - Final probability for a tree is otherwise the
same - Simplify things by finding a way of choosing a
unique derivation for each tree - Canonical derivation. ( like leftmost derivation
) - p(t) p(d) where d is the canonical derivation
of t - Whether this simplification is possible depends
on nature of the probabilistic condition in the
model.
18Tree probabilities and derivation probabilities
- Derivation model
- Form equivalence classes of history via an
equivalencing function and estimate ( History
based grammar, IBM ) - Frame work includes PCFGs as a specific case.
19Phrasal structure grammars and dependency grammars
- Dependency grammar
- To describe linguistic structure in terms of
dependencies between words - Such a framework is referred to as a dependency
grammar. - In a dependency grammar,
- One world is the head of a sentence.
- All other words are either a dependent of that
word, or else dependent on some other word which
connects to the head word through a sequence of
dependencies.
20Phrasal structure grammars and dependency grammars
- Lauers work
- Relation between Phrasal structure grammar and
dependency grammar. - To disambiguate a compound noun problem.
- Refer (12.23) phrase structure model
- Using corpus-evidence ( collocational bond )
between - pharse structure, structure model
- Refer (12.24) dependency structure
- pharse model, phrase structure
- Dependency model outperforms the adjacency model.
21Phrasal structure grammars and dependency grammars
- Lauers work
- Compare (12.24) with Lexicalized PCFGs model
(12.25) - Under lexcalized PCFGs, Find p(Nx) p(Nv).
- So, decide between the possibilities is by
comparing p(Ny) vs p(Nu) - This is exactly equivalent to comparing the bond
between - pharse model, phrase structure
- Isomorphisms
- between various kinds of dependency grammar and
corresponding types of phrase structure grammars.
22Phrasal structure grammars and dependency grammars
- Two key advantage of dependency grammars.
- Disambiguation decisions are being made directly
in terms of these word dependencies. - because dependency grammars work directly in
terms of these word dependencies. - Give one a way of decomposing phrase structure
rules. And estimate their probabilities. - A problem with induction parser
- Penn Treebank is that there are many of rare
kinds of flat trees with many children because of
its tree is very flat. - In unseen data, encounter yet other such trees
that one has never seen before. - Tries to estimate the prob of a local subtree at
once
23Phrasal structure grammars and dependency grammars
- How a Dependency grammar decompose phrase.
- By estimating the probability of each
head-dependent relationship separately. - Step.
- If we have never seen the local tree in figure
12.5(a) before. - Instead of PCFGs back-off, decompose the tree
into dependencies as (b). - Privide tree like (c) and (d).
- Reasonable estimation is enable.
- But making a further important independence
assumption - Need some system to account for the relative
ordering of dependencies.
24Evaluation
- How to evaluate the success of a statistical
parser - Cross entropy of the model
- Develop for language model
- But.
- Cross entropy or perplexity measures only the
probabilistic weak equivalence of models, and not
the tree structure. - Probabilistically weakly equivalent grammars have
the same cross entropy, but if they are not
strongly equivalent for the task.
25Evaluation
- Parse evaluation
- Ultimate goal is to build a aimed system.
- Task-based Evaluation
- A better way to evaluate parsers is to embed them
in such a larger system and to investigate the
differences. - Tree accuracy ( Exact matching )
- Strictest criterion
- 1 point if the parser gets the completely right,
otherwise 0 - Sensible for criterion to match what ones parser
is maximizing
26Evaluation
- Parse evaluation
- PARSEVAL measure
- Which originate in an attempt to compare the
performance of non-statistical parsers. - Usually been applied in statistically NLP work
- Basic Measure
- Precision, Recall, Labeled Precision, Labeled
Recalls, Crossing Brackets, Crossing Accuracy - Refer figure 12.6, 12.7
27Evaluation
- Problem of PARSEVAL
- PARSEVAL measures are not very discriminating.
- Charniak (96) s vanilla PCFG which ignores all
lexical content - PARSEVAL measure is quite easy at structure of
Penn-treebank. - PARSEVAL measures success at the level of
individual decisions. - At NLP, consecutive decisions is more important
and hard.
28Evaluation
- Penn-treebanks problem
- Too flat.
- Non-standard adjunction structure given to post
noun-head modifiers - PARSEVAL measure seems too harsh.