CS 904: Natural Language Processing Probabilistic Parsing - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

CS 904: Natural Language Processing Probabilistic Parsing

Description:

Weakening the independence assumptions of PCFGs ... Lexicalization: The PCFG independence assumptions do not take into consideration ... – PowerPoint PPT presentation

Number of Views:145
Avg rating:3.0/5.0
Slides: 25
Provided by: ven7
Category:

less

Transcript and Presenter's Notes

Title: CS 904: Natural Language Processing Probabilistic Parsing


1
CS 904 Natural Language ProcessingProbabilistic
Parsing
  • L. Venkata Subramaniam
  • March 21, 2002

2
Chunking and Grammar Induction
  • Chunking recognizing higher level units of
    structure that allow us to compress our
    description of a sentence.
  • Grammar Induction Explain the structure of
    chunks found over different sentences.
  • Parsing can be considered as implementing
    chunking.

3
Parsing for Disambiguation
  • Probabilities for determining the sentence
    choose sequence of words from a word lattice with
    highest probability (language model).
  • Probabilities for speedier parsing prune the
    search space of a parser.
  • Probabilities for choosing between parses choose
    most likely among many parses of the input
    sentence.

4
Treebanks
  • A collection of example parses.
  • A commonly used treebank is the Penn Treebank.
  • The induction problem is now that of extracting
    the grammatical knowledge that is implicit in the
    example parses.

5
Parsing Models vs Language Models
  • Parsing is working out parse trees for a given
    sentence according to some grammar G.
  • In probabilistic parsing, we rank the parses.
  • Given a probabilistic parsing model, the job of
    the parser is to find the most probable parse of
    a sentence.
  • We can also define a language model which assigns
    a probability to all trees generated by a grammar.

6
Weakening the independence assumptions of PCFGs
  • In PCFGs we make a number of independence
    assumptions.
  • Context Humans make wide use of context
  • Context of who we are talking to, where we are,
    prior context of the conversation.
  • Prior discourse context.
  • People find semantically intuitive readings for
    sentences
  • We need to incorporate these sources of
    information to build better parsers than PCFGs.

7
Weakening the Independence Assumptions (Cont.)
  • Lexicalization The PCFG independence assumptions
    do not take into consideration the particular
    words in the sentence.
  • We need to include more information about the
    individual words when making decisions about the
    parse tree structure.
  • Structural Context Certain types have locational
    preferences in the parse tree.

8
Tree Probabilities and Derivational Probabilities
  • In the PCFG case the way we derive (order of
    rewriting) the tree does not alter the tree
    probability.

9
Probabilistic Left-Corner Grammars
  • Top-down parsing
  • Tries to predict the child nodes given knowledge
    only of the parent nodes eg. PCFG.
  • Left corner parsing
  • Tries to predict the child nodes using left
    corner and goal category rather than just parent.
  • Combination of bottom-up and top-down parsing.

10
Probabilistic LCG (Cont.)
  • P (C -gt lc, c2,,cn/lc, gc)

P (VP -gt VBD, NP, PP/ VBD, S)
11
Phrase Structure Grammars and Dependency Grammars
  • In a dependency grammar, one word is the head of
    a sentence, and all other words are either a
    dependent of that word, or else dependent on some
    other word which connects to the head word
    through a series of dependencies.
  • Lexicalized Dependencies between words are taken
    care of.
  • Gives a way of decomposing phrase structure rules.

12
Evaluation
  • Exact Match Criterion Compare parser performance
    with hand parses of sentences give 1 for exact
    match and 0 for any mistake.
  • Parseval Measures Measure based on presicion,
    recall and crossing brackets. Not very
    discriminating.
  • Success in real tasks.

13
Equivalent Models
  • Compare models in terms of what information is
    being used to condition the prediction of what.
  • Improving the Models by
  • Remembering more of derivational history.
  • Looking at bigger context in a phrase structure
    tree.
  • Enriching the vocabulary of the tree in
    deterministic ways.

14
Search Methods
  • For certain classes of probabilistic grammars
    efficient algorithms to find highest probability
    parse in polynomial time exist.
  • Viterbi Algorithms To store steps in a parse
    derivation and extending only those parses that
    have higher probability till the current cell in
    a tableau.
  • But for this a one-to-one relationship between
    derivations and parses needs to exist.

15
Search Methods (Cont.)
  • Finding the best parse becomes exponential if no
    one-to-one relationship exists between the
    derivations and parses.
  • The Stack Decoding Algo
  • Uniform cost search algorithm.
  • Expands the least cost node first.
  • Beam Search
  • Only keep and extend best partial results.

16
Search Methods (Cont.)
  • A Search
  • Uniform-cost search will expand all partial
    derivations a certain distance.
  • Best-first search algorithm judges which
    derivation to expand based on how near to a
    complete solution it is.
  • A does both by working out probability of steps
    already taken and optimistically estimating the
    probabilty of derivational steps still left to
    take.
  • Optimal and Efficient

17
Non-lexicalized Treebank Grammars
  • Non-lexicalized parsers operate over word
    categories.
  • Disadvantage Less information
  • Advantage Easier to build, issues of smoothing
    and efficiency less severe.

18
PCFG Estimation from Treebank (Charniak, 1996)
  • Uses Penn Treebank POS and phrasal categories to
    induce a maximum likelihood PCFG
  • by using the relative frequency of local trees as
    the estimates for rules
  • no attempt to do any smoothing or collapsing of
    rules
  • Works surprisingly well majority of parsing
    decisions are mundane and can be handled well by
    unlexicalized PCFG.

19
Partially Unsupervised Learning (Pereira and
Schabes, 1992)
  • The parameter estimation space for
    realistic-sized PCFGs is very big.
  • We try to encourage the probabilities into a good
    region in the parameter space.
  • Begin with a Chomsky normal form grammar with
    limited non-terminals and POS tags.
  • Train on Penn treebank sentences.
  • ignore the non-terminal labels, but use the
    treebank bracketing.
  • Use a modified Inside-Outside algorithm
    constrained to consider parses that do not cross
    Penn-Treebank nodes.

20
Data Oriented Parsing
  • Use whichever fragments of trees appear to be
    useful.

21
Data Oriented Parsing (Cont.)
  • Multiple fundamentally distinct derivations of a
    single tree.
  • Parse using Monte Carlo simulation methods
  • prob. is estimated by taking random samples of
    derivations

22
Lexicalized Grammars
  • Include more information about the individual
    words when making decisions about the parse tree
    structure.

23
History Based Grammars (HBG)
  • All prior parse decisions could influence
    following parse decisions in the derivation.
  • (Black et al. 1993)
  • Use decision trees to decide which features in
    the derivational history were important in
    determining the expansion of the current node.
  • Consider only nodes on a path to the root.

24
Dependency Based Models (Collins, 1996)
  • Lexicalized Dependency Grammar like framework.
  • baseNP units, other words and the dependencies
    between them are captured.
  • Dependencies are derived from purely categorical
    labels by working directly with the phrase
    structures from the Penn Treebank.
  • Simpler and quickly computable with good
    performance.
Write a Comment
User Comments (0)
About PowerShow.com