Probabilistic Context Free Grammar - PowerPoint PPT Presentation

1 / 59
About This Presentation
Title:

Probabilistic Context Free Grammar

Description:

Probabilistic Context Free Grammar. Language structure is not linear ... For example, Dutch. Dutch oddities. Dat Jan Marie Pieter Arabisch laat zien schrijven ... – PowerPoint PPT presentation

Number of Views:347
Avg rating:3.0/5.0
Slides: 60
Provided by: csta3
Category:

less

Transcript and Presenter's Notes

Title: Probabilistic Context Free Grammar


1
Probabilistic Context Free Grammar
2
Language structure is not linear
  • The velocity of seismic waves rises to

3
Context free grammars a reminder
  • A CFG G consists of -
  • A set of terminals wk, k1, , V
  • A set of nonterminals Ni, i1, , n
  • A designated start symbol, N1
  • A set of rules, Ni?pj (where pj is a sequence
    of terminals and nonterminals)

4
A very simple example
  • Gs rewrite rules
  • S?aSb
  • S?ab
  • Possible derivations
  • S?aSb?aabb
  • S?aSb?aaSbb?aaabbb
  • In general, G creates the language anbn

5
Modeling natural language
  • G is given by the rewrite rules
  • S?NP VP
  • NP?the N a N
  • N?man boy dog
  • VP?V NP
  • V?saw heard sensed sniffed

6
Recursion can be included
  • G is given by the rewrite rules
  • S?NP VP
  • NP?the N a N
  • N?man CP boy CP dog CP
  • VP?V NP
  • V?saw heard sensed sniffed
  • CP?that VP e

7
Probabilistic Context Free Grammars
  • A PCFG G consists of
  • A set of terminals wk, k1, , V
  • A set of nonterminals Ni, i1, , n
  • A designated start symbol, N1
  • A set of rules, Ni?pj (where pj is a sequence
    of terminals and nonterminals)
  • A corresponding set of probabilities on rules

8
Example
9
astronomers saw stars with ears
  • P(t1) 0.0009072

10
astronomers saw stars with ears
  • P(t2) 0.0006804
  • P(w15) P(t1)P(t2) 0.0015876

11
Training PCFGs
  • Given a corpus, its possible to estimate rule
    probabilities to maximize its likelihood
  • This is regarded a form of grammar induction
  • However the rules of the grammar must be pre-given

12
Questions for PCFGs
  • What is the probability of a sentence w1n given a
    grammar G P(w1nG)?
  • Calculated using dynamic programming
  • What is the most likely parse for a given
    sentence argmaxtP(tw1n, G)
  • Likewise
  • How can we choose rule probabilities for the
    grammar G that maximize the probability of a
    given corpus?
  • The inside-outside algorithm

13
Chomsky Normal Form
  • We will be dealing only with PCFGs of the
    above-mentioned form
  • That means that there are exactly two types of
    rules
  • Ni?NjNk
  • Ni?wj

14
Estimating string probability
  • Define inside probabilities
  • We would like to calculate
  • A dynamic programming algorithm
  • Base step

15
Estimating string probability
  • Induction step

16
Drawbacks of PCFGs
  • Do not factor in lexical co-occurrence
  • Rewrite rules must be pre-given according to
    human intuitions
  • The ATIS-CFG fiasco
  • The capacity of PCFG to determine the most likely
    parse is very limited
  • As grammars grow larger, they become increasingly
    ambiguous
  • The following sentences look the same to a PCFG,
    although suggest different parses
  • I saw the boat with the telescope
  • I saw the man with the scar

17
PCFGs some more drawbacks
  • Have some inappropriate biases
  • In general, the probability of a smaller tree
    will be larger than a larger one
  • Most frequent length for Wall Street Journal
    sentences is around 23 words
  • Training is slow and problematic
  • Converges to a local optimum
  • Non-terminals do not always resemble true
    syntactic classes

18
PCFGs and language models
  • Because they ignore lexical co-occurrence, PCFGs
    are not good as language models
  • However, some work has been done on combining
    PCFGs with n-gram models
  • PCFGs modeled long-range syntactic constraints
  • Performance generally improved

19
Is natural language a CFG?
  • There is an on-going debate on the CFGness of
    English
  • There are some languages that can be shown to be
    more complex than CFGs
  • For example, Dutch

20
Dutch oddities
  • Dat Jan Marie Pieter Arabisch laat zien schrijven
  • THAT JAN MARIE PIETER ARABIC LET SEE WRITE
  • that Jan Let Marie see Pieter write Arabic
  • However, from a purely syntactic view point, this
    is just dat PnVn

21
Other languages
  • Bambara (Malinese language) has non-CF features,
    in the form of AnBmCnDm
  • Swiss-German as well
  • However, CFGs seem to be a good approximation for
    most phenomena in most languages

22
Grammar Induction
  • With ADIOS
  • (Automatic DIstillation Of Structure)

23
Previous work
  • Probabilistic Context Free Grammars
  • Supervised induction methods
  • Little work on raw data
  • Mostly work on artificial CFGs
  • Clustering

24
Our goal
  • Given a corpus of raw text separated into
    sentences, we want to derive a specification of
    the underlying grammar
  • This means we want to be able to
  • Create new unseen grammatically correct sentences
  • Accept new unseen grammatically correct sentences
    and reject ungrammatical ones

25
What do we need to do?
  • G is given by the rewrite rules
  • S?NP VP
  • NP?the N a N
  • N?man boy dog
  • VP?V NP
  • V?saw heard sensed sniffed

26
ADIOS in outline
  • Composed of three main elements
  • A representational data structure
  • A segmentation criterion (MEX)
  • A generalization ability
  • We will consider each of these in turn

27
The Model Graph representation with words as
vertices and sentences as paths.
Is that a dog?
Is that a cat?
Where is the dog?
And is that a horse?
28
ADIOS in outline
  • Composed of three main elements
  • A representational data structure
  • A segmentation criterion (MEX)
  • A generalization ability

29
Toy problem Alice in Wonderland
a l i c e w a s b e g i n n i n g t o g e t v e r
y t i r e d o f s i t t i n g b y h e r s i s t e
r o n t h e b a n k a n d o f h a v i n g n o t h
i n g t o d o o n c e o r t w i c e s h e h a d p
e e p e d i n t o t h e b o o k h e r s i s t e r
w a s r e a d i n g b u t i t h a d n o p i c t u
r e s o r c o n v e r s a t i o n s i n i t a n d
w h a t i s t h e u s e o f a b o o k t h o u g h
t a l i c e w i t h o u t p i c t u r e s o r c o
n v e r s a t i o n
30
Detecting significant patterns
  • Identifying patterns becomes easier on a graph
  • Sub-paths are automatically aligned

31
Motif EXtraction
32
The Markov Matrix
  • The top right triangle defines the PL
    probabilities, bottom left triangle the PR
    probabilities
  • Matrix is path-dependent

33
(No Transcript)
34
Example of a probability matrix
35
Rewiring the graph
Once a pattern is identified as significant, the
sub-paths it subsumes are merged into a new
vertex and the graph is rewired accordingly.
Repeating this process, leads to the formation of
complex, hierarchically structured patterns.
36
MEX at work
37
ALICE motifs
38
ADIOS in outline
  • Composed of three main elements
  • A representational data structure
  • A segmentation criterion (MEX)
  • A generalization ability

39
Generalization
40
Bootstrapping
41
Determining L
  • Involves a tradeoff
  • Larger L will demand more context sensitivity in
    the inference
  • Will hamper generalization
  • Smaller L will detect more patterns
  • But many might be spurious

42
The ADIOS algorithm
  • Initialization load all data into a pseudograph
  • Until no more patterns are found
  • For each path P
  • Create generalized search paths from P
  • Detect significant patterns using MEX
  • If found, add best new pattern and equivalence
    classes and rewire the graph

43
The Model The training process
44
567
321
120
132
234
621
987
2000
567
120
132
621
1203
321
1203
321
1204
2001
987
1204
1205
45
1205
987
1204
2001
321
321
1203
567
120
132
621
2000
567
321
120
132
234
621
987
46
Example
47
More Patterns
48
Evaluating performance
  • In principle, we would like to compare
    ADIOS-generated parse-trees with the true
    parse-trees for given sentences
  • Alas, the true parse-trees are subject to
    opinion
  • Some approaches dont even suppose parse trees

49
Evaluating performance
  • Define
  • Recall the probability of ADIOS recognizing an
    unseen grammatical sentence
  • Precision the proportion of grammatical ADIOS
    productions
  • Recall can be assessed by leaving out some of the
    training corpus
  • Precision is trickier
  • Unless were learning a known CFG

50
The ATIS experiments
  • ATIS-NL is a 13,043 sentence corpus of natural
    language
  • Transcribed phone calls to an airline reservation
    service
  • ADIOS was trained on 12,700 sentences of ATIS-NL
  • The remaining 343 sentences were used to assess
    recall
  • Precision was determined with the help of 8
    graduate students from Cornell University

51
The ATIS experiments
  • ADIOS performance scores
  • Recall 40
  • Precision 70
  • For comparison, ATIS-CFG reached
  • Recall 45
  • Precision - lt1(!)

52
ADIOS/ATIS-N comparison
53
An ADIOS drawback
  • ADIOS is inherently a heuristic and greedy
    algorithm
  • Once a pattern is created it remains forever
    errors conflate
  • Sentence ordering affects outcome
  • Running ADIOS with different orderings gives
    patterns that cover different parts of the
    grammar

54
An ad-hoc solution
  • Train multiple learners on the corpus
  • Each on a different sentence ordering
  • Create a forest of learners
  • To create a new sentence
  • Pick one learner at random
  • Use it to produce sentence
  • To check grammaticality of given sentence
  • If any learner accepts sentence, declare as
    grammatical

55
The effects of context window width
56
Meta-analysis of ADIOS results
  • Define a pattern spectrum as the histogram of
    pattern types for an individual learner
  • A pattern type is determined by its contents
  • E.g. TT, TET, EE, PE
  • A single ADIOS learner was trained with each of 6
    translations of the bible

57
Pattern spectra
58
Language dendogram
59
To be continued
Write a Comment
User Comments (0)
About PowerShow.com