Probabilistic Context Free Grammar - PowerPoint PPT Presentation

1 / 59

About This Presentation

Title:

Probabilistic Context Free Grammar

Description:

Probabilistic Context Free Grammar. Language structure is not linear ... For example, Dutch. Dutch oddities. Dat Jan Marie Pieter Arabisch laat zien schrijven ... – PowerPoint PPT presentation

Number of Views:347

Avg rating:3.0/5.0

Slides: 60

Provided by: csta3

Category:

more less

Transcript and Presenter's Notes

Title: Probabilistic Context Free Grammar

1
Probabilistic Context Free Grammar
2
Language structure is not linear

The velocity of seismic waves rises to

3
Context free grammars a reminder

A CFG G consists of -
A set of terminals wk, k1, , V
A set of nonterminals Ni, i1, , n
A designated start symbol, N1
A set of rules, Ni?pj (where pj is a sequence
of terminals and nonterminals)

4
A very simple example

Gs rewrite rules
S?aSb
S?ab
Possible derivations
S?aSb?aabb
S?aSb?aaSbb?aaabbb
In general, G creates the language anbn

5
Modeling natural language

G is given by the rewrite rules
S?NP VP
NP?the N a N
N?man boy dog
VP?V NP
V?saw heard sensed sniffed

6
Recursion can be included

G is given by the rewrite rules
S?NP VP
NP?the N a N
N?man CP boy CP dog CP
VP?V NP
V?saw heard sensed sniffed
CP?that VP e

7
Probabilistic Context Free Grammars

A PCFG G consists of
A set of terminals wk, k1, , V
A set of nonterminals Ni, i1, , n
A designated start symbol, N1
A set of rules, Ni?pj (where pj is a sequence
of terminals and nonterminals)
A corresponding set of probabilities on rules

8
Example
9
astronomers saw stars with ears

P(t1) 0.0009072

10
astronomers saw stars with ears

P(t2) 0.0006804
P(w15) P(t1)P(t2) 0.0015876

11
Training PCFGs

Given a corpus, its possible to estimate rule
probabilities to maximize its likelihood
This is regarded a form of grammar induction
However the rules of the grammar must be pre-given

12
Questions for PCFGs

What is the probability of a sentence w1n given a
grammar G P(w1nG)?
Calculated using dynamic programming
What is the most likely parse for a given
sentence argmaxtP(tw1n, G)
Likewise
How can we choose rule probabilities for the
grammar G that maximize the probability of a
given corpus?
The inside-outside algorithm

13
Chomsky Normal Form

We will be dealing only with PCFGs of the
above-mentioned form
That means that there are exactly two types of
rules
Ni?NjNk
Ni?wj

14
Estimating string probability

Define inside probabilities
We would like to calculate
A dynamic programming algorithm
Base step

15
Estimating string probability

Induction step

16
Drawbacks of PCFGs

Do not factor in lexical co-occurrence
Rewrite rules must be pre-given according to
human intuitions
The ATIS-CFG fiasco
The capacity of PCFG to determine the most likely
parse is very limited
As grammars grow larger, they become increasingly
ambiguous
The following sentences look the same to a PCFG,
although suggest different parses
I saw the boat with the telescope
I saw the man with the scar

17
PCFGs some more drawbacks

Have some inappropriate biases
In general, the probability of a smaller tree
will be larger than a larger one
Most frequent length for Wall Street Journal
sentences is around 23 words
Training is slow and problematic
Converges to a local optimum
Non-terminals do not always resemble true
syntactic classes

18
PCFGs and language models

Because they ignore lexical co-occurrence, PCFGs
are not good as language models
However, some work has been done on combining
PCFGs with n-gram models
PCFGs modeled long-range syntactic constraints
Performance generally improved

19
Is natural language a CFG?

There is an on-going debate on the CFGness of
English
There are some languages that can be shown to be
more complex than CFGs
For example, Dutch

20
Dutch oddities

Dat Jan Marie Pieter Arabisch laat zien schrijven
THAT JAN MARIE PIETER ARABIC LET SEE WRITE
that Jan Let Marie see Pieter write Arabic
However, from a purely syntactic view point, this
is just dat PnVn

21
Other languages

Bambara (Malinese language) has non-CF features,
in the form of AnBmCnDm
Swiss-German as well
However, CFGs seem to be a good approximation for
most phenomena in most languages

22
Grammar Induction

With ADIOS
(Automatic DIstillation Of Structure)

23
Previous work

Probabilistic Context Free Grammars
Supervised induction methods
Little work on raw data
Mostly work on artificial CFGs
Clustering

24
Our goal

Given a corpus of raw text separated into
sentences, we want to derive a specification of
the underlying grammar
This means we want to be able to
Create new unseen grammatically correct sentences
Accept new unseen grammatically correct sentences
and reject ungrammatical ones

25
What do we need to do?

G is given by the rewrite rules
S?NP VP
NP?the N a N
N?man boy dog
VP?V NP
V?saw heard sensed sniffed

26
ADIOS in outline

Composed of three main elements
A representational data structure
A segmentation criterion (MEX)
A generalization ability
We will consider each of these in turn

27
The Model Graph representation with words as
vertices and sentences as paths.
Is that a dog?
Is that a cat?
Where is the dog?
And is that a horse?
28
ADIOS in outline

Composed of three main elements
A representational data structure
A segmentation criterion (MEX)
A generalization ability

29
Toy problem Alice in Wonderland
a l i c e w a s b e g i n n i n g t o g e t v e r
y t i r e d o f s i t t i n g b y h e r s i s t e
r o n t h e b a n k a n d o f h a v i n g n o t h
i n g t o d o o n c e o r t w i c e s h e h a d p
e e p e d i n t o t h e b o o k h e r s i s t e r
w a s r e a d i n g b u t i t h a d n o p i c t u
r e s o r c o n v e r s a t i o n s i n i t a n d
w h a t i s t h e u s e o f a b o o k t h o u g h
t a l i c e w i t h o u t p i c t u r e s o r c o
n v e r s a t i o n
30
Detecting significant patterns

Identifying patterns becomes easier on a graph
Sub-paths are automatically aligned

31
Motif EXtraction
32
The Markov Matrix

The top right triangle defines the PL
probabilities, bottom left triangle the PR
probabilities
Matrix is path-dependent

33
(No Transcript)
34
Example of a probability matrix
35
Rewiring the graph
Once a pattern is identified as significant, the
sub-paths it subsumes are merged into a new
vertex and the graph is rewired accordingly.
Repeating this process, leads to the formation of
complex, hierarchically structured patterns.
36
MEX at work
37
ALICE motifs
38
ADIOS in outline

Composed of three main elements
A representational data structure
A segmentation criterion (MEX)
A generalization ability

39
Generalization
40
Bootstrapping
41
Determining L

Involves a tradeoff
Larger L will demand more context sensitivity in
the inference
Will hamper generalization
Smaller L will detect more patterns
But many might be spurious

42
The ADIOS algorithm

Initialization load all data into a pseudograph
Until no more patterns are found
For each path P
Create generalized search paths from P
Detect significant patterns using MEX
If found, add best new pattern and equivalence
classes and rewire the graph

43
The Model The training process
44
567
321
120
132
234
621
987
2000
567
120
132
621
1203
321
1203
321
1204
2001
987
1204
1205
45
1205
987
1204
2001
321
321
1203
567
120
132
621
2000
567
321
120
132
234
621
987
46
Example
47
More Patterns
48
Evaluating performance

In principle, we would like to compare
ADIOS-generated parse-trees with the true
parse-trees for given sentences
Alas, the true parse-trees are subject to
opinion
Some approaches dont even suppose parse trees

49
Evaluating performance

Define
Recall the probability of ADIOS recognizing an
unseen grammatical sentence
Precision the proportion of grammatical ADIOS
productions
Recall can be assessed by leaving out some of the
training corpus
Precision is trickier
Unless were learning a known CFG

50
The ATIS experiments

ATIS-NL is a 13,043 sentence corpus of natural
language
Transcribed phone calls to an airline reservation
service
ADIOS was trained on 12,700 sentences of ATIS-NL
The remaining 343 sentences were used to assess
recall
Precision was determined with the help of 8
graduate students from Cornell University

51
The ATIS experiments

ADIOS performance scores
Recall 40
Precision 70
For comparison, ATIS-CFG reached
Recall 45
Precision - lt1(!)

52
ADIOS/ATIS-N comparison
53
An ADIOS drawback

ADIOS is inherently a heuristic and greedy
algorithm
Once a pattern is created it remains forever
errors conflate
Sentence ordering affects outcome
Running ADIOS with different orderings gives
patterns that cover different parts of the
grammar

54
An ad-hoc solution

Train multiple learners on the corpus
Each on a different sentence ordering
Create a forest of learners
To create a new sentence
Pick one learner at random
Use it to produce sentence
To check grammaticality of given sentence
If any learner accepts sentence, declare as
grammatical

55
The effects of context window width
56
Meta-analysis of ADIOS results

Define a pattern spectrum as the histogram of
pattern types for an individual learner
A pattern type is determined by its contents
E.g. TT, TET, EE, PE
A single ADIOS learner was trained with each of 6
translations of the bible

57
Pattern spectra
58
Language dendogram
59
To be continued

Write a Comment

User Comments (0)