Jason EisnerNoah A. Smith - PowerPoint PPT Presentation

About This Presentation
Title:

Jason EisnerNoah A. Smith

Description:

Competitive Grammar Writing VP Jason Eisner Noah A. Smith Johns Hopkins Carnegie Mellon * * 7. Winners announced * 7. Winners announced Of course, no one finishes ... – PowerPoint PPT presentation

Number of Views:71
Avg rating:3.0/5.0
Slides: 39
Provided by: JasonEisn1
Learn more at: https://www.cs.jhu.edu
Category:

less

Transcript and Presenter's Notes

Title: Jason EisnerNoah A. Smith


1
Competitive Grammar Writing
VP
  • Jason Eisner Noah A. Smith
  • Johns Hopkins Carnegie Mellon

2
Tree structure
  • N Noun
  • V Verb
  • P Preposition
  • D Determiner
  • R Adverb

3
Tree structure
  • N Noun
  • V Verb
  • P Preposition
  • D Determiner
  • R Adverb
  • NP Noun phrase
  • VP Verb phrase
  • PP Prepositional phrase
  • S Sentence

4
Generative Story PCFG
  • Given a set of symbols (phrase types)
  • Start with S at the root
  • Each symbol randomly generates 2 child symbols,
    or 1 word
  • Our job (maybe) Learn these probabilities

S
p(NP VP S)
5
Context-Freeness of Model
  • In a PCFG, the string generated under NP doesnt
    depend on the context of the NP.
  • All NPs are interchangeable.

S
PP
P
with
hates
peas
quite
violently
6
Inside vs. Outside
  • This NP is good because the inside string looks
    like a NP

S
NP
7
Inside vs. Outside
  • This NP is good because the inside string looks
    like a NP
  • and because the outside context looks like it
    expects a NP.
  • These work together in global inference, and
    could help train each other during learning (cf.
    Cucerzan Yarowsky 2002).

S
NP
8
Inside vs. Outside
  • This NP is good because the inside string looks
    like a NP
  • and because the outside context looks like it
    expects a NP.
  • These work together in global inference, and
    could help train each other during learning (cf.
    Cucerzan Yarowsky 2002).

9
Inside vs. Outside
  • This NP is good because the inside string looks
    like a NP
  • and because the outside context looks like it
    expects a NP.
  • These work together in global inference, and
    could help train each other during learning (cf.
    Cucerzan Yarowsky 2002).

10
1. Welcome to the lab exercise!
  • Please form teams of 3 people
  • Programmers, get a linguist on your team
  • And vice-versa
  • Undergrads, get a grad student on your team
  • And vice-versa

11
2. Okay, team, please log in
  • The 3 of you should use adjacent workstations
  • Log in as individuals
  • Your secret team directory
  • cd /03-turbulent-kiwi
  • You can all edit files there
  • Publicly readable writeable
  • No one else knows the secret directory name

12
3. Now write a grammar of English
  • You have 2 hours. ?

13
3. Now write a grammar of English
Heres one to start with.
  • 1 S1 ? NP VP .
  • 1 VP ? VerbT NP
  • 20 NP ? Det N
  • 1 NP ? Proper
  • 20 N ? Noun
  • 1 N ? N PP
  • 1 PP ? Prep NP
  • You have 2 hours. ?

14
3. Now write a grammar of English
Plus initial terminal rules.
Heres one to start with.
  • 1 S1 ? NP VP .
  • 1 VP ? VerbT NP
  • 20 NP ? Det N
  • 1 NP ? Proper
  • 20 N ? Noun
  • 1 N ? N PP
  • 1 PP ? Prep NP
  • 1 Noun ? castle
  • 1 Noun ? king
  • 1 Proper ? Arthur
  • 1 Proper ? Guinevere
  • 1 Det ? a
  • 1 Det ? every
  • 1 VerbT ? covers
  • 1 VerbT ? rides
  • 1 Misc ? that
  • 1 Misc ? bloodier
  • 1 Misc ? does

15
3. Now write a grammar of English
Heres one to start with.
  • 1 S1 ? NP VP .
  • 1 VP ? VerbT NP
  • 20 NP ? Det N
  • 1 NP ? Proper
  • 20 N ? Noun
  • 1 N ? N PP
  • 1 PP ? Prep NP

S1
1
16
3. Now write a grammar of English
Heres one to start with.
  • 1 S1 ? NP VP .
  • 1 VP ? VerbT NP
  • 20 NP ? Det N
  • 1 NP ? Proper
  • 20 N ? Noun
  • 1 N ? N PP
  • 1 PP ? Prep NP

S1
NP
VP
.
17
3. Now write a grammar of English
Heres one to start with.
  • 1 S1 ? NP VP .
  • 1 VP ? VerbT NP
  • 20 NP ? Det N
  • 1 NP ? Proper
  • 20 N ? Noun
  • 1 N ? N PP
  • 1 PP ? Prep NP

S1
NP
VP
.
18
4. Okay go!
19
4. Okay go!
5. Evaluation procedure
  • Well sample 20 random sentences from your PCFG.
  • Human judges will vote on whether each sentence
    is grammatical.
  • By the way, yall will be the judges
    (double-blind).
  • You probably want to use the sampling script to
    keep testing your grammar along the way.

20
5. Evaluation procedure
  • Well sample 20 random sentences from your PCFG.
  • Human judges will vote on whether each sentence
    is grammatical.
  • Youre right This only tests precision.
  • How about recall?

Ok, were done! All our sentences are already
grammatical.
21
Development set
  • You might want your grammar to generate
  • Arthur is the king .
  • Arthur rides the horse near the castle .
  • riding to Camelot is hard .
  • do coconuts speak ?
  • what does Arthur ride ?
  • who does Arthur suggest she carry ?
  • why does England have a king ?
  • are they suggesting Arthur ride to Camelot ?
  • five strangers are at the Round Table .
  • Guinevere might have known .
  • Guinevere should be riding with Patsy .
  • it is Sir Lancelot who knows Zoot !
  • either Arthur knows or Patsy does .
  • neither Sir Lancelot nor Guinevere will speak of
    it .

We provide a file of 27 sample sentences illustrat
ing a range of grammatical phenomena
questions, movement, (free) relatives,
clefts, agreement, subcat frames, conjunctions,
auxiliaries, gerunds, sentential subjects,
appositives
22
Development set
  • You might want your grammar to generate
  • the Holy Grail was covered by a yellow fruit .
  • Zoot might have been carried by a swallow .
  • Arthur rode to Camelot and drank from his chalice
    .
  • they migrate precisely because they know they
    will grow .
  • do not speak !
  • Arthur will have been riding for eight nights .
  • Arthur , sixty inches , is a tiny king .
  • Arthur knows Patsy , the trusty servant .
  • Arthur and Guinevere migrate frequently .
  • he knows what they are covering with that story .
  • Arthur suggested that the castle be carried .
  • the king drank to the castle that was his home .
  • when the king drinks , Patsy drinks .

questions, movement, (free) relatives,
clefts, agreement, subcat frames, conjunctions,
auxiliaries, gerunds, sentential subjects,
appositives
23
5. Evaluation of recall
( productivity!!)
What we could have done
Cross-entropy on a similar, held-out
test set
  • every coconut of his that the swallow dropped
    sounded like a horse .

24
5. Evaluation of recall
( productivity!!)
What we could have done
Cross-entropy on a similar, held-out
test set
What well actually do, to heighten competition
creativity Test set comes from the participants!
You should try to generate sentences that your
opponents cant parse.
25
Initial terminal rules
  • 1 Noun castle
  • 1 Noun king
  • 1 Proper Arthur
  • 1 Proper Guinevere
  • 1 Det a
  • 1 Det every
  • 1 VerbT covers
  • 1 VerbT rides
  • 1 Misc that
  • 1 Misc bloodier
  • 1 Misc does

The initial grammar sticks to 3rd-person singular
transitive present-tense forms. All
grammatical. But we provide 183 Misc words (not
accessible from initial grammar) that youre free
to work into your grammar
26
Initial terminal rules
  • 1 Misc that
  • 1 Misc bloodier
  • 1 Misc does

The initial grammar sticks to 3rd-person singular
transitive present-tense forms. All
grammatical. But we provide 183 Misc words (not
accessible from initial grammar) that youre free
to work into your grammar
pronouns (various cases), plurals, various verb
forms, non-transitive verbs, adjectives (various
forms), adverbs negation, conjunctions
punctuation, wh-words,
27
5. Evaluation of recall
( productivity!!)
What we could have done (good for your
class?) Cross-entropy on a similar, held-out
test set
What we actually did, to heighten competition
creativity Test set comes from the participants!
In Boggle, you get points for finding words that
your opponents dont find.
You should try to generate sentences that your
opponents cant parse.
28
5. Evaluation of recall
( productivity!!)
What we could have done (good for your
class?) Cross-entropy on a similar, held-out
test set
What we actually did, to heighten competition
creativity Test set comes from the participants!
Well score your cross-entropywhen you try to
parse the sentences that the other teams
generate. (Only the ones judged grammatical.)
You should try to generate sentences that your
opponents cant parse.
  • You probably want to use the parsing script to
    keep testing your grammar along the way.

29
5. Evaluation of recall
( productivity!!)
What we could have done (you could
too) Cross-entropy on a similar, held-out test
set
What we actually did, to heighten competition
creativity Test set comes from the participants!
Well score your cross-entropywhen you try to
parse the sentences that the other teams
generate. (Only the ones judged grammatical.)
What if my grammar cant parseone of the
testsentences?
So dont do that.
30
Use a backoff grammar
Bigram POS HMM
Initial backoff grammar
i.e., something that starts with a Verb
_Verb
i.e., something that starts with a Misc
Verb
_Misc
. . .
Misc
31
Use a backoff grammar
Bigram POS HMM
Init. linguistic grammar
Initial backoff grammar
  • S1 ? NP VP .
  • VP ? VerbT NP
  • NP ? Det N
  • NP ? Proper
  • N ? Noun
  • N ? N PP
  • PP ? Prep NP

32
Use a backoff grammar
Bigram POS HMM
Choose these weights wisely!
Mixturemodel
Init. linguistic grammar
Initial backoff grammar
  • S1 ? NP VP .
  • VP ? VerbT NP
  • NP ? Det N
  • NP ? Proper
  • N ? Noun
  • N ? N PP
  • PP ? Prep NP

33
6. Discussion
  • What did you do? How?
  • Was CFG expressive enough?
  • How would you improve the formalism?
  • Would it work for other languages?
  • How should one pick the weights?
  • And how could you build a better backoff grammar?
  • Is grammaticality well-defined? How is it
    related to probability?
  • What if you had 36 person-months to do it right?
  • What other tools or data do you need?
  • What would the resulting grammar be good for?
  • What evaluation metrics are most important?

features, gapping
34
7. Winners announced
35
7. Winners announced
  • Of course, no one finishes their ambitious plans.
  • Alternative Allow 2 weeks (see paper)

36
What did past teams do?
  • More fine-grained parts of speech
  • do-support for questions negation
  • Movement using gapped categories
  • X-bar categories (following the initial grammar)
  • Singular/plural features
  • Pronoun case
  • Verb forms
  • Verb subcategorization selectional restrictions
    (location)
  • Comparative vs. superlative adjectives
  • Appositives (must avoid double comma)
  • A bit of experimentation with weights
  • One successful attempt to game scoring system (ok
    with us!)

37
Why do we recommend this lesson?
  • Good opening activity
  • Good opening activity
  • Introduces many topics touchstone for later
    teaching
  • Grammaticality
  • Grammaticality judgments, formal grammars,
    parsers
  • Specific linguistic phenomena
  • Desperate need for features, morphology,
    gap-passing
  • Generative probability models PCFGs and HMMs
  • Backoff, inside probability, random sampling,
  • Recovering latent variables Parse trees and POS
    taggings
  • Evaluation (sort of)
  • Annotation, precision, recall, cross-entropy,
  • Manual parameter tuning
  • Why learning would be valuable, alongside expert
    knowledge

http//www.clsp.jhu.edu/grammar-writing
38
A final thought
  • The CS curriculum starts with programming
  • Accessible and hands-on
  • Necessary to motivate or understand much of CS
  • In CL, the equivalent is grammar writing
  • It was the traditional (pre-statistical)
    introduction
  • Our contributions competitive game, statistics,
    finite-state backoff, reusable instructional
    materials
  • Much of CL work still centers around grammar
    formalisms
  • We design expressive formalisms for linguistic
    data
  • Solve linguistic problems within these formalisms
  • Enrich them with probabilities
  • Process them with algorithms
  • Learn them from data
  • Connect them to other modules in the pipeline
Write a Comment
User Comments (0)
About PowerShow.com