Title: ContextFree Grammars for English
1Context-Free Grammars for English
2What are grammars used for in CL?
- Grammar checking
- Natural Language Understanding
- (1) HP fired Carlini.
- (2) Carlini was fired by HP.
- (3) The most recent executive to be fired by HP
was Carlini. - (4) Carlini was the most recent executive to be
fired by HP. - (1)-(4) all convey the same basic fact, but you
need to know the grammatical relationship among
these sentences to capture that. - This sort of info is used in Machine Translation,
Question-Answering, Information Extraction, etc.
3Grammars, Recognition and Parsing
4Sentence Recognition vs. Sentence Parsing
- Recognizer
- ?- recognize(the,girl,waved).
- yes
- ?- recognize(waved,girl,the).
- no
- Parser
- ?- parse(the,girl,waved,Parse).
- Parse det,n,vi
- yes
-
5A Sentence Grammar vs. A Sentence Parser
- Grammar - a systematic description of the
structures that underlie the sentences of a
language - Parser - a program for determining the structures
for particular sentences - capitalizes on the fact that an infinite number
of sentences can be captured with a finite number
of structures
6 Grammar vs. Recognizer Parser
- A Grammar is a declarative description
- It states the conditions for a string to be
valid. - A Recognizer is a procedural process.
- It determines whether a particular sentence is
valid. - A Parser is a procedural process.
- It determines what the structure of a
particular sentence is.
7The Chomsky Hierarchy
- Chomsky (1959) identified four classes of
grammars in terms of their constraints - Unrestricted phrase structure grammars (type 0)
- Context-sensitive grammars (type 1)
- Context-free grammars (type 2)
- Regular grammars (type 3)
- Higher numbered types are more constrained and so
can generate a smaller set of languages.
8Why look at grammar types?
- More constrained grammars will be easier to write
and compute with. - Question here What is the most constrained
grammar that can be used to describe a natural
language?
9Automata theory formal languages and formal
grammars
10We looked at Regular Grammars in Chapter 2
- Left regular grammars have rules of the form
- A ? a (A is non-terminal a is terminal)
- A ? Ba
- A ? e (e is the empty string)
- Right regular grammars have rules of the form
- A ? a
- A ? aB
- A ? e
11Finite State Grammars
- Regular grammars are also called finite state
grammars, which we saw used for morphological
analysis - In a regular grammar, or FSA, the only
information we need to know to generate or
recognize a sentence is the state we are in. - We do not need to know anything about what we
have already traversed in order to finish the
sentence.
12A Finite State Grammar
adj
adj
n
det
vi
3
n
Pn/pro
13One Configuration ofA Finite State Automaton
2
- Pointer begins in an initial state.
- It moves through a finite set of states.
- As it moves it usually
- scans a new word
- transitions from one state to the next
- It ends in a pre-designated final state.
14A Finite State Grammar-based Recognizer
- If the recognition procedure can find a
- det
- or adj
- or Pn
- or pro
- it can move from state 1 to
- state 2 (if det)
- state 3 (if adj)
- state 4 (if Pn or pro)
15Representing the FS Transitions -a Prolog
example-
- Grammar
- arc(1,det,2).
- arc(2,adj,3).
- arc(1,adj,3).
- arc(1,pro,4).
- arc(1,Pn,4).
- initial(1).
- final(6).
- Lexicon
- word(the,det).
- word(girl,n).
- word(slept,vi).
- word(she,pro).
- word(saw,n).
- word(saw,vi).
- fstn wksht
16Problem with FSA/Regular Grammars
- Cannot handle long-distance dependencies
- If S1 then S2
- Either S1 or S2
- The man who said that S1 is arriving tomorrow.
- Does not recognize phrasal constituency
- (Det N is an NP)
- Chomsky, 1957.
17Context-free Grammars
18CFG Rules allow center embedding
- S ? a S b
- S ? e
- This grammar generates the language
- an bn n 0
- which is not regular.
19Context-Free Grammars
- Used extensively to describe both formal
(programming) and natural languages - Also referred to as Phrase Structure Grammars
- Presented with a BNF (Backus Naur Form)
notation - ltSgt ltagt ltSgt ltbgt
- ltSgt e
- also referred to as Backus Normal Form
20A CFG allows multiple sub-networks within S
Here the sub-networks are NP and VP
NP
VP
21Transitioning between S and NP
Start
NP
VP
S
End
Adj
NP
22Transitioning between S and VP
VP
NP
Vt
VP
VP2
NP
23Representing the network for The girl slept.
vi
VP
24The Grammar for The girl slept.
- Grammar
- S network
- initial(s,1).
- final(s,3).
- arc(s,1,np,2).
- arc(s,2,vp,3).
- NP network
- initial(np,1).
- final(np,3).
- arc(np,1,det,2).
- arc(np,2,n,3).
- VP network
- initial(vp,1).
- final(vp,2).
- arc(vp,1,vi,2).
- Lexicon
- word(the,det).
- word(girl,n).
- word(slept,vi).
- in the S network, there is
- an arc from state 1 to state 2
- labeled np.
- wksht 1.1
25Recursion is possible when arc labels refer to
other networks
PP
PP
Det
N
jump
P
NP
PP
The fish PPin NPthe pond PPin NPthe
mountains wksht 1.2
26CFG Processing
When the parser finishes processing the NP, it
pops back up to S2.
NP
VP
How does the parser know how to do this?
27Keeping track of where you are
- A stack keeps track of where the processor should
pop back to after successfully traversing a
sub-network - When moving to a sub-network, the processor
pushes the network node to return to onto the
stack - When finished with the sub-network, the processor
pops the network node from the stack
28Computer Simulation of this holding process
- String State Stack Comment
- The girl slept S1 --- Push to NP-network
- The girl slept NP1 S2 Recognize Det
- girl slept NP2 S2 Recognize N
- slept NP3 S2 Pop to S-network
- slept S2 --- Push to VP-network
- slept VP1 S3 Recognize V
- --- VP2 S3 Pop to S-network
- --- S3 success
29Complications with CFGs
30Agreementcauses the grammar to expand
considerably
- S ? npsg vpsg.
- S ? nppl vppl.
- Npsg ? detsg nsg.
- NPpl ? detpl npl.
- VPsg ? visg.
- VPpl ? vipl.
- arc(np,2,n,3). becomes
- arc(np,2,nsg,3)
- arc(np,2,npl,3).
- arc(vp,1,vi,2). becomes
- arc(vp,1,visg,2).
- arc(vp,1,vipl,2).
31Subcategorization also causes expansion
- VP ?vi.
- VP ? vt np.
- VP ? vl np.
- VP ? vl adjp.
- VP ? v2t.
- . . .
- . . .
- . . .
- arc(vp,1,vi,2).
- arc(vp,1,vt,2, np,3).
- arc(vp,1,vl,2,np,3).
- arc(vp,1,vl,2,adj,3).
- arc(vp,1,v2t,np,3,np,4).
- . . .
- . . .
- . . . COMLEX Manual
32Auxiliaries
- Word order is not a problem
- might have been being bothered
- Selection is a problem with the same consequences
as agreement and subcat - have seen
- have seeing