Title: Chapter 11: Parsing with Unification Grammars
1Chapter 11 Parsing with Unification Grammars
- Heshaam Faili
- hfaili_at_ece.ut.ac.ir
- University of Tehran
2Overview
- Feature Structures and Unification
- Unification-Based Grammars
- Chart Parsing with Unification-Based Grammars
- Type Hierarchies
3Feature structures
- We had a problem adding agreement to CFGs. What
we needed were features, e.g., a way to say - number sg
- person 3
- A structure like this allows us to state
properties, e.g., about a noun phrase - cat NP
- number sg
- person 3
- Each feature (e.g., number) is paired with a
value (e.g., sg) - A bundle of feature-value pairs can be put into
an attribute-value matrix (AVM)
4Feature paths
- Values can be atomic (e.g. sg or NP or 3),
or can be complex, and thus we can define feature
paths - cat NP
- agreement number sg
- person 3
- The value of the path agreement number is sg
- A grammar with only atomic feature values can be
converted to a CFG. - e.g. AVM on previous page ? NP3,sg
- However, when the values are complex, it is more
expressive than a CFG ? can represent more
linguistic phenomena
5An Example for FS
6Reentrancy (structure-sharing)
- Feature structures embedded in feature structures
can share the same values - That is, two features have the exact same
valuethey share precisely the same object as
their value - well indicate this with a tag like 1
- cat S
- head agr 1num sg
- per 3
- subj agr 1
- In this example, the agreement features of both
the matrix sentence and the embedded subject are
identical - This is referred to as reentrancy
7FS with shared value
8Feature structures as graphs
- Technically, feature structures are directed
acyclic graphs (DAGs) - So, the feature structure represented by the
attribute-value matrix (AVM) - cat NP
- agreement number sg
- person 3
- is really the graph
-
CAT
np
?
?
sg
NUM
?
AGR
PER
?
?
3
9Unification
- Unification (U) a basic operation to merge two
feature structures into a resultant feature
structure (FS) - The two feature structures must be compatible,
i.e., have no values that conflict - Identical FSs
- number sg U number sg number sg
- Conflicting FSs
- number sg U number pl Fail
- Merging with an unspecified FS
- number sg U number number sg
10Unification (cont.)
- Merging FSs with different features specified
- number sg U person 3 number sg
- person 3
- More examples
- cat NP U agreement number sg
- cat NP
- agreement number sg
- agr num sg
- subj agr num sg U subj
agr num sg - agr num sg
- subj agr num sg
11Unification with Reentrancies
- Remember that structure-sharing means they are
the same object - agr 1num sg U subj agr per 3
- per 3
num sg - subj agr 1
- agr 1 num sg
- per 3
- subj agr 1
- When unification takes place, shared values are
copied over - agr 1 U sub agr per 3
- subj agr 1 num
sg - agr 1
- subj agr 1per 3
- num sg
12Unification with Reentrancies (cont.)
- And remember that having similar values is not
the same as structure-sharing - agr num sg U sub agr
per 3 - subj agr num sg
num sg - agr num sg
- subj agr per 3
- num sg
- With structure-sharing, you have to make sure the
values are compatible everywhere that
structure-sharing is specified - agr 1num sg U agr num sg
- per 3 per 3 Fail
- subj agr 1 subj agr num pl
- per 3
13Subsumption
- We can see that a more general feature structure
(less values specified) subsumes a more specific
feature structure - (1) num sg
- (2) per 3
- (3) num sg
- per 3
- So, we have the following subsumption relations,
where - (1) subsumes (3)
- (2) subsumes (3)
- (1) does not subsume (2), and (2) does not
subsume (1)
14Implementing Unification
- How do we implement a check on unification?
- i.e., given feature structures F1 and F2, return
F, the unification of F1 and F2 - Unification is a recursive operation
- If a feature has an atomic value, see if the
other FS has that feature with the same value - F a unifies with , F , and F a
- If a feature has a complex value, follow the
paths to see if theyre compatible and have the
same values at bottom - Does F G1 unify with F G2? We have to
inspect G1 and G2 to find out. - To avoid cycles, we have to do an occur check to
see if weve seen a FS before or not
15(No Transcript)
16(No Transcript)
17(No Transcript)
18Overview
- Feature Structures and Unification
- Unification-Based Grammars
- Chart Parsing with Unification-Based Grammars
- Type Hierarchies
19Grammars with Feature Structures
- CFG skeleton augmented with feature structure
path equations, i.e., each category has a feature
structure - CFG skeleton
- S ? NP VP
- Path equations
- ltNP agreementgt ltVP agreementgt
- 1. There can be zero or more path equations for
each rule skeleton ? no longer atomic - 2. When a path equation references constituents,
they can only be constituents from the CFG rule - e.g., ltD agreementgt ltNom agreementgt is an
illegal equation for the above rule! (But it
would be fine for NP ? Det Nom)
20Agreement in Feature-Based Grammars
- S ? NP VP
- ltS headgt ltVP headgt
- ltNP head agrgt ltVP head agrgt
- VP ? V NP
- ltVP headgt ltV headgt
- NP ? Det Nom(inal)
- ltNP headgt ltNom headgt
- ltDet head agrgt ltNom head agrgt
- Nom ? Noun
- ltNom headgt ltNoun headgt
- Noun ? flights
- ltNoun head agr numgt pl
- Compare with the CFG case
- S ? 3sgNP 3sgVP
- S ? PluralNP PluralVP
- 3sgVP? 3sgVerb
- 3sgVP ? 3sgVerb NP
- 3sgVP ? 3sgVerb NP PP
- 3sgVP ? 3sgVerb PP
- etc.
21Percolating Agreement Features
- S ? NP VP
- ltNP head agrgt ltVP head agrgt
- VP ? V NP
- ltVP headgt ltV headgt
- NP ? Det Nom
- ltNP headgt ltNom headgt
- ltDet head agrgt ltNom head agrgt
- Nom ? Noun
- ltNom headgt ltNoun headgt
22Head features in the grammar
- An important concept shown in the previous rules
is that heads of grammar rules share properties
with their mothers - VP ? V NP
- ltVP headgt ltV headgt
- Knowing the head will tell you about the whole
phrase - This is important for many parsing techniques
23Sub-categorization
- We could specify subcategorization like so
- VP ? V
- ltVP head subcatgt intrans
- VP ? V NP
- ltVP head subcatgt trans
- VP ? V NP NP
- ltVP head subcatgt ditrans
- But values like intrans do not correspond to
anything that the rules actually look like - To make SUBCAT better match the rules, we can
make it a list of a verbs arguments, e.g. ltNP,PPgt
24Handling Subcategorization
head 1subcat lt 2, 3gt
- VP ? V NP PP
- ltVP headgt ltVerb headgt
- ltVP head subcatgt ltNP,PPgt
- V ? leaves
- ltV head agr numgt sg
- ltV head subcatgt ltNP,PPgt
- There is also a longer, more formal way to
specify lists - ltNP,PPgt is equivalent to
- FIRST NP
- REST FIRST PP
- REST ltgt
VP
PP
V
NP
cat 2
cat 3
leaves
head 1agr num sg subcat lt
cat np, cat pp gt
25Subcategorization frames
- Subcategorization, or valency, or dependency is a
very important notion in capturing syntactic
regularity And there is a wide variety of
arguments that a verb (or noun or adjective) can
take. - Some subcategorization frames for ask
- He asked Q What was it like?
- He asked Swh what it was like
- He asked NP her Swh what it was like
- He asked VPto to see you
- He asked NP her VPto to tell you
- He asked NP a question
- He asked NP her NP a question
26Long-Distance Dependencies
- What is the earliest flight that you have _?
- TOP (fill gap)
- S ? WH-word Be-copula NP
- ltNP gapgt ltWH-word headgt
- MIDDLE (pass gap)
- NP ? D Nom
- ltNP gapgt ltNom gapgt
- Nom ? Nom RelClause
- ltNom gapgt ltRelClause gapgt
- RelClause ? RelPro NP VP
- ltRelClause gapgt ltVP gapgt
- BOTTOM (identify gap)
- VP ? V
- ltVP gapgt ltV subcat secondgt
S
27Overview
- Feature Structures and Unification
- Unification-Based Grammars
- Chart Parsing with Unification-Based Grammars
- Type Hierarchies
28Modifying a Chart Parser to handle Unification
- Our grammar still has a context-free backbone, so
we could just parse a sentence with a CFG and use
the features to filter out the ungrammatical
sentences - But by utilizing unification as we parse, we can
eliminate parses that wont work in the end - e.g., well eliminate NPs that dont match in
agreement features with their VPs as we parse,
instead of ruling them out later
29Changes to the Chart Representation
- Each state will be extended to include the LHS
DAG (which can get augmented as it goes along). - i.e., Add a feature structure (in DAG form) to
each state - So, S ? ? NP VP, 0,0
- Becomes S ? ? NP VP, 0,0, DagS
- The predictor, scanner, and completer have to
pass in the DAG, so all three operations have to
be altered
30Earley Chart Parser with Unification
31Predictor
- The predictor starts with the DAG from the
context-free rule - S ? NP VP
- ltS headgt ltVP headgt
- ltNP head agrgt ltVP head agrgt
- PREDICTOR
- S ? ? NP VP, 0,0, dagS
- where dagS is
- S head 1
- NP head agr 2
- VP head 1agr 2
32Completer
- The completer combines two rules and unifies the
two feature structures associated with them - COMPLETER
- When an NP is completed, the DagS will get
updated - S ? NP ? VP, 0,1, DagS
- where DagS is now
- S head 1
- NP definite yes
- head lex students
- agr 2num pl
- VP head 1 agr 2
33Predictor, Scanner, Completer
34Unify States
35Change to ENQUEUE
- The enqueue procedure should also be changed to
use a subsumption test - Do not add a state to the chart if an equivalent
or more general state is already there. - So, if Enqueue wants to add a singular
determiner state at x, y, and the chart already
has a determiner state at x, y unspecified for
number, then Enqueue will not add it.
36Why a Subsumption Test?
- If we don't impose a subsumption restriction,
enqueue will add two states at x, y, one
expecting to see a singular determiner, the other
just a determiner. - On seeing a singular determiner, the parser will
advance the dot on both rules, creating two edges
(since singular will unify with both singular and
with unspecified). - As a result, we would get duplicate edges.
- If we impose the restriction, and we see either a
single or plural determiner, and we advance the
dot, only one edge (singular or plural) gets
created at x, y.
37Overview
- Feature Structures and Unification
- Unification-Based Grammars
- Chart Parsing with Unification-Based Grammars
- Type Hierarchies
38Using Type Hierarchies
- Instead of simple feature structures, formalisms
like Head-Driven Phrase Structure Grammar (HPSG)
use typed feature structures - Two problems right now
- What prevents us right now from specifying the
following? - ltnumber femininegt
- How can we capture the fact that all values of
NUMBER are the same sort of thing, i.e., make a
generalization? - Solution use types
39Type Systems
- 1. Each feature structure is labeled by a type.
- noun
- CASE case
- 2. Each type has appropriateness conditions
specifying what features are appropriate for it. - noun ? CASE case
- verb ? VFORM vform
- 3. Types are organized into a type hierarchy.
- 4. Unification is modified to allow two different
types to unify.
40Simple Type Hierarchy
41Type Hierarchy
- So, if
- CASE is appropriate for noun, and
- the value of CASE is case, and
- we have the following type hierarchy
- case
- nom acc dat
- Then, the following are possible feature
structures - noun noun noun
- CASE nom CASE acc CASE dat
42Unification of types
- Now, when we unify feature structures, we have to
unify types, too - CASE case U CASE nom CASE nom
- CASE nom U CASE acc fail
- Lets also assume that acc and dat have a common
subtype, obj - acc dat
- obj
- Then, we have the following unification
- CASE acc U CASE dat CASE obj
43Practices