Title: CLINUtrecht99
1Starting With Complex Primitives Pays
Off Complicate Locally, Simplify
Globally ARAVIND K. JOSHI Department of Computer
and Information Science and Institute for
Research in Cognitive Science
2Outline
- Introduction
- Towards CLSG
- Syntactic description
- Semantic composition
- Statistical processing
- Psycholinguistic properties
- Applications to other domains
- Discourse structure
- Folded structure of biomolecular sequences
- Summary
3Introduction
- Formal systems to specify a grammar formalism
- Start with primitives (basic primitive
structures or building blocks) as simple as
possible and then introduce various
operations for constructing more complex
structures - Such systems are string rewriting systems,
requiring string adjacency of function and
argument - Alternatively,
4Introduction CLSG
- Start with complex (more complicated)
primitives which directly capture some crucial
linguistic properties and then introduce some
general operations for composing them
-- Complicate Locally, Simplify Globally (CLSG) - CLSG systems are structure rewriting systems,
requiring structure adjacency of function and
argument - CLSG approach is characterized by localizing
almost all complexity in the set of primitives,
a key property
5Introduction CLSG localization of complexity
- Specification of the set of complex primitives
becomes the main task of a linguistic theory - CLSG pushes non-local dependencies to become
local, i. e. , they arise initially in the
primitive structures to start with
6CLSG
- CLSG approach as led to several new insights
into - Syntactic description
- Semantic composition
- Language generation
- Statistical processing
- Psycholinguistic properties
- Discourse structure
7Context-free Grammars
- The domain of locality is the one level tree
-- primitive building blocks
DET the N man/car
CFG, G S NP VP
VP V NP VP VP ADV NP DET N
V likes ADV passionately
S
VP
N
N
DET
car
the
man
NP
VP
VP
ADV
VP
NP
V
ADV
passionately
likes
N
NP
DET
V
8Context-free Grammars
- The arguments of the predicate are not in the
same local domain - They can be brought together in the same domain
-- by introducing a rule
S NP V NP
- However, then the structure is lost
- Further the local domains of a CFG are not
necessarily lexicalized - Domain of Locality and Lexicalization
9Towards CLSG Lexicalization
- Lexical item One or more elementary structures
(trees, directed acyclic graphs), which are
syntactically and semantically encapsulated. - Universal combining operations
- Grammar Lexicon
10Lexicalized Grammars
- Context-free grammar (CFG)
CFG, G S NP VP
NP Harry
VP V NP VP VP ADV
NP peanuts
V likes ADV passionately
(Non-lexical)
(Lexical)
S
NP
VP
Harry
VP
ADV
passionately
NP
V
likes
peanuts
11Weak Lexicalization
- Greibach Normal Form (GNF)
CFG rules are of the form A a B1
B2 ... Bn A a This
lexicalization gives the same set of strings but
not the same set of trees, i.e., the same set of
structural descriptions. Hence, it is a weak
lexicalization.
12Strong Lexicalization
- Same set of strings and same set of trees or
structural descriptions. - Tree substitution grammars (TSG)
- Increased domain of locality
- Substitution as the only combining operation
13Substitution
X
a
b
X
g
X
b
14Strong Lexicalization
- Tree substitution grammars (TSG)
CFG, G S NP VP
NP Harry
VP V NP
NP peanuts
V likes
S
a3 NP
a2
NP
TSG, G a1
Harry
peanuts
NP
VP
V NP
likes
15Insufficiency of TSG
- Formal insufficiency of TSG
G S SS (non-lexical) S a (lexical)
CFG
S
S
S
TSG G a1
a2
a3
S
S
S
S
a
a
a
16Insufficiency of TSG
S
S
S
TSG G a1
a2
g
S
S
S
S
S
S
S
S
a
a
S
S
a
a
a
S
a3
S
S
a
a
a
g grows on both sides of the root
G can generate all strings of G but not all
trees of G. CFGs cannot be lexicalized by TSGs,
i.e., only by substitution.
17Adjoining
X
b
a
X
X
X
g
b
X
Tree b adjoined to tree a at the node labeled X
in the tree a
18 With Adjoining
G S SS S a
S
S
S
TSG G a1
a2
a3
a
S
S
S
S
g
S
a
a
S
S
Adjoining a2 to a3 at the S node, the root node
and then adjoining a1 to the S node of the
derived tree we have g .
a
S
S
a
a
CFGs can be lexicalized by LTAGs. Adjoining is
crucial for lexicalization.
Adjoining arises out of lexicalization
19Lexicalized LTAG
- Finite set of elementary trees anchored on
lexical items -- extended projections of
lexical anchors, -- encapsulate syntactic and
semantic dependencies - Elementary trees Initial and Auxiliary
- Operations Substitution and Adjoining
- Derivation
- Derivation Tree
- How elementary trees are put together.
- Derived tree
20Localization of Dependencies
- agreement person, number, gender
- subcategorization sleeps null eats NP gives
NP NP thinks S - filler-gap who did John ask Bill to invite e
- word order within and across clauses as in
scrambling and clitic movement - function argument all arguments of the
lexical anchor are localized
21Localization of Dependencies
- word-clusters (flexible idioms)
non-compositional aspect - take a walk, give a cold shoulder to
- word co-occurrences
- lexical semantic aspects
- statistical dependencies among heads
- anaphoric dependencies
22LTAG Examples
S
S
a1
a2
VP
S
NP
NP
V
NP
VP
NP
likes
V
NP
likes
transitive
e
object extraction
some other trees for likes subject extraction,
topicalization, subject relative, object
relative, passive, etc.
23LTAG A derivation
S
S
b2
b1
a2
S
V
S
S
NP
VP
NP
does
VP
V
S
NP
V
NP
think
likes
e
a5
a3
NP
a4
NP
NP
Bill
who
Harry
24LTAG A Derivation
who does Bill think Harry likes
S
S
b2
b1
a2
S
V
S
S
NP
VP
NP
does
VP
V
S
NP
V
NP
think
likes
substitution
e
a5
a3
NP
a4
NP
NP
adjoining
Bill
who
Harry
25LTAG Derived Tree
who does Bill think Harry likes
S
S
NP
V
S
who
does
VP
NP
V
S
Bill
think
VP
NP
NP
V
Harry
likes
e
26LTAG Derivation Tree
substitution
who does Bill think Harry likes
adjoining
likes
a2
who
a3
a4
Harry
b1
think
a5
Bill
does
b2
Compositional semantics on this derivation
structure Related to dependency diagrams
27Topology of Elementary Trees Nested Dependencies
Topology of elementary trees, a and b determines
the Nature of dependencies described by the TAG
grammar
S
a
S
G b
S
a
b
a
S
b
a S b
a S b
a a ab b b
S
a b
28Topology of Elementary Trees Crossed dependencies
b
a
S
S
a
a
S
S
b
b
S
Topology of elementary trees a and b determines
the kinds of dependencies that can be
characterized b is one level below a and to
the right of the spine
29Topology of Elementary Trees Crossed
dependencies
S
S
a
a
S
S
S
b
b
S
a
S
a
S
a a b b
b
S
Linear structure
b
30Examples Nested Dependencies
- Center embedding of relative clauses in English
The rat1 the cat2 chased2 ate1 the cheese
- Center embedding of complement clauses in German
Hans1 Peter2 Marie3 schwimmen3 lassen2
sah1 (Hans saw Peter make Marie swim)
31Examples Crossed Dependencies
- Center embedding of complement clauses in Dutch
Jan1 Piet2 Marie3 zag1 laten2 zwemmen3
(Jan saw Piet make Marie swim)
- It is possible to obtain a wide range of
complex dependencies, i.e., complex
combinations of nested and crossed
dependencies. Such patterns arise in word
order phenomena such as scrambling and clitic
climbing and also due to scope ambiguities -
32LTAG Some Important Properties
- Factoring recursion from the domain of
dependencies (FRD) and extended domain of
locality (EDL) - All interesting properties of LTAG follow from
FRD and EDL mathematical, linguistic and
processing - Belong to the class of so-called mildly
context-sensitive grammars - Automaton equivalent of TAG, embedded pushdown
automaton, EPDA
33Processing of crossed and nested dependencies
Crossed dependencies (CD)
Jan1 Piet2 Marie3 zag1 laten2 zwemmen3
Nested dependencies (ND)
Hans1 Peter2 Marie3 schwimmen3 lassen2 sah1
(Jan saw Peter make Marie swim)
- CDs are easier to process (about one-half) than
NDs (Bach, Brown, and Marslen-Wilson (1986) - Principle of partial interpretation (PPI)
- EPDA model correctly predicts BBM results
Joshi (1990)
34Some Important Properties of LTAG
- Extended domain of locality (EDL)
- Localizing dependencies
- Set of elementary trees are the domains for
specifying linguistic constraints - Factoring recursion from the domain of
dependencies (FRD) - All interesting properties of LTAG follow from
EDL and FRD mathematical, linguistic, and
processing - Belongs to the class of mildly context-sensitive
grammars
35A different perspective on LTAG
- Treat the elementary trees associated with a
lexical item as if they are super part of speech
(super-POS or supertags) - Local statistical techniques have been remarkably
successful in disambiguating standard POS - Apply these techniques for disambiguating
supertags -- almost parsing
36Supertag disambiguation -- supertagging
- Given a corpus parsed by an LTAG grammar
- we have statistics of supertags -- unigram,
bigram, trigram, etc. - these statistics combine the lexical statistics
as well as the statistics of the constructions in
which the lexical items appear
37Supertagging
a5
a2
a1
a3
a4
a8
a6
a7
b2
b4
a13
a9
a10
b3
a12
a11
b1
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
the purchase price includes two
ancillary companies
On the average a lexical item has about 8 to 10
supertags
38Supertagging
a5
a2
a1
a3
a4
a8
a6
a7
b2
b4
a13
a9
a10
b3
a12
a11
b1
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
the purchase price includes two
ancillary companies
- Select the correct supertag for each word --
shown in green - Correct supertag for a word
means the supertag that corresponds to that
word in the correct parse of the sentence
39Supertagging -- performance
- Performance of a trigram supertagger
- Performance on the WSJ corpus, Srinivas
(1997), Chen (2002)
correct
of words correctly supertagged
Size of the training corpus
Size of the test corpus
75.3
35,391
Baseline
47,000
92.2
43,334
47,000
1 million
40Abstract character of supertagging
- Complex (richer) descriptions of primitives
(anchors) - contrary to the standard mathematical convention
- descriptions of primitives are simple
- complex descriptions are made from simple
descriptions - Associate with each primitive all information
associated with it
41Complex descriptions of primitives
- Making descriptions of primitives more complex
- increases the local ambiguity, i.e., there are
more descriptions for each primitive - however, these richer descriptions of primitives
locally constrain each other - analogy to a jigsaw puzzle -- the richer the
description of each primitive the better
42Complex descriptions of primitives
- Making the descriptions of primitives more
complex - allows statistics to be computed over these
complex descriptions - these statistics are more meaningful
- local statistical computations over these complex
descriptions lead to robust and efficient
processing
43Flexible Composition
Adjoining as Wrapping
a at x
Split
a
X
X
X
a1 supertree of a at X
a2 subtree of a at X
44Flexible Composition
Adjoining as Wrapping
X
b
a
X
X
a1 supertree of a at X
X
g
b
X
a2 subtree of a at X
a wrapped around b i.e., the two components a1
and a2 are wrapped around b
45Flexible Composition
Wrapping as substitutions and adjunctions
S
b
a
S
S
NP(wh)
VP
NP
VP
NP
V
S
V
NP
think
substitution
likes
e
- We can also view this composition as a
wrapped around b - Non-directional composition
adjoining
46Adjoining as Wrapping
Wrapping as substitutions and adjunctions
a
S
substitution
a1
S
adjoining
b
S
NP(wh)
S
VP
a2
NP
VP
NP
V
S
V
NP
think
likes
e
a1 and a2 are the two components of a a1
attached (adjoined) to the root node S of b a2
attached (substituted) at the foot node S of b
47Multi-component LTAG (MC-LTAG)
- The components are used together in one
composition step with the individual
components being composed with either
substitution or adjoining
- The representation can be used for both
-- predicate argument relationships --
scope information - The two pieces of information
are together before the single composition
step - However, after the composition there may
be intervening material between the components
48Tree-Local Multi-component LTAG (MC-LTAG)
- How can the components of MC-LTAG compose
preserving locality of LTAG - Tree-Local MC-LTAG
-- Components of a set compose only with an
elementary tree or an elementary
component- Flexible composition - Tree-Local
MC-LTAGs are weakly equivalent to LTAGs -
However, Tree-Local MC-LTAGs provide structural
descriptions not obtainable by LTAGs - Increased
strong generative power
49Scope ambiguities Example
( every student hates some course)
a1
a2
a3
S
a11
S
a21
S
VP
NP
V
NP
a12
NP
a22
NP
DET
N
hates
DET
N
every
some
N
N
a4
a5
student
course
50Derivation with scope information Example
( every student hates some course)
a1
a2
a3
S
a11
S
a21
S
VP
NP
V
NP
a12
NP
a22
NP
DET
N
hates
DET
N
every
some
N
N
a4
a5
student
course
51Derivation tree with scope information Example
( every student hates some course)
a3(hates)
0
0
1
2.2
a11(E)
a12(every)
a22(some)
a21(S)
2
2
a4(student)
a5(course)
- a11 and
a21 are both adjoined at the root of a3(hates)
- They can be adjoined in any order - a11 will
outscope a(S) if a(E)is adjoined before a(S) -
Scope information represented in the LTAG
system itself
52Competence/Performance Distinction A New Twist
- For a property, P, of language, how does one
decide whether P is a competence or a
performance property? - The answer is not given a-priori
- It depends on the formal devices (grammars and
corresponding machines) available for
describing language
53Competence/Performance Distinction A New Twist
- With MC-TAG and flexible composition all word
order patterns up to two levels of embedding
can be described with correct structural
structural descriptions assigned, i.e., with
correct semantics - Examples center embedding of complement
clauses, clitic movement, scope ambiguities,
etc. - Beyond two levels of embedding, although all
word order patterns can be described, there is
no guarantee that correct semantics can be
assigned to all strings - No corresponding result known so far for center
embedding of relative clauses as in English
54Summary
- Complex primitive structures (building blocks)
- CLSG Complicate Locally, Simplify Globally
- CLSG makes non-local dependencies become local,
i.e., they are encapsulated in the primitive
building blocks - New insights into
- Syntactic description
- Semantic composition
- Statistical processing
- Psycholinguistic properties
- Applications to other domains