Title: Intensional ContextFree Grammar
1Intensional Context-Free Grammar
2Introduction
- Natural language processing (NLP) is the use of
computers to understand human languages - Generative grammar is the use of rules to
construct sentences - Intensional context-free grammar (ICG) is a
generative grammar born out of the intensional
programming paradigm - based on intensional logic
3Outline
- Natural Language Processing
- Linguistics
- Generative Grammars
- Transformational grammar
- Lexical-functional grammar
- Intensional Logic
- Intensional Programming
- Intensional Context-Free Grammar
- Future Work
4Outline
- Natural Language Processing
- Linguistics
- Generative Grammars
- Transformational grammar
- Lexical-functional grammar
- Intensional Logic
- Intensional Programming
- Intensional Context-Free Grammar
- Future Work
5Natural Language Processing
- Subfield of Artificial Intelligence (AI)
- Typically seen as one part of knowledge
representation - Can also be seen as a combination of computer
science and linguistics - Computational Linguistics
- Applications
- English as a command language
- Database queries
- Translation systems
- Speech recognition systems
6Linguistics
- Human language divides into five levels
- Phonology
- How sounds are used in language
- Morphology
- Word formation
- Syntax
- Sentence formation
- Semantics
- Sentence meaning
- Pragmatics
- Use of language in context
7Syntax
- Generative grammars are mostly concerned with
syntax - Morphology and semantics to some extent as well
- Words can be grouped together into syntactic
categories - Classified by type of meaning, how they can be
altered and where they can occur - They fall into two types
- Lexical and non-lexical
8Lexical Categories
- Noun (N) Typically names entities
- Can be inflected for number and possession
- Harry, boy, wheat, painting
- Verb (V) Designates actions, sensations and
states - Can be inflected for tense
- Arrive, discuss, melt, hear
- Adjective (A) designates a property of a noun
- Good, tall, old
- Preposition (P) links nouns to other words
- To, in, on, through, at, by
- Adverb (Adv) Names properties of verbs
- Silently, slowly, now
9Non-lexical Categories
- Determiner (D) specifies a N
- The, a, these, every, my
- Auxiliary (Aux) specifies a V
- Will, can, may, could
- Degree (Deg) specifies a P or an A
- Too, so, very, more, often
- Qualifier (Qual) modifies a N or V
- Always, perhaps, often, never
- Conjunction (Con) joins two or more categories
of the same type - And, but, or
10Outline
- Natural Language Processing
- Linguistics
- Generative Grammars
- Transformational grammar
- Lexical-functional grammar
- Intensional Logic
- Intensional Programming
- Intensional Context-Free Grammar
- Future Work
11Generative Grammar
- The idea of grammar has been around for centuries
in one form or another. - It wasnt until the middle of the last century
that the idea of formal grammar took hold. - At that time Noam Chomsky and others had the
revolutionary idea that some part of language
learning in humans is innate. - Languages are infinite
- Language acquisition is relatively simple
- Evidence that the brain is structured for
language - Sentences are constructed using rules
- Not listed
12Phrases and Sentences
- Sentences have a hierarchical design
- Words are grouped together into structural units
- Phrase structures or constituent structures
- Phrase structures are based on lexical categories
- Typically have a head, specifier and a compliment
- XP -gt (Spec) X (Comp)
13Phrase Structure
- The head is the word around which the phrase is
built (lexical) - NP, VP, PP, AP, and AdvP
- Specifiers are words that specify the head
- D, Aux, Deg, Qual
- NP -gt A dog
- VP -gt will run
- AP - gt quite certain
- PP -gt almost in
- Compliments are phrases themselves that provide
information about the head - NP -gt the books about the warPP
- VP -gt never eat a hamburgerNP
14Example Grammar
- S -gt NP VP
- NP -gt D N
- VP -gt V NP
- D -gt the a
- N -gt dog cat
- V -gt chased saw
S NP VP
D N V NP
D N the
dog saw the cat
15Classification of Generative Grammars
- In his pursuit of a generative grammar for
natural languages Chomsky defined different
grammar types - This created a hierarchy of computability based
on complexity of computation - Chomskys Hierarchy
- Type 0 recursively enumerable (r.e.)
- Type 1 context-sensitive (CSG)
- Type 2 context-free (CFG)
- Type 3 regular (RG)
16Where do Natural Languages Fall?
- Regular grammars were quickly ruled out
- Chomsky contended that natural language could not
be generated with a context-free grammar - some context-sensitivity was necessary.
- Some mathematical proofs to this effect were
given - The first attempt at a grammar of natural
language was the Transformational Grammar
proposed by Chomsky
17Transformational Grammar (TG)
- there exist two types of structure, deep
structure (d-structure) and surface structure
(s-structure) - The d-structure is generated by context-free
rules and contains the bulk of the meaning of the
sentence - The surface structure is reached by a series of
transformations which account for the morphology
and word order of the sentence. - The transformations alter the d-structures in
ways that cannot be accounted for by CFG alone
18Example
- Consider the sentence (passive)
- (1) A book was given to the professor.
- In TG, (1) is the s-structure of the given
sentence and is actually of the form - (2) A book was given _ to the professor
- This is the result of the application of an
NP-movement on the d-structure - (3) _ give a book to the professor.
19Example TG
- The production rules necessary to generate this
d-structure are - S ? NP I VP
- NP ? D N e
- VP ? V NP PP
- PP ? P NP
- I ? past
- D ? a the
- N ? book professor
- V ? give
- P ? to
20D-structure
21S-Structure
22Problems with TG
- Empty categories, non-word lexical entries, and
d-structures arent sentences - TGs are weakly equivalent to the r.e. languages.
- anything that is computable can be modeled by a
TG - TG is too complex for use as a natural language
generator. - Transformational grammars vs. non-transformational
grammars - restrict the transformational grammar in some way
- look back at the context-free grammars
23Restricting TG
- Wasow suggests constraining the language of
transformational grammar in such a way to make it
a context-sensitive grammar - Savitch shows that any context-sensitive grammar
can generate for each recursively enumerable
language one just as complex. - Thus, CSG is also a poor choice for a formal
model of language syntax - Lead to a string of more restrictive TGs
- The Standard Theory, The Revised, Extended
Standard Theory, Realistic Transformational
Grammar, Government and Binding Theory, and the
Minimalist Program
24Back to CFGs
- Harman, Pullum and Gazdar suggest that previous
arguments against the use of CFGs have been
misleading and mathematically unsound - examples are given of CFGs that generate subsets
of English - A number of papers then appeared with examples in
different languages refuting the use of pure
CFGs. - in Bresnan et al, they show that certain
languages while possibly weakly equivalent to
CFLs, are not strongly equivalent - In others, Shieber, Higginbotham and Culy, they
give examples of languages that are neither
weakly nor strongly equivalent to CFLs.
25Non-Transformational Grammars
- Large majority of the constructs of languages can
be generated through the use of CFGs. - The non-transformational grammars
- Lexicalization
- Subcategorization
- Examples
- Head-driven Phrase Structure Grammar
- Tree Adjoining Grammar
- Indexed Grammar
- Lexical-Functional Grammar
26Lexical-Functional Grammar (LFG)
- Lexical - LFG has a stronger lexicalization
than TG. - a lot of the work of the grammar is done at the
word level - Functional - the role of grammatical functions
is prominent - propose a separate functional structure
(f-structure) - Grammar - a model of generative grammar that is
an extension of CFG
27LFG
- There are three important structures in LFG
- c-structure (constituent structure, CFG)
- f-structure (functional structure)
- ?-structure ( thematic roles)
- agent, theme, location
- Furthermore, there exists two types of mappings
- f-description which allows for the mapping
between c-structure and f-structure - the a-structure which is a map between the
f-structure and ?-structure.
28Example LFG
- Consider sentence (1) again
- A book was given to the professor
- Here are the rules, f-descriptions and lexical
entries
was I (?TENSE) PAST a D (?DEF) - (?NUM)
SG the D (?DEF) book N (?PRED)
book (?NUM) SG professor N (?PRED)
professor (?NUM) SG given V (?PRED)
give?? (?SUBJ) (?OBLgoal)? to P (?PCASE)
OBLgoal
29c-Structure
30f-Structure
- Also, called the attribute-value matrix (AVM),
- is a nested matrix of grammatical functions
(attributes) and there values
31Outline
- Natural Language Processing
- Linguistics
- Generative Grammars
- Transformational grammar
- Lexical-functional grammar
- Intensional Logic
- Intensional Programming
- Intensional Context-Free Grammar
- Future Work
32Intensional Logic
- Intensional logic was motivated initially as a
formal description of natural language meaning. - Scott and Montague who looked to formalize the
semantics of language. - It was later applied by Carnap to assign meaning
to sentences based on implicit contextual
information - The truth assignment of a sentence is dependent
on the context or possible world - Typically, this context is not stated explicitly
but is implied by the world in which the
statement is uttered
33Intension and Extension
- The statement itself is defined as the intension
- the interpretation of that statement in the given
context is defined as the extension of the
statement. - The extensions of some intensional statement can
depend on any number of contexts - time, space, culture, audience, etc.
- As an example consider the expression
- (4) It is 12 degrees Celsius.
- The truth of this statement depends on at least
two parameters - the time and place in which it was uttered
34More Intensional Logic
- Different branches of intensional logic
- modal logic, epistemic logic (knowledge and
belief), deontic logic (obligation and
permission), tense logic and conditional logic. - It seems a natural fit that languages use
intensional logic - most contextual information goes unstated in our
every day life - yet the meaning of statements are usually
perfectly clear. - Most languages even have context switching
operators without actually calling them such - yesterday and today for temporal switching
- there and here for spatial switching
- Since its inception though, intensional logic has
been used more for other purposes
35Outline
- Natural Language Processing
- Linguistics
- Generative Grammars
- Transformational grammar
- Lexical-functional grammar
- Intensional Logic
- Intensional Programming
- Intensional Context-Free Grammar
- Future Work
36Intensional Programming
- Intensional programming is a programming paradigm
based on intensional logic - The idea behind intensional programming is the
use of context in any and all aspects of the
language. - Many examples of its use
- functional programming
- software version control
- web authoring
- scripting
37Functional Programming
- The first programming language developed based on
the principles of intensional logic is called
Lucid. - The creator of this language, William Wadge,
wanted a programming language that avoided
knowledge of the internal make-up of the system
on which it would run. - It was seen that hiding such information from the
programmer is similar to the hidden details about
context that are inherent to natural language use
38Lucid
- Lucid is an example of a functional-intensional
language - consists of functions which operate on streams of
data, which are the intensions. - A Lucid intension, x, is a value which varies
over time - the one dimension (context) of the Lucid
environment - So, the extensions of x are the infinite values
that x takes on at each time point t. - The time points in Lucid are implied
- but there are three context-switching operators
available to the programmer, - first, next and fby.
- Lucid stores previously calculated results
(extensions) in a warehouse, known as warehousing.
39Extensions of Lucid
- Applying intensional programming to other
programming paradigms is continued and Lucid is
extended in several ways. - spreadsheet programming, databases, real-time
systems, logic programming (Chronolog). - Other extensions are discussed in more detail in
this section - Lucid is also extended to contain both a
declarative and an imperative part and it is
called Granular Lucid (GLU). - Two important ideas that come out of the work of
GLU - multidimensional versioning
- versions as values
40Software Versioning
- One particularly interesting use of intensional
programming - version control tools in software development.
- In 1993, Plaice and Wadge applied intensional
contexts to versioning - different possible versions of software
components as possible worlds. - A complete system is formed by taking the most
relevant version of each component.
41Version Algebra
- To do this they developed a version algebra,
- partially ordered by an operation called
refinement, denoted by ?. - V ? W read as W refines V
- means that version W of a particular component is
an extension of version V - The simplest version of any component is called
the vanilla version and is denoted by ? - Allows for joins () of versions and subversions
()
42Best-fit Algorithm
- The complete version V of a system is found by
selecting the appropriate component versions. - best-fit algorithm - the most relevant version of
each component is selected - Thus, suppose we are looking for the
Keirapplefast version of a component - exists in the versions Keir, Keirapple,
Keirfast, applefast, Mariaapple, fast, and ?. - In this case the most relevant version would be
the Keirapple version.
43Web Authoring
- The version control system becomes important in
the intensional programming world. - One of the first uses of the version space
phenomenon comes in the form of web authoring
languages - Successive web authoring programs
- Intensional Hypertext Markup Language (IHTML),
IHTML 2 and IHTML 3, which all extend HTML with
intensions. - This is done by revising the version space system
and best-fit algorithms to account for dimension
(context) labels and the use of dimensions as
values themselves.
44IHTML
- IHTML allows authors to define a whole indexed
family of HTML variants using a single source
file. - The intension is the family of HTML pages while
each individual page serves as an extension - So, authors can provide multiple sources for the
same page where each source is labeled with a
different version. - The version dimensions can be attributed to any
of the markup elements of traditional HTML.
45Versioning in IHTML
- Explicit dimension identifiers separated from the
value of that dimension with the notation (). - So, for example we can have the version
- platformMacK68langFrenchcuisinechinese
- Another added element of IHTML is the use of
transversion links - links that are used to change the context of the
current version using vmod - Example, suppose your current context is
languageenglishbackgroundblue - lta hrefpage1 vmodlanguagefrenchgt
46IHTML2
- improves upon IHTML in a number of ways
- implementation on the server-side and overall
efficiency of the system - addition of dimensions as values
- change in the best-fit algorithm
- IHTML 2 drops the notion of a subversion
47IHTML3
- Swoboda extends IHTML 1 and 2 by letting
dimension identifiers be nested to an arbitrary
depth and by adding imperative structures - ISE (imperative scripting language)
- in the spirit of Perl but uses versioning for all
of its identifiers, including files, variables,
and functions - allows the values of dimensions to be version
expressions themselves
48Intensional Markup Language (IML)
- Wadge and schraefel added IML to ISE as a front
end - to get back to the simplicity of markup while
still maintaining the power of ISE. - IML uses Groff macros to extend HTML
- IML is translated into ISE which in turn is
translated into HTML readable by your browser.
49Outline
- Natural Language Processing
- Linguistics
- Generative Grammars
- Transformational grammar
- Lexical-functional grammar
- Intensional Logic
- Intensional Programming
- Intensional Context-Free Grammar
- Future Work
50Intensional Generative Grammar
- The one area that has seen little use of
intensional programming is in generative grammar. - Considering the motivation for intensional logic
in the first place, this seems remiss. - But, there has been some work in this vain
51Sentence Generators
- two examples of sentence generation using the
intensional paradigm - In both cases, the generator is built on ISE and
thus is web specific. - Both use a small grammar and lexicon and
construct French sentences by an informal method - sentence construction here is not via generative
grammar methods, that is, there are no CFG-like
rules.
52Intensional Context-Free Grammar
- Although, the intensional logic paradigm grew out
of research in natural language semantics not
much work has been done in NLP with intensional
programming. - Intensional context-free grammar (ICFG) consists
of two structures - a constituent structure (c-structure), composed
of a CFG with two types of production rules, - tagged rules and context switching rules
- a warehouse structure (w-structure) which stores
the possible world view as a version space.
53Future Work
- My intention is to formally define the grammar
underlying the intensional programming paradigm - denotational and operational semantics
- I also plan on developing a practical grammar for
use in NLP, - English sentence generator
- Might use AVMs from LFG as version space
- I may also develop an intensional prolog
interpreter to be used for this grammar
54THE END