Title: Introduction to Cognitive Science
1Introduction to Cognitive Science
- Topic Formal Grammars
- Generating and Parsing
- Lecturer Dr Bodomo
2Introduction
- In my previous lectures, we discussed how tacit
linguistic knowledge can be represented at
various levels of phonology, morphology, syntax,
semantics, pragmatics, and their interfaces,
including morphophonology, morphosyntax, and the
syntax-semantics interrelationships. - In this lecture, we shall look closely at how
these linguistic knowledge representations can be
formalised into an algorithm, a computational
procedure for processing this linguistic
knowledge.
3Keywords
- Constituent structure rules
- initial symbol
- terminal symbol
- non-terminal symbol
- generative grammar
- formal grammar
4Formal devices and notation
- The symbol ? indicates that a node is
rewritten as or consists of, or has the
constituents - This is used in rewrite rules of the type S ? NP
VP - a sentence, S, has the constituents noun phrase
(NP) and verb phrase (VP) - Optionality in the grammar is expressed as X,
Y . This means apply either X or Y but not both
5Formal devices and notation (cont.)
- The symbol is used to indicate constituent
boundary - e.g. _ is word initial while _ is word final
- The notation X (Y) implies that X is obligatory
and may be followed by Y
- Initial symbol the symbol from which a rewrite
rule begins (e.g. S) - Terminal symbol the end symbols from which no
constituent structure can be further developed
(N, V, Art). All others are non-terminal symbols
(e.g. NP, VP).
6Two main aspects of grammatical information
processingGenerating and Parsing sentences
- Before we begin let us illustrate with a simple
grammar and lexicon, using the following
sentence - The students greeted the teacher .
7The students greeted the teacher.
- Lexicon 1
- Greeted V, - NP
- Students N
- The Art
- Teacher N
- Grammar
- S ? NP VP
- VP ? V NP
- NP ? Art N
This grammar can also generate (i.e. produce)
the following sentences The teacher
greeted the students The teacher scared the
students The child ate an apple
But you have to augment i.e. increase the lexicon
as follows Lexicon2 An Art Greeted,
scared, ate V, - NP Apple N Students N
Child N Teacher N The Art
8Sentence Generationthe algorithm
- To produce a sentence we need three things
- A set of phrase structure rules (as illustrated
above) - A lexicon (as illustrated above), and
- A lexical insertion rule (as explained below)
- A lexical insertion rule is an instruction to
select the right word from a lexicon - The following is an example of a lexical rule
9Lexical insertion rule
- For each terminal symbol of a phrase structure
rule, select a word from the lexicon that
satisfies the following conditions - It is a member of the class of terminal symbol
(e.g. N, V) - its subcategorization frame matches that of the
terminal symbol (e.g. V, _NP). Attach this word
as the daughter of this terminal symbol. - The set of rules above constitutes what is known
as a sentence generator.
10- The whole procedure of beginning with an initial
symbol and then working through phrase structure
rules to adding the lexical items via lexical
insertions rules is driven by an algorithm or a
set of instructions. - Let us set out an algorithm for the generation
(production) of the sentence The students
greeted the teacher, a grammar and a lexicon as
follows
11The students greeted the teacher
Lexicon1 Greeted V, - NP Students N The
Art Teacher N
Grammar PS Rule1 S ? NP VP PS Rule2 VP ? V
NP PS Rule3 NP ? Art N
- i. Start with the initial symbol, S.
- ii. For every non-terminal symbol, X, find a
phrase structure rule with X as left-hand symbol
and others as the right hand symbol(s), and
develop a rewrite rule with X as the mother and
the right hand symbols as ordered daughters. - iii. Apply rule ii. until all branches end in
terminal symbols. - iv. Apply lexical rule iteratively until every
terminal symbol is replaced by a lexical item.
12Illustrating the algorithm
- Applying rule ii and iii. We get
13- From the above we can see that we have started
from an initial string and have ended with
terminal strings with lexical items as their
daughters. A sentence has thus been generated
(produced), telling us how this sentence is built
up. - Now, let us see how we can begin with an existing
sentence and then break it down into its
component parts by applying rules.
14Sentence parsing the algorithm
- To parse a sentence means to analyse it into its
constituent parts by the systematic application
of lexical insertion rules and some phrase
structure rules. It is like the reverse process
of generation.
15Some sentence parsing rules which constitute a
PARSER
- For a sentence, S
- i. Determine from the lexicon the word class of
every item and develop a partial tree for each
word where the word class label dominates the
word. - ii. Find a PS rule of the type X ? Y, Z and
where the right hand symbols match some sequence
of categories in the structure so far, and
develop a partial tree with X as the mother and
the right hand symbols as ordered daughters. - iii. Continue rule ii. until the root, S, is
reached and there are no unattached strings.
16The man drank the tea.
Lexicon1 drank V, - NP man N The Art Tea
N
Grammar PS Rule1 S ? NP VP PS Rule2 VP ? V
NP PS Rule3 NP ? Art N
17(No Transcript)
18Conclusion
- Parsing and generation of natural language data
is a very important area of linguistics,
especially in computer applications of natural
languages which has become an important aspect of
the computer or information processing industry. - In the next lecture, we shall be looking at the
last topic of the linguistics segment i.e. how
linguistic knowledge is acquired/learnt by
speakers of a language, from the point of view of
spoken language and from the point of literacy
(reading and writing).