Title: Chomsky Hierarchy
1Chomsky Hierarchy
We dont need them all for PL
Lexical structure (Scanner)
Regular Expressions (type-3)
Syntactic structure (Parser)
Context-free languages (type-2)
Context-sensitive languages (type-1)
Computable (formal) languages (type-0)
Type-3 ? Type-2 ? Type-1 ? Type-0
The inclusions are all proper.
2Syntax
The structure of a program Grammar
Analyze Lexical Structure
- Scanner ?output Tokens string
- Parser ?output Parse tree (intermediate data
structure)
Analyze Phrase Structure
3Separate Grammars
- Lexical Structure
- Phrase Structure
ltprogramgt ltend-of-filegt ltstatement-blockgt
ltprogramgt ltstatement-blockgt ltdeclarator-listgt
ltstatement-listgt ltdeclarator-listgt
ltdeclaratorgt ltdeclaratorgt , ltdeclarator-listgt ltde
claratorgt ltidgt ltidgt ltexpgt ................
. ltidgt ltalphabetgtltalphabetgtltid-tailgt
ltid-tailgt ltdelimitergt ltalphabetgt ltid-tailgt
ltdigitgt ltid-tailgt ltdelimitergt
ltspacegtlttabgtltend-of-linegt ltalphabetgt
abcdefghijklmnopqrstuvwxz
ltdigitgt 0123456789 ..................
.
4Scanner simplifies Parsers Job
ltEgt ltEgt ltTgt ltTgt ltTgt ltTgt ltFgt
ltFgt ltFgt ( ltEgt ) ltidgt ltvgt .................
ltidgt ltalphabetgtltalphabetgtltid-tailgt
ltid-tailgt ltdelimitergt ltalphabetgt ltid-tailgt
ltdigitgt ltid-tailgt ltdelimitergt
ltspacegtlttabgtltend-of-linegt ltalphabetgt
abcdefghijklmnopqrstuvwxz
ltdigitgt 0123456789 ltvgt ltdigitgt
ltdigitgtltvgt ...................
Lexical Structure
5To simplify Parsers Job
total total price number (1-discount)
(1saletax)
program source
Scanner
token string
Id3
Id1
(
Id1
Id2
v1
(
)
)
-
v1
Id4
Id5
6Regular Expressions, Regular Grammars (BNF)
( Rules to specify)
Lexical Analysis
Scanners, Finite Automata, Finite State Machines
(Formalism to recognize)
7Context-Free Grammars (CFG)
(to specify)
Syntactic Analysis
Push-Down Automata (PDA)
(to recognize)
Theoretically, but may not practically doable for
compilers
Unambiguous CFG (to specify PL)
Deterministic PDA (DPDA) (to parse programs)
8? A finite set of alphabets (symbols)
- ?, the empty string, is a regular expression
- S is a regular expression if S? ?
- If S is a regular expression, so is Si for i?N
- If S is a regular expression, so are S and S
- If R and S are two regular expressions, so is RS
- If R and S are two regular expressions, so is RS
9- Languages Specified by Regular Expressions
- A finite set of alphabets (symbols)
- ? the empty string
- L(S) the set of sentences represented by regular
expression S.
Ler S and R be two regulars expressions
- L() ?
- L(?) ?
- If S? ?, then L(S) S
- L(S)L(R) xy x ?L(S) and y?L(R)
- L(SR) L(S)L(R)
- L(S0) ?
- For n 1, L(Sn) L(SSn-1)
- L(S) L(S) ? L(S2) ? L(S3) ? ....
- L(S) L(S0) ? L(S) ? L(S2) ? L(S3) ? ....
- L(SR) L(S) ? L(R)
10Here we use regular expressions in the right hand
said of BNF
id A(AD) A abcdefghijklmno
pqrstuvwxz D 0123456789 L(
id) a, ab, a234b, xyz, x5z9, li2, ....
ab(aab)(aaa) is a regular expression
abaa, abaaaabaaba, abaaaaa, abababaa,....
11id A(AD) A abcdefghijklmno
pqrstuvwxz D 0123456789
AD
a
id
b
A
q'0
q0
q1
q'1
z
12id A(AD) A abcdefghijklmno
pqrstuvwxz D 0123456789
AD
id
a
b
q1
q0
z
13- DFA (Deterministic Finite Automata)
ab(aab)(aaa)
14- DFA (Deterministic Finite Automata)
? na(?)- nb(?) mod 3 1
na(?) The number of as in ? nb(?) The number
of bs in ?
aaabaa (5-14) bbabbab (2-5 -3) bbabbaba
(3-5-2)
15- NFA (Nondeterministic Finite Automata)
aaabab
aaaaa, ababb
16aaabab
aaaaa, ababb
17aaabab
S ? aA S ? aX A ? ? A ? aA X ? bY Y ? aY Y ?
B B ? ? B ? bB
A Regular Grammar
18- DFA have no memory to count
anbn n 0 is not regular
aaaaaaaaaaaaaaaabbbbbbbbbbbbbb
S ? ? S ? aAb
Context Free Grammar
Not a right-linear grammar Is there one? Or, is
there a left-linear grammar this language? No