CSCI 435 Compiler Design - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

CSCI 435 Compiler Design

Description:

... constructed in a structured fashion and to this structure semantics ... Design ... CSCI 435 Compiler Design. Formalism 2. Now we can define a context-free ... – PowerPoint PPT presentation

Number of Views:345
Avg rating:3.0/5.0
Slides: 28
Provided by: OwenAst9
Category:
Tags: csci | compiler | design

less

Transcript and Presenter's Notes

Title: CSCI 435 Compiler Design


1
CSCI 435 Compiler Design
  • Week 1 Class 3
  • Ray Schneider

2
Today's Drill
  • Grammars
  • Closures
  • Outline Code, phew!
  • Conway paper from 1963 on J
  • On to Chapter 2

3
Grammars
  • context-free-grammars (abbreviated CF)
  • basic kind of grammar used in defining
    Programming Languages. "Context free" means that
    one can substitute for a non-terminal symbol
    without reference to the context.
  • also regular grammars also called regular
    expressions, and
  • attribute grammars which are context-free-grammars
    extended with parameters and code

4
So what is a grammar?
  • a procedure for generating strings of symbols
    with defined properties
  • symbols are also called TOKENS of the language
  • strings of symbols are Program Texts
  • and the set of strings of symbols is the
    Programming Language
  • ex. BEGIN print ( "Hi!" ) END
  • a string with six Tokens
  • The strings are constructed in a structured
    fashion and to this structure semantics cam be
    attached

5
The form of a grammar
  • grammar production rule(s)S, the Start
    Symbol
  • production rule HAS 2 parts
  • left hand side name of syntactic construct
  • right hand side possible forms
  • separated by an ? arrow
  • ex. expression?'(' expression operator expression
    ')'
  • The left hand side is a NON TERMINAL symbol
  • the right hand side consists of combinations of
    NON TERMINALS and TERMINALS
  • together they make up the set of Grammar Symbols
    collectively called the MEMBERS of the Grammar, G

6
Some Conventions
  • Want to be able to infer the class of a symbol
    from its typographical form, so
  • NON TERMINALS are denoted by capital letters,ex.
    A,B,C and N
  • Terminals are denoted by lower-case letters near
    the end of the alphabet, ex. x, y, and z
  • Sequences of grammar symbols are denoted by Greek
    letters near the beginning of the alphabet, ex. a
    (alpha), b (beta), g (gamma)
  • Lower-case letters near the beginning of the
    alphabet (a, b, c, etc.) stand for themselves as
    terminals
  • the empty sequence is denoted by e, (epsilon)

7
The Production Process
  • sentential form the central data structure in
    the production process
  • syntactic structure added to the sentential form
    as a tree with leaves of grammar symbols
  • production tree combination of sentential form
    and syntactic structure
  • string of terminals is produced by a grammar by
    applying production steps to a sentential form

8
derivation
Each production step finds a Non-Terminal N in
the leaves of the sentential form, finds a
production rule N?a with N as its left hand side,
and replaces N in the sentential form with a tree
with N as the root and the right hand side of the
production rule, a, as the leaves. ex. given N?a
then bNg can be replaced with bag
1 expression ? '(' expression operator
expression ')' 2 expression ? '1' 3 operator
? '' 4 operator ? ''
Notation R_at_P means Production rule R applied at
position P
START Symbol
Derivation of the String (1(11)) specifically a
leftmost derivation
9
Parse Tree of our derivation
  • Recursion is necessary to the production process
  • We need to maintain the production TREE to find
    out the semantics of the program which is the
    task of the PARSER

10
Extended forms of grammars
  • non-terminal ? zero or more grammar symbols
  • basic single grammar rule format
  • normally a richer notation is used

N?a N?b simple notation each alternative
separately N?g can be combined as N?a
b g where these are the alternatives of N
Thus far the format is BNF which is good for
expressing nesting and recursion but is not as
effective expressing repetition and optionality.
11
Additional Notation postfix operators
  • Extended BNF adds new forms
  • R is one or more R's, expresses repetition
  • R? is an occurrence of zero or one R, optionality
  • R is an occurrence of zero or more Rs, optional
    repetition
  • parentheses may be needed to group grammar
    symbols so that the operators can operate on more
    than one

12
Properties of Grammars
  • left recursive if starting with a sentential form
    N we can produce another sentential form starting
    with N. ex.
  • expression?expression '' factor factor
  • non-terminal is nullable if starting with N we
    can produce an empty sentential form.
  • non-terminal is useless if it can never produce a
    string of terminal symbols
  • grammar is ambiguous if it can produce two
    different production trees with the same leaves
    in the same order. This means that the semantics
    will differ since the semantics are derived from
    the production tree.

13
Formalism
  • basic unit of a grammar is the symbol
  • - they must be distinct, i.e. distinguishable
  • examples N, x, procedure_body, tk
  • 2. next element the production rule
  • given 2 sets of symbols V1 and V2 (a
    vocabulary) a production rule is a 2-tuple (a
    pair)
  • (N,a) such that N?V1,a?V2
  • where X means a sequence of 0 or more
    elements drawn from set X.
  • the production rule is usually written
    N?a

14
Formalism 2
  • Now we can define a context-free grammar G as a
    4-tuple

G(VN,VT,S,P) where VN is the set of
non-terminal symbols VT is the set of terminal
symbols S is the start symbol P is the set of
production rules The above is the context-free
portion of the grammar. Real acceptable grammars
have to satisfy three context conditions.
15
Formalism 3
  • Three Context Conditions

1. VN?VT? i.e. the terminal symbol set and the
non-terminal symbol set have no symbols in
common 2. S?VN i.e. the start symbol is a member
of the non-terminal symbol set 3.
P?(N,a)N?VN,a ?(VN?VT) i.e. a production
rule must have a left side drawn from the
non- terminal symbol set and a right side drawn
from the combination of the non-terminal and
terminal symbol sets. No other symbols allowed.
16
The Language Generated by a Grammar
  • Sequences of symbols are called strings
  • A string in the grammar may be directly derived
    from another string, ex. agtb
  • means b is directly derivable from a iff
    ?g,d1,d2,N?VN such that ad1Nd2, bd1gd2,
    (N,g)?P
  • In English that's something like the string b can
    be produced from the string a iff a contains a
    non-terminal symbol and there is a production
    rule which allows b to be produced by
    substitution for the non-terminal.
  • this 'replacement' is called a production step
  • more generally agtb iff ab or if b can be
    produced by a sequence of production steps

17
finally
  • a sentential form of a grammar G is defined as
    a Sgta i.e. all strings must derive from the
    start string.
  • a terminal production of a grammar G is defined
    as a sentential form composed of all terminal
    symbols aSgta?a?VT
  • The Language L (G)aSgta?a?VT

18
Closure Algorithms
  • information improving algorithms often start by
    collecting information then apply rules to extend
    or draw conclusions from it.
  • It can be very misleading to consider algorithms
    by looking at their pieces in isolation
  • We will using the construction of the calling
    graph of a program as an example.

19
Calling Graph of a Program
  • Calling Graph is a directed graph (arrows) which
    has a node for each routine (procedure or
    function) in the program and an arrow from node A
    to node B denotes that A calls B either directly
    or indirectly.

void P() Q() S() void Q() R()
T() void R() P() void S() void T()

direct calling graph
20
Calling Chains change the picture
  • to complete the picture we apply the following
    rule
  • If there is an arrow from node A to node B and
    one from B to C, make sure there is an arrow from
    A to C. This transitivity axiom is written
  • A?B?B?C ? A?C
  • where ? is read "calls directly or indirectly"
  • thus A ? A indicates that "routine A is
    recursive"

21
General From of a Closure Algorithm
  • Three Elements
  • Data Definitions deriving from the nature of the
    problem
  • Initializations one or more rules for
    initialization of the information from the
    specific problem to its representation
  • Inference Rules one or more rules of the form If
    I1,I2 then J (i.e. if this information is
    present then I infer J)

22
Recursion Detection A little more formally
  • Data definitions
  • Let G be a directed graph with one node for each
    routine. The information items are arrows in G.
  • An arrow from a node A to a node B means that
    routine A calls routine B directly or indirectly.
  • Initializations
  • If the body of a routine A contains a call
    to routine B, an arrow from A to B must be
    present.
  • Inference Rules
  • If there is an arrow from node A to node B
    and one from B to C, an arrow from A to C must be
    present.

Two Things 1) doesn't specify stuff that should
not be present (need a additional rule) and 2)
doesn't guarantee that the algorithm will stop
23
Iteration to the rescue
  • Implement the closure algorithm through repeated
    bottom-up sweeps

SET the flag something changed TO True WHILE
Something changed SET Something changed TO
False FOR EACH Node 1 IN Graph FOR
EACH Node 2 IN Descendants of Node 1
FOR EACH Node 3 IN Descendants of Node 2
IF there is no arrow from Node 1 to Node 3
Add an arrow from Node 1 to Node 3
SET Something changed TO True
Algorithm is O(n3) per repetition and could be
as bad as O(n5) with WHILE loops Author's
handwaving suggests in practice it tends to run
in linear time.
24
The Outline Code (pseudocode)
  • Command lines end in ''
  • Control lines end in ''
  • Body of Control Structure in indented with end
    obvious by return to former indentation
  • KEYWORDS all capitals
  • Identifiers generally start with a capital letter
  • Type Identifiers and field selectors start with
    lower case letters (denote classes)
  • Field Selectors marked by a 'dot'
    ex. Node .left is a postfix operator
  • // starts a comment line

(46)
25
Summary End of Chapter 1
  • Compiler is a file conversion program of a very
    specialized nature
  • takes Source Language and produces Target
    Language and is written in Implementation
    Language
  • Usual form is a series of processes
  • lexical analysis, syntactic analysis,
    intermediate form, then code generation
  • Usual form of semantic representation is the AST
    with context and semantic annotations
  • Program Generators based on formalisms have
    allowed compiler generation to be increasingly
    automated.

26
Homework for Week 1
  • For the week do problems 1 through 21 at the end
    of the chapter (Chapter 1). Pace yourself and
    turn them in Monday.
  • Get the demo compiler running and run some
    examples starting with the example on page 18 and
    illustrated in figure 1.17
  • (2((34)9))

27
References
  • Text Modern Compiler Design Figures
Write a Comment
User Comments (0)
About PowerShow.com