CSCI 435 Compiler Design - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

CSCI 435 Compiler Design

Description:

Return Mode, Names and Values of Exceptions, Label for Jump Mode, etc. ... dinosaur.compilertools.net/yacc/index.html and http://dinosaur.compilertools.net/lex ... – PowerPoint PPT presentation

Number of Views:79
Avg rating:3.0/5.0
Slides: 26
Provided by: OwenAst9
Category:

less

Transcript and Presenter's Notes

Title: CSCI 435 Compiler Design


1
CSCI 435 Compiler Design
  • Week 6 Class 3
  • Section 4 to Section 4.1.2
  • (279-290)
  • Ray Schneider

2
Topics of the Day
  • Processing the intermediate code
  • Interpretation
  • Recursive Interpretation
  • Iterative Interpretation

3
Where we are ...
  • Now we have an annotated syntax tree, either
    actually in memory as a data structure (Broad
    Compilers) or implicitly available during parsing
    (Narrow Compiler).
  • The Annotated Syntax tree bears traces of its
    origin, the language constructs and the like,
    represented by nodes and subtrees, despite the
    relative paradigm independence of the methods
    being used
  • NOW THE NEXT STEP Transforming the AST into
    Intermediate Code

4
Status of various modules in compiler construction
The AST is full of nodes reflecting the specific
semantic concepts of the source language.
Intermediate Code Generation reduces the set of
specific node types to a small set of general
concepts easily implemented on actual machines.
FIND and REWRITE Intermediate Code Generation
finds the language characteristic nodes and
subtrees in the AST and rewrites them into
subtrees that use only a small number of
features, each corresponding closely to a set of
machine instructions.
5
After FIND and REPLACE
  • The resulting tree should be called THE
    INTERMEDIATE CODE TREE but is usually still
    called the AST
  • Features of the Intermediate Code Tree are
  • expressions, including assignments
  • routine calls, procedure headings, and return
    statements,
  • conditional and unconditional jumps
  • IN ADDITION
  • administrative features, ex. memory allocation
    for global variables, activation record
    allocation, and module linkage information
  • the entire range of high-level concepts of the
    language is replaced by a few rather low-level
    concepts

6
Processing the Intermediate Code
  • Involves either ...
  • A Little Pre-processing followed by execution on
    an Interpreter, or
  • A lot of Pre-processing in the form of machine
    code generation followed by execution on hardware
  • Whatever the processing system ...
  • Writing the Run-Time and Library system is the
    majority of the work and is primarily just brute
    force coding.
  • We will begin by looking at Interpretation

7
Simplest way ...process AST using an ...
  • INTERPRETER
  • An Interpreter considers the nodes of the AST in
    the correct order and performs the prescribed
    actions required by the semantics of the language
  • NOTE unlike compilation, the input data is
    required
  • Interpreter performs actions similar to the CPU
    except that it works on AST nodes rather than
    Machine Instructions
  • A CPU by contrast works on Machine Instructions
    given in the correct order and performs the
    actions demanded by the language as translated
    into the instructions required by the semantics
    of the machine
  • TWO KINDS OF INTERPRETERS
  • RECURSIVE (works directly on the AST), and
  • ITERATIVE (works on a linearized version of the
    AST)

8
Simple Recursive Compiler from 1.2.8 fig 1.19
(21)
9
Recursive Interpretation
  • An interpreting routine is provided for each node
    type in the AST
  • Each such routine calls other similar routines
  • The meaning of the language constructs are
    defined as a function of the meanings of their
    components
  • The Interpretation Starts by calling the
    interpretation routine for Program with the top
    node of the AST as a parameter
  • An important ingredient of a Recursive
    Interpreter is the UNIFORM SELF-IDENTIFYING DATA
    REPRESENTATION

10
Uniform Self-Identifying Data Representation
  • The Interpreter has to manipulate data values of
    unknown types and sizes that are not known when
    the Interpreter is written
  • Implementation requires a generic model
  • implementing values as variable-size records that
  • specify the type of the run-time value
  • its size and the run time value itself
  • a POINTER to such a record serves as the VALUE
    during Interpretation

11
Example Complex Numbers
  • Two Parts of Data Representation
  • Actual Values, vary from entity to entity
  • Type of Value, things in common

"re"
"im"
"real"
Specific to the given value of type
complex_number
Common to all values of type complex_number
12
Status Indicator another important feature
  • Used to direct the flow of control
  • Primary Component
  • Mode of Operation of the Interpreter an
    enumeration value with normal value something
    like "Normal Mode" indicating sequential flow of
    control, other values like Jumps, Exceptions,
    Function Returns
  • Second Component
  • value in the wider sense Supply more information
    about the Non-Sequential Flow of Control, ex.
    Return Mode, Names and Values of Exceptions,
    Label for Jump Mode, etc.
  • Status Indicator should contain file name and
    line number of text where status indicator was
    created and possibly other debugging information
  • Each interpreting routine checks the status
    indicator after each call to another routine to
    see how to carry on

13
Outline of a routine for recursive interpretation
of an if-statement
PROCEDURE Elaborate if statement (If node)
SET Result TO Evaluate condition (If node
.condition) IF Status .mode / Normal mode
RETURN IF Result .type / Boolean
ERROR "Condition in if-statement is not of type
Boolean" RETURN IF Result .boolean
.value True Elaborate statement (If node
.then part) ELSE Result .boolean .value
False // Check if there is an else-part at
all IF If node .else part / No node
Elaborate statement (If node .else part)
ELSE If node .else part No node SET
Status .node TO Normal mode
14
Typical Handling of the Symbol Table
  • Variables, named constants, other named entities
    are handled by the Symbol Table which is handled
    like the example below for something like
    variable V of type T say a record called
    "Declarable"
  • a pointer to the name V,
  • the file name and line number of its declaration
  • an indication of its kind (variable, constant,
    field selector, etc.)
  • a pointer to the type T
  • a pointer to newly allocated room for the value
    of V
  • a bit telling whether or not V has been
    initialized, if known
  • one or more scope- and stack- related pointers,
    depending on the language
  • other data as required (language dependent)

15
Summary
  • Recursive Interpreter can generally be written
    quickly, so useful for rapid prototyping
  • Not the best architecture for heavy duty
    interpreting but good for debugging language
    concepts and features
  • Big Disadvantage Very Slow, as much as 1000
    times slower than a compiler for the same
    language
  • This can be improved somewhat by doing as much
    static context checking as possible in the
    pre-interpretive phase (see Memoization pg.286)

16
Iterative Interpretation
  • Structure of an Iterative Interpreter is much
    closer to that of a CPU than a Recursive
    Interpreter is.
  • Consists of a flat loop over a case statement
    which contains a code segment for each node type
  • the code segment for a node type implements the
    semantics of that node type
  • It requires a fully annotated and threaded AST
    and maintains an ACTIVE NODE POINTER which points
    to the node being interpreted, i.e. the ACTIVE
    NODE
  • The interpreter runs the code for the Active Node
    which then points to another node, the successor
    node.

17
include "parser.h" / for types AST_node and
Expression / include "thread.h" / for
Thread_AST() and Thread_start / include
"stack.h" / for Push() and Pop() / include
"backend.h" / for self check / static
AST_node Active_node_pointer static void
Interpret_iteratively(void) while
(Active_node_pointer ! 0) / there is
only one node type, Expression /
Expression expr Active_node-pointer
switch (expr-gttype) case 'D'
Push(expr-gtvalue) break case
'P' int e_left Pop() int e_right
Pop() switch (expr-gtoper)
case '' Push(e_left e_right) break
case '' Push(e_left e_right) break
break
Active_node_pointer Active_node_pointer-gtsuccess
or printf("d\n",Pop()) / print the
result / void Process(AST_node icode)
Thread_AST(icode) Active_node_pointer
Thread_start Interpret_iteratively()
An iterative interpreter for the demo compiler of
1.2 JUST A BIG SWITCH STATEMENT
18
the Iterative Interpreter 1
  • Data Structures resemble those inside a compiled
    program more than those in a Recursive
    Interpreter
  • ex. Array holding the global data, if source
    language is stack oriented, then the iterative
    compiler maintains a stack.
  • Variables and Entities have an address which is
    generally an offset into a memory array
  • Symbol table is no longer relevant, but useful to
    generate better error messages

19
the Iterative Interpreter 2
  • Iterative interpreter has more information about
    run time events that a compiled program but less
    than a recursive interpreter
  • one can make up for the lack of a symbol table in
    an iterative interpreter by using SHADOW MEMORY
    parallel to the memory arrays maintained by the
    interpreter. The Shadow Memory holds properties
    of the corresponding byte in memory, ex. "is
    uninitialized", "is a non-first byte of a
    pointer", "belongs to a read only array" the
    different modes can be encoded with byte-codes
  • Some Iterative Interpreters store the AST in a
    single array for several reasons
  • easier to write it to file
  • more compact representation
  • reusable without regenerating the AST

20
Three Forms of Storing an AST a Graph
21
Storing an AST in an array or as
pseudo-instructions
Array
condition
IF
condition
IF_FALSE
statement 1
statement 1
JUMP
statement 2
statement 2
statement 3
statement 3
statement 4
statement 4
Pseudo- Instructions
22
AST Constructions and interpretation
  • Usually puts the successor of a node right after
    the node
  • may even omit the successor pointer altogether
    and just make it the default and only include
    pointers when the next node is NOT the successor
    node
  • Historically an Iterative Interpreter mimics a
    CPU working on a compiled program and the AST
    array mimics the compiled program
  • Iterative Interpreters are easier to write even
    than recursive interpreters and much easier than
    compilers
  • Only serious deficiency is speed, even the best
    interpreter is typically 30 times slower that an
    optimized compiler

23
Next time ...
  • Next time we'll start looking at Code Generation
  • We'll spend about two or three classes on it.

24
Homework for Week 8
  • Bison Familiarization
  • Read the entire 39 pages of "A Compact Guide To
    Lex and Yacc" // you can skim through it the
    first time
  • THEN concentrate first on getting the lex example
    on page 10 running
  • THEN after you have that running go on to
    Practice, Part 1 and strive to get the primitive
    calculator running (pages 14 through 17)
  • HINTS the lex input on page 10 can be made to
    run by extending it with the line (cribbed from
    our text)
  • int yywrap(void) return 1 //at the end, and
    you have to put
  • include ltstdlib.hgt //at the top of your yy.lex.c
    output then the code will add line number to a
    text file reading the file name in from the
    command line and sending the output to stdout.

25
References
  • Text Modern Compiler Design Figures
  • Lex A Lexical Analyzer Generator by M.E. Lesk
    and E. Schmidt
  • Yacc Yet Another Compiler-Compiler by Stephen C.
    Johnson
  • see http//dinosaur.compilertools.net/yacc/index.h
    tml and http//dinosaur.compilertools.net/lex/in
    dex.html
Write a Comment
User Comments (0)
About PowerShow.com