Title: Compilers Modern Compiler Design
1CompilersModern Compiler Design
NCYU C. H. Wang
2Whats Compiler
- A Compiler is a program that accepts as input a
program text in a certain language and produces
as output a program text in another language,
while preserving the meaning of that text. - The main reason why one wants such a translation
is that one has hardware on which one can run the
translated program
3Compiler Compilers
- To obtain the compiler, we run another compiler
whose input consists of compiler source text and
which will produce executable code for it, as it
would for any program source text. - When the source language is also the
implementation language and the source text to be
compiled is actually a new version if the
compiler itself, the process is called
bootstrapping.
4Compiler Compilers
Compiling and running a compiler
5Magic Work
- The compiler can work its magic because of two
factors - The input is in a language and consequently has a
structure, which is described in the language
manual. - The semantics of the input is described in terms
of and is attached to that structure.
6Conceptual Structure
- Conceptual structure of a compiler
7Phases of a compiler
From ASU88
8Compiler and Interpreter
9Interpreter
- An interpreter is normally written in a
high-level language and will therefore run on
most machine types. (no generating object
code--protability) - Writing an interpreter is much less work than
writing a back-end. - Performing the actions straight from the semantic
representation allows better error checking and
reporting to be done. - The increased security can be achieved by
interpreter
10Why study compiler construction
- Compiler construction is very successful
- Compiler construction has a wide applicability
- Be applied to rapidly create read routines for
HTML, PostScript, etc. - Compilers contain generally useful algorithms
11Demo Compiler
- Structure of the demo compiler
12General Translation (I)
From ASU88
13General Translation (II)
From ASU88
14Notations
- Parsing
- Parse Tree
- Syntax Analysis
- Abstract Syntax Tree (AST)
- Annotated Abstract Syntax Tree
- The annotations in a node are also called the
attributes of a node - It is the task of the context handling module
15Parsing
- Syntax trees are also called parse tree
- Parsing is also called syntax analysis
- Grammar
- expression -gtexpression term expression -
term term - term -gt term factor term / factor
factor - factor -gt identifier constant ( expression
)
16Parse Tree
bb 4ac
17Abstract Syntax Tree (AST)
18Annotated AST
- Examples of annotations are type information and
optimization information. - The annotations in a node are also called the
attributes of that node.
19Annotated AST
20Grammar for demo compiler
- Fully parenthesized expression
21Lexical analysis for the demo compiler
- The tokens in our language are (, ), , , and
digit
22Lexical analyzer
23Syntax analysis for the demo compiler
- Recursive descent parsing
- Predictive recursive descent parsing
- LL(1)
- Look-ahead sets
24A C template for a grammar rule
P -gt A1 A2 An B1 B2
25Context handling for the demo compiler
26Code generation for the demo compiler
- A simple stack machine
- PUSH n
- Pushes the integer n onto the stack
- ADD
- Replaces the topmost two elements by their sum
- MULT
- Replaces the topmost two elements by their
product - PRINT
- Pops the top element and prints its value
- Depth-first scan of the AST
27Code generation results
- The expression
- (2((34)9))
- Outputs
- PUSH 2
- PUSH 3
- PUSH 4
- MULT
- PUSH 9
- ADD
- MULT
- PRINT
28Interpretation for the demo compiler
- Code generator emits code to have the actions
performed by a machine at a later time. - The interpreter performs the actions right away.
29The structure of a more realistic compiler
30Run-time system
- Traditionally left outside compiler structure
pictures - Some of the actions required by a running program
will be of a general, language-dependent and/or
machine dependent housekeeping nature examples
are code for allocating arrays, manipulating
stack frames, and finding the proper method
during method invocation in an object-oriented
language.
31Short-cuts
- It is by no means always necessary to implement
all modules of the back-end - Local assembler is almost always available
- Generate C code from the intermediate code
- C is sometimes called The machine-independent
assembler - Good to excellent quality, but the increased
compilation time
32Properties of a good compiler
- Generate correct code
- Conform completely to the language specification
- Be able to handle programs of essentially
arbitrary size. - Compilation speed
- Compiler size
- User friendliness
- The speed and the size of the generated code
33Portability and Retargetability
- Portability
- The compiler itself can be made to run on another
machine - Retargetability
- The compiler can be made to generate code for
another machine
34A short history of compiler construction (I)
- 1945-1960 code generation
- Assembly programming
- Higher level languages and compiler were looked
at with a mixture of suspicion and awe - The first FORTRAN compiler (Sheridan, 1959)
optimized heavily and was far ahead of its time
in that respect.
35A short history of compiler construction (II)
- 1960-1975 parsing
- A proliferation of new programming language
- The language designers began to believe that
having a compiler for a new language quickly was
more important than having one that generated
very efficient code. - This shifted the emphasis in compiler
construction from back-ends to front-ends.
36A short history of compiler construction (III)
- 1975-present code generation and code
optimization paradigms - Professional compilers
- Reliable, efficient, both in use and in generated
code, and preferably with pleasant user
interfaces. - More attention to the quality of the generated
code - New paradigm in programming were developed
- Functional, logic and distributed programming
37Grammars
- Context-free grammars (CF grammars)
- The essential formalism for describing the
structure of programs in a programming language - The form of a grammar
- A grammar consists of a set of production rules
and a start symbol. Each production rule defines
a named syntactic construct. - Expression -gt ( expression operator expression
)
Left hand side
Right hand side
38Grammars
- Non-terminal symbol
- Occurs as the left-hand side of one or more
production rules (are denoted by capital letters,
ex. A, B, C, and N ) - Terminal symbol
- End point of production process (are denoted by
lower-case letters, ex. x, y, and z) - Sequence of grammar symbols
- Are denoted by Greek letters, ex. ?, ?, and ?
- The empty sequence ?
39The production process
- Production Tree
- The syntactic structure can be added to the flat
interpretation of a sentential form as a tree
positioned above the sentential form so that
leaves of the tree are the grammar symbols.
40Leftmost derivation (1)
- Production rule
- 1 expression-gt( expression operator
expression ) - 2 expression-gt 1
- 3 operator-gt
- 4 operator-gt
41Leftmost derivation (2)
Leftmost derivation of the string (1(11))
42Leftmost derivation (3)
Parse tree of the derivation
43Extended forms of grammars
- Backus-Naur Form (Backus Normal Form)BNF
- Extended BNF (EBNF)
- R the occurrence of one or more Rs
- R? the occurrence of zero or one Rs
- R the occurrence of zero or more Rs
- Example
- Parameter_list -gt(INOUT)? identifier (,
identifier) - a,b
- IN year, month, day
- OUT left, right
44Properties of grammars
- Left-recursion
- expression -gt expression factor factor
- Right-recursion
- Nullable a non-terminal N is nullable if,
starting with a sentential form N, we can produce
an empty sentential form. - Useless a non-terminal N is useless if it can
never produce a string of terminal symbols. - Ambiguous a grammar is ambiguous if it can
produce two different production trees with the
same leaves in the same order.
45Example (simple test)
- Page 50 Exercise 1.16
- S -gt A B C
- A -gt B ?
- B -gt x Cy
- C -gt B C S
- Left-recursive? Right-recursive? Nullable?
Useless? - What language does the grammar produce?
- Is the grammar ambiguous?
46Answer
- Left-recursive B, C
- Right-recursive S, C
- Nullable S, A
- Useless C
- Language x, ?
- Yes, it is ambiguous
- S-gt A-gtB-gtx S-gtB-gtx
47The grammar formalism
- A context-free grammar G is a 4-tuple
- VN a set of non-terminal symbols
- VT a set of terminal symbols
- S start symbol
- P a set of production rules
48Closure algorithms
- Calling graph is a directed graph which has a
node for each routine in the program and an arrow
from node A to node B if routine A calls routine
B directly or indirectly.
49Calling graph
50Transitive closure
- The transitive closure of the relation calls
directly or indirectly. - Routine A is recursive
51Calling graph
- The resulting calling graph
52Recursion detection
53Iterative Implementation of the closure algorithm
Time complexity O(n3)