Title: Recap
1Recap
2Outline
- Subjects Studied
- Questions Answers
3Lexical Analysis (Scanning)
- input
- program text (file)
- output
- sequence of tokens
- Read input file
- Identify language keywords and standard
identifiers - Handle include files and macros
- Count line numbers
- Remove whitespaces
- Report illegal symbols
- Produce symbol table
4The Lexical Analysis Problem
- Given
- A set of token descriptions
- An input string
- Partition the strings into tokens (class, value)
- Ambiguity resolution
- The longest matching token
- Between two equal length tokens select the first
5Jlex
- Input
- regular expressions and actions (Java code)
- Output
- A scanner program that reads the input and
applies actions when input regular expression is
matched
Jlex
6Summary
- For most programming languages lexical analyzers
can be easily constructed automatically - Exceptions
- Fortran
- PL/1
- Lex/Flex/Jlex are useful beyond compilers
7Syntax Analysis (Parsing)
- input
- Sequence of tokens
- output
- Abstract Syntax Tree
- Report syntax errors
- unbalanced parenthesizes
- Create symbol-table
- Create pretty-printed version of the program
- In some cases the tree need not be generated
(one-pass compilers)
8Pushdown Automaton
input
u
t
w
V
control
parser-table
stack
9Efficient Parsers
- Pushdown automata
- Deterministic
- Report an error as soon as the input is not a
prefix of a valid program - Not usable for all context free grammars
cup
Ambiguity errors
parse tree
10Kinds of Parsers
- Top-Down (Predictive Parsing) LL
- Construct parse tree in a top-down matter
- Find the leftmost derivation
- For every non-terminal and token predict the next
production - Preorder tree traversal
- Bottom-Up LR
- Construct parse tree in a bottom-up manner
- Find the rightmost derivation in a reverse order
- For every potential right hand side and token
decide when a production is found - Postorder tree traversal
11Top-Down Parsing
1
input
t1 t2
12Bottom-Up Parsing
input
t1 t2 t4 t5
t6 t7 t8
13Example Grammar for Predictive LL Top-Down Parsing
expression ? digit ( expression operator
expression ) operator ? digit ? 0
1 2 3 4 5 6 7 8
9
14Example Grammar for Predictive LL Top-Down Parsing
expression ? digit ( expression operator
expression ) operator ? digit ? 0
1 2 3 4 5 6 7 8
9
15static int Parse_Expression(Expression expr_p)
Expression expr expr_p new_expression()
/ try to parse a digit / if (Token.class
DIGIT) expr-gttypeD
expr-gtvalueToken.repr 0
get_next_token() return 1 /
try parse parenthesized expression / if
(Token.class () expr-gttypeP
get_next_token() if (!Parse_Expression(exp
r-gtleft)) Error(missing expression) if
(!Parse_Operator(expr-gtoper)) Error(missing
operator) if (Token.class ! ))
Error(missing )) get_next_token()
return 1 return 0
16Parsing Expressions
- Try every alternative production
- For P ? A1 A2 An B1 B2 Bm
- If A1 succeeds
- Call A2
- If A2 succeeds
- Call A3
- If A2 fails report an error
- Otherwise try B1
- Recursive descent parsing
- Can be applied for certain grammars
- Generalization LL1 parsing
17int P(...) / try parse the alternative P ?
A1 A2 ... An / if (A1(...)) if
(!A2()) Error(Missing A2) if (!A3())
Error(Missing A3) .. if (!An())
Error(Missing An) return 1
/ try parse the alternative P ? B1 B2
... Bm / if (B1(...)) if (!B2())
Error(Missing B2) if (!B3())
Error(Missing B3) .. if (!Bm())
Error(Missing Bm) return 1
return 0
18Predictive Parser for Arithmetic Expressions
- E ? E T
- E ? T
- T ? T F
- T ? F
- 5 F ? id
- 6 F ? (E)
19Bottom-Up Syntax Analysis
- Input
- A context free grammar
- A stream of tokens
- Output
- A syntax tree or error
- Method
- Construct parse tree in a bottom-up manner
- Find the rightmost derivation in (reversed order)
- For every potential right hand side and token
decide when a production is found - Report an error as soon as the input is not a
prefix of valid program
20Constructing LR(0) parsing table
- Add a production S ? S
- Construct a finite automaton accepting valid
stack symbols - States are set of items A? ???
- The states of the automaton becomes the states of
parsing-table - Determine shift operations
- Determine goto operations
- Determine reduce operations
- Report an error when conflicts arise
211 S ? ?E 4 E ? ? T 6 E ? ? E T 10 T ? ?
i 12 T ? ? (E)
2 S ? E ? 7 E ? E ? T
T
E
5 E ? T ?
i
11 T ? i ?
(
i
13 T ? (? E) 4 E ? ? T 6 E ? ? E T 10 T ?
? i 12 T ? ? (E)
7 E ? E ? T 10 T ? ? i 12 T ? ? (E)
i
(
T
8 E ? E T ?
)
15 T ? (E) ?
22Parsing (i)
1 S ? ?E 4 E ? ? T 6 E ? ? E T 10 T ? ?
i 12 T ? ? (E)
2 S ? E ? 7 E ? E ? T
T
E
5 E ? T ?
i
11 T ? i ?
(
i
13 T ? (? E) 4 E ? ? T 6 E ? ? E T 10 T ?
? i 12 T ? ? (E)
7 E ? E ? T 10 T ? ? i 12 T ? ? (E)
i
(
T
8 E ? E T ?
)
15 T ? (E) ?
23Summary (Bottom-Up)
- LR is a powerful technique
- Generates efficient parsers
- Generation tools exit LALR(1)
- Bison, yacc, CUP
- But some grammars need to be tuned
- Shift/Reduce conflicts
- Reduce/Reduce conflicts
- Efficiency of the generated parser
24Summary (Parsing)
- Context free grammars provide a natural way to
define the syntax of programming languages - Ambiguity may be resolved
- Predictive parsing is natural
- Good error messages
- Natural error recovery
- But not expressive enough
- But LR bottom-up parsing is more expressible
25Abstract Syntax
- Intermediate program representation
- Defines a tree - Preserves program hierarchy
- Generated by the parser
- Declared using an (ambiguous) context free
grammar (relatively flat) - Not meant for parsing
- Keywords and punctuation symbols are not stored
(Not relevant once the tree exists) - Big programs can be also handled (possibly via
virtual memory)
26Semantic Analysis
- Requirements related to the context in which a
construct occurs - Examples
- Name resolution
- Scoping
- Type checking
- Escape
- Implemented via AST traversals
- Guides subsequent compiler phases
27Abstract InterpretationStatic analysis
- Automatically identify program properties
- No user provided loop invariants
- Sound but incomplete methods
- But can be rather precise
- Non-standard interpretation of the program
operational semantics - Applications
- Compiler optimization
- Code quality tools
- Identify potential bugs
- Prove the absence of runtime errors
- Partial correctness
28Constant Propagation
x??, y??, z??
z 3
x??, y??, z ? 3
x??, y??, z?3
while (xgt0)
x??, y??, z?3
if (x1)
x??, y??, z?3
x?1, y??, z?3
y 7
y z4
x?1, y?7, z?3
x??, y?7, z?3
assert y7
29a 0
/ c / L0 a 0 / ac / L1 b a
1 / bc / c c b / bc / a b 2 /
ac / if c lt N goto L1 / c / return c
b a 1
c c b
a b2
c ltN goto L1
return c
30a 0
?
b a 1
?
c c b
?
a b2
?
c ltN goto L1
?
return c
?
31a 0
?
b a 1
?
c c b
?
a b2
?
c ltN goto L1
c
return c
?
32a 0
?
b a 1
?
c c b
?
a b2
c
c ltN goto L1
c
return c
?
33a 0
?
b a 1
?
c c b
c, b
a b2
c
c ltN goto L1
c
return c
?
34a 0
?
b a 1
c, b
c c b
c, b
a b2
c
c ltN goto L1
c
return c
?
35a 0
c, a
b a 1
c, b
c c b
c, b
a b2
c
c ltN goto L1
c
return c
?
36a 0
c, a
b a 1
c, b
c c b
c, b
a b2
c, a
c ltN goto L1
c, a
return c
?
37Summary Iterative Procedure
- Analyze one procedure at a time
- More precise solutions exit
- Construct a control flow graph for the procedure
- Initializes the values at every node to the most
optimistic value - Iterate until convergence
38Basic Compiler Phases
39Overall Structure
40Techniques Studied
- Simple code generation
- Basic blocks
- Global register allocation
- Activation records
- Object Oriented
- Assembler/Linker/Loader
41Heap Memory Management
- Part of the runtime system
- Utilities for dynamic memory allocation
- Utilities for automatic memory reclamation
- Garbage Colletion
42Garbage Collection
- Techniques
- Mark and sweep
- Copying collection
- Reference counting
- Modes
- Generational
- Incremental vs. Stop the world