Title: Elaboration or: Semantic Analysis
1Elaborationor Semantic Analysis
- Compiler
- Baojian Hua
- bjhua_at_ustc.edu.cn
2Front End
lexical analyzer
source code
tokens
abstract syntax tree
parser
semantic analyzer
IR
3Elaboration
- Also known as type-checking, or semantic analysis
- context-sensitive analysis
- Checking the context-sensitive property of
programs (AST) - every variable is declared before use
- every expression has a proper type
- function calls conform to definitions
- all other possible context-sensitive info
(highly language-dependent)
4Elaboration Example
- // Sample C code
- void f (int p)
-
- x 4
- p (23)
- hello world
-
- int main ()
-
- f () 5
- break
What errors can be detected here?
5Conceptually
Elaborator
AST
Intermediate Code
Language Semantics
6Semantics
- Traditionally, semantics takes the form of
natural language specification - e.g., for the operator, both the left and
right operands should be of integer type - refer to various specifications
- But recent research has revealed that semantics
can also be addressed via math - rigorous and clean
7Semantics
- Now lets turn to Macqueens note
- How to implement these rules?
8Language S
- // Lets make the SLP typed
- prog -gt decs stm
- decs -gt type ID decs
- -gt
- type -gt bool int
- stm -gt stm stm
- ID exp
- print (exp)
- printBool (exp)
- exp -gt ID NUM expexp expexp
- true false
variable declarations followed by statements
two types bool and int
print an integer value
print a bool value
both the two sub-expressions must be booleans
9Symbol Tables
- In order to keep track of the types and other
infos wed maintain a finite map of program
symbols to info - symbols variables, function names, etc.
- Such a mapping is called a symbol table, or
sometimes an environment - Notation x1 b1, x2 b2, , xn bn
- where bi (1i n) is called a binding
10Type System
- Next, we write the symbol table as ?
- ?ty x1 ty x2 ty x3
- a list of (ty var) tuples
- may be empty
- Each rule takes the form of
? ? P1 ty
? ? Pn ty
?? ? C ty
11Type System exp
ty x ? ?
?? ? n int
?? ? x ty
?? ? true bool
?? ? false bool
? ? e1 int
? ? e2 int
?? ? e1e2 int
? ? e1 bool
? ? e2 bool
?? ? e1e2 bool
12Type System stm
? ? x ty
? ? e ty
??- xe OK
? ? e int
?? ? print(e) OK
? ? e bool
?? ? printBool(e) OK
13Type System dec, prog
id ? dom(?)
? type id ? decs ?
?? ? type id decs ?
?? ? ?
? ? stm OK
? decs ?
? ? decs stm OK
14Example
// Whether or not the following program is //
well-typed? int x int y print (xy)
int x ? ?
int y ? ?
? ? x int
? ? y int
int x int y ? ?
? ? xy int
int x ? int y ?
? int x int y ?
? ? print(xy) OK
? ? int x int y print(xy) OK
15Elaboration of Expressions
? ? n int
- type elab_exp (sigma, n)
- return int
16Elaboration of Expressions
? ? true bool
- type elab_exp (sigma, true)
- return bool
17Elaboration of Expressions
? ? false bool
- type elab_exp (sigma, false)
- return bool
18Elaboration of Expressions
ty x ? venv
? ? x ty
- type elab_exp (sigma, x)
- type ty Table_lookup (sigma, x)
- if (tyNULL)
- error (variable not declared)
- return ty
19Elaboration of Expressions
- type elab_exp (sigma, e1e2)
- type t1 elab_exp (sigma, e1)
- type t2 elab_exp (sigma, e2)
- switch (t1, t2)
- case (Int, Int) return Int
- case (Int, _) error (e2 should be int)
- case(_, Int) error (e1 should be int)
- default error (should both be int)
-
20Elaboration of Expressions
? ? e1 bool
? ? e2 bool
? ? e1e2 bool
- type elab_exp (sigma, e1e2)
- type t1 elab_exp (sigma, e1)
- type t2 elab_exp (sigma, e2)
- switch (t1, t2)
- case (Bool, Bool) return Bool
- case (Bool, _) error(e2 should be bool)
- case(_, Bool) error(e1 should be bool)
- default error (should both be bool)
-
21Elaboration of Statements
? ? x ty
? ? e ty
? ? xe OK
- void elab_stm (sigma, xe)
- type t1 elab_exp (sigma, x)
- type t2 elab_exp (sigma, e)
- if (t1 ! t2)
- error (different types in assigment)
22Elaboration of Statements
? ? e int
? ? print(e) OK
- void elab_stm (sigma, print(e))
- type ty elab_exp (sigma, e)
- if (ty ! INT)
- error (type should be INT)
23Elaboration of Statements
? ? e bool
? ? printBool(e) OK
- void elab_stm (sigma, printBool(e))
- type ty elab_exp (sigma, e)
- if (ty ! BOOL)
- error (type should be BOOL)
24Elaboration of Declarations
ID ? dom(?)
? type ID? ? decs ?
? ?? type ID decs ?
?? ? ?
- Sigma elab_decs (sigma, decs)
- if (decs)
- return sigma
-
- // decs type ID decs
- if (ID\in sigma) error (duplicated decl)
- new_sigma enter_table (sigma, type ID)
- return elab_decs(new_sigma, decs)
25Elaboration of Programs
?? decs ?
? ?? stm OK
? ? ?? decs stm OK
- void elab_prog (decs stm)
- sigma elab_decs (decs)
- elab_stm (sigma, stm)
26Moral
- There may be other information associated with
identifiers, not just types, say - Scope
- Storage class
- Access control info
-
- All these details are handled by symbol tables
(?)!
27Implementation
- Must be efficient!
- lots of variables, functions, etc
- Two basic approaches
- Functional
- symbol table is implemented as a functional data
structure (e.g., red-black tree), with no tables
ever destroyed or modified - Imperative
- a single table, modified for every binding added
or removed - This choice is largely independent of the
implementation language
28Functional Symbol Table
- Basic idea
- when implementing s2 s1 xt
- creating a new table s2, instead of modifying s1
- when deleting, restore to the old table
- A good data structure for this is BST or
red-black tree
29BST Symbol Table
?
?
c int
c int
e int
a char
b double
30Possible Functional Interface
- signature SYMBOL_TABLE
- sig
- type a t
- type key
- val empty a t
- val insert a t key a -gt a t
- val lookup a t key -gt a option
- end
31Imperative Symbol Tables
- The imperative approach almost always involves
the use of hash tables - Need to delete entries to revert to previous
environment - made simpler because deletes follow a stack
discipline - can maintain a stack of entered symbols, so that
they can be later popped and removed from the
hash table
32Possible Imperative Interface
- signature SYMBOL_TABLE
- sig
- type a t
- type key
- val insert a t key a -gt unit
- val lookup a t key -gt a option
- val delete a t key -gt unit
- val beginScope unit -gt unit
- val endScope unit -gt unit
- end
33Implementation of Symbols
- For several reasons, it will be useful at some
point to represent symbols as elements of a
small, densely packed set of identities - fast comparisons (equality)
- for dataflow analysis, we will want sets of
variables and fast set operations - It will be critically important to use bit
strings to represent the sets - For example, your liveness analysis algorithm
- More on this later
34Scope
- How to handle lexical scope?
- Many choices
- One table insert and remove bindings during
elaboration, as we enters and leaves a local
scope - Stack of tables insertion and removal always
operated on stack-top - dragon compiler makes use of this
35One-table approach
- int x sxint
- int f () s1 s f xint, f
-
- if (4)
- int x s2 s1 xint x, f, x
- x 6
- s1
- else
- int x s4 s1 xint x, f, x
- x 5
- s1
- x 8
- s1
Shadowing is not commutative!
36Name Space
- struct list
-
- int x
- struct list list
- list
- void walk (struct list list)
-
- list
- printf (d\n, list-gtx)
- if (list list-gtlist)
- goto list
37Name Space
- Its trivial to handle name space
- one symbol table for each name space
- Take C as an example
- Several different name spaces
- labels
- tags
- variables
- So
38Types
- The representation of types is highly
language-dependent - Some key considerations
- name vs. structural equivalence
- mutually recursive type definitions
- errors handling
39Name vs. Structural Equivalence
struct A int i x struct B int i
y x y
- In a language with structural equivalence, this
program is legal - But not in a language with name equivalence
(e.g., C) - For name equivalence, can generate a unique
symbol for each defined type - For structural equivalence, need to recursively
compare the types
40Mutually recursive type definitions
- To process recursive and mutually recursive type
definitions, need a placeholder - in ML, an option ref
- in C, a pointer
- in Java, bind method (read Appel)
struct A int data struct A next struct
B b struct B
41Error Diagnostic
- To recover from errors, it is useful to have an
any type - makes it possible to continue more type-checking
- In practice, use int or guess one
- Similarly, a void type can be used for
expressions that return no value - Source locations are annotated in AST!
42Summary
- Elaboration checks the context-sensitive
properties of programs - must take care of semantics of source programs
- and may translate into more low-level forms
- Usually the most big (complex) part in a compiler!