Elaboration or: Semantic Analysis - PowerPoint PPT Presentation

1 / 42
About This Presentation
Title:

Elaboration or: Semantic Analysis

Description:

Elaboration or: Semantic Analysis Compiler Baojian Hua bjhua_at_ustc.edu.cn Front End Elaboration Also known as type-checking, or semantic analysis context-sensitive ... – PowerPoint PPT presentation

Number of Views:133
Avg rating:3.0/5.0
Slides: 43
Provided by: Baoji1
Category:

less

Transcript and Presenter's Notes

Title: Elaboration or: Semantic Analysis


1
Elaborationor Semantic Analysis
  • Compiler
  • Baojian Hua
  • bjhua_at_ustc.edu.cn

2
Front End
lexical analyzer
source code
tokens
abstract syntax tree
parser
semantic analyzer
IR
3
Elaboration
  • Also known as type-checking, or semantic analysis
  • context-sensitive analysis
  • Checking the context-sensitive property of
    programs (AST)
  • every variable is declared before use
  • every expression has a proper type
  • function calls conform to definitions
  • all other possible context-sensitive info
    (highly language-dependent)

4
Elaboration Example
  • // Sample C code
  • void f (int p)
  • x 4
  • p (23)
  • hello world
  • int main ()
  • f () 5
  • break

What errors can be detected here?
5
Conceptually
Elaborator
AST
Intermediate Code
Language Semantics
6
Semantics
  • Traditionally, semantics takes the form of
    natural language specification
  • e.g., for the operator, both the left and
    right operands should be of integer type
  • refer to various specifications
  • But recent research has revealed that semantics
    can also be addressed via math
  • rigorous and clean

7
Semantics
  • Now lets turn to Macqueens note
  • How to implement these rules?

8
Language S
  • // Lets make the SLP typed
  • prog -gt decs stm
  • decs -gt type ID decs
  • -gt
  • type -gt bool int
  • stm -gt stm stm
  • ID exp
  • print (exp)
  • printBool (exp)
  • exp -gt ID NUM expexp expexp
  • true false

variable declarations followed by statements
two types bool and int
print an integer value
print a bool value
both the two sub-expressions must be booleans
9
Symbol Tables
  • In order to keep track of the types and other
    infos wed maintain a finite map of program
    symbols to info
  • symbols variables, function names, etc.
  • Such a mapping is called a symbol table, or
    sometimes an environment
  • Notation x1 b1, x2 b2, , xn bn
  • where bi (1i n) is called a binding

10
Type System
  • Next, we write the symbol table as ?
  • ?ty x1 ty x2 ty x3
  • a list of (ty var) tuples
  • may be empty
  • Each rule takes the form of


? ? P1 ty
? ? Pn ty
?? ? C ty
11
Type System exp
ty x ? ?
?? ? n int
?? ? x ty
?? ? true bool
?? ? false bool
? ? e1 int
? ? e2 int
?? ? e1e2 int
? ? e1 bool
? ? e2 bool
?? ? e1e2 bool
12
Type System stm
? ? x ty
? ? e ty
??- xe OK
? ? e int
?? ? print(e) OK
? ? e bool
?? ? printBool(e) OK
13
Type System dec, prog
id ? dom(?)
? type id ? decs ?
?? ? type id decs ?
?? ? ?
? ? stm OK
? decs ?
? ? decs stm OK
14
Example
// Whether or not the following program is //
well-typed? int x int y print (xy)
int x ? ?
int y ? ?
? ? x int
? ? y int
int x int y ? ?
? ? xy int
int x ? int y ?
? int x int y ?
? ? print(xy) OK
? ? int x int y print(xy) OK
15
Elaboration of Expressions
? ? n int
  • type elab_exp (sigma, n)
  • return int

16
Elaboration of Expressions
? ? true bool
  • type elab_exp (sigma, true)
  • return bool

17
Elaboration of Expressions
? ? false bool
  • type elab_exp (sigma, false)
  • return bool

18
Elaboration of Expressions
ty x ? venv
? ? x ty
  • type elab_exp (sigma, x)
  • type ty Table_lookup (sigma, x)
  • if (tyNULL)
  • error (variable not declared)
  • return ty

19
Elaboration of Expressions
  • type elab_exp (sigma, e1e2)
  • type t1 elab_exp (sigma, e1)
  • type t2 elab_exp (sigma, e2)
  • switch (t1, t2)
  • case (Int, Int) return Int
  • case (Int, _) error (e2 should be int)
  • case(_, Int) error (e1 should be int)
  • default error (should both be int)

20
Elaboration of Expressions
? ? e1 bool
? ? e2 bool
? ? e1e2 bool
  • type elab_exp (sigma, e1e2)
  • type t1 elab_exp (sigma, e1)
  • type t2 elab_exp (sigma, e2)
  • switch (t1, t2)
  • case (Bool, Bool) return Bool
  • case (Bool, _) error(e2 should be bool)
  • case(_, Bool) error(e1 should be bool)
  • default error (should both be bool)

21
Elaboration of Statements
? ? x ty
? ? e ty
? ? xe OK
  • void elab_stm (sigma, xe)
  • type t1 elab_exp (sigma, x)
  • type t2 elab_exp (sigma, e)
  • if (t1 ! t2)
  • error (different types in assigment)

22
Elaboration of Statements
? ? e int
? ? print(e) OK
  • void elab_stm (sigma, print(e))
  • type ty elab_exp (sigma, e)
  • if (ty ! INT)
  • error (type should be INT)

23
Elaboration of Statements
? ? e bool
? ? printBool(e) OK
  • void elab_stm (sigma, printBool(e))
  • type ty elab_exp (sigma, e)
  • if (ty ! BOOL)
  • error (type should be BOOL)

24
Elaboration of Declarations
ID ? dom(?)
? type ID? ? decs ?
? ?? type ID decs ?
?? ? ?
  • Sigma elab_decs (sigma, decs)
  • if (decs)
  • return sigma
  • // decs type ID decs
  • if (ID\in sigma) error (duplicated decl)
  • new_sigma enter_table (sigma, type ID)
  • return elab_decs(new_sigma, decs)

25
Elaboration of Programs
?? decs ?
? ?? stm OK
? ? ?? decs stm OK
  • void elab_prog (decs stm)
  • sigma elab_decs (decs)
  • elab_stm (sigma, stm)

26
Moral
  • There may be other information associated with
    identifiers, not just types, say
  • Scope
  • Storage class
  • Access control info
  • All these details are handled by symbol tables
    (?)!

27
Implementation
  • Must be efficient!
  • lots of variables, functions, etc
  • Two basic approaches
  • Functional
  • symbol table is implemented as a functional data
    structure (e.g., red-black tree), with no tables
    ever destroyed or modified
  • Imperative
  • a single table, modified for every binding added
    or removed
  • This choice is largely independent of the
    implementation language

28
Functional Symbol Table
  • Basic idea
  • when implementing s2 s1 xt
  • creating a new table s2, instead of modifying s1
  • when deleting, restore to the old table
  • A good data structure for this is BST or
    red-black tree

29
BST Symbol Table
?
?
c int
c int
e int
a char
b double
30
Possible Functional Interface
  • signature SYMBOL_TABLE
  • sig
  • type a t
  • type key
  • val empty a t
  • val insert a t key a -gt a t
  • val lookup a t key -gt a option
  • end

31
Imperative Symbol Tables
  • The imperative approach almost always involves
    the use of hash tables
  • Need to delete entries to revert to previous
    environment
  • made simpler because deletes follow a stack
    discipline
  • can maintain a stack of entered symbols, so that
    they can be later popped and removed from the
    hash table

32
Possible Imperative Interface
  • signature SYMBOL_TABLE
  • sig
  • type a t
  • type key
  • val insert a t key a -gt unit
  • val lookup a t key -gt a option
  • val delete a t key -gt unit
  • val beginScope unit -gt unit
  • val endScope unit -gt unit
  • end

33
Implementation of Symbols
  • For several reasons, it will be useful at some
    point to represent symbols as elements of a
    small, densely packed set of identities
  • fast comparisons (equality)
  • for dataflow analysis, we will want sets of
    variables and fast set operations
  • It will be critically important to use bit
    strings to represent the sets
  • For example, your liveness analysis algorithm
  • More on this later

34
Scope
  • How to handle lexical scope?
  • Many choices
  • One table insert and remove bindings during
    elaboration, as we enters and leaves a local
    scope
  • Stack of tables insertion and removal always
    operated on stack-top
  • dragon compiler makes use of this

35
One-table approach
  • int x sxint
  • int f () s1 s f xint, f
  • if (4)
  • int x s2 s1 xint x, f, x
  • x 6
  • s1
  • else
  • int x s4 s1 xint x, f, x
  • x 5
  • s1
  • x 8
  • s1

Shadowing is not commutative!
36
Name Space
  • struct list
  • int x
  • struct list list
  • list
  • void walk (struct list list)
  • list
  • printf (d\n, list-gtx)
  • if (list list-gtlist)
  • goto list

37
Name Space
  • Its trivial to handle name space
  • one symbol table for each name space
  • Take C as an example
  • Several different name spaces
  • labels
  • tags
  • variables
  • So

38
Types
  • The representation of types is highly
    language-dependent
  • Some key considerations
  • name vs. structural equivalence
  • mutually recursive type definitions
  • errors handling

39
Name vs. Structural Equivalence
struct A int i x struct B int i
y x y
  • In a language with structural equivalence, this
    program is legal
  • But not in a language with name equivalence
    (e.g., C)
  • For name equivalence, can generate a unique
    symbol for each defined type
  • For structural equivalence, need to recursively
    compare the types

40
Mutually recursive type definitions
  • To process recursive and mutually recursive type
    definitions, need a placeholder
  • in ML, an option ref
  • in C, a pointer
  • in Java, bind method (read Appel)

struct A int data struct A next struct
B b struct B
41
Error Diagnostic
  • To recover from errors, it is useful to have an
    any type
  • makes it possible to continue more type-checking
  • In practice, use int or guess one
  • Similarly, a void type can be used for
    expressions that return no value
  • Source locations are annotated in AST!

42
Summary
  • Elaboration checks the context-sensitive
    properties of programs
  • must take care of semantics of source programs
  • and may translate into more low-level forms
  • Usually the most big (complex) part in a compiler!
Write a Comment
User Comments (0)
About PowerShow.com