Title: CS%20320:%20Compiling%20Techniques
1CS 320 Compiling Techniques
2People
- David Walker (Professor)
- 412 Computer Science Building
- dpw_at_cs.princeton.edu
- office hours after each class
- Dan Dantas (TA)
- 417 Computer Science Building
- ddantas_at_cs.princeton.edu
- office hours Mondays 2-3 PM
3Information
- Web site
- www.cs.princeton.edu/courses/archive/spring04/cos3
20/index.htm - Mailing list
4Books
- Modern Compiler Implementation in ML
- Andrew Appel
- required
- Elements of ML Programming
- Jeffrey D. Ullman
- also online references see Web site
5Assignment 0
- Write your name and other information on the
sheet circulating - Find, skim and bookmark the course web pages
- Subscribe to course e-mail list
- Begin assignment 1
- Read chapter 1 Appel
- Figure out how to run use SML
- Due next Thursday 12
6onward!
7What is a compiler?
- A compiler is program that translates a source
language into an equivalent target language
8What is a compiler?
while (i gt 3) ai bi i
C program
compiler does this
mov eax, ebx add eax, 1 cmp eax, 3 jcc eax, edx
assembly program
9What is a compiler?
class foo int bar ...
Java program
compiler does this
struct foo int bar ...
C program
10What is a compiler?
class foo int bar ...
Java program
compiler does this
........ ......... ........
Java virtual machine program
11What is a compiler?
\newcommand ....
Latex program
compiler does this
\sfd\sf\fadg
Tex program
12What is a compiler?
\newcommand ....
Tex program
compiler does this
\sfd\sf\fadg
Postscript program
13What is a compiler?
- Other places
- Web scripts are compiled into HTML
- assembly language is compiled into machine
language - hardware description language is compiled into a
hardware circuit - ...
14Compilers are complex
- text file to abstract syntax
- lexing parsing
- abstract syntax to intermediate form (IR)
- analysis optimizations data layout
- IR to machine code
- code generation register allocation
front-end
middle-end
back-end
15Course project
- Tiger Source Language
- simple imperative language
- Instruction Trees as intermediate form (IR)
- type checking data layout on the stack
- Code Generation
- instruction selection algorithms register
allocation via graph coloring
front-end
middle-end
back-end
16Standard ML
- Standard ML is a domain-specific language for
building compilers - Support for
- Complex data structures (abstract syntax,
compiler intermediate forms) - Memory management like Java
- Large projects with many modules
- Advanced type system for error detection
17Introduction to ML
- You will be responsible for learning ML on your
own. - Today I will cover some basics
- Resources
- Jeffrey Ullman Elements of ML Programming
- Robert Harpers an introduction to ML
- See course webpage for pointers and info about
how to get the software
18Intro to ML
- Highlights
- Data Structures for compilers
- Data type definitions
- Pattern matching
- Strongly-typed language
- Every expression has a type
- Certain errors cannot occur
- Polymorphic types provide flexibility
- Flexible Module System
- Abstract Types
- Higher-order modules (functors)
19Intro to ML
- Interactive Language
- Type in expressions
- Evaluate and print type and result
- Compiler as well
- High-level programming features
- Data types
- Pattern matching
- Exceptions
- Mutable data discouraged
20Preliminaries
- Read Eval Print Loop
- - 3 2
21Preliminaries
- Read Eval Print Loop
- - 3 2
- gt 5 int
22Preliminaries
- Read Eval Print Loop
- - 3 2
- gt 5 int
- - it 7
- gt 12 int
23Preliminaries
- Read Eval Print Loop
- - 3 2
- gt 5 int
- - it 7
- gt 12 int
- - it 3
- gt 9 int
- - 4 true
- stdIn17.1-17.9 Error operator and operand don't
agree literal - operator domain int int
- operand int bool
- in expression
- 4 true
24Preliminaries
- Read Eval Print Loop
-
- - 3 div 0
- Failure Div - run-time error
25Basic Values
- - ()
- gt () unit gt like void in C (sort of)
- gt the uninteresting value/type
- - true
- gt true bool
- - false
- gt false bool
- - if it then 32 else 7 else clause is always
necessary - gt 7 int
- - false andalso loop_Forever
- gt false bool and also, or else short-circuit
eval
26Basic Values
- Integers
- - 3 2
- gt 5 int
- - 3 (if not true then 5 else 7)
- gt 10 int No division between expressions
- and statements
- Strings
- - Dave Walker
- gt Dave Walker string
- - print foo\n
- foo
- gt 3 int
- Reals
- - 3.14
- gt 3.14 real
27Using SML/NJ
- Interactive mode is a good way to start learning
and to debug programs, but - Type in a series of declarations into a .sml
file - - use foo.sml
- opening foo.sml
-
list of declarations with their types
28Larger Projects
- SML has its own built in interactive make
- Pros
- It automatically does the dependency analysis for
you - No crazy makefile syntax to learn
- Cons
- May be more difficult to interact with other
languages or tools
29Compilation Manager
sources.cm
c.sml
b.sml
a.sig
Group is a.sig b.sml c.sml
- sml
- OS.FileSys.chDir /courses/510/a2
- CM.make() looks for sources.cm, analyzes
dependencies - compiling compiles files in group
- wrote saves binaries in ./CM/
- - CM.make myproj/() specify directory
30What is next?
- ML has a rich set of structured values
- Tuples (17, true, stuff)
- Records name Dave, ssn 332177
- Lists 345nil or 3,4_at_5
- Datatypes
- Functions
- And more!
- Rather than list all the details, we will write a
couple of programs
31An interpreter
- Interpreters are usually implemented as a series
of transformers
lexing/ parsing
evaluate
print
stream of characters
abstract syntax
abstract value
stream of characters
32A little language (LL)
- An arithmetic expression e is
- a boolean value
- an if statement (if e1 then e2 else e3)
- an integer
- an add operation
- a test for zero (isZero e)
33LL abstract syntax in ML
datatype term Bool of bool If of term
term term Num of int Add of term term
IsZero of term
-- constructors are capitalized --
constructors can take a single argument of
a particular type
type of a tuple another eg string char
vertical bar separates alternatives
34LL abstract syntax in ML
Add
Add (Num 2, Num 3) represents the expression 2
3
Num
Num
2
3
35LL abstract syntax in ML
If
If (Bool true, Num 0, Add (Num 2, Num
3)) represents if true then 0 else 2 3
Add
Bool
Num
true
Num
Num
0
3
2
36Function declarations
function name
function parameter
fun isValue t case t of Num n gt true
Bool b gt true _ gt false
default pattern matches anything
37What is the type of the parameter t? Of the
function?
function name
function parameter
fun isValue t case t of Num n gt true
Bool b gt true _ gt false
default pattern matches anything
38What is the type of the parameter t? Of the
function?
fun isValue (tterm) bool case t of Num
n gt true Bool b gt true _ gt false
val isValue term -gt bool
ML does type inference gt you need not annotate
functions yourself (but it can be helpful)
39A type error
fun isValue t case t of Num _ gt true
_ gt false
ex.sml22.3-24.15 Error types of rules don't
agree literal earlier rule(s) term -gt int
this rule term -gt bool in rule _ gt false
40A type error
Actually, ML will give you several errors in a
row ex.sml22.3-25.15 Error types of rules
don't agree literal earlier rule(s) term -gt
int this rule term -gt bool in rule
Successor t2 gt true ex.sml22.3-25.15 Error
types of rules don't agree literal earlier
rule(s) term -gt int this rule term -gt bool
in rule _ gt false
41A very subtle error
fun isValue t case t of num gt true _
gt false
The code above type checks. But when we test it
refined the function always returns true. What
has gone wrong?
42A very subtle error
fun isValue t case t of Num 0 gt 1
Add(Num t1,Num t2) gt t1 t2 _ gt 0
The code above type checks. But when we test it
refined the function always returns true. What
has gone wrong? -- num is not capitalized (and
has no argument) -- ML treats it like a variable
pattern (matches anything!)
43Exceptions
exception Error of string fun debug s unit
raise (Error s)
44Exceptions
exception Error of string fun debug s unit
raise (Error s)
in SML interpreter
- debug "hello" uncaught exception Error
raised at ex.sml15.28-15.35
45Evaluator
fun isValue t ... exception NoRule fun eval t
case t of Bool _ Num _ gt t ...
46Evaluator
... fun eval t case t of Bool _ Num _
gt t If(t1,t2,t3) gt let val v eval
t1 in case v of Bool b gt if
b then (eval t2) else (eval t3) _ gt
raise NoRule end
let statement for remembering temporary results
47Evaluator
exception NoRule fun eval1 t case t of
Bool _ Num _ gt ... ... Add (t1,t2) gt
case (eval v1, eval v2) of (Num
n1, Num n2) gt Num (n1 n2) (_,_) gt
raise NoRule
48Finishing the Evaluator
fun eval1 t case t of ... ... Add
(t1,t2) gt ... IsZero t gt ...
be sure your case is exhaustive
49Finishing the Evaluator
fun eval1 t case t of ... ... Add
(t1,t2) gt ...
What if we forgot a case?
50Finishing the Evaluator
fun eval1 t case t of ... ... Add
(t1,t2) gt ...
What if we forgot a case?
ex.sml25.2-35.12 Warning match nonexhaustive
(Bool _ Zero) gt ... If
(t1,t2,t3) gt ... Add (t1,t2) gt ...