Title: CS 4240
1CS 4240
(many slides from Bodik and August)
2About Me
- Prof. Nate Clark
- But just call me Nate
- I swear a lot
- 2nd year here at Ga. Tech. (be gentle)
- Hardware guy who likes compilers
- Dynamic program optimization
- Before this Grad student at UMich
- IBM, ARM, HP
- Before before Undergrad at UMich
3About Class
- http//www.cc.gatech.edu/ntclark/4240f08/
- Give a man fire and hes warm for a day set a
man on fire and hes warm for the rest of his
life - Background
4Planned topics
- Lexing/parsing
- Symbol tables
- Error detection/recovery
- Semantic analysis
- IR generation
- OO languages
- Some optimizations
5Classy Class Class
- Book
- Compilers Principles, Techniques, and Tools,
Aho, Lam, Sethi, and Ullman
6Grades
- Project X
- 2 Exams Y
- Cheating
- Winnowing Local Algorithms for Document
Fingerprinting by Schleimer, Wilkerson, and Aiken
7What is a Compiler?
8The Stack
- Almost 100 application code goes through
compiler - 99.5 Linux OS Kernel
9- the structure of a compiler
10Three execution environments
- Interpreters
- Scheme, lisp, perl, python
- the more popular of these later got compilers
- Compilers
- C/C to machine code
- Java to Java bytecode
- Virtual machines
- interpreter, often aided by a just-in-time (JIT)
compiler - Java bytecode runs on a VM
11The Structure of a Compiler
- Scanning (Lexical Analysis)
- Parsing (Syntactic Analysis)
- Type checking (Semantic Analysis)
- Optimization
- Code Generation
- The first 3, at least, can be understood by
analogy to how humans comprehend English.
12Lexical Analysis
- Lexical analyzer divides program text into
words or tokens - if x y then z 1 else z 2
- Units
- if, x, , y, then, z, , 1, , else, z, , 2,
13Parsing
- Once words are understood, the next step is to
understand sentence structure - Parsing Diagramming Sentences
- The diagram is a tree
14Diagramming a Sentence
15Parsing Programs
- Parsing program expressions is the same
- Consider
- If x y then z 1 else z 2
- Diagrammed
16Semantic Analysis in English
- Example
- Jack said Jerry left his assignment at home.
- What does his refer to? Jack or Jerry?
- Even worse
- Jack said Jack left his assignment at home?
- How many Jacks are there?
- Which one left the assignment?
17Semantic Analysis I
- Programming languages define strict rules to
avoid such ambiguities - This Java code prints 4 the inner definition
is used
-
- int Jack 3
-
- int Jack 4
- System.out. print(Jack)
-
-
18Semantic Analysis II
- Compilers also perform checks to find bugs
- Example
- Jack left her homework at home.
- A type mismatch between her and Jack
- we know they are different people (presumably
Jack is male)
19Code Generation
- A translation into another language
- Analogous to human translation
- Compilers for C, C
- produce assembly code (typically)
- Code generators
- produce C or Java
20- New Languages will Keep Coming
21A survey how many languages have you used?
22Be prepared to program in new languages
- Languages undergo constant change
- FORTRAN 1953
- ALGOL 60 1960
- C 1973
- C 1985
- Java 1995
- Evolution steps 12 years per big language
- are we overdue for the next big one?
- ... or is the future already here?
- are we in a major change in what programs
express, and how?
23Develop your own language
- Are you kidding? No. Guess who developed
- PHP
- Ruby
- JavaScript
- perl
- Done by smart hackers like you
- say these were done in a garage
- not in ivory tower
24- Trends in Programming Languages
25Trends in programming languages
- the programming language and its compiler
- programmers primary tools
- you must know them inside out
- languages has been constantly evolving ...
- what are the forces driving the change?
- ... and will keep doing so
- to predict the future, lets examine the history
26ENIAC (1946, University of Philadelphia)
ENIAC program for external ballistic equations
27Programming the ENIAC
28ENIAC (1946, U of Philadelphia)
- programming done by
- rewiring the interconnections
- to set up desired formulas, etc
- Problem whats the tedious part?
- programming rewiring
- slow, error-prone
- Lesson
- store the program in memory!
- birth of von Neuman paradigm
29UDSAC (1947, Cambridge University)
- the first real computer
- large-scale, fully functional, stored-program
electronic digital computer (by Maurice Wilkes) - this was a von Neuman computer
- problem Wilkes realized
- a good part of the remainder of my life was
going to be spent in finding errors in ...
programs - solution procedures (1951)
- reusable software was born
- procedure the first (implemented) language
construct
30Assembly the language (UNIVAC 1, 1950)
- Idea translate mnemonic code (assembly) by hand
- by hand, they did not have a compiler yet
- write programs with mnemonic codes (add, sub),
with symbolic labels, - then assign addresses by hand
- Example of symbolic assembler
- clear-and-add a
- add b
- store c
- translate it by hand to something like this
(understood by CPU) - B100 A200 C300
31Assembler the compiler (Manchester, 1952)
- it was assembler nearly as we know it, called
AutoCode - a loop example, in MIPS, a modern-day assembly
code - loop addi t3, t0, -8
- addi t4, t0, -4
- lw t1, theArray(t3) Gets the last
- lw t2, theArray(t4) two elements
- add t5, t1, t2 Adds them together...
- sw t5, theArray(t0) ...and stores result
- addi t0, t0, 4 Moves to next "element
- of theArray
- blt t0, 160, loop If not past the end of
- jr ra theArray, repeat
32Assembly programming caught on, but
- Problem Software costs exceeded hardware costs!
- John Backus Speedcoding
- An interpreter for a high-level language
- Ran 10-20 times slower than hand-written assembly
- way too slow
33FORTRAN I (1954-57)
- The first HLL compiler
- Produced code almost as good as hand-written
- Huge impact on computer science
- Modern compilers preserve its outlines
- FORTRAN (the language) still in use today
- By 1958, gt50 of all software is in FORTRAN
- Cut development time dramatically
- 2 weeks ? 2 hrs
- thats more than 100-fold
34FORTRAN I (IBM, John Backus, 1954)
- Example nested loops in FORTRAN
- a big improvement over assembler,
- but annoying artifacts of assembly remain
- labels and rather explicit jumps (CONTINUE)
- lexical columns the statement must start in
column 7 - The MIPS loop in FORTRAN
-
- DO 10 I 2, 40
- AI AI-1 AI-2
- 10 CONTINUE
35Side note designing a good language is hard
- A good language protects against bugs, but
lessons take a while. - An example that cause a failure of a NASA
planetary probe - buggy line
- DO 15 I 1.100
- what was intended (a dot had replaced the
comma) - DO 15 I 1,100
- because Fortran ignores spaces, compiler read
this as - DO15I 1.100
- which is an assignment into a variable DO15I,
not a loop. - This mistake is harder (if at all possible) to
make with the modern lexical rules (white space
not ignored) and loop syntax - for (i1 i lt 100 i)
36Goto considered harmful
- L1 statement
- if expression goto L1
- statement
- Dijkstra says gotos are harmful
- use structured programming
- lose some performance, gain a lot of readability
- how do you rewrite the above code into structured
form?
37Object-oriented programming (1970s)
- The need to express that more than one object
supports draw() - draw(2DElement p)
- switch (p.type)
- SQUARE // draw a square
- break
- CIRCLE // draw a circle
- break
-
-
- Problem
- unrelated code (drawing of SQUARE and CIRCLE)
mixed in same procedure - Solution
- Object-oriented programming with inheritance
38Object-oriented programming
- In Java, the same code has the desired
separation - class Circle extends 2DElement
- void draw() ltdraw circlegt
-
- // similar for Square
- the dispatch is now much simpler
- p.draw()
39Review of historic development
- wired interconnects ? stored program (von
Neuman machines) - lots of common bugs in repeated code ? procedures
- machine code ? symbolic assembly (compiled by
hand) - tedious compilation ? assembler (the assembly
compiler) - assembly ? FORTRAN I
- gotos ? structured programming
- hardcoded OO programming ? inheritance, virtual
calls - Do you see a trend?
40Where will languages go from here?
- The trend is towards higher-level abstractions
- express the algorithm concisely!
- which means hiding often repeated code fragments
- new language constructs hide more of these
low-level details. - Also, detect more bugs when the program is
compiled - with stricter type checking
41Why software developers need PL
- New languages will keep coming. But why do
they? - Understand them, choose the right one.
- Write code that writes code. When to write a
codegen? - Be the wizard, not the typist.
- Develop your own language. How do I do that?
- Are you kidding? No.
- Learn about compilers and interpreters.
- Programmers main tools.
- One answer to these questions (to) hide the
plumbing.