Title: Credible Compilation With Pointers
1Credible Compilation With Pointers
- Martin Rinard and Darko Marinov
- Laboratory for Computer Science
- Massachusetts Institute of Technology
2Goal
Compiler
Source Code (C, Java)
Object Code
Proof that Object Code Implements Source Code
Proof Checker
Yes
No
3Proposed Approach
Parser
Source
Internal Representation
Equivalence Proof
Analysis Transformation
Internal Representation
Equivalence Proof
Analysis Transformation
Internal Representation
Simple Code Generator
Binary Code
4Key Aspects
- Majority of compiler structured as a sequence of
transformations on standard intermediate format - Compiler generates a proof for each
transformation - Separate verification frameworks for
- Parser
- Transformations
- Code generator
- Today Proving Transformations Correct
5Overview of Framework
- Compiler operates on compilation units
- procedure, loop nest, method
- Set of externally visible variables
- global variables, instance variables
- Correctness condition
- transformation preserves final values of
externally visible variables - original and transformed programs terminate under
same conditions
6Example
Externally visible variable g
7Structure of A Transformation
- Analysis to discover program properties
- reaching definitions
- available expressions
- Program transformation
- constant and copy propagation
- common subexpression elimination
- Correctness of transformation often depends on
correctness of analysis result
8Two Stage Proof Structure
- Prove analysis results correct
- Classic Floyd approach for proving program
properties - Use analysis results to prove simulation
relations between original and transformed
programs - State equality of expressions at corresponding
program points under certain execution conditions
9Standard and Simulation Invariants
- Invariants for program analysis results
- ltcgtp - The condition c is always true at the
program point p - Simulation invariants for transformations
- ltc1,e1gtp1 - ltc2,e2gtp2
- For all executions of the transformed program
that reach the program point p2 with the
condition c2 true, there exists an execution of
the original program that reaches p1 with c1 true
such that e1 e2
10Example
1
a
i ? 0
2
b
i ? i 3
3
c
g ? 2 i
4
d
i lt 24
5
exit
ltg 2 igt 4
lttrue, 2 igt 2 - lttrue, ggt b
lt2 igt 4 - ltggt c
ltggt 5 - ltggt d
Correctness Condition
11Proving Standard Invariants
- Proof rules propagate invariants backwards
through control flow graph - Substitution at assignment statements
- Add condition in at conditional branches
- Propagate along all edges at join points
3
g ? 2 i
lt2 i 2 igt 3
4
i lt 24
ltg 2 igt 4
12Proving Simulation Invariants
- Each proof rule propagates one of the two sides
of invariant (ltc1,e1gtp1 - ltc2,e2gtp2) - Right side propagated in transformed program
- Propagate along all edges at join points
- Left side propagated in original program
- Propagate along one edge at join points
- Can use other invariants (both standard and
simulation) to prove propagated invariant
13Simulation Invariant Proof Example
Given lt2 igt 2 - ltggt b, lt2 igt 4 - ltggt c, ltggt
5 - ltggt d and ltg 2 igt 4
a
1
b
2
c
3
d
4
5
Prove lt2 igt 2 - ltggt b
14Propagate Right Hand Side
a
- To prove
- lt2 igt 2 - ltggt b
- Propagate RHS to a, c
- Must Prove
- lt2 igt 2 - lt0gt a
- lt2 igt 2 - ltglt48, ggt c
b
c
d
15Propagate Left Hand Side
1
To prove lt2 igt 2 - lt0gt a Propagate LHS to
1 lt2 0gt 1 - lt0gt a To prove lt2igt 2 - ltglt48,
ggt c Propagate LHS to 4 ltilt24,2igt 4 - ltglt48,
ggt c
2
3
4
5
16Use Other Invariants
- Want to prove lti lt 24,2 igt 4 - ltg lt 48, ggt c
- But lt2 igt 4 - ltggt c is one of given invariants
- So we can substitute 2 i in for g, and reduce
our problem to proving - 2 i lt 48 implies i lt 24
- 2 i 2 i
17Primary Advantages
- Open Compilation Framework
- Anyone can provide transformations
- No need to trust provider
- Buggier Compilers
- Incorrect compilation much less serious problem
- Compiler writers can focus on
- Aggressive optimizations
- Latest language developments
- NOT on correct compilation in all cases
18- Key Challenge
- Compiler must be able to control machine at very
low level for efficiency - Pointers
- Register Allocation
- Instruction Selection
- Condition Codes
19Pointer Problem
- Cannot use simple expression substitution in
presence of pointers - Aliasing may cause substitution to produce
incorrect result - Solution
- Define substitution in the presence of a set of
aliases - Compiler uses pointer analysis to produce alias
invariants at each program point - Proof rules use alias invariants
20Pointer Details
- Handling aliasing uncertainty
- Must prove result for both aliased and unaliased
cases - Flow-insensitive analyses
- Change semantics of source language slightly to
make analysis results valid - Provide derived rules specifically to support
validation of flow-insensitive results
21Low Level Details Register Allocation Instructi
on Selection Condition Codes
22Standard Compiler Structure
Machine-Independent Representation
Parser
Source
Machine-Independent Analyses and Transformations
Lowering
Machine-Dependent Representation
Machine-Specific Analyses and Transformations
Code Generation
Binary Code
23Proposed Compiler Structure
Single Standard Representation
Parser
Source
Analyses and Transformations
Lowering
Single Standard Representation
Code Generation
Analyses and Transformations
Binary Code
24Simple Code Generation
- Basic Approach
- Each node in control flow graph
- Corresponds to single instruction in generated
code - Code generator can be very simple
- Issues
- Registers in machine-independent IR
- Implicitly set state (condition codes)
- Instruction selection
25Register Allocation
- Have a dedicated variable represent each register
- Not semantically distinct from other variables in
intermediate representation - But code generator for the specific machine
understands representation - Allocates dedicated variables in registers
- Can represent results of register allocation in
an ostensibly machine-independent IR
26Condition Codes
- Instruction may have multiple effects
- Write registers, set condition codes
- Solution Macro Instructions
- Define macro instructions as a sequence of basic
nodes in IR - System automatically derives proof rules for
macro instructions - Code generator produces one instruction for each
macro instruction - Approach works for instruction selection
27Limitation
- Proof rules are based on concepts of partial
correctness - Not designed to prove equivalences that depend on
termination of loops
g ? 0
g ? 48
g ? g 6
exit
g lt 48
exit
28Proved Transformations
- Constant Propagation and Folding
- Copy Propagation
- Dead Code Elimination
- Branch Movement
- Induction Variable Elimination
- Loop Unrolling
- Branch Elimination
29Related Work
- Totally Correct Compilation
- Verifix
- Guttman, Ramsdell, Wand
- Synchronous Languages
- Cimatti, Giunchiglia, Pecchiari, Pietra, Profeta,
Romano, Traverso, Yu - Pnueli, Siegal, Singerman
- Proof-Carrying Code
- Necula, Lee
30Conclusion
- Credible Compilation for Imperative Languages
- Basic Concepts
- Standard Invariants
- Simulation Invariants
- Proof rules for propagating invariants
- Support for low-level details
- Pointers
- Registers and Condition Codes
- Instruction Selection