Title: Proof-Carrying Code
1Proof-Carrying Code
2Programmable mobile devices
By 2003, one in five people will own a mobile
communications device. Nokia expects to sell 500M
Java-enabled phones in 2003. Most of these
devices will be power and memory limited.
3Mobile/Wireless Devices
- In 97, 101M mobile phones vs 82M PCs. (40 vs
14.) - 95 phones will be WAP enabled by 04.
- 64Mbits of RAM in 2002.
- Battery life a primary factor.
- Efficiency and bandwidth will still be precious.
4Cheese and the Sum Total of Human Knowledge
5The Code Safety Problem
6Code Safety
Code
Trusted Host
7Approach 1Trust the Code Producer
Code
sig
PK1
PK2
PK1
PK2
Trusted 3rd Party
Trusted Host
8Approach 2Baby-sit the Program
Code
Execution monitor
E.g., Software Fault Isolation Wahbe Lucco,
Inline Reference Monitors Schneider
Trusted Host
9Approach 3Java
Code
Verifier
Interp/ JIT
Trusted Host
10Approach 4Formal Verification
Code
But really really really hard and must be correct.
Trusted Host
11A Key Idea Explicit Proofs
Code
Certifying Prover
Proof Checker
Proof
Trusted Host
12A Key Idea Explicit Proofs
Code
Certifying Prover
Proof
Proof Checker
13Proof-Carrying Code
Code
Certifying Prover
Proof
Proof Checker
14But...
- ...How to generate the proofs?
- Proving theorems about real programs is hard.
- Most useful safety properties of low-level
programs are undecidable. - Symbolic theorem-proving systems are unfamiliar
to programmers and hard to use even for experts.
15The Role ofProgramming Languages
- Civilized programming languages can provide
safety for free. - Well-formed/well-typed ? safe.
- Idea Arrange for the compiler to explain why
the target code it generates preserves the safety
properties of the source program.
16Certifying CompilersNecula Lee, PLDI98
- Intuition
- Compiler knows why each translation step is
semantics-preserving. - So, have it generate a proof that safety is
preserved. - Small theorems about big programs.
- Dont try to verify the whole compiler, but only
each output it generates.
17Automation viaCertifying Compilation
Certifying Compiler
Certifying Prover
Proof Checker
18Overview of the Necula/Lee Approach to PCC
19High-Level Architecture
Code
Verification condition generator
Checker
Explanation
Agent
Safety policy
Host
20Reference Interpreters
- A reference interpreter (RI) is a standard
interpreter extended with instrumentation to
check the safety of each instruction before it is
executed, and abort execution if anything unsafe
is about to happen. - In other words, an RI is capable only of safe
execution.
21Reference Interpreterscontd
- The reference interpreter is never actually
implemented. - The point will be to prove (by using the proof
rules given in the safety policy) that execution
of the code on the RI never aborts, and thus
execution on the real hardware will be identical
to execution on the RI.
22Sample Reference Interpreter
23High-Level Architecture
Code
Verification condition generator
Checker
Explanation
Agent
Safety policy
Host
24The Safety Policy
- The RI can be viewed as defining a safety policy
- RI language is a restriction of x86 assembly
language - Must prove that a given program always makes
progress on the RI - We introduce verification conditions (VCs), whose
truth implies that the corresponding instruction
has a defined execution on the RI.
25Verification Conditions
- The point of the verification conditions, then,
is to provide such progress theorems for each
instruction in the program. - In other words, a VCs validity says that the
corresponding instruction has a defined
execution in the s86 operational semantics.
26The VCGen
- The verification condition generator (VCGen)
examines each instruction. - It essentially encodes the operational semantics
of the language. - It checks some simple properties.
- E.g., direct jumps go to legal addrs.
- It invokes the Checker when dangerous
instructions are encountered.
27The VCGen, contd
- Examples of dangerous instructions
- memory operations
- procedure calls
- procedure returns
- For each such instruction, VCGen creates a
verification condition (VC). - A VC is a logical predicate whose truth implies
the instruction is safe.
28Examples of Safety Properties
- Memory safety.
- Which addresses are readable / writable when,
and what values. - Type safety.
- What values can be stored and used in operations.
- System call safety.
- Which system routines can be called and when.
29Examples of Safety Policiescontd
- Action sequence safety.
- E.g., no network send after reading a file.
- Resource usage safety.
- E.g., instruction counts, stack limits, etc.
30What Cant Be Enforced?
- Informally
- Safety properties. ? Yes.
- No bad thing will happen.
- Liveness properties. ? Not yet.
- A good thing will eventually happen.
- Information-flow properties. ? ?
- Confidentiality will be preserved.
31- Example of type safety giving us VC validity?
32Example Source Code
public class Bcopy public static void
bcopy(int src, int dst)
int l src.length int i 0
for(i0 iltl i) dsti srci
33Example Target Code
L7 ANN_LOOP(INV (csubneq ebx 0), (csubneq
eax 0), (csubb edx ecx), (of rm mem),
MODREG (EDI,EDX,EFLAGS,FFLAGS,RM)) cmpl esi,
edx jae L13 movl 8(ebx, edx, 4),
edi movl edi, 8(eax, edx, 4) incl edx cmpl
ecx, edx jl L7 ret L13 call __Jv_ThrowBadA
rrayIndex ANN_UNREACHABLE nop L6 call __Jv_Thr
owNullPointer ANN_UNREACHABLE nop
ANN_LOCALS(_bcopy__6arrays5BcopyAIAI,
3) .text .align 4 .globl _bcopy__6arrays5BcopyAIAI
_bcopy__6arrays5BcopyAIAI cmpl 0,
4(esp) je L6 movl 4(esp), ebx movl 4(ebx),
ecx testl ecx, ecx jg L22 ret L22 xorl e
dx, edx cmpl 0, 8(esp) je L6 movl 8(esp),
eax movl 4(eax), esi
34Cut Points
- Each loop entry must be annotated as a cut point.
- VCGen requires this so that checking can be
performed in a single scan of the code. - As a convenience, the modified registers are also
declared in the cut annotations.
35Example Target Code
L7 ANN_LOOP(INV (csubneq ebx 0), (csubneq
eax 0), (csubb edx ecx), (of rm mem),
MODREG (EDI,EDX,EFLAGS,FFLAGS,RM)) cmpl esi,
edx jae L13 movl 8(ebx, edx, 4),
edi movl edi, 8(eax, edx, 4) incl edx cmpl
ecx, edx jl L7 ret L13 call __Jv_ThrowBadA
rrayIndex ANN_UNREACHABLE nop L6 call __Jv_Thr
owNullPointer ANN_UNREACHABLE nop
ANN_LOCALS(_bcopy__6arrays5BcopyAIAI,
3) .text .align 4 .globl _bcopy__6arrays5BcopyAIAI
_bcopy__6arrays5BcopyAIAI cmpl 0,
4(esp) je L6 movl 4(esp), ebx movl 4(ebx),
ecx testl ecx, ecx jg L22 ret L22 xorl e
dx, edx cmpl 0, 8(esp) je L6 movl 8(esp),
eax movl 4(eax), esi
VCGen requires annotations in order to simplify
the process.
36Example Source Code
public class Bcopy public static void
bcopy(int src, int dst)
int l src.length int i 0
for(i0 iltl i) dsti srci
37The VCGen Process (1)
_bcopy__6arrays5BcopyAIAI cmpl 0, src
je L6 movl src, ebx movl 4(ebx),
ecx testl ecx, ecx jg L22
ret L22 xorl edx, edx cmpl 0,
dst je L6 movl dst, eax movl
4(eax), esi L7 ANN_LOOP(INV
A0 (type src_1 (jarray jint)) A1 (type dst_1
(jarray jint)) A2 (type rm_1 mem) A3 (csubneq
src_1 0) ebx src_1 ecx (sel4 rm_1
(add src_1 4)) A4 (csubgt (sel4 rm_1
(add src_1 4)) 0) edx 0 A5 (csubneq dst_1
0) eax dst_1 esi (sel4 rm_1 (add
dst_1 4))
38The VCGen Process (2)
L7 ANN_LOOP(INV (csubneq ebx 0),
(csubneq eax 0), (csubb edx ecx), (of
rm mem), MODREG (EDI, EDX,
EFLAGS,FFLAGS,RM)) cmpl esi, edx jae
L13 movl 8(ebx,edx,4), edi movl
edi, 8(eax,edx,4)
A3 A5 A6 (csubb 0 (sel4 rm_1 (add src_1
4))) edi edi_1 edx edx_1 rm rm_2 A7
(csubb edx_1 (sel4 rm_2 (add dst_1
4)) !!Verify!! (saferd4 (add src_1 (add
(imul edx_1 4) 8)))
39The Checker (1)
The checker is asked to verify that
(saferd4 (add src_1 (add (imul edx_1 4) 8)))
under assumptions
A0 (type src_1 (jarray jint)) A1 (type dst_1
(jarray jint)) A2 (type rm_1 mem) A3 (csubneq
src_1 0) A4 (csubgt (sel4 rm_1 (add src_1 4))
0) A5 (csubneq dst_1 0) A6 (csubb 0 (sel4
rm_1 (add src_1 4))) A7 (csubb edx_1 (sel4 rm_2
(add dst_1 4))
The checker looks in the PCC for a proof of this
VC.
40The Checker (2)
In addition to the assumptions, the proof may use
axioms and proof rules defined by the host, such
as
szint pf (size jint 4) rdArray4 Mexp
Aexp Texp OFFexp pf (type A
(jarray T)) -gt pf (type M mem) -gt
pf (nonnull A) -gt pf (size T 4) -gt
pf (arridx OFF 4 (sel4 M (add A 4))) -gt
pf (saferd4 (add A OFF)).
41Checker (3)
A proof for
(saferd4 (add src_1 (add (imul edx_1 4) 8)))
in the Java specification looks like this
(excerpt)
(rdArray4 A0 A2 (sub0chk A3) szint (aidxi 4
(below1 A7)))
This proof can be easily validated via LF type
checking.
42VC Explosion
ab gt (xc gt safef(y,c) ? xltgtc gt
safef(x,y)) ? altgtb gt (ax gt safef(y,x) ?
altgtx gt safef(a,y))
Exponential growth in size of the VC is possible.
43VC Explosion
a b
(ab gt P(x,b,c,x) ? altgtb gt P(a,b,x,x)) ? (?a,
c. P(a,b,c,x) gt ac gt safef(y,c)
? altgtc gt safef(a,y))
a x
c x
INV P(a,b,c,x)
a c
a y
c y
Growth can usually be controlled by careful
placement of just the right join-point
invariants.
f(a,c)
44Stack Slots
- Each procedure will want to use the stack for
local storage. - This raises a serious problem because a lot of
information is lost by VCGen (such as the value)
when data is stored into memory. - We avoid this problem by assuming that procedures
use up to 256 words of stack as registers.
45Other Approaches to PCC
46Typed Assembly LanguageMorrisett, et al., 98
- Use modern type theory to develop a static type
system for machine code. - Prove decidability of typechecking.
- Prove soundness of type system.
- Developing such a type system is very hard, but
done only once.
47TAL
fact ALL rho.r1int, spr1int,
sprhorho jgz r1, positive mov r1,1
ret positive push r1 sp
intt1int,sprhorho sub r1,r1,1 call
factintr1int,sprhorho imul r1,r1,r2
pop r2 sp r1int,sprho ret
48Eliminating VCGen
- We can eliminate VCGen by using the logic to
encode a global invariant on states, Inv(S). - Then, the proof must show
- Inv(S0)
- ?SState. Inv(S) ! Inv(Step(S))
- ?SState. Inv(S) ! SP(S)
49Foundational PCC
- Appel and Felty 00 develop a semantic model of
types, starting from the foundations of
mathematical logic. - This model is used to construct the global
invariant. - Hamid, Shao, et al. define the global invariant
to be a syntactic well-formedness condition on
machine states.
50Temporal-logic PCC
- Bernard and Lee 02 define the global invariant
via a temporal-logic specification. - A trusted generic program then interprets these
specifications to extract verification conditions.