Title: CS711 Overview of PCC
1CS711Overview of PCC
- Greg Morrisett
- Cornell University
- Thanks to G.Necula P.Lee
2Papers for this Lecture
- G. Necula, Proof-Carrying Code. PoPL'97.
- G.Necula and P.Lee. Safe Kernel Extensions
Without - Run-Time Checking. OSDI'96.
- G.Necula and P.Lee. The Design and
Implementation of a Certifying Compiler.
PLDI98, June 1998. pldi98.ps - I also highly recommend Neculas PhD thesis (CMU).
3Ideally
trusted computing base
Security Policy
Your favorite language
verifier
System Binary
Low-Level IL
optimizer
machine code
4Idea 1 Theorem Prover!
trusted computing base
NuPRL
Security Policy
Your favorite language
System Binary
Low-Level IL
optimizer
machine code
5Unfortunately...
trusted computing base
NuPRL
6Observation
Finding a proof is hard, but verifying a proof
is easy.
7PCC
trusted computing base
verifier
Security Policy
optimizer
System Binary
machine code
prover
certified binary
code
proof
in- variants
8Making Proof Rigorous
- Specify machine-code semantics and security
policy using axiomatic semantics. - Pre ld r2,r1(i) Post
- Given
- security policy (i.e., axiomatic semantics and
associated logic for assertions) - untrusted code
- annotated with invariant assertions
- its possible to calculate a verification
condition - an assertion A such that
- if A is true then the code respects the policy.
9The Client
- The client takes its code the policy
- constructs some loop invariants.
- constructs the verification condition A from the
code, policy, and loop invariants. - constructs a proof that A is true.
code
proof
in- variants
certified binary
10Verification
- The Verifier ( 4-6 pages of C code)
- takes code, loop invariants, and policy
- calculates the verification condition A.
- checks that the proof is a valid proof of A
- fails if some step doesnt follow from an axiom
or inference rule - fails if the proof is valid, but not a proof of A
code
proof
in- variants
certified binary
11Advantages of PCC
- In Principle
- Simple, small, and fast TCB.
- No external authentication or cryptography.
- No additional run-time checks.
- Tamper-proof.
- Precise and expressive specification of code
safety policies.
code
proof
in- variants
12An Experiment Packet Filters
- Safety Policy
- given a packet, returns yes/no
- packet is read-only, small scratchpad
- no loops
- Compare
- Berkeley Packet Filter Interpreter
- Modula-3 (but turn off type-checking)
- Software Fault Isolation (sandboxing)
- PCC (hand-optimized, proved)
13Results
14Is PCC the answer?
- PCC seems to offer everything we need
- small, simple trusted computing base
- optimize all you want, any language, any security
policy, etc. - But how do we make it scale to real programs?
15Scaling Problem 1
- How to generate proofs?
- Manual construction is too painful for real
programs. - Interactive theorem provers are really only
feasible for a relatively small fraction of the
code. - We need something thats fully automatic most of
the time.
16One Approach
- Restrict the safety policy to type safety.
- Necessary for most policies anyway
- cannot execute code or access data for which you
do not have a capability. - type systems are a meta-policy that allow
programmers to define fine-grained notions of
capability and access. - abstract types, interfaces, static scope, etc.
- Start with a well-typed, high-level program
- you have a proof for the high-level code
- preserve the proof as you compile
17Type-Preserving Compilation
Source code
binary
Type-checker
Optimizer
Code- generator
Proof of type-safety
Proof of type-safety
18Touchstone Necula
- Compiles type-safe subset of C to certified
binaries for the DEC Alpha. - Security policy is type-safety
- parameters of the right type to functions
- values of the right type in arrays, structs
- array indices in bounds
- Highly-optimizing
- competitive with GCC, DEC cc
- eliminates array bound checks when possible
19Touchstone Performance
In spite of the fact that C compilers do not
insert array bound checks, Touchstone
is competitive.
20Touchstone Compilation Time
- Geometric means
- compilation 75
- VC generation 2
- proving 21
- proof checking 2
21JVM vs. Touchstone
- JVM
- portable
-
- Touchstone
- extremely good performance
- extremely small TCB
- fast verification
22However...
- Touchstones type system suits only one very
simple language - no abstract data types, objects, etc.
- no threads
- Proof size was an issue
- proofs were 1-3x the size of the code, just for a
really simple notion of type-safety. - but recent work by Necula shows that this can be
compressed down to tiny overhead (e.g., 10)
23Touchstone proof size
Touchstones proof size relative to code
and invariant annotations.
24Summary thus far...
- Proof-carrying code is great in principle.
- Its the right general framework.
- For special-purpose applications, cant be beat.
- But for general-purpose extensions
- Need some way to get the proof automatically
(limit policy to type-safety). - Engineering proof size is an issue.
- Compiling high-level languages is an issue.
25Design Details
Server
Client
Safety policy
Certifying Compiler
Source
VC Generator
Logic
VC
Code
Theorem Prover
Proof Checker
Proof
Untrusted Complex Slow
Trusted Simple Fast
26Abstract Machine
- Instructions (from DEC Alpha)
- ADD/SUB rs, Op, rd (Op n r)
- LD rd, n(rs), ST rs, n(rd)
- BEQ/NE rs, n, RET
- INV(P)
- States (R,pc)
- Rr is a 64-bit integer
- Rmem is memory Int64-gtInt64
- pc is current program counter
- Expressions
- e n r e1 e2 e1 e2 sel(m,e)
- m mem upd(m,e1,e2)
27Semantics
- (R,pc) -gt (R',pc')
- relative to fixed instruction sequence S
- Rewriting rules
- R' Rrd R(rt) R(rs), pc' pc1 if
S(pc) ADD rs,rt,rd - R' Rrd sel(R(m),R(rs)n), pc'pc1 if
S(pc) LD rd,n(rs) and readable(R,rs,n) - R' Rm upd(R(m),R(rd)n,R(rs)) pc' pc1
if S(pc) ST rs, n(rd) and writeable (R,rd,n) - R R', pc pcn1 if S(pc) BEQ rs,n and R(rs)
0 (and pcn1 in 0..S.size-1)
28Predicates
- P true false P1 P2 P1 gt P2 All x.P
e1 e2 e1 ! e2 e T - T RO RW
- quantifiers range over numbers and are meant to
hold in every state. - eT predicate asserting that e has type T
- Example pre-condition
- r0RO (r08)RO (sel(m,r0) ! 0) gt
(r08)RW
29Axioms and Proof Rules
- The usual ones for predicate logic
- Some rules for reasoning about 64-bit arithmetic
values - Rules for reasoning about memory
- sel(upd(m,e1,e2),e3) e2 when e1 e3
- sel(upd(m,e1,e2),e3) sel(m,e3) when e1 ! e3.
- upd(upd(m,e1,e2),e3,e4) upd(upd(m,e3,e4),e1,e2)
when e1 ! e3 - Note aliasing strikes again!
- Rules for reasoning about types
- eRW gt eRO
30Notes on Axioms
- When you scale PCC up
- you still need a rich type system to specify
interfaces (i.e., pre-conditions) - you still have to prove the consistency and
soundness of your axioms w.r.t. the machine - i.e., you still have to write down a TAL and
prove its soundness - you'll tend to use the same type invariance
tricks to ensure soundness
31Verification Conditions
- VC(i)
- rsrt / rd VC(i1) if S(i) ADD rs,rt,rd
- (rsn)RO sel(m,rsn)/rdVC(i1) if S(i)
LD rd,n(rs) - (rdn)RW upd(m,rdn,rs)/mVC(i1) if S(i)
ST rs,n(rd)
32VC continued
- VC(i) (rs 0 gt VC(in1)) (rs ! 0 gt
VC(i1)) when S(i) BEQ rs,n - PostCondition when VC(i) RET
- P when VC(i) INV(P)
33Notes on VCGen
- Computes the weakest pre-condition of the program
if you start form the post-condition at the
RET(s) and work back. - Need to cut cycles (back-edges in CFG) with INV
nodes or more properly. - Note that INV isn't trusted it's assumed for
the continuation, but verified if you ever get to
it. - Accomplished by adding INV gt VC(i1) to the
final safety predicate. - Now all you need is a proof that VCGen is implied
by the pre-condition.
34Example
- add r2,r2,5
- ld r1, r2(3)
- st r5, r1(1)
- true
35Example
add r2,r2,5 ld r1, r2(3) st r5, r1(1)
(r11)RW true true
36Example
add r2,r2,5 ld r1, r2(3) (r23)RO
(sel(m,r2,3)1)RW st r5, r1(1)
(r11)RW true
37Example
add r2,r2,5 (r253)RO
(sel(m,r25,3)1)RW ld r1, r2(3) (r23)RO
(sel(m,r2,3)1)RW st r5,
r1(1) (r11)RW true
38Proof Representation
- Use a variant of LF to represent assertions and
proofs. - write down assertion language
- write down inference rules for the logic
- proof-checking becomes LF type-checking
- decouples the logic and assertion language from
the verifier. - of course, you still have to establish the
soundness and consistency of the logic that you
encode within LF. - and some logics (e.g., linear or temporal or
modal) do not encode so nicely into LF (see Twelf)
39Representing LF Proofs
- In practice LF proof objects are HUGE.
- Recent work on proof oracles compresses this down
to nothing PoPL2001? - assume you can match the goal against the
conclusions of the proof rules (e.g., 1st-order
unification.) If you cant match with this, then
force the representation to contain more
information - only some (small) subset of the rules will apply
(say k of them.) - so you only need to spit out lg(k) bits to
indicate which rule is actually used in the
proof. - the matching lets you then establish sub-goals
that need to be proven.
40Where PCC stands
- Cedilla has built a certifying compiler for Java.
- generates optimized x86 code
- but you can write your own code too!
- uses a Nelson-Oppen-style prover
- The proof checker is actually machine independent
- map object code up to a machine-independent IL
(Secure Assembly Language) - proofs are with respect to that the SAL code
- retargeting the prover to another machine just
involves writing a (correct) mapping from the
machine code to SAL.
41Foundational PCC Appel, Felty
- Eliminate more trust from PCC
- logic encoded into LF
- implicit machine semantics
- Rather, encode things from the machine semantics
up. - you prove w.r.t. the semantics that PreCPost
is valid. - Interesting observation
- to do any reasonable proof, you start introducing
types or invariants that look suspiciously like
TAL - except that you have a semantic encoding as to
what the TAL types mean w.r.t. the machine.