Title: Ranjit Jhala Rupak Majumdar
1 Bit-level Types
for
High-level Reasoning
- Ranjit Jhala Rupak Majumdar
2The Problem
mget (u32 p) if (p 0x1 0)
error(permission) pte (p
0xFFFFF000)gtgt 12 b tabpte
0xFFFFFFFC o p 0xFFC return
m(bo)gtgt2
- Bit-level operators in low-level systems code
- Why ?
- Interact with hardware
- Reduce memory footprint
3The Problem
mget (u32 p) if (p 0x1 0)
error(permission) pte (p
0xFFFFF000)gtgt 12 b tabpte
0xFFFFFFFC o p 0xFFC return
m(bo)gtgt2
- Bit-level operators in low-level systems code
- Inscrutable to humans, optimizers, verifiers
4Whats going on ?
32
mget (u32 p) if (p 0x1 0)
error(permission) pte (p
0xFFFFF000)gtgt12 b tabpte 0xFFFFFFFC
o p 0xFFC return m(bo)gtgt2
5Whats going on ?
20
mget (u32 p) if (p 0x1 0)
error(permission) pte (p
0xFFFFF000)gtgt12 b tabpte 0xFFFFFFFC
o p 0xFFC return m(bo)gtgt2
12
20
6Whats going on ?
mget (u32 p) if (p 0x1 0)
error(permission) pte (p
0xFFFFF000)gtgt12 b tabpte 0xFFFFFFFC
o p 0xFFC return m(bo)gtgt2
12
20
32
7Whats going on ?
mget (u32 p) if (p 0x1 0)
error(permission) pte (p
0xFFFFF000)gtgt12 b tabpte 0xFFFFFFFC
o p 0xFFC return m(bo)gtgt2
12
20
8Whats going on ?
mget (u32 p) if (p 0x1 0)
error(permission) pte (p
0xFFFFF000)gtgt12 b tabpte 0xFFFFFFFC
o p 0xFFC return m(bo)gtgt2
12
20
30
2
9Q How to infer complex information flow
to understand, optimize, verify code ?
mget (u32 p) if (p 0x1 0)
error(permission) pte (p
0xFFFFF000)gtgt12 b tabpte 0xFFFFFFFC
o p 0xFFC return m(bo)gtgt2
12
20
30
2
10Plan
11Our approach (1) Bit-level Types
- Bit-level Types
- Sequences of
- name,size pairs
12Our approach (2) Translation
Expressions ! Records Bit-ops ! Field accesses
mget (p) if (p 0x1 0)
error(permission) pte (p
0xFFFFF000)gtgt12 b tabpte 0xFFFFFFFC
o p 0xFFC return m(bo)gtgt2
if (p.rd 0)
13Our approach (2) Translation
Expressions ! Records Bit-ops ! Field accesses
mget (p) if (p 0x1 0)
error(permission) pte (p
0xFFFFF000)gtgt12 b tabpte 0xFFFFFFFC
o p 0xFFC return m(bo)gtgt2
if (p.rd 0)
14Our approach (2) Translation
Expressions ! Records Bit-ops ! Field accesses
mget (p) if (p 0x1 0)
error(permission) pte (p
0xFFFFF000)gtgt12 b tabpte 0xFFFFFFFC
o p 0xFFC return m(bo)gtgt2
if (p.rd 0)
pte.idx p.idx
15Our approach (2) Translation
Expressions ! Records Bit-ops ! Field accesses
mget (p) if (p 0x1 0)
error(permission) pte (p
0xFFFFF000)gtgt12 b tabpte 0xFFFFFFFC
o p 0xFFC return m(bo)gtgt2
if (p.rd 0)
pte.idx p.idx
16Our approach (2) Translation
Expressions ! Records Bit-ops ! Field accesses
mget (p) if (p 0x1 0)
error(permission) pte (p
0xFFFFF000)gtgt12 b tabpte 0xFFFFFFFC
o p 0xFFC return m(bo)gtgt2
if (p.rd 0)
pte.idx p.idx
b.addr tabpte.idx.addr
17Our approach (2) Translation
Expressions ! Records Bit-ops ! Field accesses
mget (p) if (p 0x1 0)
error(permission) pte (p
0xFFFFF000)gtgt12 b tabpte 0xFFFFFFFC
o p 0xFFC return m(bo)gtgt2
if (p.rd 0)
pte.idx p.idx
b.addr tabpte.idx.addr
18Our approach (2) Translation
Expressions ! Records Bit-ops ! Field accesses
mget (p) if (p 0x1 0)
error(permission) pte (p
0xFFFFF000)gtgt12 b tabpte 0xFFFFFFFC
o p 0xFFC return m(bo)gtgt2
if (p.rd 0)
pte.idx p.idx
b.addr tabpte.idx.addr
o.addr p.addr
19Our approach (2) Translation
Expressions ! Records Bit-ops ! Field accesses
mget (p) if (p 0x1 0)
error(permission) pte (p
0xFFFFF000)gtgt12 b tabpte 0xFFFFFFFC
o p 0xFFC return m(bo)gtgt2
if (p.rd 0)
pte.idx p.idx
b.addr tabpte.idx.addr
o.addr p.addr
20Our approach (2) Translation
Expressions ! Records Bit-ops ! Field accesses
mget (p) if (p 0x1 0)
error(permission) pte (p
0xFFFFF000)gtgt12 b tabpte 0xFFFFFFFC
o p 0xFFC return m(bo)gtgt2
if (p.rd 0)
pte.idx p.idx
b.addr tabpte.idx.addr
o.addr p.addr
return mb.addr o.addr
21Our approach (2) Translation
Expressions ! Records Bit-ops ! Field accesses
mget (p) if (p 0x1 0)
error(permission) pte (p
0xFFFFF000)gtgt12 b tabpte 0xFFFFFFFC
o p 0xFFC return m(bo)gtgt2
if (p.rd 0)
pte.idx p.idx
b.addr tabpte.idx.addr
o.addr p.addr
return mb.addr o.addr
22Our approach
Low-level operations eliminated bit-level
types translation
mget(p) if (p 0x1 0)
error(permission) pte (p
0xFFFFF000)gtgt12 b tabpte 0xFFFFFFFC
o p 0xFFC return m(bo)gtgt2
if (p.rd 0)
pte.idx p.idx
b.addr tabpte.idx.addr
o.addr p.addr
return mb.addr o.addr
Program can be understood, optimized, verified
23Plan
- Motivation
- Approach
- Bit-level types Translation
- Key Bit-level type Inference
- Experiences
- Related work
24Constraint-based Type Inference
Alices age a Bobs age b
22 54
- Algorithm
- 0. Variables for unknowns
- 1. Generate constraints on vars
- 2. Solve constraints
2a b 10 b 2006 - 1952
Remember these If Alice doubles her age, she
would still be 10 years younger than Bob,
who was born in 1952. How old are Alice
and Bob ?
25Constraint-based Type Inference
- Algorithm
- 0. Variables for unknown
- bit-level types of all program expressions
- Generate constraints on vars
- Solve constraints
26Plan
- Motivation
- Approach
- Bit-level types Translation
- Key Bit-level type Inference
- Constraint Generation
- Constraint Solving
- Experiences
- Related work
27Constraint Generation
- Type variables
- for each expression
- p ?p
- p0x1 ?p0x1
- pte ?pte
- ? ?
mget (p) if (p 0x1 0)
error(permission) pte (p
0xFFFFF000)gtgt12 b tabpte 0xFFFFFFFC
o p 0xFFC return m(bo)gtgt2
28Generating Zero Constraints
- Mask
- ?p0xFFC3112
- ?p0xFFC10
020
02
12
31
1
0
mget (p) if (p 0x1 0)
error(permission) pte (p
0xFFFFF000)gtgt12 b tabpte 0xFFFFFFFC
o p 0xFFC return m(bo)gtgt2
29Generating Zero Constraints
- Shift
- ?egtgt123120
- e is p0xFFFFF000
20
31
mget (p) if (p 0x1 0)
error(permission) pte (p
0xFFFFF000)gtgt12 b tabpte 0xFFFFFFFC
o p 0xFFC return m(bo)gtgt2
30Why are zeros special ?
x e
- Consider assignment (value flows e
to x) - Should x and e have same bit-level type?
K ?
x
K
?
e
- Common idiom
- k-bit values special case of k?-bit values
- Equality results in unnecessary breaks
- Zeros enable precise subtyping
subtypes()
31Generating Inequality Constraints
020
02
11
2
mget (p) if (p 0x1 0)
error(permission) pte (p
0xFFFFF000)gtgt12 b tabpte 0xFFFFFFFC
o p 0xFFC return m(bo)gtgt2
32Generating Inequality Constraints
e
12
mget (p) if (p 0x1 0)
error(permission) pte (p
0xFFFFF000)gtgt12 b tabpte 0xFFFFFFFC
o p 0xFFC return m(bo)gtgt2
31
egtgt12
19
0
33Generating Inequality Constraints
- Assignment
- ?o ?p0xFFC
- that is
- ?o310 ?p0xFFC310
mget (p) if (p 0x1 0)
error(permission) pte (p
0xFFFFF000)gtgt12 b tabpte 0xFFFFFFFC
o p 0xFFC return m(bo)gtgt2
34Plan
- Motivation
- Approach
- Bit-level types Translation
- Key Bit-level type Inference
- Constraint Generation
- Constraint Solving
- Experiences
- Related work
35Constraint Solutions
- Solution is an assignment
- A type variables ! bit-level types
- A(?)ij subsequence of A(?) from bit i
through j
12
1
31
5
2
- A(?p)121 addr,10 wr,1
- A(?p)312 idx,20 addr,10
- A(?p)315 undefined
36Constraint Solving Overview
- Solution is an assignment
- A type variables ! bit-level types
- A(?ij) subsequence from bit i through j
- A satisfies
- zero Constraint ?ij
- If A(?)ij i-j1
- inequality Constraint ?ij ?ij
- If A(?)ij A(?)ij
- In both cases, A(?)ij must be defined
37Constraint Solving Algorithm
- Input Zero constraints z_1,,z_m
- Inequality constraints c1,,cn
- Output Assignment satisfying all constraints
A0 Initial asgn satisfying zero constraints
(details in paper)
A A0 for i in 1n A refine(A,ci) return A
- refine(A,ci) adjusts A such that
- ci becomes satisfied
- earlier constraints stay satisfied
- built using Split, Unify
38Refine Split(A,?,k)
Throughout A, substitute
p,12 ?
A(?)
p,32
A Split(A,?,12)
and substitute
p,12-?
A(?)
f,12
e,20
f,12-?
where e , f are fresh
39Refine Split(A,?,k)
- Used to ensure A(?)ij is defined
Ensure A(?)112 is defined
A(?)
p,32
A Split(A,?,12)
11
A(?)
f,12
e,20
A Split(A,?,2)
11
2
A(?)
g,10
e,20
h,2
A(?)112 defined
40Refine Unify(A,p,q)
Throughout A, substitute
p,?
q,?
41Refine(A, ?3112 ?190)
0
19
A(?)190 undefined
12
31
A(?)
p 32
A(?)
r 12
10
q 10
A Split(A,?,191)
A(?)190
A(?)3112
A Unify(A,q,t)
42Constraint Solving
- Input Constraints
- Output Assignment satisfying all constraints
A A0 for i in 1n A refine(A,ci) return A
- Substitution (in Split, Unify)
- ensures earlier constraints stay satisfied
- most general solution found
- Efficiently implemented using graphs
43Plan
- Motivation
- Approach
- Bit-level types Translation
- Key Bit-level type Inference
- Constraint Generation
- Constraint Solving
- Experiences
- Related work
44Experiences
- Implemented bit-level type inference for C
- pmap a kernel virtual memory system
- Implements the code for our running example
- mondrian a memory protection system
- scull a linux device driver
- (1-3 Kloc)
- Inference/Translation takes less than 1s
45Mondrian Witchel et. al.
- Bit packing for memory and permission bits
- 2600 lines of code, generated 775 constraints
- Translated to program without bit-operations
- 18 different bit-packed structures
- 10 assertions provided by programmer
- After translation, assertions verified using
BLAST - 6 safe all require bit-level reasoning
- Previously, verification was not possible
- 4 false positives imprecise modeling of arrays
46Cop outs (i.e. Future Work)
- Truly binary bit-vector operations
- x ltlt y, x y
- Currently Value-flow analysis to infer constants
flowing to y - Break into a switch statement
- Flow-sensitivity
- Currently SSA renaming
- Arithmetic overflow
- does a k-bit value spill over
- Currently Assume no overflow
- Path-sensitivity (value dependent types)
- Type of suffix depends on value of first field
- e.g. Instruction decoder for architecture
simulator - Number/type of operands depends on opcode
47Plan
- Motivation
- Approach
- Bit-level types Translation
- Key Bit-level type Inference
- Constraint Generation
- Constraint Solving
- Experiences
- Related work
48Related Work
- O Callahan Jackson ICSE 97
- Type Inference
- Gupta et. al. POPL 03, CC02
- Dataflow analyses for packing bit-sections
- Ramalingam et. al. POPL 99
- Aggregate structure inference for COBOL
49Conclusions
- (Automatic) reasoning about Bit-operations hard
- Structure bit-operations pack data into one word
- Structure Inferred via Bit-level Type Inference
- Structure Exploited via Translation to fields
- Precise, efficient reasoning about Bit-operations
50Thank you
51Q How to infer complex information flow
to understand, optimize, verify code ?
- Previous approaches model bitwise ops by
- Uninterpreted functions
- Imprecise
- Logical axioms
- Inefficient
- Bit-blasting terms into 32/64-bits
- Lose high-level relationships
52Refine
- Two basic operations split, unify
- Split(A,?,ij) ensures A(?)ij is defined
A in A, substitute
Split(A,?,112)
p ?(111)
A(?)
p 32
where e , f are fresh
A(?)
f 12
e 20
A in A, substitute
f ?2
A(?)
g 10
e 20
h2
where g,h are fresh
53Generating Zero Constraints
- Mask
- All but 1st bit are zero
- ?p0x1311
031
mget (p) if (p 0x1 0)
error(permission) pte (p
0xFFFFF000)gtgt12 b tabpte 0xFFFFFFFC
o p 0xFFC return m(bo)gtgt2
54Our approach (2) Translation
- Expressions ! Records
- Bit-ops ! Field accesses
mget (p) if (p 0x1 0)
error(permission) pte (p
0xFFFFF000)gtgt12 b tabpte 0xFFFFFFFC
o p 0xFFC return m(bo)gtgt2
if (p.rd 0)
pte.idx p.idx
b.addr tabpte.idx.addr
o.addr p.addr
return mo.addr p.addr
55Our approach (2) Translation
- Expressions ! Records
- Bit-ops ! Field accesses
mget (p) if (p 0x1 0)
error(permission) pte (p
0xFFFFF000)gtgt12 b tabpte 0xFFFFFFFC
o p 0xFFC return m(bo)gtgt2
if (p.rd 0)
56Our approach (2) Translation
- Expressions ! Records
- Bit-ops ! Field accesses
mget (p) if (p 0x1 0)
error(permission) pte (p
0xFFFFF000)gtgt12 b tabpte 0xFFFFFFFC
o p 0xFFC return m(bo)gtgt2
if (p.rd 0)
pte.idx p.idx
b.addr tabpte.idx.addr
o.addr p.addr
return mo.addr p.addr
57Our approach (2) Translation
- Expressions ! Records
- Bit-ops ! Field accesses
mget (p) if (p 0x1 0)
error(permission) pte (p
0xFFFFF000)gtgt12 b tabpte 0xFFFFFFFC
o p 0xFFC return m(bo)gtgt2
if (p.rd 0)
pte.idx p.idx
b.addr tabpte.idx.addr
o.addr p.addr
return mo.addr p.addr
58Constraint Solutions
- Solution is an assignment
- A variables ! bit-level types
- A(?)ij subsequence of A(?) from bit i
through j
12
1
31
5
2
- A(?p)121 addr,10 wr,1
- A(?p)312 idx,20 addr,10
- A(?p)315 undefined
59Bit-level Types
for
via
High-level Reasoning
- Ranjit Jhala Rupak Majumdar