Title: NearConcrete Program Interpretation
1Near-Concrete Program Interpretation
- Paritosh Shroff
- The Johns Hopkins University
- Doctoral Thesis Proposal
TexPoint fonts used in EMF A
2Theme of the Thesis
- Develop a unified framework for static program
analysis - Higher-order with Mutable State
- Flow-sensitive
- Context-sensitive
- Path-sensitive
- Automatically verify complex program properties
- Safety of array index accesses
- Temporal safety properties
- many more
3Motivation
- Multitude of static analyses exist
- New analysis for every new problem
- Mix-and-Match existing analyses
- Higher-order Flow-sensitivity
- Very few existing analyses are higher-order
- Miss Object-Oriented (Java) programs
- Type systems are higher-order
- Most not flow-sensitive
4Flow-sensitivity
- Temporal order of data-flow is accounted for by
static analysis - x 0
- x 5
- 25/(!x)
x
5
0
heap location
5
5Flow-insensitivity
- Temporal order of data-flow is ignored by
- static analysis
- x 0
- x 5
- 25/(!x)
x
0, 5
heap location
error
6Motivating Example (progmot)
let flag ref false in let fact ?factn. if (n
0) then flag
true 1 else
fact (n 1) n in fact(x)
!flag
Value Range Analysis If x 0 then the range of
n is 0, x. If x lt 0 then the range of n is
-1, x.
Flow-sensitivity If progmot terminates, it
computes to exactly true.
Nontermination If x lt 0 progmot computes forever.
7Approach
- Develop a fundamental technique for abstract
execution of programs - Nearly identical to the concrete execution
- Mostly isomorphic
- Minimal abstraction
- Guaranteed termination
Near-Concrete Interpretation (NCI)
8Challenges for NCI
- Detect the sources of nontermination in concrete
interpretation (CI) - Plug each source with an approximation
- Sound
- Minimally lossy
- Retaining much of the semantics of CI
9What are the sources of nontermination?
- All programs have finite number of statements
- Nonterminating programs must loop over a subset
of statements infinitely often - Imperative Languages for, while loops etc.
- e.g. for (i 0 i 99 i i 2 )
- Functional Languages recursion
- for, while loops are syntactic sugar for recursion
Fundamental source of nontermination recursion
10Example Factorial function
?factn. let r if (n 0) then
1 else
let y fact (n 1) in y n
in r
- ?factn. if (n 0) then
- 1
- else
- fact (n 1) n
?
A-normal form (each program point has an
associated program variable)
11Flowchart for CI of fact(n)
?factn. let r if (n 0) then
1 else
let y fact (n 1) in y n
in r
12CI of fact(5)
Stack Environment
n a 5
y a 24
fact
r a 120
n a 4
y a 6
fact
r a 24
n a 3
y a 2
fact
r a 6
n a 2
y a 1
fact
r a 2
n a 1
y a 1
fact
r a 1
n a 0
fact
r a 1
13CI of fact(-5)
Stack Environment
n a -5
fact
n a -6
fact
n a -7
fact
n a -8
fact
n a -1
fact
never returns (r never gets a concrete binding)
14Sources of Divergence in Recursive Computation
- Arbitrary size of the environment
- Solution find a finite representation for
encoding possibly infinite values - Arbitrary size of the stack
- Solution place an upper bound on the size of the
stack, not any random bound but a principled one
15NCI Environment (E)
- Set of symbolic mappings like n ? n 1
- Rules of a context-free generative grammar
- n a 5, n a n 1 ? n 5 n 1
- 5, (5 1), (5 1 1 ),
- 5, 4, 3, , 0, -1, , -1
16NCI Stack (S)
- Contains at most one instance of any function
- Maximum size of functions in program
We need a way to short-circuit recursive calls,
but in a sound manner
17Flowchart for NCI of fact(n) prune-rerun
technique
rec-call cycle
rerun cycle
n
E
true
false
true
false
n 0?
n 0?
E
E n a n 1, y a r E
E
prune
fact (n 1)
fact (n 1)
y
1
1
E
y n
y n
r
r
ET E r a 1
r a y n E EF
rec-return cycle
Er
y
rerun until fixed-point i.e. E E ET
EF Er
Recursion in CI is converted to Iteration in NCI
18NCI of fact(5)
n ? 5 n ? n 1 y ? r r ? y n
rerun
decision procedure ) n 5
push
n ? 5
false
true
false
n 0?
n 0?
n ? 5 n ? n 1 y ? r r ? y n
n ? 5 n ? n 1 y ? r r ? y n
n ? 5 n ? n 1 y ? r
prune
prune
fact (n 1)
fact (n 1)
y
y
1
y n
y n
r
r
n ? 5 n ? n 1 y ? r r ? y n r ? 1
n ? 5 n ? n 1 y ? r r ? y n
new mapping r ? 1 added
environment not reached fixed-point
rerun
environment not reached fixed-point
19NCI of fact(5)
n ? 5 n ? n 1 y ? r r ? y n r ? 1
decision procedure ) n 5
true
false
n 0?
n ? 5 n ? n 1 y ? r r ? y n r ? 1
n ? 5 n ? n 1 y ? r r ? y n r ? 1
prune
fact (n 1)
y
1
y n
r
n ? 5 n ? n 1 y ? r r ? y n r ? 1
environment has reached a fixed-point
nugget
pop
20NCI of fact(-5)
n ? -5 n ? n 1 y ? r r ? y n
rerun
decision procedure ) n -5
push
n ? -5
false
false
n 0?
n 0?
n ? -5 n ? n 1 y ? r r ? y n
n ? -5 n ? n 1 y ? r
prune
prune
fact (n 1)
fact (n 1)
y
y
y n
y n
r
r
nugget
n ? -5 n ? n 1 y ? r r ? y n
environment has reached a fixed-point
n ? -5 n ? n 1 y ? r r ? y n
pop
environment not reached fixed-point
21The NCI Nugget
- Distilled essence of all the value flows in the
program - Program properties can be read off it
n ? 5 n ? n 1 y ? r r ? y n r ? 1
n ? -5 n ? n 1 y ? r r ? y n
fact(5)
fact(-5)
22Nugget of fact(-5)
n ? -5 n ? n 1 y ? r r ? y n
- n ? -5, n ? n 1 ) n -5 or range of n is
-1, -5 - Precise range of n
- y ? r, r ? y n ) r ? r n
- r does not have a base case
- r is not bound to any concrete value
- Implies fact(-5) does not terminate
NCI can detect nontermination when it can
conclusively decide that the base case of
recursive computation is unreachable
23Nugget of fact(5)
n ? 5 n ? n 1 y ? r r ? y n r ? 1
- n ? 5, n ? n 1 ) n 5
- ) range of n is -1, 5
- Conservative approximation
- Precise range of n in fact(5) is 0, 5
- 0, 5 µ -1, 5
- Distilled essence of all the value flows in the
program - Program properties can be read off it
24Convergence of NCI
- E is strongly bound
- Monotonically increasing
- Range is a subset of program subexpressions
- S is strongly bounded
- Reruns are triggered only by new mappings
NCI is a finite state system with no infinite
paths through it
25Extensions to the core NCI
- Path-sensitivity (NCIP)
- Context-sensitivity (NCI?)
- Mutable state (NCIH)
26Path-sensitivity (NCIP)
- n ? 5, n ? (n 1)n ? 0 ) 0 n 5
- n ? (n 1)n ? 0 can be used as a generation rule
only when n ? 0 - Range of n is 0, 5
Tag mappings with branch conditions in force at
their point of addition
27Mutable state (NCIH)
- Abstract heap H with destructive updates
- Operations on H
- Flow-sensitive
- Mimic those of CI
- Prune-rerun technique extended to find
fixed-point of H in addition to E
28Properties of NCI
- Soundness If an expression has a NCI then its CI
either computes to a value or computes forever. - Termination NCI terminates on all input.
- Runtime Complexity The runtime complexity of NCI
is exponential in size of the higher-order
programs, and polynomial in size of first-order
programs, modulo the complexity of the decision
procedure.
29Applications
- Value Range Analysis
- NCIP range of n in fact(5) is 0, 5
- Array bounds analysis
- Value range analysis on the index of the array
- Enforcement of Temporal Safety Properties
- Flow-sensitivity tracks temporal order accurately
- File management
- File have to opened before reading or writing
- Only opened files can be closed
30Related Work
- Abstract Interpretation Cousot77
- Not higher-order
- Higher-order abstract interpretation Cousot94,
Rosendahl97 - Not flow-sensitive
- ESP Yang04, SLAM Rajamani01, Shape analyses
Reps04 - Not higher-order
- Array bounds Pfenning98, Sarkar00
- Programmer annotations and not higher-order
respectively
31To be done for completion
- Empirical Evaluation
- Test on realistic programs
- Decision Procedures
- Develop heuristics to improve efficiency in
practice - Explore Potential Applications
- Information flow analysis
- Verify safety of legacy C code
- Memory leak detection
32Acknowledgements
- Prof. Scott Smith
- Advisor
- Assist. Prof. Christian Skalka, University of
Vermont Collaborator - Prof. Harry Mairson, Brandeis University
- Discussions on runtime complexity of NCI
33Thanks ?