Title: Symbolic Path Simulation in Path-Sensitive Dataflow Analysis
1Symbolic Path Simulation in Path-Sensitive
Dataflow Analysis
- Hari Hampapuram
- Jason Yue Yang
- Manuvir Das
Center for Software Excellence (CSE) Microsoft
Corporation
2Gist of Results
- Symbolic path simulation engine supporting
- Merge
- For merge-based path-sensitive analysis
- Function summaries
- For scalable global analysis
- Pointers
- Our main client is Windows
3Infeasible Path ? False Positive
extern int a, b void Process(int
handle) int x, y
if (a gt 0)
CloseHandle(handle) x
1 else
x 2 if (b gt 0)
y 1 else
y 2 if
(x ! 1)
UseHandle(handle)
START
OpenHandle
CloseHandle
OPEN
CLOSE
UseHandle
UseHandle
ERROR
4Infeasible Path ? False Positive
extern int a, b void Process(int
handle) int x, y
if (a gt 0)
CloseHandle(handle) x
1 else
x 2 if (b gt 0)
y 1 else
y 2 if
(x ! 1)
UseHandle(handle)
START
OpenHandle
CloseHandle
OPEN
CLOSE
UseHandle
UseHandle
ERROR
5Need for Merge
- The knob for scalability vs. precision tradeoff
- Always merge (traditional dataflow) ? false
errors - Always separate exponential blow-up
- Driven by client analyses
6Merge Criterion for ESP
- Selective merging based on property states
- Partition symbolic states into property states
and everything else - If the incoming paths differ in property states,
track them separately otherwise, merge them.
7Merge Criterion for ESP ? Example
extern int a, b void Process(int
handle) int x, y
if (a gt 0)
CloseHandle(handle) x
1 else
x 2 if (b gt 0)
y 1 else
y 2 if
(x ! 1)
UseHandle(handle)
Property states different along paths
8Merge Criterion for ESP ? Example
extern int a, b void Process(int
handle) int x, y
if (a gt 0)
CloseHandle(handle) x
1 else
x 2 if (b gt 0)
y 1 else
y 2 if
(x ! 1)
UseHandle(handle)
Property states different along paths ? Do not
merge
9Merge Criterion for ESP ? Example
extern int a, b void Process(int
handle) int x, y
if (a gt 0)
CloseHandle(handle) x
1 else
x 2 if (b gt 0)
y 1 else
y 2 if
(x ! 1)
UseHandle(handle)
Property states change along paths ? Do not
merge
Property states are the same
10Merge Criterion for ESP ? Example
extern int a, b void Process(int
handle) int x, y
if (a gt 0)
CloseHandle(handle) x
1 else
x 2 if (b gt 0)
y 1 else
y 2 if
(x ! 1)
UseHandle(handle)
Property states change along paths ? Do not
merge
Property states are the same ? Merge
11Merge Criterion for ESP ? Example
extern int a, b void Process(int
handle) int x, y
if (a gt 0)
CloseHandle(handle) x
1 else
x 2 if (b gt 0)
y 1 else
y 2 if
(x ! 1)
UseHandle(handle)
Property states change along paths ? Do not
merge
Property states are the same ? Merge
Still maintains the needed fact If CloseHandle
is called, branch should fail.
12Need for Function Summaries
extern int a, b void Process(int
handle) int x, y
if (a gt 0)
CloseHandle(handle) x
1 else
x 2 if (b gt 0)
y Foo(b)
else y 2
if (x ! 1)
UseHandle(handle)
Partial transfer functions Computed
on-demand Enforced by into-binding and
back-binding
13Support for Language Features
- Pointers
- Field-based objects
- Operator expressions
-
14Symbolic Simulator Architecture
Defect detection, core dump analysis, test
generation code review ...
Client Application
Client Application
Semantic translator
Simulation Interface (SI)
Simulation Interface (SI)
Simulation State Manager (SSM)
Theorem prover
15Semantic Domains
- Environment
- ProgramSymbol ? Loc
- Managed by Simulation Interface
- Store
- Loc ? Val
- Managed by Simulation State Manager
- Region-based model for symbolic store
- region ?Loc
- value ?Val
16Simulation State Manager (SSM)
- Tracking symbolic simulation states to answer
queries about path feasibility - What should be tracked?
- Mapping of store region ? value
- Constraints on values
17Regions
- Variable regions vs. deref regions
- Important for pointer dereference
- Important for supporting merge and binding
Variable regions R(p), R(q), R(x),
R(y) Deref regions R(p), R(q)
void Process(int p, int q)
int x p int y q
if (p ! q) return
if (p ! q)
// Not reachable
18Values
- Constant values (integers, floats, )
- Operator values (arithmetic, bitwise, relational)
- Symbolic values (general constraint variables)
- Region-initial values (constraint variables for
initial values) - Pointer values (for points-to relationship)
- Field-based values (for compound types)
19Need for Region-Initial Values
- Important for function summary
- Pre-condition simulation state at Entry node
- Post-condition simulation state at Exit node
- Input values vs. current values
- To support lazy initialization for input values
- An input region gets region-initial values by
default, unless it has been killed - Need to maintain a kill set
20Decision Procedures
- Current implementation
- Equality (e.g. a b) equivalence classes
- Disequality (e.g. a ! b) multi-maps between
equivalence classes - Inequality (e.g. alt b) a graph (nodes are
equivalence classes and edges are inequality
relations) - Can plug in other theorem provers if needed
21Merge
- Moves symbolic states upwards in the lattice
- Less constraints on path feasibility after merge
- Maps the memory graphs and the associated
constraints on values
0xEFD0
0xEFD0
R1
R1
R1
0xEFD0
?
JOIN
R2
R2
R2
1
2
3
3 gt 0
2 gt 0
1 gt 0
22Example Client Analysis ? ESP
- Path-sensitive, context sensitive,
inter-procedural defect detection tool for large
C/C programs
23Simulation Interface (SI)
- Fetching regions and values
- Assignments
- E.g., x 1
- Branches
- E.g., a b
- Procedure call (into-binding)
- Call back (back-binding)
24Into-Binding
- Two approaches
- Binding precise calling context into callee
- Less demand in reasoning power to refute
infeasible path - More suitable for top-down analysis
- Binding no constraints (TOP) into callee
- More demand in reasoning power to refute
infeasible path - More suitable for bottom-up analysis
- Binding from caller Call node to callee Entry
node - Bind parameters
- Bind global variables
- Bind constraints
25Back-Binding
- Binding from callee Exit node to caller Return
node - Bind the region-initial values of input regions
- Bind values of output regions
- Bind constraints
26Experiences
- Security properties for future version of Windows
- Difficult to check with other tools
- Scalability
- E.g., for all device drivers, found 500 errors
in 20 hours - Precision
- E.g., for Windows kernel (216,000 LOC, 9755
functions)
Bugs False Positives Time (sec)
With Path Simulation 2 0 1098
Without Path Simulation 2 12 1037
27Summary
- Critical for improving precision
- Scalable enough for industrial programs
- Other client analyses
- PSE
- Iterative refinement for ESP
- Beneficial to have built-in support for
- merge, function summaries, and pointers
28Thank You!For more information, please
visithttp//www.microsoft.com/windows/cse/pa