ContextSensitive, Interprocedural Dataflow Analysis as CFL Reachability - PowerPoint PPT Presentation

1 / 26

About This Presentation

Title:

ContextSensitive, Interprocedural Dataflow Analysis as CFL Reachability

Description:

Exploded supergraph (next couple of s) Outputs from the algorithm: ... Is it always cheap to compute the exploded supergraph? ... – PowerPoint PPT presentation

Number of Views:99

Avg rating:3.0/5.0

Slides: 27

Provided by: sethh6

Category:

more less

Transcript and Presenter's Notes

Title: ContextSensitive, Interprocedural Dataflow Analysis as CFL Reachability

1
Context-Sensitive, Interprocedural Dataflow
Analysis as CFL Reachability

Seth Hallem and Eric Watkins

2
Exhaustive Analysis Papers

Precise Interprocedural Dataflow Analysis via
Graph Reachability
Reps, Horowitz, Sagiv -- POPL 1995
applies CFL reachability to context-sensitive,
interprocedural dataflow analysis
Program Analysis via Graph Reachability
Reps -- ILP 1997
describes two additional applications
interprocedural program slicing and shape analysis

3
The Reduction to CFL Reachability

Question 1 What problems can we solve?
Question 2 How do we set up the problem?
Question 3 How do we solve the problem?
Question 4 What is the complexity of this
approach?
Running example possibly uninitialized variables

4
What problems can we solve?

IFDS problems
Finite set of dataflow facts (D)
Mapping from functions 2D?2D to edges in the
CFG
Each is distributive wrt the meet operator
(a b) (a) (b)
Possibly uninitialized vars
Each program variable corresponds to a dataflow
fact. When that fact holds, the variable may be
uninitialized.
Transfer functions a variable is uninitialized
if it was just declared or if it is assigned an
expression containing uninitialized variables.

5
Simple Example
int z int main (void) int x ,y 0 / x,
z / y y x / x, y, z / z 0
/ x, y /

D x, y, z, domain/range of transfer
functions is the power set of D (2D)

6
How do we setup and solve IFDS problems?

Inputs to the algorithm
Exploded supergraph (next couple of slides)
Outputs from the algorithm
meet-over-all-realizable-paths solution
MRPn pfq( )

q?Rpaths (startmain, n)
7
The Supergraph
8
Representation Relations

Each dataflow function, , is converted to a
representation relation, which is represented as
a graph consisting of 2D 2 nodes
D input nodes, one for each dataflow fact, plus
the node ? (or 0), which corresponds to the empty
set.
D output nodes plus the node ?
There is an edge from input node d1 to output
node d2 if d2 ?(S) if d1?S and d2 ?(?)

9
More Representation Relations

(a) and (b) show representation relations for two
functions (nodes smain and n1)
(c) and (d) show two ways to compose these
relations
(d) illustrates the need for the ? in each
relation

10
Exploding the Supergraph
11
CFL Reachability

Want to solve the dataflow problem with a
reachability query on the exploded supergraph.
Not all paths in G are valid, though. Must
match calls w/returns.
Insight context-sensitivity matching parens
language of matching parens is a CFL

12
Context-Sensitivity CFL

Assign a unique index to each callsite, define a
CFL of matching calls and returns.
Suppose we have two call-sites to function P(),
which we label i and k
(i (k )k )i is a valid path
(i (k )k is a valid path
(i (k )i is not

13
Reachability Algorithm

Dynamic programming is the key
Start at the entry point to the program. Follow
the edges in G, recording what dataflow facts we
can reach.
At a procedure call, follow the call. To avoid
re-doing any work, though, maintain a cache of
edges of that summarize pieces of the
computation.
Summary edges record the results of an entire
procedure, start at a callsite, end at the
corresponding return-site.
Path edges record the suffix of a valid path.

14
Dynamic Programming Details
15
Complexity

Worst case for general CFL reachability is cubic
in the number of nodes in the graph
Can do better for dataflow analysis O(ED3) for
any distributive problem, O(Call D3 hED2) for
h-sparse problems
possibly uninitialized variables is 2-sparse when
aliasing is ignored a variables status as
initialized or uninitialized can only affect
itself and one other variable (if it is assigned
to that variable)

16
Other Applications

Interprocedural slicing
identify all pieces of a program relevant to a
particular statement
Shape Analysis
For any DAG data structure, determines a superset
of the possible shapes for that data structure.
Each dataflow fact corresponds to a single
possible shape.
Problem infinite number of shapes. Solution is
to define shape at program point q in terms of
shape at previous program points.
ILP paper has an example of shape analysis of a
linked list.

17
The other papers

Demand Interprocedural Dataflow Analysis
Horowitz, Reps, Sagiv -- FSE 1995
Demand-driven Computation of Interprocedural
Data Flow
Duesterwald, Gupta, Soffa -- POPL 1995
Provide two possible frameworks for transforming
any IFDS analysis into a demand-driven analysis

18
Steps to Demand-driven analysis

Define problem in the IFDS framework
Reverse the flow functions, or reverse the flow
edges
Start with initial query lt d, n gt
Propagate the query backwards until solved

19
Reversing dataflow

In Duesterwald et al., the dataflow problem is
specified with flow functions
Reverse the functions
For CFL problems, the problem is represented as a
set of edges
Just reverse the edges

20
Example CCP

Notation
x set of dataflow facts
xw dataflow fact for variable w
fn(x)w transfer fn for variable w at node n
w c set of dataflow facts, where the fact
for variable w equals c

21
Query Algorithm

Worklist holds the set of outstanding queries
While not empty, remove a query
Propagate backwards one node in the flowgraph
For a function call, create a backwards summary
for that function and apply that

22
Query Propagation

More notation
rp entry node for procedure p
m, n normal nodes
fm reverse dataflow fn for node m
Ncall all nodes that are callsites
call(m) the procedure called at node m
f(rp, ep) summary fn for procedure p

23
Backwards edge propagation
24
Query Algorithm Efficiency

Optimizations function summaries, early
termination, query result cache
In the worst case, its the same as exhaustive
analysis
Some problems work better than others for
demand-driven analysis.
Depends how much information you need to answer
queries, or how many queries need to be made.

25
Conclusions

Demand-driven analysis is a powerful idea
Saves time and space, but in the worst case its
no better than exhaustive analysis
Only works for distributive problems
Two approaches for demand-driven analysis are
equivalent

26
Discussion

Are these algorithms generally applicable?
Are they fast?
No evidence the papers, but the answer is yes
(see ESP in a couple of weeks)
Why are they efficient (beyond the complexity
guarantee)?
Is it always cheap to compute the exploded
supergraph?
How can an imprecise alias analysis influence
this step and the overall performance of the
algorithm?