Title: ContextSensitive, Interprocedural Dataflow Analysis as CFL Reachability
1Context-Sensitive, Interprocedural Dataflow
Analysis as CFL Reachability
- Seth Hallem and Eric Watkins
2Exhaustive Analysis Papers
- Precise Interprocedural Dataflow Analysis via
Graph Reachability - Reps, Horowitz, Sagiv -- POPL 1995
- applies CFL reachability to context-sensitive,
interprocedural dataflow analysis - Program Analysis via Graph Reachability
- Reps -- ILP 1997
- describes two additional applications
interprocedural program slicing and shape analysis
3The Reduction to CFL Reachability
- Question 1 What problems can we solve?
- Question 2 How do we set up the problem?
- Question 3 How do we solve the problem?
- Question 4 What is the complexity of this
approach? - Running example possibly uninitialized variables
4What problems can we solve?
- IFDS problems
- Finite set of dataflow facts (D)
- Mapping from functions 2D?2D to edges in the
CFG - Each is distributive wrt the meet operator
- (a b) (a) (b)
- Possibly uninitialized vars
- Each program variable corresponds to a dataflow
fact. When that fact holds, the variable may be
uninitialized. - Transfer functions a variable is uninitialized
if it was just declared or if it is assigned an
expression containing uninitialized variables.
5Simple Example
int z int main (void) int x ,y 0 / x,
z / y y x / x, y, z / z 0
/ x, y /
- D x, y, z, domain/range of transfer
functions is the power set of D (2D)
6How do we setup and solve IFDS problems?
- Inputs to the algorithm
- Exploded supergraph (next couple of slides)
- Outputs from the algorithm
- meet-over-all-realizable-paths solution
- MRPn pfq( )
q?Rpaths (startmain, n)
7The Supergraph
8Representation Relations
- Each dataflow function, , is converted to a
representation relation, which is represented as
a graph consisting of 2D 2 nodes - D input nodes, one for each dataflow fact, plus
the node ? (or 0), which corresponds to the empty
set. - D output nodes plus the node ?
- There is an edge from input node d1 to output
node d2 if d2 ?(S) if d1?S and d2 ?(?)
9More Representation Relations
- (a) and (b) show representation relations for two
functions (nodes smain and n1) - (c) and (d) show two ways to compose these
relations - (d) illustrates the need for the ? in each
relation
10Exploding the Supergraph
11CFL Reachability
- Want to solve the dataflow problem with a
reachability query on the exploded supergraph. - Not all paths in G are valid, though. Must
match calls w/returns. - Insight context-sensitivity matching parens
language of matching parens is a CFL
12Context-Sensitivity CFL
- Assign a unique index to each callsite, define a
CFL of matching calls and returns. - Suppose we have two call-sites to function P(),
which we label i and k - (i (k )k )i is a valid path
- (i (k )k is a valid path
- (i (k )i is not
13Reachability Algorithm
- Dynamic programming is the key
- Start at the entry point to the program. Follow
the edges in G, recording what dataflow facts we
can reach. - At a procedure call, follow the call. To avoid
re-doing any work, though, maintain a cache of
edges of that summarize pieces of the
computation. - Summary edges record the results of an entire
procedure, start at a callsite, end at the
corresponding return-site. - Path edges record the suffix of a valid path.
14Dynamic Programming Details
15Complexity
- Worst case for general CFL reachability is cubic
in the number of nodes in the graph - Can do better for dataflow analysis O(ED3) for
any distributive problem, O(Call D3 hED2) for
h-sparse problems - possibly uninitialized variables is 2-sparse when
aliasing is ignored a variables status as
initialized or uninitialized can only affect
itself and one other variable (if it is assigned
to that variable)
16Other Applications
- Interprocedural slicing
- identify all pieces of a program relevant to a
particular statement - Shape Analysis
- For any DAG data structure, determines a superset
of the possible shapes for that data structure. - Each dataflow fact corresponds to a single
possible shape. - Problem infinite number of shapes. Solution is
to define shape at program point q in terms of
shape at previous program points. - ILP paper has an example of shape analysis of a
linked list.
17The other papers
- Demand Interprocedural Dataflow Analysis
- Horowitz, Reps, Sagiv -- FSE 1995
- Demand-driven Computation of Interprocedural
Data Flow - Duesterwald, Gupta, Soffa -- POPL 1995
- Provide two possible frameworks for transforming
any IFDS analysis into a demand-driven analysis
18Steps to Demand-driven analysis
- Define problem in the IFDS framework
- Reverse the flow functions, or reverse the flow
edges - Start with initial query lt d, n gt
- Propagate the query backwards until solved
19Reversing dataflow
- In Duesterwald et al., the dataflow problem is
specified with flow functions - Reverse the functions
- For CFL problems, the problem is represented as a
set of edges - Just reverse the edges
20Example CCP
- Notation
- x set of dataflow facts
- xw dataflow fact for variable w
- fn(x)w transfer fn for variable w at node n
- w c set of dataflow facts, where the fact
for variable w equals c
21Query Algorithm
- Worklist holds the set of outstanding queries
- While not empty, remove a query
- Propagate backwards one node in the flowgraph
- For a function call, create a backwards summary
for that function and apply that
22Query Propagation
- More notation
- rp entry node for procedure p
- m, n normal nodes
- fm reverse dataflow fn for node m
- Ncall all nodes that are callsites
- call(m) the procedure called at node m
- f(rp, ep) summary fn for procedure p
23Backwards edge propagation
24Query Algorithm Efficiency
- Optimizations function summaries, early
termination, query result cache - In the worst case, its the same as exhaustive
analysis - Some problems work better than others for
demand-driven analysis. - Depends how much information you need to answer
queries, or how many queries need to be made.
25Conclusions
- Demand-driven analysis is a powerful idea
- Saves time and space, but in the worst case its
no better than exhaustive analysis - Only works for distributive problems
- Two approaches for demand-driven analysis are
equivalent
26Discussion
- Are these algorithms generally applicable?
- Are they fast?
- No evidence the papers, but the answer is yes
(see ESP in a couple of weeks) - Why are they efficient (beyond the complexity
guarantee)? - Is it always cheap to compute the exploded
supergraph? - How can an imprecise alias analysis influence
this step and the overall performance of the
algorithm?