ContextSensitive, Interprocedural Dataflow Analysis as CFL Reachability - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

ContextSensitive, Interprocedural Dataflow Analysis as CFL Reachability

Description:

Exploded supergraph (next couple of s) Outputs from the algorithm: ... Is it always cheap to compute the exploded supergraph? ... – PowerPoint PPT presentation

Number of Views:99
Avg rating:3.0/5.0
Slides: 27
Provided by: sethh6
Category:

less

Transcript and Presenter's Notes

Title: ContextSensitive, Interprocedural Dataflow Analysis as CFL Reachability


1
Context-Sensitive, Interprocedural Dataflow
Analysis as CFL Reachability
  • Seth Hallem and Eric Watkins

2
Exhaustive Analysis Papers
  • Precise Interprocedural Dataflow Analysis via
    Graph Reachability
  • Reps, Horowitz, Sagiv -- POPL 1995
  • applies CFL reachability to context-sensitive,
    interprocedural dataflow analysis
  • Program Analysis via Graph Reachability
  • Reps -- ILP 1997
  • describes two additional applications
    interprocedural program slicing and shape analysis

3
The Reduction to CFL Reachability
  • Question 1 What problems can we solve?
  • Question 2 How do we set up the problem?
  • Question 3 How do we solve the problem?
  • Question 4 What is the complexity of this
    approach?
  • Running example possibly uninitialized variables

4
What problems can we solve?
  • IFDS problems
  • Finite set of dataflow facts (D)
  • Mapping from functions 2D?2D to edges in the
    CFG
  • Each is distributive wrt the meet operator
  • (a b) (a) (b)
  • Possibly uninitialized vars
  • Each program variable corresponds to a dataflow
    fact. When that fact holds, the variable may be
    uninitialized.
  • Transfer functions a variable is uninitialized
    if it was just declared or if it is assigned an
    expression containing uninitialized variables.

5
Simple Example
int z int main (void) int x ,y 0 / x,
z / y y x / x, y, z / z 0
/ x, y /
  • D x, y, z, domain/range of transfer
    functions is the power set of D (2D)

6
How do we setup and solve IFDS problems?
  • Inputs to the algorithm
  • Exploded supergraph (next couple of slides)
  • Outputs from the algorithm
  • meet-over-all-realizable-paths solution
  • MRPn pfq( )

q?Rpaths (startmain, n)
7
The Supergraph
8
Representation Relations
  • Each dataflow function, , is converted to a
    representation relation, which is represented as
    a graph consisting of 2D 2 nodes
  • D input nodes, one for each dataflow fact, plus
    the node ? (or 0), which corresponds to the empty
    set.
  • D output nodes plus the node ?
  • There is an edge from input node d1 to output
    node d2 if d2 ?(S) if d1?S and d2 ?(?)

9
More Representation Relations
  • (a) and (b) show representation relations for two
    functions (nodes smain and n1)
  • (c) and (d) show two ways to compose these
    relations
  • (d) illustrates the need for the ? in each
    relation

10
Exploding the Supergraph
11
CFL Reachability
  • Want to solve the dataflow problem with a
    reachability query on the exploded supergraph.
  • Not all paths in G are valid, though. Must
    match calls w/returns.
  • Insight context-sensitivity matching parens
    language of matching parens is a CFL

12
Context-Sensitivity CFL
  • Assign a unique index to each callsite, define a
    CFL of matching calls and returns.
  • Suppose we have two call-sites to function P(),
    which we label i and k
  • (i (k )k )i is a valid path
  • (i (k )k is a valid path
  • (i (k )i is not

13
Reachability Algorithm
  • Dynamic programming is the key
  • Start at the entry point to the program. Follow
    the edges in G, recording what dataflow facts we
    can reach.
  • At a procedure call, follow the call. To avoid
    re-doing any work, though, maintain a cache of
    edges of that summarize pieces of the
    computation.
  • Summary edges record the results of an entire
    procedure, start at a callsite, end at the
    corresponding return-site.
  • Path edges record the suffix of a valid path.

14
Dynamic Programming Details
15
Complexity
  • Worst case for general CFL reachability is cubic
    in the number of nodes in the graph
  • Can do better for dataflow analysis O(ED3) for
    any distributive problem, O(Call D3 hED2) for
    h-sparse problems
  • possibly uninitialized variables is 2-sparse when
    aliasing is ignored a variables status as
    initialized or uninitialized can only affect
    itself and one other variable (if it is assigned
    to that variable)

16
Other Applications
  • Interprocedural slicing
  • identify all pieces of a program relevant to a
    particular statement
  • Shape Analysis
  • For any DAG data structure, determines a superset
    of the possible shapes for that data structure.
  • Each dataflow fact corresponds to a single
    possible shape.
  • Problem infinite number of shapes. Solution is
    to define shape at program point q in terms of
    shape at previous program points.
  • ILP paper has an example of shape analysis of a
    linked list.

17
The other papers
  • Demand Interprocedural Dataflow Analysis
  • Horowitz, Reps, Sagiv -- FSE 1995
  • Demand-driven Computation of Interprocedural
    Data Flow
  • Duesterwald, Gupta, Soffa -- POPL 1995
  • Provide two possible frameworks for transforming
    any IFDS analysis into a demand-driven analysis

18
Steps to Demand-driven analysis
  • Define problem in the IFDS framework
  • Reverse the flow functions, or reverse the flow
    edges
  • Start with initial query lt d, n gt
  • Propagate the query backwards until solved

19
Reversing dataflow
  • In Duesterwald et al., the dataflow problem is
    specified with flow functions
  • Reverse the functions
  • For CFL problems, the problem is represented as a
    set of edges
  • Just reverse the edges

20
Example CCP
  • Notation
  • x set of dataflow facts
  • xw dataflow fact for variable w
  • fn(x)w transfer fn for variable w at node n
  • w c set of dataflow facts, where the fact
    for variable w equals c

21
Query Algorithm
  • Worklist holds the set of outstanding queries
  • While not empty, remove a query
  • Propagate backwards one node in the flowgraph
  • For a function call, create a backwards summary
    for that function and apply that

22
Query Propagation
  • More notation
  • rp entry node for procedure p
  • m, n normal nodes
  • fm reverse dataflow fn for node m
  • Ncall all nodes that are callsites
  • call(m) the procedure called at node m
  • f(rp, ep) summary fn for procedure p

23
Backwards edge propagation
24
Query Algorithm Efficiency
  • Optimizations function summaries, early
    termination, query result cache
  • In the worst case, its the same as exhaustive
    analysis
  • Some problems work better than others for
    demand-driven analysis.
  • Depends how much information you need to answer
    queries, or how many queries need to be made.

25
Conclusions
  • Demand-driven analysis is a powerful idea
  • Saves time and space, but in the worst case its
    no better than exhaustive analysis
  • Only works for distributive problems
  • Two approaches for demand-driven analysis are
    equivalent

26
Discussion
  • Are these algorithms generally applicable?
  • Are they fast?
  • No evidence the papers, but the answer is yes
    (see ESP in a couple of weeks)
  • Why are they efficient (beyond the complexity
    guarantee)?
  • Is it always cheap to compute the exploded
    supergraph?
  • How can an imprecise alias analysis influence
    this step and the overall performance of the
    algorithm?
Write a Comment
User Comments (0)
About PowerShow.com