Title: Interprocedural Program Analyses
1Interprocedural Program Analyses
- David Heine Vladimir Livshits Brian Murphy
- Christopher Unkel Hansel Wan
- Stanford University
- http//suif.stanford.edu/
2Outline
- I. Data structures for program analysis
- II. Interprocedural analysis framework
- III. Interprocedural passes and parallelizer
- IV. Pointer alias analysis
3I. Data structures Lattice values
- Commonly used in data flow analysis
- bottom, top, meet operators
- Includes definitions of some common lattices,
e.g. - bitvectors, constants, intervals
4Graphs
- Common algorithms
- Iterated dominance frontier
- strongly connected components
- Generates dot graph output
- Example control flow graphs and call graphs
5Region Graphs
- Capture the hierarchical program structure along
side the statements - An interpretation of the statements without
dismantling them - Useful for elimination-style algorithms
- A region
- has one entry and possibly multiple exits
- may be a terminal region (straight line control
flow internally) - or a composite region
- Flow between subregions is specified by
- control flow graph (adjacency lists)
- a regular expression (path expression with
composition, meet and Kleene star) - Extensible with new nodes
6Region Transformations
- Flattening regions
- Conversions from regular expression RE -gt CFG and
CFG -gt RE - May involve some code cloning
7III. Interprocedural Analysis
- Two important design choices in program analysis
- Across procedures
- No interprocedural analysis
- Interprocedural context-insensitive
- Interprocedural context-sensitive
- Within a procedure
- Flow-insensitive
- Flow-sensitive interval/region based
- Flow-sensitive iterative over flow-graph
8Efficient Context-Sensitive Analysis
- Bottom-up
- A region/interval a procedure or a loop
- An edge call or code in inner scope
- Summarize each region (with a transfer function)
- Find strongly connected components (sccs)
- Bottom-up traversal of sccs
- Iteration to find fixed-point for recursive
functions - Top-down
- Top-down propagation of values
- Iteration to find fixed-point for recursive
functions (sccs)
call
inner loop
scc
9Interprocedural Framework Architecture
Driver
Bottom-up Top-down Linear traversal
User-def. handlers/lattice values
Compound Handlers
Procedure calls and returns Composite regions
Data Structures
Call graphs, SCC, lattice values
Regions, control flow graphs
10Interprocedural Framework Architecture
- Interprocedural analysis data structures
- e.g. call graphs, regions or intervals
- Handlers Orthogonal sets of handlers for
different groups of constructs - Primitives user specifies analysis-specific
semantics of primitives - Compound handles compound statements and calls
- User chooses between handlers of different styles
- e.g. no interprocedural analysis versus
context-sensitive - e.g. flow-insensitive vs. flow-sensitive
- All the handlers are registered in a visitor
- Driver
- Driver invoked by users request for information
(demand driven) - Build prepass data structures
- Invokes the right set of handlers in right
order(e.g. bottom-up traversal of call graph)
11III. Interprocedural Passes
- Scalar analysis
- Mod/ref, reduction recognition Bottom-up
flow-insensitive - Liveness for privatization Bottom-up and
top-down, flow-sensitive - Constraint propagation Top-down,
flow-insensitive - Array analysis
- Dependence analysis
- Privatization analysis
12Region-Based Array Analysis
- Array sections are represented as sets of linear
inequalities(Omega) - Bottom-up and backward-flow analysis
- For each region compute 4 sections for each
array accessed - M may have been written
- W must have been written
- R may have been read
- E (exposed-read) values read are defined before
the region executes - Dependence test
- iteration i, j s.t. Mi ? Rj ?
- Privatization test
- ? iteration i, Ei ?
13Example ModRef Analysis
- class ModRefProblem public BUProblem public
ModRefProblem(SuifEnv suif_env,
PtrAnalysisType the_ptrAnalysisType) virtual
void initialize() ... ModRefProblemMo
dRefProblem(SuifEnv suif_env,
PtrAnalysisType the_ptrAnalysisType)
BUProblem(suif_env, "ModRef", new
ModRefValue(), new ModRefValue(), new
ModRefUserBUHandler(suif_env, the_ptrAnalysisType)
, new CallGraphIPBUHandler(suif_env),
new FlowInsensitiveIntraBUHandler(suif_env)), - ptrAnalysisType(the_ptrAnalysisType)
initialize()
14Lattice Values
- class ModRefValue public LatticeValue
public ModRefValue() ModRefValue()
AbslocSetValue get_mod() const return
modVars AbslocSetValue get_ref() const
return refVars virtual void do_meet(const
LatticeValue other, bool changedNULL)
virtual LatticeValue top() const virtual
LatticeValue id() const virtual void
do_compose(const LatticeValue other, bool
changedNULL) virtual void do_star(const
VariableSymbol idx, const Expression
lb, const Expression ub, bool
changed) virtual void do_widen(const
LatticeValue other, bool changed)
LatticeValue clone() const bool is_top()
const bool is_id() const String
to_string() const ...
15User-Defined Handler
- class ModRefUserBUHandler public UserBUHandler
public ModRefUserBUHandler(SuifEnv
suif_env, PtrAnalysisType ptrAnalysisType)
virtual UNSHARED LatticeValue handle_statement
(BUProblem problem, Statement stmt)
virtual LatticeValue handle_simple_region - (BUProblem problem, SimpleRegion
region) virtual LatticeValue
handle_predicate_region - (BUProblem problem, PredicateRegion
region) virtual LatticeValue
handle_mwb_default_region - (BUProblem problem, MWBDefaultRegion
region) virtual LatticeValue
handle_eval_predicate_region - (BUProblem problem, EvalPredicateRegion
region) virtual LatticeValue
handle_undef_proc_region - (BUProblem problem, UndefProcRegion
region) - ...
16Most of the work is done here!
- UNSHARED LatticeValue ModRefUserBUHandlerhandle
_statement (BUProblem problem, Statement
stmt - ModRefValue curr_value new ModRefValue()
for (SemanticHelperSrcVarIter iter(stmt)
iter.is_valid() iter.next()) - curr_value-gtadd_ref(iter.current())
if(is_kind_ofltStoreVariableStatementgt(stmt)) - StoreVariableStatement s
toltStoreVariableStatementgt(stmt)
VarAbsLocation dest VarAbsLocationcreate_v
ar_absloc(s-gtget_destination())
curr_value-gtget_mod()-gtadd(dest) else
if(is_kind_ofltStoreStatementgt(stmt)) // x y
StoreStatement s toltStoreStatementgt(stmt)
curr_value-gtget_mod()-gtdo_join( - new AbslocSetValue(query-gtget_absloc_set(s),
true)) return curr_value
17Parallelizer
- Parallelizes a loop if
- there is no abnormal exit out of a loop
- all scalar variables are either
- read-only variables
- privatizable variables
- reduction variables
- all array variables
- either have no dependence
- or can be privatized
18IV. Pointer Alias Analysis
- Steensgaards pointer alias analysis
- Flow-insensitive and context-insensitive,
type-inference based analysis - Very efficient near linear-time analysis
- Very inaccurate
- A good bootstrapping step for interprocedural C
program analysis - Enables the construction of a call graph with
indirect function calls
19Context-Sensitive Pointer Analysis
- Implementation of the analysis described in
Scalable Context-Sensitive Flow Analysis Using
Instantiation Constraints Fahndrich, Rehof, Das,
(PLDI 00!) in SUIF 2. - Context-sensitive, flow-insensitive flow
analysis. - Instantiation constraints represent caller-callee
relationships. - Handles function pointers smoothly, and is
efficient. - One application is pointer alias analysis.
- Implementation runs in three phases
- constraint generation
- constraint solution
- reachability analysis
- Implemented in SUIF in 6 weeks (as a first
project in SUIF)
20Demo of Two Visualization Tools
- From the implementation in SUIF, running on
sizeable programs - Progress of the analysis
- Resulting type graphs
21Progress Visualization
- Simple X windows progress monitor.
- One pixel for each node.
- Allocated in scan order as they are created.
- White initially created red callee green
caller grey merged node. - Visualization results
- Constraint generation
- white nodes created some functions and call
sites. - Constraint solution
- nodes merged together and many greyed
out.Several passes of working down pointer
chains ab, ab, ab. - Red and green spread to formal and actual
parameters. - Some new nodes created for product types.
- Scattered merging as the algorithm deduces flow
through functions - Nearly 1,000,000 nodes created for gcc. 2.5
minutes CPU time on this laptop.
22Result Visualization
- pointergraph compress.suif compress.ps
- ghostview compress.ps
- Resulting type graphs courtesy of Dot
- Pointees below pointers.
- Arguments below functions.
- Callers below callees.
- Nodes marked with variable names.
- Optional grouping by function (only for small
programs.)