Title: Data Flow Analysis ctd'
1Data Flow Analysis ctd.
- Data Flow Problems
- Formal Data Flow Systems
2Dead Code Elimination
- Code is useless if it can never be executed
- If we cannot reach a node from start of flowgraph
it is useless - It wont be executed, but removing it reduces
size of object code - Code is dead if it does not contribute to the
results of the program in any way. - Dead code may be inadvertently written, produced
as a result of code modification (upgrades) or
generated by the compiler during optimizations. - Dead code elimination optimization attempts to
recognize and eliminate dead code in a program. - We look at a strategy for doing so in next slides
3Dead Code Elimination
- Goal Remove all statements that do not
contribute to the output (or interact with the
environment) in any way. We only consider WRITE
statements in the following. - Strategy to mark useful statements
- Let STAT be the set of statements, OUTPUT be the
set of WRITE statements in STAT, and the sets
UD(S,v) for all S ? STAT and v ? VAR, with v ?
USE(S) be available. - Let KEEP be the set of useful statements
- Set mark to FALSE for each statement
- Initialize KEEP OUTPUT
This is a somewhat simplistic definition of dead
code.
4Dead Code Elimination
- While KEEP is not empty, select and remove an
arbitrary statement S from KEEP - mark(S) TRUE
- for all v in USE(S) do
- for all S in UD(S, v) do
- if mark(S) FALSE, add S to KEEP
- end for
- end for
- end while
Idea work backwards from data output and keep
all calculations that contribute to their
computation.
5Live Variables Problem
- Recall A variable v is live at the exit of a
basic block n iff there is a path from n to some
basic block n such that there is an outward
exposed use of v in n, and the path is
definition free for v. - v . .
.. v - The live variables problem is to determine, for
each node n in the flowgraph, the set of live
variables at the exit of n.
6Live Variables Problem
- If a variable is live, then it is used
subsequently. - This supports register allocation If a variable
is not live, then its value does not need to be
saved in a register. - We require a single-exit flowgraph to solve this
problem, since we need to track variable uses
back to their definitions. - So we start at the exit node
7Modeling Data Flow Problems
- We can save considerable programming effort by
using a framework to solve all of the dataflow
problems needed in a compiler. - Dataflow frameworks require a representation of
the program and a strategy for saving data flow
information associated with this representation. - We assume that a program is represented by the
flowgraph (as well as the IR ?). - Since there is a flowgraph for each individual
procedure, data flow information is gathered on a
per-procedure basis.
8Modeling Data Flow Problems
- Our general strategy for modeling data flow
problems expects a flowgraph and the following - specification of the data flow problem to be
solved - availability of the initial information required
- specification of the effect of individual basic
blocks on the information, and - the effect of a join in the flowgraph (i.e. what
happens when several paths in the flowgraph meet)
The idea we gather data flow information
systematically for all the statements in the
procedure.
9Modeling Data Flow Problems
- Given these, we can propagate information
through the flowgraph. - updated info
- what
info? - start info
- updated info
10Example Reaching Definitions Problem
RD(n) S DEF(S) is not empty and S reaches
n, a set of statements that define variables.
- Start info begin with empty sets for each basic
block - Effect of a basic block If S1 is set of reaching
definitions that enters n, then any statement in
S1 that is preserved in n will be a reaching
definition for successor blocks. - In addition, all outward exposed definitions in
n will be reaching definitions for successor
blocks. -
- S1 S v ..
Sv ... - Sa .. S a
Sa...
11Example Reaching Definitions Problem
RD(n) S DEF(S) is not empty and S reaches
n, a set of statements that define variables.
- Effect of a join if two nodes n1 and n2 join at
n, where S1 is the set of reaching definitions of
n1 and S2 is the set of reaching definitions of
n2, then - the definitions in S1 and the definitions in S2
all reach n. So we form S1 U S2 - S a
- Sa..
- Sb ..
- S b ...
12Example Reaching Definitions Problem
- Solution of problem
- Propagate information through flowgraph
iteratively. - Terminate when no new information is created for
any node in graph. - This means we have found all definitions in
program that reach a given node - Longest path from start node to end node is upper
bound on number of iterations required
13Monotone Data Flow System
- A uniform framework for modeling almost all data
flow problems is a Monotone Data Flow System
(MDS) - Unifies and simplifies implementation of a
variety of data flow problems - construct monotone data flow system,
- use iterative algorithm to propagate information
in the flowgraph - To use this framework we must define
- functions that describe the effect of a basic
block on the solution - the effect of a join in the flowgraph.
14Semi-Lattices
MDS based on a semi-lattice Actually, a bounded
semi-lattice with 0 and 1 elements
- Set L with binary meet operation such that for
all a,b,c L - a a a ( idempotent )
- a b b a ( commutative )
- a ( b c ) ( a b ) c
( associative ) - A semilattice has a
- zero element iff for some element 0, a 0
0 for all a L - one element iff for some element 1, a 1
a, for all a L
15Partial Order in a Semi-Lattice
- We may define a partial order on a semi-lattice
as follows - Given a semi-lattice ( L, ) and arbitrary
elements a, b L. - a b ?? a b a
- is a partial order on L.
- We use gt and lt in the usual way.
16Bounded Semi-Lattice
- Let a1, a2, , an be a sequence of elements from
semilattice L. - This sequence is a chain iff aj gt aj1 for 1
j n-1 - A semi-lattice is bounded iff for every a L,
there is some natural number ca such that the
length of every chain beginning with a is at most
ca. - Thus each chain is of finite length.
17Bounded Semi-Lattices
- We may extend the meet operation to an arbitrary
number of elements of a semi-lattice - j m aj a1 a2 am
- We may further extend to countably
infinite sets. - If L is bounded, the limit exists and is equal to
that of a finite set of elements.
18Bounded Semi-Lattices Example
- For a set M, (P (M), ) is a bounded
semi-lattice with a 0 and a 1 element. - In this case, is the set-theoretic relation
. - For a set M, (P (M), ) is a bounded
semi-lattice with a 0 and a 1 element. - In this case, is the set-theoretic relation
-1
U
U
19Bounded Semi-Lattices Example
- For a set M, (P (M), ) is a bounded
semi-lattice with a 0 and a 1 element. - In this case, is the set-theoretic relation
. - For a set M, (P (M), ) is a bounded
semi-lattice with a 0 and a 1 element. - In this case, is the set-theoretic relation
-1
U
B
A B iff A B A
U
A
A
A B iff A B A
U
B
U
20Bounded Semi-Lattices Example
- Natural numbers and the meet operation given by
min is also a bounded semi-lattice. -
- We can use the usual definition of , since
- a b iff min(a,b) a.
-
- If we use max as the meet operation, then we
would need to use the ordering instead.
21Monotone Data Flow System
- A monotone data flow system is founded on a
bounded semi-lattice (L, ) with a 0 and a 1
element. - It has functions that model the effect of a basic
block on data flow information. - Most data flow problems operate on semi-lattices
where sets (of statements, variables) are the
elements, and - the meet operation is the union or intersection
of such sets. - We use the properties of such semi-lattices to
write algorithms with known behavior.
22Effects of a Basic Block
- These are modeled by a function f L ? L. We
require such functions in a monotone data flow
system to be monotonic. - A total function f L ? L is monotonic if and
only if - for all a, b L, f ( a b ) f (
a ) f ( b ) - Alternatively, a function is monotonic iff
- for all a, b L, a b ? f(a) f(b)
23Effects of a Basic Block
- The iterative algorithm will produce precise
results if the functions are distributive.
Otherwise, they may not be precise. - A total function f L ? L is distributive if and
only if - for all a, b L, f ( a b ) f (
a ) f ( b )
If we know that our problem is distributive, the
algorithm will find all the available data flow
information. Otherwise, it might not.
24Fixpoint of Monotone Data Flow System
- Our plan is to repeatedly apply monotone
functions to update the effect of a basic block
upon data flow information associated with nodes - So we must be sure that this process terminates.
- This is guaranteed by the greatest fixpoint
theorem. - A fixpoint of a monotonic function f L ? L is a
value - a L such that f ( a ) a.
25Greatest Fixpoint Theorem
- The theorem
- let L be a bounded semi-lattice with 0 and 1
elements, and - let f L ? L be a monotonic function.
- Then there is a t 0 such that ft1(1)
ft(1). - ft(1) is the greatest fixpoint of f.
- The proof see next slide
26Greatest Fixpoint Theorem
- 1 f(1), since 1 x for all x L
- Since f is monotonic, f( 1 ) ( f (1) )
- So 1 f (1) f ( f (1) ) ... is a
chain which is bounded, and for some t, f ( ft (
1 )) ft (1) - Hence ft ( 1 ) is a fixpoint of f
- Now assume a is an arbitrary fixpoint of f.
- Then f ( a ) a
- Since a 1, f(a) f(1), we repeatedly apply
monotonicity definition a ft ( a ) ft( 1
) - Thus ft ( 1 ) is the greatest fixpoint of f
27Computing Greatest Fixpoint
- Given a bounded semi-lattice (L, ) with 0 and
1, and a monotonic function f L ? L, the
greatest fixpoint of f is computed as follows - a 1
- while f ( a ) lt a
- a f ( a )
- end while
- fixpoint a