Title: Introduction to Abstract Interpretation
1Introduction to Abstract Interpretation
- Andy King
- a.m.king_at_kent.ac.uk
- http//www.cs.kent.ac.uk/amk
2Pointers to the literature
- SAS, POPL, ESOP, ICLP, ICFP,
- Useful review articles and books
- Patrick and Radhia Cousot, Comparing the Galois
connection and Widening/Narrowing approaches to
Abstract Interpretation, PLILP, LNCS 631,
269-295, 1992. Available from LIX library. - Patrick and Radhia Cousot, Abstract
interpretation and Application to Logic Programs,
JLP, 13(2-3)103-179, 1992 - Flemming Neilson, Hanne Riis Neilson and Chris
Hankin, Principles of Program Analysis, Springer,
1999. - Patrick has a database of abstract interpretation
researchers and regularly writes tutorials, see,
CC02.
3Applications of abstract interpretation
- Verification can a concurrent program deadlock?
Is termination assured? - Parallelisation are two or more tasks
independent? What is the worst/best-case running
time of function? - Transformation can a definition be unfolded?
Will unfolding terminate? - Implementation can an operation be specialised
with knowledge of its (global) calling context? - Applications and players are incredibly diverse
4Casting out nines algorithm
- Which of the following multiplications are
correct - 2173 ? 38 81574 or
- 2173 ? 38 82574
- Casting out nines is a checking technique that is
really a form of abstract interpretation - Sum the digits in the multiplicand n1, multiplier
n2 and the product n to obtain s1, s2 and s. - Divide s1, s2 and s by 9 to compute the
remainder, that is, r1 s1 mod 9, r2 s2 mod 9
and r s mod 9. - Calculate r (r1 ? r2) mod 9
- If r ? r then multiplication is incorrect
- The algorithm returns incorrect or dont know
5Running the numbers for 2173 ? 38 81574
- Compute r1 (2173) mod 9
- Compute r2 (38) mod 9
- Calculate r (81574) mod 9
- Calculate r (r1 ? r2) mod 9
- Check (r ? r)
- Deduce that 2173 ? 38 81574 is
6Abstract interpretation is a theory of
relationships
- The computational domain for multiplication
(concrete domain) - N the set of non-negative integers
- The computational domain of remainders used in
the checking algorithm (abstract domain) - R 0, 1, , 8
- Key question is what is the relationship between
an element n?N which is used in the real
algorithm and its analog r?R in the check
7What is the relationship?
- When multiplicand is n1 456, say, then the
check uses r1 (456) mod 9 4 - Observe that
- 456 mod 9
- (4100 56) mod 9
- (490 410 56) mod 9
- (410 56) mod 9
- ((4 5)10 6) mod 9
- ((4 5)9 (4 5) 6) mod 9
- (4 5 6) mod 9
- More generally, induction can show r1 n1 mod 9
and r2 n2 mod 9
8Correctness is the preservation of relationships
- The check simulates the concrete multiplication
and, in effect, is an abstract multiplication - Concrete multiplication is n n1 ? n2
- Abstract multiplication is r (r1 ? r2) mod 9
- Where r1 describes n1 and r2 describes n2
- For brevity, write r ? n iff r n mod 9
- Then abstract multiplication preserves ? iff
whenever r1 ? n1 and r2 ? n2 it follows that r ?
n
9Correctness argument
- Suppose r1 ? n1 and r2 ? n2
- If
- n n1 ? n2 then
- n mod 9 (n1 ? n2) mod 9 hence
- n mod 9 ((n1 mod 9) ? (n2 mod 9)) mod 9 whence
- n mod 9 (r1 ? r2) mod 9 r therefore
- r ? n
- Consequently if ?(r ? n) then n ? n1 ? n2
10Summary
- Formalise the relationship between the data
- Check that the relationship is preserved by the
abstract analogues of the concrete operations - The relational framework Acta Informatica,
30(2)103-129,1993 not only emphases the theory
of relations but is very general
11Numeric approximation and widening
- Abstract interpretation does not require an
abstract domain to be finite
12Interval approximation
- Consider the following Pascal-like program
- SYNTOX PLDI90 inferred the invariants scoped
within - Invariants occur between consecutive lines in the
program - i?0,15 asserts 0?i?15 whereas i?0,0 means i0
begin i 0 1 i?0,0 while (i
lt 16) do 2 i?0,15
i i 1 3 i?1,16 end
4 i?16,16
13Compilation versus (classic) interpretation
- Abstract compilation compile the concrete
program into an abstract program (equation
system) and execute the abstract program - good separation of concerns that aids debugging
- the particulars of the domain can be exploited to
reorder operations, specialise operations, etc - Abstract interpretation run the concrete
program but on-the-fly interpret its concrete
operations as abstract operations - ideal for a generic framework (toolkit) which is
parameterised by abstract domain plugins
14Abstract domain that is used in interval analysis
- Domain of intervals includes
- l,u where l ? u and l,u ? Z for bounded sets ie
0, 5?0,1,4 since 0,1,4 ? 0, 5 - ? to represent the empty set of numbers, that is,
? ? ? - l,? for sets which are bounded below such as
l,l2,l4, - -?,u to represent sets which are bounded above
such as ..,l-5,l-3,l
15Weakening intervals
if then 1 i?0,2 else 2
i?3,5 endif 3 i?0,5
- Join (path merge) is defined
- Put d1?d2 d1 if d2 ?
- d2 else if d1 ?
- min(l1,l2), max(u1,u2)
otherwise - whenever d1 l1,u1 and d2 l2,u2
16Strengthening intervals
- Meet is defined
- Put d1?d2 ? if (d1 ?) ? (d2 ?)
- max(l1,l2), min(u1,u2) otherwise
- whenever d1 l1,u1 and d2 l2,u2
3 i?0,5 if (2 lt i) then 4 i?3,5
else 5 i?0,2
17Meet and join are the basic primitives for
compilation
- I1 0,0 since program point (1) immediately
follows the i 0 - I2 (I1? I3) ? -?, 15 since
- control from program points (1) and (3) flow
into (2) - point (2) is reached only if i lt 16 holds
- I3 n1 n ? I2 since (3) is only reachable
from (2) via the increment - I4 (I1? I3) ? 16, ? since
- control from (1) and (3) flow into (4)
- point (4) is reached only if ?(i lt 16) holds
18Interval iteration
I1 ? 0,0 0,0 0,0 0,0 0,0 0,0 0,0
I2 ? ? 0,0 0,0 0,1 0,1 0,2 0,2
I3 ? ? ? 1,1 1,1 1,2 1,2 1,3
I4 ? ? ? ? ? ? ? ?
I1 0,0 0,0 0,0 0,0
I2 0,15 0,15 0,15 0,15
I3 1,15 1,16 1,16 1,16
I4 ? ? 16,16 16,16
19Jacobi versus Gauss-Seidel iteration
- With Jacobi, the new vector ?I1,I2,I3,I4? of
intervals is calculated from the old
?I1,I2,I3,I4? - With Gauss-Seidel iteration
- I1 is calculated from ?I1,I2,I3,I4?
- I2 is calculated from ?I1,I2,I3,I4?
- I3 is calculated from ?I1,I2,I3,I4?
- I4 is calculated from ?I1,I2,I3,I4?
I1 ? 0,0 0,0 0,0 0,0 0,0 0,0
I2 ? 0,0 0,1 0,2 0,14 0,15 0,15
I3 ? 1,1 1,2 1,3 1,15 1,16 1,16
I4 ? ? ? ? ? 16,16 16,16
20Gauss-Seidel versus chaotic iteration
- Observe that I4 might change if either I1 or I3
change, hence evaluate I4 after I1 and I3
stabilise - Suggests that wait until stability is achieved at
one level before starting on the next
I1
I2
I1
I4
I3
I4
I2, I3
21Gauss-Seidel versus chaotic iteration
- Chaotic iteration can postpone evaluating Ii for
bounded number of iterations - I1 is calculated from ?I1,-,-,-?
- I2 and I3 are calculated Gauss-Seidel style
from ?I1,I2,I3,-? - I4 is calculated from ?I1,I2,I3,I4?
- Fast and (incremental) fixpoint solvers TOPLAS
22(2)187-223,2000 apply chaotic iteration
I1 ? 0,0 0,0 0,0 0,0 0,0 0,0
I2 ? - 0,0 0,1 0,15 0,15 0,15
I3 ? - 1,1 1,2 1,16 1,16 1,16
I4 ? - - - - - 16,16
22Suppose i was decremented rather than incremented
begin i 0 1 i?0,0 while (i
lt 16) do 2 i?-?,0
i i -1 3 i?-?,-1 end
4 i??
- I1 0,0
- I2 (I1? I3) ? -?, 15
- I3 n-1 n ? I2
- I4 (I1? I3) ? 16, ?
I1 ? 0,0 0,0 0,0 0,0 0,0 0,0
I2 ? - - 0,0 -1,0 -2,0
I3 ? - - -1,-1 -2,-1 -3,-1
I4 ? - - - - - -
23Ascending chain condition
- A domain D is ACC iff it does not contain an
infinite strictly increasing chain d1ltd2ltd3lt
where dltd iff d?d and d?d (see below) - The interval domain D is ordered by
- ? ? d forall d?D and
- l1,u1 ? l2,u2 iff l2?l1?u1?u2
- and is not ACC since 0,0lt-1,0lt-2,0lt
T
-4 3 2 1 0 1 2 3 4
?
24Some very expressive relational domains are ACC
- The sub-expression elimination relies on
detecting duplicated expression evaluation - Karr Acta Informatica, 6, 133-151 noticed that
detecting an invariance such as - y (x/2) 6 was key to this optimisation
begin x (2 (z ?w)) - 2 y (z
7) ?w end
25The affine domain
- The domain of affine equations over n variables
is - D ?A,B?A is m?n dimensional matrix and
- B is m dimensional column vector
-
- D is ordered by
- ?A1,B1???A2,B2? iff (if A1xB1 then A2xB2)
26 An affine abstraction
- Consider ?A,B? where
- A B
- Consider x ?x1,x2,x3?T where AxB
- Then x1 1
- Then x2 2x3 0
1 0 0
0 1 -2
1
0
begin x1 1 x2 2x3 end
27Pre-orders versus posets
- A pre-order ?D, ?? is a set D ordered by a binary
relation ? such that - If d?d for all d?D
- If d1?d2 and d2?d3 then d1?d3
- A poset is pre-order ?D, ?? such that
- If d1?d2 and d2?d1 then d1d2
28The affine domain is a pre-order (so it is not
ACC)
- Observe ?A1,B1???A2,B2? but ?A2,B2???A1,B1?
- A1 B1 A2 B2
- To build a poset from a pre-order
- define d?d iff d?d and d?d
- define d? d?Dd?d and D? d?d?D
- define d? ? d? iff d?d
- The poset ?D?, ?? is ACC since chain length is
bounded by the number of variables n
1 0 0
0 1 0
0 0 1
1
0
0
2 0 0
0 1 0
0 0 1
2
0
0
29Inducing termination for non-ACC (and huge ACC)
domains
- Enforce convergence for intervals with a widening
operator ?D?D ? D - ??d d
- d?? d
- l1,u1 ? l2,u2 if l2ltl1 then -? else l1,
- if u1ltu2 then ? else u1
- Examples
- 1,2?1,2 1,2
- 1,2?1,3 1,? but 1,3?1,2 1,3
- Safe since li,ui?(l1,u1?l2,u2) for i?1,2
30Chaotic iteration with widening
- To terminate it is necessary to traverse each
loop a finite number of times - It is sufficient to pass through I2 or I3 a
finite number of times Bourdoncle, 1990 - Thus widen at I3 since it is simpler
I1
I2
I3
I4
31Termination for the decrement
- I1 0,0
- I2 (I1? I3) ? -?, 15
- I3 I3?n-1 n ? I2 note the fix
- I4 (I1? I3) ? 16, ?
- When I2 -1,0 and I3 -1,-1, then
- I3?n-1 n ? I2 -1,-1 ? -2,-1 -?,-1
I1 ? 0,0 0,0 0,0 0,0 0,0 0,0 0,0
I2 ? - - 0,0 -1,0 -?,0 -?,0 -?,0
I3 ? - - -1,-1 -?,-1 -?,-1 -?,-1 -?,-1
I4 ? - - - - - - ?
32(Malicious) research challenge
- Read a survey paper to find an abstract domain
that is ACC but has a maximal chain length of
O(2n) - Construct a program with O(n) symbols that
iterates through all O(2n) abstractions - Publish the program in IPL
33Are numeric domains convex?
- A set S?Rn is convex iff for all x,y?S it follows
that ?x (1-?)y 0???1 ? S - The 2 leftmost sets in R2 are convex but the 2
rightmost sets are not - Intervals and affine systems are convex
34Arithmetic congruences are not convex
- Elements of the arithmetic congruence (AC) domain
take the form x 2y 1 (mod 3) which describes
integral values of x and y - More exactly, the AC domain consists of
conjunctions of equations of the form - c1x1cmxm (c mod n) where ci,c?Z and n?N
- Incredibly AC is ACC IJCM, 30, 165--190, 1989
35Research challenge
- Søndergaard FSTTCS,95 introduced the concept of
an immediate fixpoint - Consider the following (groundness) dependency
equations over the domain of Boolean functions
?Bool, ?, ?? - f1 x ? (y ? z)
- f2 ?t(?x(?z(u ? (t?x) ? v ? (t?z) ? f4)))
- f3 ?u (?v(x ? u ? z ? v ? f2))
- f4 f1? f3
- Where ?x(f) fx ?true?fx ?false thus ?x(x?y)
true and ?x(x?y) y
36The alternative tactic
- The standard tactic is to apply iteration
- Søndergaard found that the system can be solved
symbolically (like a quadratic) - This would be very useful for infinite domains
for improved precision and predictability
f1 false x ? (y?z) x ? (y?z) x ? (y?z) x ? (y?z)
f2 false false false v ? (y?u) (u?y) ? v
f3 false false false false (x?y) ? z
f4 false false x ? (y?z) x ? (y?z) (x?y) ? z
37Combining analyses
- Verifiers and optimisers are often multi-pass,
built from several separate analyses - Should the analyses be performed in parallel or
in sequence? - Analyses can interact to improve one another
(problem is in the complexity of the interaction
Pratt)
38Pruning combined domains
2 ?x, y?
y b
x f(y, z)
1 ?x, true?
4 ?x, y?z?
5 ?x,x, y, x, z,y, z, (x?(y?z))? (y?z)?
z c
3 ?x, z?
39Pruning combined domains
- Suppose that ?1? D1?C and ?2?D2?C, then how is
DD1?D2 interpreted? - Then ?d1,d2??c iff d1?1c ? d2?2c
- Ideally, many ?d1,d2??D will be redundant, that
is, ??c?C . c?1d1?c?2d2
40Time versus precision from TOPLAS
17(1)28--44,1993
Time Precision
Share ASub Share?ASub Share ASub Share?ASub
serialise 9290 839 1870 235 35 35
init-subst 569 1250 829 5 72 5
map-color 4600 1040 5760 76 74 73
grammar 170 140 269 11 11 11
browse 51860 1609 49580 196 104 104
bid 1129 1000 1429 11 0 0
deriv 2819 2630 3550 0 0 0
rdtok 5670 4450 6389 185 48 48
read 8790 8380 11069 11 1 1
boyer 11040 3949 7709 242 93 93
peephole 20760 7990 23029 386 310 310
ann 93509 16789 53269 1935 1690 1690
41The Galois framework
- Abstract interpretation is classically presented
in terms of Galois connections
42Lattices a prelude to Galois connections
- Suppose ?S, ?? is a poset
- A mapping ?S?S?S is a join (least upper bound)
iff - a?b is an upper bound of a and b, that is, a?a?b
and b?a?b for all a,b?S - a?b is the least upper bound, that is, if c?S is
an upper bound of a and b, then a?b?c - The definition of the meet ?S?S?S (the greatest
lower bound) is analogous
43Complete lattices
- A lattice ?S, ?, ?, ?? is a poset ?S, ?? equipped
with a join ? and a meet ? - The join concept can often be lifted to sets by
defining ??(S)?S iff - t?(?T) for all T?S and for all t?T
- if t?s for all t?T then (?T)?s
- If meet can often be lifted analogously, then the
lattice is complete - A lattice that contains a finite number of
elements is always complete
44A lattice that is not complete
- A hyperplane in 2-d space in a line and in 3-d
space is a plane - A hyperplane in Rn is any space that can be
defined by x?Rn c1x1cnxn c where
c1,,cn,c?R - A halfspace in Rn is any space that can be
defined by x?Rn c1x1cnxn ? c - A polyhedron is the intersection of a finite
number of half-spaces
45Examples and non-examples in planar space
46Join for polyhedra
- Join of polyhedra P1 and P2 in Rn coincides (with
the topological closure) of the convex hull of
P1?P2
47The join of an infinite set of polyhedra
- Consider the following infinite chain of regular
polyhedra - The only space that contains all these polyhedra
is a circle yet this is not polyhedral
48Galois connection example (2 complete lattices
)
- The concrete domain ?C,?C,?C,?C? is ??(Z),?,?,??
- The abstract domain ?A,?A,?A,?A? where
- A ?,,-,T
- ? ?A a ?AT for all a?A
- join ?A and meet ?A are defined by
?A ? - T
? ? - T
T T
- - T - T
T T T T T
?A ? - T
? ? ? ? ?
? ? T
- ? ? - T
T ? - T
49 concretisation mapping
- The concretisation mapping ?A?C is defined
- ?(?) Ø
- ?() n?Z n gt 0
- ?(-) n?Z n lt 0
- ?(T) Z
- Concretisation spells out how to interpret the
symbols in the abstract domain - Observe that ?(?)??()??(T) and more generally ?
is required to be order-preserving - If a1 ?A a2 then ?(a1) ?C ?(a2)
50 an abstraction mapping
- Since 1,2??() and 1,2??(T) either or T can
represent 1,2. - Thus need a mechanism to map a set to the best
abstract object that represents it - The abstraction mapping ?C?A is defined
- ?(S) ? if S Ø
- ?(S) else if n gt 0 for all n?S
- ?(S) - else if n lt 0 for all n?S
- ?(S) T otherwise
- Require ? to be monotonic, that is, if c1 ?C c2
then ?(c1) ?A ?(c2)
51? can be defined from ? (and vice versa)
- Observe ?(S) ?Aa?A S ? ?(a)
- As an example consider ?(1,2)
- 1,2 ? ?(T) ?
- 1,2 ? ?() ?
- 1,2 ? ?(-) ?
- 1,2 ? ?(?) ?
- Therefore ?(1,2) ?A, T
- Dually ?(a) ?S?Z ?(S) ?A a
52? requires A to be complete (dually for ? and C)
- Since ?(S) ?Aa?A S ? ?(a), meet needs to be
defined over possibly infinite subsets of A - Observe that ? ?(R2)?A cannot be defined for A
set of planar polyhedra - Consider c ?x, y??R2 x2 y2 ? 1
- But ?Aa1, a2, a3, is not defined
c
a1
a2
a3
53?A, ?, C, ?? is Galois connection whenever
- ?A, ?A? and ?C, ?C? are complete lattices
- The mappings ?C?A and ?A?C are monotonic, that
is, - If c1 ?C c2 then ?(c1) ?A ?(c2)
- If a1 ?A a2 then ?(a1) ?C ?(a2)
- The compositions ???A?A and ???C?C are
extensive and reductive respectively, that is, - c ?C (???)(c) for all c?C
- (???)(a) ?A a for all a?A
54c ?C (???)(c) is a statement about safe
abstractions
- If c lt c for some c?C then working in abstract
setting has compromised precision - If c lt c for some c?C then working in abstract
setting has compromised correctness - Bar (???)(c) ltC c for every c?C
- Thus stipulate c ?C (???)(c) for all c?C to
guarantee safety
?
c
a
c
?
55(???)(a) ?A a is a statement about best
abstractions
- Recall that ?(a) spells out what a?A represents
- Thus a is one way to describe ?(a) T is another
way to describe ?(a) but a is better since a ?A T - Desire ?(?(a)) to be the best way to describe
?(a) - Therefore require ?(?(a)) ?A a
56Collecting domains and semantics
- Observe that C is not that concrete programs
include operations such as Z?Z?Z - C?(Z) is collecting domain which is easier to
abstract than Z since it already a lattice - To abstract Z?Z?Z, say, we synthesise a
collecting version C?(Z)??(Z)??(Z) and then
abstract that - Put S1 C S2 n1n2 n1? S1 and n2 ? S2
57Safety and optimality requirements
- The most precise (optimal) way to define AA?A?A
is to define a1 A a2 ?(?(a1)C?(a2)) - Not practical since ?(a1) and ?(a2) are infinite
- Handcraft computable AA?A?A with a1 A a2 ?A
a1 A a2 for all a1,a2?A - Merely need to assert ?(?(a1)C?(a2)) ?A a1 A
a2 for all a1,a2?A for correctness
58Abstract multiplication
- Consider ?(?()C?()) and A
- Recall ?() n?Z n gt 0, hence ?()C?()
n1n2 n1 gt 0 and n2 gt 0 n n gt 0 - Hence ?(?()C?()) A
- Since ?(?()C?()) ?A A safety follows for
this case - Since A ?(?()C?()) optimality follows
for this case
A ? - T
? ? ? ? ?
? - T
- ? - T
T ? T T T
59Exotic applications of abstract interpretation
- Recovering programmer intentions for
understanding undocumented or third-party code - Verifying that a buffer-over cannot occur, or
pin-pointing where one might occur in a C program - Inferring the environment in which is a system of
synchronising agents will not deadlock - Lower-bound time-complexity analysis for
granularity throttling - Binding-time analysis for inferring off-line
unfolding decisions which avoid code-bloat