Introduction to Abstract Interpretation

About This Presentation

Title:

Introduction to Abstract Interpretation

Description:

Introduction to Abstract Interpretation Andy King a.m.king_at_kent.ac.uk http://www.cs.kent.ac.uk/~amk – PowerPoint PPT presentation

Number of Views:138

Avg rating:3.0/5.0

Slides: 60

Provided by: Ulf59

Category:

more less

Transcript and Presenter's Notes

Title: Introduction to Abstract Interpretation

1
Introduction to Abstract Interpretation

Andy King
a.m.king_at_kent.ac.uk
http//www.cs.kent.ac.uk/amk

2
Pointers to the literature

SAS, POPL, ESOP, ICLP, ICFP,
Useful review articles and books
Patrick and Radhia Cousot, Comparing the Galois
connection and Widening/Narrowing approaches to
Abstract Interpretation, PLILP, LNCS 631,
269-295, 1992. Available from LIX library.
Patrick and Radhia Cousot, Abstract
interpretation and Application to Logic Programs,
JLP, 13(2-3)103-179, 1992
Flemming Neilson, Hanne Riis Neilson and Chris
Hankin, Principles of Program Analysis, Springer,
1999.
Patrick has a database of abstract interpretation
researchers and regularly writes tutorials, see,
CC02.

3
Applications of abstract interpretation

Verification can a concurrent program deadlock?
Is termination assured?
Parallelisation are two or more tasks
independent? What is the worst/best-case running
time of function?
Transformation can a definition be unfolded?
Will unfolding terminate?
Implementation can an operation be specialised
with knowledge of its (global) calling context?
Applications and players are incredibly diverse

4
Casting out nines algorithm

Which of the following multiplications are
correct
2173 ? 38 81574 or
2173 ? 38 82574
Casting out nines is a checking technique that is
really a form of abstract interpretation
Sum the digits in the multiplicand n1, multiplier
n2 and the product n to obtain s1, s2 and s.
Divide s1, s2 and s by 9 to compute the
remainder, that is, r1 s1 mod 9, r2 s2 mod 9
and r s mod 9.
Calculate r (r1 ? r2) mod 9
If r ? r then multiplication is incorrect
The algorithm returns incorrect or dont know

5
Running the numbers for 2173 ? 38 81574

Compute r1 (2173) mod 9
Compute r2 (38) mod 9
Calculate r (81574) mod 9
Calculate r (r1 ? r2) mod 9
Check (r ? r)
Deduce that 2173 ? 38 81574 is

6
Abstract interpretation is a theory of
relationships

The computational domain for multiplication
(concrete domain)
N the set of non-negative integers
The computational domain of remainders used in
the checking algorithm (abstract domain)
R 0, 1, , 8
Key question is what is the relationship between
an element n?N which is used in the real
algorithm and its analog r?R in the check

7
What is the relationship?

When multiplicand is n1 456, say, then the
check uses r1 (456) mod 9 4
Observe that
456 mod 9
(4100 56) mod 9
(490 410 56) mod 9
(410 56) mod 9
((4 5)10 6) mod 9
((4 5)9 (4 5) 6) mod 9
(4 5 6) mod 9
More generally, induction can show r1 n1 mod 9
and r2 n2 mod 9

8
Correctness is the preservation of relationships

The check simulates the concrete multiplication
and, in effect, is an abstract multiplication
Concrete multiplication is n n1 ? n2
Abstract multiplication is r (r1 ? r2) mod 9
Where r1 describes n1 and r2 describes n2
For brevity, write r ? n iff r n mod 9
Then abstract multiplication preserves ? iff
whenever r1 ? n1 and r2 ? n2 it follows that r ?
n

9
Correctness argument

Suppose r1 ? n1 and r2 ? n2
If
n n1 ? n2 then
n mod 9 (n1 ? n2) mod 9 hence
n mod 9 ((n1 mod 9) ? (n2 mod 9)) mod 9 whence
n mod 9 (r1 ? r2) mod 9 r therefore
r ? n
Consequently if ?(r ? n) then n ? n1 ? n2

10
Summary

Formalise the relationship between the data
Check that the relationship is preserved by the
abstract analogues of the concrete operations
The relational framework Acta Informatica,
30(2)103-129,1993 not only emphases the theory
of relations but is very general

11
Numeric approximation and widening

Abstract interpretation does not require an
abstract domain to be finite

12
Interval approximation

Consider the following Pascal-like program
SYNTOX PLDI90 inferred the invariants scoped
within
Invariants occur between consecutive lines in the
program
i?0,15 asserts 0?i?15 whereas i?0,0 means i0

begin i 0 1 i?0,0 while (i
lt 16) do 2 i?0,15
i i 1 3 i?1,16 end
4 i?16,16
13
Compilation versus (classic) interpretation

Abstract compilation compile the concrete
program into an abstract program (equation
system) and execute the abstract program
good separation of concerns that aids debugging
the particulars of the domain can be exploited to
reorder operations, specialise operations, etc
Abstract interpretation run the concrete
program but on-the-fly interpret its concrete
operations as abstract operations
ideal for a generic framework (toolkit) which is
parameterised by abstract domain plugins

14
Abstract domain that is used in interval analysis

Domain of intervals includes
l,u where l ? u and l,u ? Z for bounded sets ie
0, 5?0,1,4 since 0,1,4 ? 0, 5
? to represent the empty set of numbers, that is,
? ? ?
l,? for sets which are bounded below such as
l,l2,l4,
-?,u to represent sets which are bounded above
such as ..,l-5,l-3,l

15
Weakening intervals
if then 1 i?0,2 else 2
i?3,5 endif 3 i?0,5

Join (path merge) is defined
Put d1?d2 d1 if d2 ?
d2 else if d1 ?
min(l1,l2), max(u1,u2)
otherwise
whenever d1 l1,u1 and d2 l2,u2

16
Strengthening intervals

Meet is defined
Put d1?d2 ? if (d1 ?) ? (d2 ?)
max(l1,l2), min(u1,u2) otherwise
whenever d1 l1,u1 and d2 l2,u2

3 i?0,5 if (2 lt i) then 4 i?3,5
else 5 i?0,2
17
Meet and join are the basic primitives for
compilation

I1 0,0 since program point (1) immediately
follows the i 0
I2 (I1? I3) ? -?, 15 since
control from program points (1) and (3) flow
into (2)
point (2) is reached only if i lt 16 holds
I3 n1 n ? I2 since (3) is only reachable
from (2) via the increment
I4 (I1? I3) ? 16, ? since
control from (1) and (3) flow into (4)
point (4) is reached only if ?(i lt 16) holds

18
Interval iteration
I1 ? 0,0 0,0 0,0 0,0 0,0 0,0 0,0
I2 ? ? 0,0 0,0 0,1 0,1 0,2 0,2
I3 ? ? ? 1,1 1,1 1,2 1,2 1,3
I4 ? ? ? ? ? ? ? ?
I1 0,0 0,0 0,0 0,0
I2 0,15 0,15 0,15 0,15
I3 1,15 1,16 1,16 1,16
I4 ? ? 16,16 16,16
19
Jacobi versus Gauss-Seidel iteration

With Jacobi, the new vector ?I1,I2,I3,I4? of
intervals is calculated from the old
?I1,I2,I3,I4?
With Gauss-Seidel iteration
I1 is calculated from ?I1,I2,I3,I4?
I2 is calculated from ?I1,I2,I3,I4?
I3 is calculated from ?I1,I2,I3,I4?
I4 is calculated from ?I1,I2,I3,I4?

I1 ? 0,0 0,0 0,0 0,0 0,0 0,0
I2 ? 0,0 0,1 0,2 0,14 0,15 0,15
I3 ? 1,1 1,2 1,3 1,15 1,16 1,16
I4 ? ? ? ? ? 16,16 16,16
20
Gauss-Seidel versus chaotic iteration

Observe that I4 might change if either I1 or I3
change, hence evaluate I4 after I1 and I3
stabilise
Suggests that wait until stability is achieved at
one level before starting on the next

I1
I2
I1
I4
I3
I4
I2, I3
21
Gauss-Seidel versus chaotic iteration

Chaotic iteration can postpone evaluating Ii for
bounded number of iterations
I1 is calculated from ?I1,-,-,-?
I2 and I3 are calculated Gauss-Seidel style
from ?I1,I2,I3,-?
I4 is calculated from ?I1,I2,I3,I4?
Fast and (incremental) fixpoint solvers TOPLAS
22(2)187-223,2000 apply chaotic iteration

I1 ? 0,0 0,0 0,0 0,0 0,0 0,0
I2 ? - 0,0 0,1 0,15 0,15 0,15
I3 ? - 1,1 1,2 1,16 1,16 1,16
I4 ? - - - - - 16,16
22
Suppose i was decremented rather than incremented
begin i 0 1 i?0,0 while (i
lt 16) do 2 i?-?,0
i i -1 3 i?-?,-1 end
4 i??

I1 0,0
I2 (I1? I3) ? -?, 15
I3 n-1 n ? I2
I4 (I1? I3) ? 16, ?

I1 ? 0,0 0,0 0,0 0,0 0,0 0,0
I2 ? - - 0,0 -1,0 -2,0
I3 ? - - -1,-1 -2,-1 -3,-1
I4 ? - - - - - -
23
Ascending chain condition

A domain D is ACC iff it does not contain an
infinite strictly increasing chain d1ltd2ltd3lt
where dltd iff d?d and d?d (see below)
The interval domain D is ordered by
? ? d forall d?D and
l1,u1 ? l2,u2 iff l2?l1?u1?u2
and is not ACC since 0,0lt-1,0lt-2,0lt

T
-4 3 2 1 0 1 2 3 4
?
24
Some very expressive relational domains are ACC

The sub-expression elimination relies on
detecting duplicated expression evaluation
Karr Acta Informatica, 6, 133-151 noticed that
detecting an invariance such as
y (x/2) 6 was key to this optimisation

begin x (2 (z ?w)) - 2 y (z
7) ?w end
25
The affine domain

The domain of affine equations over n variables
is
D ?A,B?A is m?n dimensional matrix and
B is m dimensional column vector
D is ordered by
?A1,B1???A2,B2? iff (if A1xB1 then A2xB2)

26
An affine abstraction

Consider ?A,B? where
A B
Consider x ?x1,x2,x3?T where AxB
Then x1 1
Then x2 2x3 0

1 0 0
0 1 -2
1
0
begin x1 1 x2 2x3 end
27
Pre-orders versus posets

A pre-order ?D, ?? is a set D ordered by a binary
relation ? such that
If d?d for all d?D
If d1?d2 and d2?d3 then d1?d3
A poset is pre-order ?D, ?? such that
If d1?d2 and d2?d1 then d1d2

28
The affine domain is a pre-order (so it is not
ACC)

Observe ?A1,B1???A2,B2? but ?A2,B2???A1,B1?
A1 B1 A2 B2
To build a poset from a pre-order
define d?d iff d?d and d?d
define d? d?Dd?d and D? d?d?D
define d? ? d? iff d?d
The poset ?D?, ?? is ACC since chain length is
bounded by the number of variables n

1 0 0
0 1 0
0 0 1
1
0
0
2 0 0
0 1 0
0 0 1
2
0
0
29
Inducing termination for non-ACC (and huge ACC)
domains

Enforce convergence for intervals with a widening
operator ?D?D ? D
??d d
d?? d
l1,u1 ? l2,u2 if l2ltl1 then -? else l1,
if u1ltu2 then ? else u1
Examples
1,2?1,2 1,2
1,2?1,3 1,? but 1,3?1,2 1,3
Safe since li,ui?(l1,u1?l2,u2) for i?1,2

30
Chaotic iteration with widening

To terminate it is necessary to traverse each
loop a finite number of times
It is sufficient to pass through I2 or I3 a
finite number of times Bourdoncle, 1990
Thus widen at I3 since it is simpler

I1
I2
I3
I4
31
Termination for the decrement

I1 0,0
I2 (I1? I3) ? -?, 15
I3 I3?n-1 n ? I2 note the fix
I4 (I1? I3) ? 16, ?
When I2 -1,0 and I3 -1,-1, then
I3?n-1 n ? I2 -1,-1 ? -2,-1 -?,-1

I1 ? 0,0 0,0 0,0 0,0 0,0 0,0 0,0
I2 ? - - 0,0 -1,0 -?,0 -?,0 -?,0
I3 ? - - -1,-1 -?,-1 -?,-1 -?,-1 -?,-1
I4 ? - - - - - - ?
32
(Malicious) research challenge

Read a survey paper to find an abstract domain
that is ACC but has a maximal chain length of
O(2n)
Construct a program with O(n) symbols that
iterates through all O(2n) abstractions
Publish the program in IPL

33
Are numeric domains convex?

A set S?Rn is convex iff for all x,y?S it follows
that ?x (1-?)y 0???1 ? S
The 2 leftmost sets in R2 are convex but the 2
rightmost sets are not
Intervals and affine systems are convex

34
Arithmetic congruences are not convex

Elements of the arithmetic congruence (AC) domain
take the form x 2y 1 (mod 3) which describes
integral values of x and y
More exactly, the AC domain consists of
conjunctions of equations of the form
c1x1cmxm (c mod n) where ci,c?Z and n?N
Incredibly AC is ACC IJCM, 30, 165--190, 1989

35
Research challenge

Søndergaard FSTTCS,95 introduced the concept of
an immediate fixpoint
Consider the following (groundness) dependency
equations over the domain of Boolean functions
?Bool, ?, ??
f1 x ? (y ? z)
f2 ?t(?x(?z(u ? (t?x) ? v ? (t?z) ? f4)))
f3 ?u (?v(x ? u ? z ? v ? f2))
f4 f1? f3
Where ?x(f) fx ?true?fx ?false thus ?x(x?y)
true and ?x(x?y) y

36
The alternative tactic

The standard tactic is to apply iteration
Søndergaard found that the system can be solved
symbolically (like a quadratic)
This would be very useful for infinite domains
for improved precision and predictability

f1 false x ? (y?z) x ? (y?z) x ? (y?z) x ? (y?z)
f2 false false false v ? (y?u) (u?y) ? v
f3 false false false false (x?y) ? z
f4 false false x ? (y?z) x ? (y?z) (x?y) ? z
37
Combining analyses

Verifiers and optimisers are often multi-pass,
built from several separate analyses
Should the analyses be performed in parallel or
in sequence?
Analyses can interact to improve one another
(problem is in the complexity of the interaction
Pratt)

38
Pruning combined domains
2 ?x, y?
y b
x f(y, z)
1 ?x, true?
4 ?x, y?z?
5 ?x,x, y, x, z,y, z, (x?(y?z))? (y?z)?
z c
3 ?x, z?
39
Pruning combined domains

Suppose that ?1? D1?C and ?2?D2?C, then how is
DD1?D2 interpreted?
Then ?d1,d2??c iff d1?1c ? d2?2c
Ideally, many ?d1,d2??D will be redundant, that
is, ??c?C . c?1d1?c?2d2

40
Time versus precision from TOPLAS
17(1)28--44,1993
Time Precision
Share ASub Share?ASub Share ASub Share?ASub
serialise 9290 839 1870 235 35 35
init-subst 569 1250 829 5 72 5
map-color 4600 1040 5760 76 74 73
grammar 170 140 269 11 11 11
browse 51860 1609 49580 196 104 104
bid 1129 1000 1429 11 0 0
deriv 2819 2630 3550 0 0 0
rdtok 5670 4450 6389 185 48 48
read 8790 8380 11069 11 1 1
boyer 11040 3949 7709 242 93 93
peephole 20760 7990 23029 386 310 310
ann 93509 16789 53269 1935 1690 1690
41
The Galois framework

Abstract interpretation is classically presented
in terms of Galois connections

42
Lattices a prelude to Galois connections

Suppose ?S, ?? is a poset
A mapping ?S?S?S is a join (least upper bound)
iff
a?b is an upper bound of a and b, that is, a?a?b
and b?a?b for all a,b?S
a?b is the least upper bound, that is, if c?S is
an upper bound of a and b, then a?b?c
The definition of the meet ?S?S?S (the greatest
lower bound) is analogous

43
Complete lattices

A lattice ?S, ?, ?, ?? is a poset ?S, ?? equipped
with a join ? and a meet ?
The join concept can often be lifted to sets by
defining ??(S)?S iff
t?(?T) for all T?S and for all t?T
if t?s for all t?T then (?T)?s
If meet can often be lifted analogously, then the
lattice is complete
A lattice that contains a finite number of
elements is always complete

44
A lattice that is not complete

A hyperplane in 2-d space in a line and in 3-d
space is a plane
A hyperplane in Rn is any space that can be
defined by x?Rn c1x1cnxn c where
c1,,cn,c?R
A halfspace in Rn is any space that can be
defined by x?Rn c1x1cnxn ? c
A polyhedron is the intersection of a finite
number of half-spaces

45
Examples and non-examples in planar space
46
Join for polyhedra

Join of polyhedra P1 and P2 in Rn coincides (with
the topological closure) of the convex hull of
P1?P2

47
The join of an infinite set of polyhedra

Consider the following infinite chain of regular
polyhedra
The only space that contains all these polyhedra
is a circle yet this is not polyhedral

48
Galois connection example (2 complete lattices
)

The concrete domain ?C,?C,?C,?C? is ??(Z),?,?,??
The abstract domain ?A,?A,?A,?A? where
A ?,,-,T
? ?A a ?AT for all a?A
join ?A and meet ?A are defined by

?A ? - T
? ? - T
T T
- - T - T
T T T T T
?A ? - T
? ? ? ? ?
? ? T
- ? ? - T
T ? - T
49
concretisation mapping

The concretisation mapping ?A?C is defined
?(?) Ø
?() n?Z n gt 0
?(-) n?Z n lt 0
?(T) Z
Concretisation spells out how to interpret the
symbols in the abstract domain
Observe that ?(?)??()??(T) and more generally ?
is required to be order-preserving
If a1 ?A a2 then ?(a1) ?C ?(a2)

50
an abstraction mapping

Since 1,2??() and 1,2??(T) either or T can
represent 1,2.
Thus need a mechanism to map a set to the best
abstract object that represents it
The abstraction mapping ?C?A is defined
?(S) ? if S Ø
?(S) else if n gt 0 for all n?S
?(S) - else if n lt 0 for all n?S
?(S) T otherwise
Require ? to be monotonic, that is, if c1 ?C c2
then ?(c1) ?A ?(c2)

51
? can be defined from ? (and vice versa)

Observe ?(S) ?Aa?A S ? ?(a)
As an example consider ?(1,2)
1,2 ? ?(T) ?
1,2 ? ?() ?
1,2 ? ?(-) ?
1,2 ? ?(?) ?
Therefore ?(1,2) ?A, T
Dually ?(a) ?S?Z ?(S) ?A a

52
? requires A to be complete (dually for ? and C)

Since ?(S) ?Aa?A S ? ?(a), meet needs to be
defined over possibly infinite subsets of A
Observe that ? ?(R2)?A cannot be defined for A
set of planar polyhedra
Consider c ?x, y??R2 x2 y2 ? 1
But ?Aa1, a2, a3, is not defined

c
a1
a2
a3
53
?A, ?, C, ?? is Galois connection whenever

?A, ?A? and ?C, ?C? are complete lattices
The mappings ?C?A and ?A?C are monotonic, that
is,
If c1 ?C c2 then ?(c1) ?A ?(c2)
If a1 ?A a2 then ?(a1) ?C ?(a2)
The compositions ???A?A and ???C?C are
extensive and reductive respectively, that is,
c ?C (???)(c) for all c?C
(???)(a) ?A a for all a?A

54
c ?C (???)(c) is a statement about safe
abstractions

If c lt c for some c?C then working in abstract
setting has compromised precision
If c lt c for some c?C then working in abstract
setting has compromised correctness
Bar (???)(c) ltC c for every c?C
Thus stipulate c ?C (???)(c) for all c?C to
guarantee safety

?
c
a
c
?
55
(???)(a) ?A a is a statement about best
abstractions

Recall that ?(a) spells out what a?A represents
Thus a is one way to describe ?(a) T is another
way to describe ?(a) but a is better since a ?A T
Desire ?(?(a)) to be the best way to describe
?(a)
Therefore require ?(?(a)) ?A a

56
Collecting domains and semantics

Observe that C is not that concrete programs
include operations such as Z?Z?Z
C?(Z) is collecting domain which is easier to
abstract than Z since it already a lattice
To abstract Z?Z?Z, say, we synthesise a
collecting version C?(Z)??(Z)??(Z) and then
abstract that
Put S1 C S2 n1n2 n1? S1 and n2 ? S2

57
Safety and optimality requirements

The most precise (optimal) way to define AA?A?A
is to define a1 A a2 ?(?(a1)C?(a2))
Not practical since ?(a1) and ?(a2) are infinite
Handcraft computable AA?A?A with a1 A a2 ?A
a1 A a2 for all a1,a2?A
Merely need to assert ?(?(a1)C?(a2)) ?A a1 A
a2 for all a1,a2?A for correctness

58
Abstract multiplication

Consider ?(?()C?()) and A
Recall ?() n?Z n gt 0, hence ?()C?()
n1n2 n1 gt 0 and n2 gt 0 n n gt 0
Hence ?(?()C?()) A

Since ?(?()C?()) ?A A safety follows for
this case
Since A ?(?()C?()) optimality follows
for this case

A ? - T
? ? ? ? ?
? - T
- ? - T
T ? T T T
59
Exotic applications of abstract interpretation

Recovering programmer intentions for
understanding undocumented or third-party code
Verifying that a buffer-over cannot occur, or
pin-pointing where one might occur in a C program
Inferring the environment in which is a system of
synchronising agents will not deadlock
Lower-bound time-complexity analysis for
granularity throttling
Binding-time analysis for inferring off-line
unfolding decisions which avoid code-bloat