Introduction to Abstract Interpretation - PowerPoint PPT Presentation

About This Presentation
Title:

Introduction to Abstract Interpretation

Description:

Introduction to Abstract Interpretation Andy King a.m.king_at_kent.ac.uk http://www.cs.kent.ac.uk/~amk – PowerPoint PPT presentation

Number of Views:135
Avg rating:3.0/5.0
Slides: 60
Provided by: Ulf59
Category:

less

Transcript and Presenter's Notes

Title: Introduction to Abstract Interpretation


1
Introduction to Abstract Interpretation
  • Andy King
  • a.m.king_at_kent.ac.uk
  • http//www.cs.kent.ac.uk/amk

2
Pointers to the literature
  • SAS, POPL, ESOP, ICLP, ICFP,
  • Useful review articles and books
  • Patrick and Radhia Cousot, Comparing the Galois
    connection and Widening/Narrowing approaches to
    Abstract Interpretation, PLILP, LNCS 631,
    269-295, 1992. Available from LIX library.
  • Patrick and Radhia Cousot, Abstract
    interpretation and Application to Logic Programs,
    JLP, 13(2-3)103-179, 1992
  • Flemming Neilson, Hanne Riis Neilson and Chris
    Hankin, Principles of Program Analysis, Springer,
    1999.
  • Patrick has a database of abstract interpretation
    researchers and regularly writes tutorials, see,
    CC02.

3
Applications of abstract interpretation
  • Verification can a concurrent program deadlock?
    Is termination assured?
  • Parallelisation are two or more tasks
    independent? What is the worst/best-case running
    time of function?
  • Transformation can a definition be unfolded?
    Will unfolding terminate?
  • Implementation can an operation be specialised
    with knowledge of its (global) calling context?
  • Applications and players are incredibly diverse

4
Casting out nines algorithm
  • Which of the following multiplications are
    correct
  • 2173 ? 38 81574 or
  • 2173 ? 38 82574
  • Casting out nines is a checking technique that is
    really a form of abstract interpretation
  • Sum the digits in the multiplicand n1, multiplier
    n2 and the product n to obtain s1, s2 and s.
  • Divide s1, s2 and s by 9 to compute the
    remainder, that is, r1 s1 mod 9, r2 s2 mod 9
    and r s mod 9.
  • Calculate r (r1 ? r2) mod 9
  • If r ? r then multiplication is incorrect
  • The algorithm returns incorrect or dont know

5
Running the numbers for 2173 ? 38 81574
  • Compute r1 (2173) mod 9
  • Compute r2 (38) mod 9
  • Calculate r (81574) mod 9
  • Calculate r (r1 ? r2) mod 9
  • Check (r ? r)
  • Deduce that 2173 ? 38 81574 is

6
Abstract interpretation is a theory of
relationships
  • The computational domain for multiplication
    (concrete domain)
  • N the set of non-negative integers
  • The computational domain of remainders used in
    the checking algorithm (abstract domain)
  • R 0, 1, , 8
  • Key question is what is the relationship between
    an element n?N which is used in the real
    algorithm and its analog r?R in the check

7
What is the relationship?
  • When multiplicand is n1 456, say, then the
    check uses r1 (456) mod 9 4
  • Observe that
  • 456 mod 9
  • (4100 56) mod 9
  • (490 410 56) mod 9
  • (410 56) mod 9
  • ((4 5)10 6) mod 9
  • ((4 5)9 (4 5) 6) mod 9
  • (4 5 6) mod 9
  • More generally, induction can show r1 n1 mod 9
    and r2 n2 mod 9

8
Correctness is the preservation of relationships
  • The check simulates the concrete multiplication
    and, in effect, is an abstract multiplication
  • Concrete multiplication is n n1 ? n2
  • Abstract multiplication is r (r1 ? r2) mod 9
  • Where r1 describes n1 and r2 describes n2
  • For brevity, write r ? n iff r n mod 9
  • Then abstract multiplication preserves ? iff
    whenever r1 ? n1 and r2 ? n2 it follows that r ?
    n

9
Correctness argument
  • Suppose r1 ? n1 and r2 ? n2
  • If
  • n n1 ? n2 then
  • n mod 9 (n1 ? n2) mod 9 hence
  • n mod 9 ((n1 mod 9) ? (n2 mod 9)) mod 9 whence
  • n mod 9 (r1 ? r2) mod 9 r therefore
  • r ? n
  • Consequently if ?(r ? n) then n ? n1 ? n2

10
Summary
  • Formalise the relationship between the data
  • Check that the relationship is preserved by the
    abstract analogues of the concrete operations
  • The relational framework Acta Informatica,
    30(2)103-129,1993 not only emphases the theory
    of relations but is very general

11
Numeric approximation and widening
  • Abstract interpretation does not require an
    abstract domain to be finite

12
Interval approximation
  • Consider the following Pascal-like program
  • SYNTOX PLDI90 inferred the invariants scoped
    within
  • Invariants occur between consecutive lines in the
    program
  • i?0,15 asserts 0?i?15 whereas i?0,0 means i0

begin i 0 1 i?0,0 while (i
lt 16) do 2 i?0,15
i i 1 3 i?1,16 end
4 i?16,16
13
Compilation versus (classic) interpretation
  • Abstract compilation compile the concrete
    program into an abstract program (equation
    system) and execute the abstract program
  • good separation of concerns that aids debugging
  • the particulars of the domain can be exploited to
    reorder operations, specialise operations, etc
  • Abstract interpretation run the concrete
    program but on-the-fly interpret its concrete
    operations as abstract operations
  • ideal for a generic framework (toolkit) which is
    parameterised by abstract domain plugins

14
Abstract domain that is used in interval analysis
  • Domain of intervals includes
  • l,u where l ? u and l,u ? Z for bounded sets ie
    0, 5?0,1,4 since 0,1,4 ? 0, 5
  • ? to represent the empty set of numbers, that is,
    ? ? ?
  • l,? for sets which are bounded below such as
    l,l2,l4,
  • -?,u to represent sets which are bounded above
    such as ..,l-5,l-3,l

15
Weakening intervals
if then 1 i?0,2 else 2
i?3,5 endif 3 i?0,5
  • Join (path merge) is defined
  • Put d1?d2 d1 if d2 ?
  • d2 else if d1 ?
  • min(l1,l2), max(u1,u2)
    otherwise
  • whenever d1 l1,u1 and d2 l2,u2

16
Strengthening intervals
  • Meet is defined
  • Put d1?d2 ? if (d1 ?) ? (d2 ?)
  • max(l1,l2), min(u1,u2) otherwise
  • whenever d1 l1,u1 and d2 l2,u2

3 i?0,5 if (2 lt i) then 4 i?3,5
else 5 i?0,2
17
Meet and join are the basic primitives for
compilation
  • I1 0,0 since program point (1) immediately
    follows the i 0
  • I2 (I1? I3) ? -?, 15 since
  • control from program points (1) and (3) flow
    into (2)
  • point (2) is reached only if i lt 16 holds
  • I3 n1 n ? I2 since (3) is only reachable
    from (2) via the increment
  • I4 (I1? I3) ? 16, ? since
  • control from (1) and (3) flow into (4)
  • point (4) is reached only if ?(i lt 16) holds

18
Interval iteration
I1 ? 0,0 0,0 0,0 0,0 0,0 0,0 0,0
I2 ? ? 0,0 0,0 0,1 0,1 0,2 0,2
I3 ? ? ? 1,1 1,1 1,2 1,2 1,3
I4 ? ? ? ? ? ? ? ?
I1 0,0 0,0 0,0 0,0
I2 0,15 0,15 0,15 0,15
I3 1,15 1,16 1,16 1,16
I4 ? ? 16,16 16,16
19
Jacobi versus Gauss-Seidel iteration
  • With Jacobi, the new vector ?I1,I2,I3,I4? of
    intervals is calculated from the old
    ?I1,I2,I3,I4?
  • With Gauss-Seidel iteration
  • I1 is calculated from ?I1,I2,I3,I4?
  • I2 is calculated from ?I1,I2,I3,I4?
  • I3 is calculated from ?I1,I2,I3,I4?
  • I4 is calculated from ?I1,I2,I3,I4?

I1 ? 0,0 0,0 0,0 0,0 0,0 0,0
I2 ? 0,0 0,1 0,2 0,14 0,15 0,15
I3 ? 1,1 1,2 1,3 1,15 1,16 1,16
I4 ? ? ? ? ? 16,16 16,16
20
Gauss-Seidel versus chaotic iteration
  • Observe that I4 might change if either I1 or I3
    change, hence evaluate I4 after I1 and I3
    stabilise
  • Suggests that wait until stability is achieved at
    one level before starting on the next

I1
I2
I1
I4
I3
I4
I2, I3
21
Gauss-Seidel versus chaotic iteration
  • Chaotic iteration can postpone evaluating Ii for
    bounded number of iterations
  • I1 is calculated from ?I1,-,-,-?
  • I2 and I3 are calculated Gauss-Seidel style
    from ?I1,I2,I3,-?
  • I4 is calculated from ?I1,I2,I3,I4?
  • Fast and (incremental) fixpoint solvers TOPLAS
    22(2)187-223,2000 apply chaotic iteration

I1 ? 0,0 0,0 0,0 0,0 0,0 0,0
I2 ? - 0,0 0,1 0,15 0,15 0,15
I3 ? - 1,1 1,2 1,16 1,16 1,16
I4 ? - - - - - 16,16
22
Suppose i was decremented rather than incremented
begin i 0 1 i?0,0 while (i
lt 16) do 2 i?-?,0
i i -1 3 i?-?,-1 end
4 i??
  • I1 0,0
  • I2 (I1? I3) ? -?, 15
  • I3 n-1 n ? I2
  • I4 (I1? I3) ? 16, ?

I1 ? 0,0 0,0 0,0 0,0 0,0 0,0
I2 ? - - 0,0 -1,0 -2,0
I3 ? - - -1,-1 -2,-1 -3,-1
I4 ? - - - - - -
23
Ascending chain condition
  • A domain D is ACC iff it does not contain an
    infinite strictly increasing chain d1ltd2ltd3lt
    where dltd iff d?d and d?d (see below)
  • The interval domain D is ordered by
  • ? ? d forall d?D and
  • l1,u1 ? l2,u2 iff l2?l1?u1?u2
  • and is not ACC since 0,0lt-1,0lt-2,0lt

T
-4 3 2 1 0 1 2 3 4
?
24
Some very expressive relational domains are ACC
  • The sub-expression elimination relies on
    detecting duplicated expression evaluation
  • Karr Acta Informatica, 6, 133-151 noticed that
    detecting an invariance such as
  • y (x/2) 6 was key to this optimisation

begin x (2 (z ?w)) - 2 y (z
7) ?w end
25
The affine domain
  • The domain of affine equations over n variables
    is
  • D ?A,B?A is m?n dimensional matrix and
  • B is m dimensional column vector
  • D is ordered by
  • ?A1,B1???A2,B2? iff (if A1xB1 then A2xB2)

26
An affine abstraction
  • Consider ?A,B? where
  • A B
  • Consider x ?x1,x2,x3?T where AxB
  • Then x1 1
  • Then x2 2x3 0

1 0 0
0 1 -2
1
0
begin x1 1 x2 2x3 end
27
Pre-orders versus posets
  • A pre-order ?D, ?? is a set D ordered by a binary
    relation ? such that
  • If d?d for all d?D
  • If d1?d2 and d2?d3 then d1?d3
  • A poset is pre-order ?D, ?? such that
  • If d1?d2 and d2?d1 then d1d2

28
The affine domain is a pre-order (so it is not
ACC)
  • Observe ?A1,B1???A2,B2? but ?A2,B2???A1,B1?
  • A1 B1 A2 B2
  • To build a poset from a pre-order
  • define d?d iff d?d and d?d
  • define d? d?Dd?d and D? d?d?D
  • define d? ? d? iff d?d
  • The poset ?D?, ?? is ACC since chain length is
    bounded by the number of variables n

1 0 0
0 1 0
0 0 1
1
0
0
2 0 0
0 1 0
0 0 1
2
0
0
29
Inducing termination for non-ACC (and huge ACC)
domains
  • Enforce convergence for intervals with a widening
    operator ?D?D ? D
  • ??d d
  • d?? d
  • l1,u1 ? l2,u2 if l2ltl1 then -? else l1,
  • if u1ltu2 then ? else u1
  • Examples
  • 1,2?1,2 1,2
  • 1,2?1,3 1,? but 1,3?1,2 1,3
  • Safe since li,ui?(l1,u1?l2,u2) for i?1,2

30
Chaotic iteration with widening
  • To terminate it is necessary to traverse each
    loop a finite number of times
  • It is sufficient to pass through I2 or I3 a
    finite number of times Bourdoncle, 1990
  • Thus widen at I3 since it is simpler

I1
I2
I3
I4
31
Termination for the decrement
  • I1 0,0
  • I2 (I1? I3) ? -?, 15
  • I3 I3?n-1 n ? I2 note the fix
  • I4 (I1? I3) ? 16, ?
  • When I2 -1,0 and I3 -1,-1, then
  • I3?n-1 n ? I2 -1,-1 ? -2,-1 -?,-1

I1 ? 0,0 0,0 0,0 0,0 0,0 0,0 0,0
I2 ? - - 0,0 -1,0 -?,0 -?,0 -?,0
I3 ? - - -1,-1 -?,-1 -?,-1 -?,-1 -?,-1
I4 ? - - - - - - ?
32
(Malicious) research challenge
  • Read a survey paper to find an abstract domain
    that is ACC but has a maximal chain length of
    O(2n)
  • Construct a program with O(n) symbols that
    iterates through all O(2n) abstractions
  • Publish the program in IPL

33
Are numeric domains convex?
  • A set S?Rn is convex iff for all x,y?S it follows
    that ?x (1-?)y 0???1 ? S
  • The 2 leftmost sets in R2 are convex but the 2
    rightmost sets are not
  • Intervals and affine systems are convex

34
Arithmetic congruences are not convex
  • Elements of the arithmetic congruence (AC) domain
    take the form x 2y 1 (mod 3) which describes
    integral values of x and y
  • More exactly, the AC domain consists of
    conjunctions of equations of the form
  • c1x1cmxm (c mod n) where ci,c?Z and n?N
  • Incredibly AC is ACC IJCM, 30, 165--190, 1989

35
Research challenge
  • Søndergaard FSTTCS,95 introduced the concept of
    an immediate fixpoint
  • Consider the following (groundness) dependency
    equations over the domain of Boolean functions
    ?Bool, ?, ??
  • f1 x ? (y ? z)
  • f2 ?t(?x(?z(u ? (t?x) ? v ? (t?z) ? f4)))
  • f3 ?u (?v(x ? u ? z ? v ? f2))
  • f4 f1? f3
  • Where ?x(f) fx ?true?fx ?false thus ?x(x?y)
    true and ?x(x?y) y

36
The alternative tactic
  • The standard tactic is to apply iteration
  • Søndergaard found that the system can be solved
    symbolically (like a quadratic)
  • This would be very useful for infinite domains
    for improved precision and predictability

f1 false x ? (y?z) x ? (y?z) x ? (y?z) x ? (y?z)
f2 false false false v ? (y?u) (u?y) ? v
f3 false false false false (x?y) ? z
f4 false false x ? (y?z) x ? (y?z) (x?y) ? z
37
Combining analyses
  • Verifiers and optimisers are often multi-pass,
    built from several separate analyses
  • Should the analyses be performed in parallel or
    in sequence?
  • Analyses can interact to improve one another
    (problem is in the complexity of the interaction
    Pratt)

38
Pruning combined domains
2 ?x, y?
y b
x f(y, z)
1 ?x, true?
4 ?x, y?z?
5 ?x,x, y, x, z,y, z, (x?(y?z))? (y?z)?
z c
3 ?x, z?
39
Pruning combined domains
  • Suppose that ?1? D1?C and ?2?D2?C, then how is
    DD1?D2 interpreted?
  • Then ?d1,d2??c iff d1?1c ? d2?2c
  • Ideally, many ?d1,d2??D will be redundant, that
    is, ??c?C . c?1d1?c?2d2

40
Time versus precision from TOPLAS
17(1)28--44,1993
Time Precision
Share ASub Share?ASub Share ASub Share?ASub
serialise 9290 839 1870 235 35 35
init-subst 569 1250 829 5 72 5
map-color 4600 1040 5760 76 74 73
grammar 170 140 269 11 11 11
browse 51860 1609 49580 196 104 104
bid 1129 1000 1429 11 0 0
deriv 2819 2630 3550 0 0 0
rdtok 5670 4450 6389 185 48 48
read 8790 8380 11069 11 1 1
boyer 11040 3949 7709 242 93 93
peephole 20760 7990 23029 386 310 310
ann 93509 16789 53269 1935 1690 1690
41
The Galois framework
  • Abstract interpretation is classically presented
    in terms of Galois connections

42
Lattices a prelude to Galois connections
  • Suppose ?S, ?? is a poset
  • A mapping ?S?S?S is a join (least upper bound)
    iff
  • a?b is an upper bound of a and b, that is, a?a?b
    and b?a?b for all a,b?S
  • a?b is the least upper bound, that is, if c?S is
    an upper bound of a and b, then a?b?c
  • The definition of the meet ?S?S?S (the greatest
    lower bound) is analogous

43
Complete lattices
  • A lattice ?S, ?, ?, ?? is a poset ?S, ?? equipped
    with a join ? and a meet ?
  • The join concept can often be lifted to sets by
    defining ??(S)?S iff
  • t?(?T) for all T?S and for all t?T
  • if t?s for all t?T then (?T)?s
  • If meet can often be lifted analogously, then the
    lattice is complete
  • A lattice that contains a finite number of
    elements is always complete

44
A lattice that is not complete
  • A hyperplane in 2-d space in a line and in 3-d
    space is a plane
  • A hyperplane in Rn is any space that can be
    defined by x?Rn c1x1cnxn c where
    c1,,cn,c?R
  • A halfspace in Rn is any space that can be
    defined by x?Rn c1x1cnxn ? c
  • A polyhedron is the intersection of a finite
    number of half-spaces

45
Examples and non-examples in planar space
46
Join for polyhedra
  • Join of polyhedra P1 and P2 in Rn coincides (with
    the topological closure) of the convex hull of
    P1?P2

47
The join of an infinite set of polyhedra
  • Consider the following infinite chain of regular
    polyhedra
  • The only space that contains all these polyhedra
    is a circle yet this is not polyhedral

48
Galois connection example (2 complete lattices
)
  • The concrete domain ?C,?C,?C,?C? is ??(Z),?,?,??
  • The abstract domain ?A,?A,?A,?A? where
  • A ?,,-,T
  • ? ?A a ?AT for all a?A
  • join ?A and meet ?A are defined by

?A ? - T
? ? - T
T T
- - T - T
T T T T T
?A ? - T
? ? ? ? ?
? ? T
- ? ? - T
T ? - T
49
concretisation mapping
  • The concretisation mapping ?A?C is defined
  • ?(?) Ø
  • ?() n?Z n gt 0
  • ?(-) n?Z n lt 0
  • ?(T) Z
  • Concretisation spells out how to interpret the
    symbols in the abstract domain
  • Observe that ?(?)??()??(T) and more generally ?
    is required to be order-preserving
  • If a1 ?A a2 then ?(a1) ?C ?(a2)

50
an abstraction mapping
  • Since 1,2??() and 1,2??(T) either or T can
    represent 1,2.
  • Thus need a mechanism to map a set to the best
    abstract object that represents it
  • The abstraction mapping ?C?A is defined
  • ?(S) ? if S Ø
  • ?(S) else if n gt 0 for all n?S
  • ?(S) - else if n lt 0 for all n?S
  • ?(S) T otherwise
  • Require ? to be monotonic, that is, if c1 ?C c2
    then ?(c1) ?A ?(c2)

51
? can be defined from ? (and vice versa)
  • Observe ?(S) ?Aa?A S ? ?(a)
  • As an example consider ?(1,2)
  • 1,2 ? ?(T) ?
  • 1,2 ? ?() ?
  • 1,2 ? ?(-) ?
  • 1,2 ? ?(?) ?
  • Therefore ?(1,2) ?A, T
  • Dually ?(a) ?S?Z ?(S) ?A a

52
? requires A to be complete (dually for ? and C)
  • Since ?(S) ?Aa?A S ? ?(a), meet needs to be
    defined over possibly infinite subsets of A
  • Observe that ? ?(R2)?A cannot be defined for A
    set of planar polyhedra
  • Consider c ?x, y??R2 x2 y2 ? 1
  • But ?Aa1, a2, a3, is not defined

c
a1
a2
a3
53
?A, ?, C, ?? is Galois connection whenever
  • ?A, ?A? and ?C, ?C? are complete lattices
  • The mappings ?C?A and ?A?C are monotonic, that
    is,
  • If c1 ?C c2 then ?(c1) ?A ?(c2)
  • If a1 ?A a2 then ?(a1) ?C ?(a2)
  • The compositions ???A?A and ???C?C are
    extensive and reductive respectively, that is,
  • c ?C (???)(c) for all c?C
  • (???)(a) ?A a for all a?A

54
c ?C (???)(c) is a statement about safe
abstractions
  • If c lt c for some c?C then working in abstract
    setting has compromised precision
  • If c lt c for some c?C then working in abstract
    setting has compromised correctness
  • Bar (???)(c) ltC c for every c?C
  • Thus stipulate c ?C (???)(c) for all c?C to
    guarantee safety

?
c
a
c
?
55
(???)(a) ?A a is a statement about best
abstractions
  • Recall that ?(a) spells out what a?A represents
  • Thus a is one way to describe ?(a) T is another
    way to describe ?(a) but a is better since a ?A T
  • Desire ?(?(a)) to be the best way to describe
    ?(a)
  • Therefore require ?(?(a)) ?A a

56
Collecting domains and semantics
  • Observe that C is not that concrete programs
    include operations such as Z?Z?Z
  • C?(Z) is collecting domain which is easier to
    abstract than Z since it already a lattice
  • To abstract Z?Z?Z, say, we synthesise a
    collecting version C?(Z)??(Z)??(Z) and then
    abstract that
  • Put S1 C S2 n1n2 n1? S1 and n2 ? S2

57
Safety and optimality requirements
  • The most precise (optimal) way to define AA?A?A
    is to define a1 A a2 ?(?(a1)C?(a2))
  • Not practical since ?(a1) and ?(a2) are infinite
  • Handcraft computable AA?A?A with a1 A a2 ?A
    a1 A a2 for all a1,a2?A
  • Merely need to assert ?(?(a1)C?(a2)) ?A a1 A
    a2 for all a1,a2?A for correctness

58
Abstract multiplication
  • Consider ?(?()C?()) and A
  • Recall ?() n?Z n gt 0, hence ?()C?()
    n1n2 n1 gt 0 and n2 gt 0 n n gt 0
  • Hence ?(?()C?()) A
  • Since ?(?()C?()) ?A A safety follows for
    this case
  • Since A ?(?()C?()) optimality follows
    for this case

A ? - T
? ? ? ? ?
? - T
- ? - T
T ? T T T
59
Exotic applications of abstract interpretation
  • Recovering programmer intentions for
    understanding undocumented or third-party code
  • Verifying that a buffer-over cannot occur, or
    pin-pointing where one might occur in a C program
  • Inferring the environment in which is a system of
    synchronising agents will not deadlock
  • Lower-bound time-complexity analysis for
    granularity throttling
  • Binding-time analysis for inferring off-line
    unfolding decisions which avoid code-bloat
Write a Comment
User Comments (0)
About PowerShow.com