Title: Basic abstract interpretation theory
1Basic abstract interpretation theory
2The general idea
- a semantics
- any definition style, from a denotational
definition to a detailed interpreter - assigning meanings to programs on a suitable
concrete domain (concrete computations domain) - an abstract domain modeling some properties of
concrete computations and forgetting about the
remaining information (abstract computations
domain) - we derive an abstract semantics, which allows us
to execute the program on the abstract domain
to compute its abstract meaning, i.e., the
modeled property
3Concrete and Abstract Domains
- two complete partial orders
- the partial orders reflect precision
- smaller is better
- concrete domain (P(C), ?, ?, C, ?, ?)
- has the structure of a powerset
- we will see later why
- abstract domain (A, ?, bottom, top, lub, glb)
- each abstract value is a description of a set
of concrete values
4The Sign Abstract Domain
- concrete domain (P(Z), ?, ?, C, ?, ? )
- sets of integers
- abstract domain (Sign, ?, bot, top, lub, glb)
5Concretization
- concrete domain (P(C), ?, ?, C, ?, ?)
- abstract domain (A, ?, bottom, top, lub, glb)
- the meaning of abstract values is defined by a
concretization function - ? A ? P(C)
- ?a? A, ?(a) is the set of concrete computations
described by a - thats why the concrete domain needs to be a
powerset - the concretization function must be monotonic
- ?a1,a2 ? A, a1 ? a2 implies ?(a1) ? ?(a2)
- concretization preserves relative precision
6Abstraction
- concrete domain (P(C), ?, ?, C, ?, ?)
- abstract domain (A, ?, bottom, top, lub, glb)
- every element of P(C) should have a unique best
(most precise) description in A - this is possible if and only if A is a Moore
family - closed under glb
- in such a case, we can define an abstraction
function - a P(C) ? A
- ?c? P(C), a(c) is the best abstract description
of c - the abstraction function must be monotonic
- ?c1,c2 ? P(C), c1 ? c2 implies a(c1) ? a(c2)
- abstraction preserves relative precision
7The example of Sign
- ?Sign (x)
- ?, if x bot
- yygt0, if x
- yy?0, if x 0
- 0, if x 0
- yy?0, if x 0-
- yylt0, if x -
- Z, if x top
- ?Sign (y) glb of
- bot , if y ?
- - , if y ? yylt0
- 0- , if y ? yy?0
- 0 , if y 0
- 0 , if y ? yy ? 0
- , if y ? yygt0
- top , if y ? Z
8Galois connection
- (P(C), ?, ?, C, ?, ? )
- (A, ?, bottom, top, lub, glb)
- ? A ? P(C) (concretization)
- a P(C) ? A (abstraction)
- ? , ? monotonic
- there may be loss of information (approximation)
in describing an element of P(C) by an element of
A
- Galois connection (insertion)
- ?c? P(C). c ? ?(?(c))
- ?a? A. ?(?(a)) ? a (?a? A. ?(?(a)) a)
- ? , ? mutually determine each other
9Concrete semantics
- the concrete semantics is defined as the least or
(greatest) fixpont of a concrete semantic
evaluation function F defined on the domain C - this does not necessarily mean that the semantic
definition style is denotational! - F is defined in terms of primitive semantic
operations fi on C - the abstract semantic evaluation function is
obtained by replacing in F each concrete
operation fi by a suitable abstract operation - however, since the actual concrete domain is
P(C), we need first to lift the concrete
semantics lfp F to a collecting semantics defined
on P(C)
10Collecting semantics
- lifting lfp F to the powerset (to get the
collecting semantics) is simply a conceptual
operation - collecting semantics lfp F
- we dont need to define a brand new collecting
semantic evaluation function on P(C) - we just need to reason in terms of liftings of
all the primitive operations (and of F), while
designing the abstract operations and
establishing their properties - in the following, by abuse of notation, we will
use the same notation for the standard and the
collecting (conceptually lifted) operations
11Abstract operations local correctness
- an abstract operator fi? defined on A is locally
correct wrt a concrete operator fi if - ?x1,..,xn ? P(C).
- fi (x1,..,xn) ? ?(fi? (?(x1),..,?(xn)))
- the concrete computation step is more precise
than the concretization of the corresponding
abstract computation step - a very weak requirement, which is satisfied, for
example, by an abstract operator which always
computes the worst abstract value top - the real issue in the design of abstract
operations is therefore precision
12Abstract operations optimality and completeness
- correctness
- ?x1,..,xn ? P(C).
- fi (x1,..,xn) ? ?(fi? (?(x1),..,?(xn)))
- optimality
- ?y1,..,yn ? A .
- fi? (y1,..,yn) a(fi (g(y1),..,g(yn)))
- the most precise abstract operator fi? correct
wrt fi - a theoretical bound and basis for the design,
rather then an implementable definition - completeness (exactness or absolute precision)
- ?x1,..,xn ? P(C).
- a(fi (x1,..,xn)) fi? (?(x1),..,?(xn))
- no loss of information, the abstraction of the
concrete computation step is exactly the same as
the result of the corresponding abstract
computation step
13Abstract operations on Sign TimesSign
14Abstract operations on Sign PlusSign
15The Sign example
- Times and Plus are the usual operations lifted to
P(Z) - both TimesSign and PlusSign are optimal (hence
correct) - TimesSign is also complete (no approximation)
- PlusSign is necessarily incomplete
- ?Sign(Times(2,-3))
- TimesSign(?Sign(2),?Sign(-3))
- ?Sign(Plus(2,-3)) ?
- PlusSign(?Sign(2),?Sign(-3))
16From local to global correctness
- the composition of locally correct abstract
operations is locally correct wrt the composition
of concrete operations - composition does not preserve optimality, i.e.,
the composition of optimal operators may be less
precise than the optimal abstract version of the
composition - if we obtain F? (abstract semantic evaluation
function) by replacing in F every concrete
semantic operation by a corresponding (locally
correct) abstract operation, the local
correctness property still holds - ?x ? P(C). F (x) ? ?(F? (?(x)))
- local correctness implies global correctness,
i.e., correctness of the abstract semantics wrt
the concrete one - lfp F ? ?(lfp F? ) gfp F ? ?(gfp F? )
- a(lfp F ) ? lfp F? a(gfp F ) ? gfp F?
- the abstraction of the concrete semantics is more
precise than the abstract semantics
17a (lfp F ) ? lfp F? why computing lfp F? ?
- lfp F cannot be computed in finitely many steps
- ? steps are in general required
- lfp F? can be computed in finitely many steps, if
the abstract domain is finite or at least
noetherian - does not contain infinite increasing chains
- interesting for static program analysis, where
the fixpoint computation must terminate - most program properties considered in static
analysis are undecidable - we accept a loss of precision (safe
approximation) in order to make the analysis
feasible
18Where does the approximation come from?
- incomplete abstract operators
- more execution paths in the abstract semantics
- the abstract state has no information to allow
deterministic choices - conditionals, pattern matching, etc.
- the set of resulting abstract states is
transformed into a single abstract state by an
abstract lub operation
19Approximation in abstract Sign computations
- concrete state x3
- if xgt2 then y3 else y-5
- concrete state x3, y3
- abstract state x
- if xgt2 then y3 else y-5
- the abstract guard can be both true and false
- we need to abstractly execute both paths
- the resulting abstract states are merged by
performing a lub on Sign - abstract state x,ytop
20Approximation in type analysis
- the following ML expression is not typed by the
MLs type inference algorithm, because it always
performs a lub operation in the conditional - if true then 3 else true
- even when the guard is valid or unsatisfiable in
the abstract state
21Applications of Abstract Interpretation
- comparative semantics
- a technique to reason about semantics at
different level of abstraction - non-noetherian abstract domain
- abstraction without approximation (completeness)
- ? (lfp F) lfp F?
- static analysis effective computation of the
abstract semantics - if the abstract domain is noetherian and the
abstract operations are computationally feasible - if the abstract domain is non-noetherian or if
the fixpoint computation is too complex - use widening operators
- which effectively compute an (upper)
approximation of lfp F? - one example later
22The abstract interpretation framework
- (P(C), ?, ?, C, ?, ? ) (concrete domain)
- (A, ?, bottom, top, lub, glb) (abstract domain)
- ? A ? P(C) monotonic (concretization function)
- a P(C) ? A monotonic (abstraction function)
- ?x?P(C). x ? ?(?(x))
- ?y? A. ?(?(y)) ? y (Galois connection)
- ? fi fi? ?x1,..,xn ? P(C).
- fi (x1,..,xn) ? ?(fi? (?(x1),..,?(xn))) (local
correctness) - critical choices
- the abstract domain to model the property
- the (possibly optimal) correct abstract operations
23Other approaches and extensions
- there exist weaker versions of abstract
interpretation - without Galois connections (e.g., concretization
function only) - based on approximation operators (widening,
narrowing) - without explicit abstract domain (closure
operators) - the theory provides also several results on
abstract domain design - how to combine domains
- how to improve the precision of a domain
- how to transform an abstract domain into a
complete one - ...
- we will look at some of these results in the last
lecture