Flow-Insensitive Points-to Analysis with Term and Set Constraints - PowerPoint PPT Presentation

About This Presentation
Title:

Flow-Insensitive Points-to Analysis with Term and Set Constraints

Description:

Foster claims these systems are nearly identical, and may actually be combined ... purpose constraint symbol (subset-iota) yields the two different algorithms. ... – PowerPoint PPT presentation

Number of Views:31
Avg rating:3.0/5.0
Slides: 28
Provided by: stude1299
Category:

less

Transcript and Presenter's Notes

Title: Flow-Insensitive Points-to Analysis with Term and Set Constraints


1
Flow-Insensitive Points-to Analysis with Term and
Set Constraints
  • Presentation by
  • Kaleem
  • Travis
  • Patrick

2
Two methods
  • Andersen vs Steensgaard
  • Foster claims these systems are nearly identical,
    and may actually be combined in their
    implementation.

3
  • Andersen
  • For an assignment e1 e2 anything in the
    points-to set for e2 must also be in the
    points-to set for e1.
  • Steensgaard
  • For an assignment e1 e2 the points-to set for
    e2 must be equal to the points-to set for e1.

4
Fosters Framework
  • Foster's type systems are designed using Term and
    Set constraints
  • Set constraints define inclusion relationships
    between types we use set constraints to describe
    Andersen's analysis.
  • Term constraints define equality relationships
    between types we use term equations to describe
    Steensgaard's analysis.

5
Whats so important about their similarity?
  • The main difference between the Steensgaard and
    Andersen is Steensgaard uses term constraints as
    opposed set constraints.Term constraints describe
    equality. Set constraints describe inclusion
  • By carefully defining our inference rules for
    both methods, the implementation is vastly
    simplified. This is because both methods will be
    combined into one set of inference rules. The
    difference in set constraints is minimal in the
    implementation.

6
Steensgaard
7
  • Const - Int S _ is a wildcard - a fresh,
    unconstrained variable

Var S variables are elevated to references for
simplicity
8
Addr S e points to e
  • Deref S if e is a reference to a then e is of
    type a

9
  • Asst S unifies the equivalence classes for the
    points-to sets of e1 and e2
  • In other words, if e1 is of type t1 and e2 is of
    type t2 then e1 e2 is of type t2
  • This is where Steensgaard uses his time-saving,
    conservative merging.

10
Andersen
11
  • Const - Int A assigns the empty set for
    integers.
  • Foster uses 0 instead of bottom
  • 0 stands for the least set

Var A lifts regular variables to a pointer type
for simplicity, as with Steensgaard. But we now
have to take into account covariance/contravarianc
e.
12
  • Addr A e points to e

Deref A e is an upper bound on the type of
whatever e points to. In other words, this is
nearly the inverse of Addr A.
13
  • Asst A illustrates the difference between
    Andersen and Steensgaard - in the assignment
    e1e2, e1 could potentially point to anything e2
    can, so the type of the expression is the type of
    e2

14
Constructor Signatures
  • The constructor signatures (section 3) merely
    describe a key difference between the two
    algorithms.
  • Set constraints describe Andersen's analysis.
  • Term constraints describe Steensgaard's analysis.
  • This difference must also be handled when
    combining both algorithms

15
Combining And/Ste
  • Foster combines the type languages for And and
    Ste by redefining their constructor signatures to
    yield a reference with two p fields and a tag
    field ref (pget, pset, t) (page 11)
  • For Andersen analysis, the Pget fields are
    covariant, the Pset fields are contravariant, and
    the t (tag) field is ignored.
  • For Steensgaard analysis, all the subfields are
    Term fields, and we can assure that PgetPset.

16
After redefining the signatures for constructors,
Foster combines AndCommon with SteCommon to
arrive at the final set of inference rules, named
Comb At this point, we no longer need to worry
about separate And and Ste inference rules.
CombCommon represents both at once. This vastly
simplifies the implementation of both algorithms.
17
How does Comb work?
  • The only difference between Comb and And/Ste is
    the use of the tag field t and the definition of
    a general-purpose symbol for the constraints.
  • First, the tag t is shown in SteCommon. It is
    used to identify equivalence classes. AndCommon
    deals with inclusion rather than equivalence, so
    Combs tag field is simply ignored when we wish
    to use it for Andersen-style results.

18
How does Comb work?
  • Second, changing the interpretation of the
    general-purpose constraint symbol (subset-iota)
    yields the two different algorithms.
  • If it is used as a subset constraint, the rules
    compute Andersen's analysis.
  • Steensgaard instead treats this constraint as
    conditional unification. Also, PgetPset, because
    the distinction is not used in SteCommon

19
Implementation
  • There are 3 major problems with using C for the
    implementation.

20
Problem 1
  • We must determine how library functions affect
    the points-to graph without looking at their
    source.
  • First, assume that most undefined functions have
    no effect on the analysis.
  • Second, for those functions that do have an
    effect (such as strcpy(char s1, char s2), we
    write a false stub of the function that provides
    enough information to the analysis to determine
    how the real function behaves.

21
Problem 2
  • Some functions can take a variable number of
    arguments.
  • For the most part, C implementations of varargs
    do not affect the points-to set.
  • But some implementations accomplish varargs by
    treating the first argument as a pointer to any
    subsequent arguments.
  • None of these algorithms handle this correctly.
    Foster manually modified the vararg functions to
    take a fixed number of arguments

22
Problem 3
  • When a multidimensional array is allocated, C
    actually uses a contiguous block of memory.
  • So if b is two-dimensional and a is
    one-dimensional, the statement
  • b (int) a
  • results in b00 being an alias to a0.
    Dealing with this added complexity involves
    determining the C types for each expression,
    adding more overhead to the existing algorithms.

23
(No Transcript)
24
(No Transcript)
25
(No Transcript)
26
(No Transcript)
27
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com