SATURN: An Overview - PowerPoint PPT Presentation

About This Presentation
Title:

SATURN: An Overview

Description:

Most of the inbuilt analysis provided by Saturn is written in CALYPSO itself ... Close to one million code size took 4 hours of analysis for memory leak checker ... – PowerPoint PPT presentation

Number of Views:56
Avg rating:3.0/5.0
Slides: 33
Provided by: fmindi
Category:

less

Transcript and Presenter's Notes

Title: SATURN: An Overview


1
SATURN An Overview
  • Shrawan Kumar
  • Shrawan.kumar_at_tcs.com

2
Topics
  • What is SATURN?
  • SATURN Framework
  • SATURN working and its Spec Language
  • Examples of Analysis through SATURN
  • Discussion on Scalability and Precision

3
Motivating Example
  • void my_func(int i, int j, int k, int b)
  • int b
  • a ij
  • if (bgta)
  • if (agt0)
  • x k / b // Is there a division by zero?

4
What is SATURN?
  • SATisfiability-based failURe aNalysis
  • Combines static program analysis and model
    checking
  • It is an error detection framework, not a
    verification framework !!
  • intra procedurally path sensitive
  • Supports summary based approach for
    inter-procedural analysis
  • Stores all information in BDB (Berkley data
    base) files
  • Has a very rich (but low level) specification
    language for analysis specification
  • Makes use of SAT solvers MiniSAT and zChaff

5
What it offers
  • A model of program
  • A rule based specification language to express
    analysis
  • Facility to form first order logic formulae in
    terms of program variables
  • Checking of Satisfiability of first order logic
    formulae
  • Getting a set of assignment to variables leading
    to satisfiability

6
Base Program Model
  • Program IR
  • As a combination of AST and CFG
  • Basic information maintained at each program
    point
  • A guard as a first order logic formulae
  • Memory locations pointed by a pointer variable
  • Value of every integral data item in symbolic
    form

7
Program Representation Model
  • AST of program
  • Containing information about each
  • Function
  • Variable (local, global, parameter)
  • Struct and Field
  • User defined Type
  • Expression
  • Statement
  • All structural information e.g. Parent-child
    relationship
  • Entry / exit points of a function
  • CFG of program
  • Maintained for each function
  • Edges represent computation
  • Nodes represent program points
  • Relationship with AST
  • A relationship is maintained among AST element,
    and program points before and after it

8
Model representation building blocks
  • Integral variables are represented as n bit
    signed or unsigned integer
  • n and signed-ness are determined from variable
    type
  • Every bit in representation is modeled as a
    boolean expression
  • A pair of mappings, called Environment, is
    associated with each program point as follows
  • VARS ? VALUES
  • PTRS ? 2GLS
  • Where GLS is GUARD-SET X 2LOC-SET
  • A guard G, associated with a program point P has
    following meaning
  • Control may reach P only if guard G may hold
  • If a pointer Q maps to GLS1 and ltG, LOCSET1gt
    belongs to GLS1 then
  • Q may point to any of locations of LOCSET1
    provided G holds

9
Example
  • main()
  • signed char i, p
  • unsigned char j,k
  • //P1
  • If (i lt 10)
  • //P2
  • j10 p i
  • //P3
  • Else
  • //P4
  • j20 p k
  • //P5
  • //P6
  • kj
  • //P7

10
Example
  • main()
  • signed char i, p
  • unsigned char j,k
  • //P1
  • If (i lt 10)
  • //P2
  • i20 p i
  • //P3
  • Else
  • //P4
  • j20 p k
  • //P5
  • //P6
  • kj
  • //P7

p1
ilt10
! ilt10
P4
P2
j20
i20
Pi
Pk
p3
p5
p6
Kj
p7
11
Example
  • Guards
  • P1 true
  • P2, P3 (i lt 10)
  • P4, P5 (! (i lt 10) )
  • P6, P7 true
  • Environment
  • P1,P2,P4 lt i? U, j-gtU, k? Ugt
  • ltp ? gt
  • P3 lt i? 00010100u , j?U, k?Ugt
  • ltp ? lttrue, igt
  • P5 lt j? 00010100u , i?U, k?Ugt
  • ltp ? lttrue, kgt
  • P6 lt j? AAABABAAu , i? CCCDCDCCu k?Ugt
  • ltp ? lt(ilt10), igt, lt!(ilt10), kgt
  • P7 lt j,k? AAABABAAu , i? CCCDCDCCu gt
  • ltp ? lt(ilt10), igt, lt!(ilt10), kgt
  • Where
  • A is (ilt10) AND U
  • B is (!ilt10) or U

p1
ilt10
! ilt10
P4
P2
j20
i20
Pi
Pk
p3
p5
p6
Kj
p7
12
Memory location Modeling
  • Every memory location is represented by a
    location trace
  • A location trace is made up from
  • Root Variable
  • Global Variable
  • Local Variable
  • Formal parameter
  • Return value of a function
  • Field access
  • De-referencing
  • There are ways to get parts of a location trace
    and compose them

13
Information representation
  • As a set of facts, which are instances of
    parameterised predicates
  • Example
  • g_guard(G, P) is a parameterized predicate where
    G is Guard and P is Program-Point
  • To be interpreted as Guard at program point P
    is G
  • For a given program, there will be multiple
    instances, one for each program point, of this
    predicate
  • For every such instance, a fact will be stored
  • Facts are stored in a Berlkley database (BDB) for
    efficient storage/retrieval
  • All the information about program model is
    stored as set of such facts in one or more
    databases
  • Information from many built-in analyses is also
    stored

14
Saturn Tool chain
C Program
C Front end
Analysis Specs (CLP)
Constraint solvers
IR data base
CLP interpreter
Summary/Error reports
Summary databases
15
Analysis specification
  • Analysis is done over a database of facts
  • During analysis, more facts may get added to
    database
  • Every Analysis specification is a set of rules
  • Each rule is a list of goals goal1, goal2,
    ,goaln where last goal must cause addition of
    some information in data base
  • A basic goal is of the form pred_name(arg1,
    arg2, argn)
  • Each arg may be bound to some value or it may be
    a free variable
  • Rules are checked for their success / failure
  • Checking of a rule proceeds from left to right
    till goals continue to succeed

16
Example
  • predicate num(Nint)
  • num(1), num(2), num(3), num(4),
  • num(5), num(6), num(7), num(8),
  • num(9), num(10), num(11), num(12),
  • num(13), num(14), num(15), num(16),
  • num(17), num(18), num(19), num(20).

17
Example
  • predicate num(Nint)
  • num(1), num(2), num(3), num(4),
  • num(5), num(6), num(7), num(8),
  • num(9), num(10), num(11), num(12),
  • num(13), num(14), num(15), num(16),
  • num(17), num(18), num(19), num(20).
  • predicate multiple(Aint, Bint, Cint).
  • num(X), num(Y), num(Z), ZXY, X\1, Y\1,
  • multiple(Z, X, Y).

18
Example
  • predicate num(Nint)
  • num(1), num(2), num(3), num(4),
  • num(5), num(6), num(7), num(8),
  • num(9), num(10), num(11), num(12),
  • num(13), num(14), num(15), num(16),
  • num(17), num(18), num(19), num(20).
  • predicate multiple(Aint, Bint, Cint).
  • num(X), num(Y), num(Z), ZXY, X\1, Y\1,
  • multiple(Z, X, Y).
  • predicate square(Pint).
  • multiple(Z,X,Y), XY, square(Z).
  • predicate prime(Pint).
  • Num(X), multiple(X,_,_), prime(X)

19
Saturn Spec Language
  • Saturn provides a specification language,
    CALYPSO, to express the analysis
  • It is rule based and in some way similar to
    prolog
  • Most of the inbuilt analysis provided by Saturn
    is written in CALYPSO itself
  • Parameterised Predicates are basic abstraction
    unit
  • Type of parameters supported are
  • Primitive types boolean, int, float and string
    are the primitive types available
  • listT is available as a type representing list
    of values of type T
  • The IR object types are available as built-in
    types
  • Vector of bits, program point, location trace are
    some other examples of built-in types
  • Addition of user types allowed
  • Can be defined as enumerated type, aggregate type
    and composition of these and other primitive
    types

20
Predicate Fact
  • A Predicate denotes type of a fact
  • Declared as Pred_name(arg1type1, arg2type2,
    arg3type3, , argntypen)
  • Every predicate is given a meaning and used with
    that meaning consistently
  • Example predicate reaches(FNstring, Ppp,
    TRt_trace, Ac_instr, Gg_guard).
  • In function FN, definition of variable with trace
    location TR assigned through statement A is in
    effect, at program point P, if G holds
  • Fact is an instance of a predicate
  • Many instances(Facts) , with different argument
    values, of same predicate may exist in data base

21
Goal
  • A goal is used to
  • Query existence of matching facts for a predicate
  • To check if a boolean expression (guard) is
    satisfiable
  • To add a new fact for a predicate
  • A goal succeeds or fails
  • A basic goal is in form of
  • Pred_name(arg1, arg2, arg3, , argn)
  • Pred_name(arg1, arg2, arg3, , argn)
  • Goals can be composed through negation,
    disjunctions and conjunctions to get new goals
  • Basic goal satisfaction
  • When a goal is used to add a fact, it always
    succeeds
  • By matching facts from database e.g. guard(P, G)
  • Free arguments are bound with corresponding
    actual value of matching fact stored in DB
  • By invoking constraint solver e.g. guard_sat(G)

22
Rule
  • A rule consists of a goal.
  • An analysis spec consists of multiple rules
  • Every rule is checked independently
  • A rule checking involves testing the success or
    failure of its goal
  • A goal consisting of conjunction of sub-goals is
    evaluated from left to right. Goal succeeds, if
    all sub-goals succeed.
  • A disjunction of sub-goals succeeds if any of the
    sub-goal succeeds
  • A rule is checked repeatedly, till new
    combination of values for free variable of any
    predicate is found
  • A set of rules is checked repeatedly till no more
    facts are added

23
Rule - example
  • predicate preaches(FNstring, Ppp, TRt_trace,
    Ac_instr,
  • Gg_guard).
  • predicate reaches(FNstring, Ppp, TRt_trace,
    Ac_instr,
  • Gg_guard).
  • cil_curfn(F), iset(PE, PX, ASN), guard(PE, GE),
  • cil_instr_set(ASN, LHS, _),
  • lval(PE, LHS, TR, GL),
  • and(GE, GL, FG),
  • guard_sat(FG),
  • preaches(F, PX, TR, ASN, FG).

24
Analysis control
  • Top down or bottom up traversal
  • How to handle loops
  • Three options
  • Keeping loops as they are
  • Analyses which work on acyclic CFG, will not work
  • Converting them into a condition statement (no
    looping)
  • Will be unsafe but will be fast to analyse
  • Converting them into a tail recursive function
  • Will be safe but some analysis may not terminate
  • Setting priority of different analyses (to
    improve efficiency)

25
Concept of session
  • The facts created are stored in databases
    identified by session id
  • The session (database) may be partitioned through
    parameters coming from IR entities
  • Facts added in inactive part of current session
    are not available in current analysis cycle
  • Therefore analysis can be staged with facts added
    in each stage going into new database (session)
  • Facts in one session can be queried/added from
    other sessions analysis by explicit
    qualification
  • When facts for a predicate being added or not
    queried in same analysis, it is better to add
    then in a separate session
  • Useful for inter-procedural analysis and staged
    analysis

26
Example Analysis Identifying Recursive
Functions
  • import "/usr/local/clpa/analysis/base/cilbase.clp"
    .
  • predicate calls(Fstring, G string).
  • analyze session_name(cil_body).
  • session callee_caller() containing calls.
  • cil_curfn(F), dircall(_, CN),
  • callee_caller()-gtcalls(F, CN).
  • predicate calls(Fstring, G string).
  • session callee_caller() contains calls
  • analyze session_name(callee_caller).
  • calls(F,G), calls(G, H), calls(F,H).
  • calls(F,F), recursive(F).

27
Inter-procedural analysis
  • Suitable summary information is conceptualised
  • Summary computed for each function
  • At its exit / entry
  • At the call site
  • Use summary information of callee to get
    appropriate facts in caller
  • Use summary information at call site to get
    initial information at function entry

28
Example Reaching definitions
  • Intra-procedural analysis
  • compute definitions reaching at a point
  • While doing so, for function calls use summary
    information
  • Inter-procedural Summary at function-exit
  • What definitions are reaching unconditionally
  • What definitions are reaching conditionally

29
Saturn Scalability, precision, soundness
  • Intra procedural Path sensitive
  • Inter-procedural (Summary based)
  • Use of BDBs
  • Inter-procedural results may be less precise than
    intra-procedural

30
Scalability
  • Already tried on Linux Kernel which is few
    million lines of Code
  • Close to one million code size took 4 hours of
    analysis for memory leak checker
  • On Linux Kernel (4.8 MLOC) took 19 hours of
    analysis time for semaphore lock checking
  • Limit on maximum time, which can be spent while
    analysing a single function, can be set.
  • Allows for partial analysis of complex functions.
    It may be unsound but it will come out with some
    results

31
References
  • Saturn A Scalable Framework for Error Detection
    using Boolean Satisfiability
  • Yichen Xie and Alex Aiken
  • http//saturn.stanford.edu

32
  • Thank You
Write a Comment
User Comments (0)
About PowerShow.com