Murali Krishna Ramanathan - PowerPoint PPT Presentation

About This Presentation
Title:

Murali Krishna Ramanathan

Description:

... Invariants. 2. Motivation. Expert Programmer. How do ... Transparent no programmer annotations. 6. Kinds of specifications. Control-flow preconditions ... – PowerPoint PPT presentation

Number of Views:137
Avg rating:3.0/5.0
Slides: 44
Provided by: mmur5
Category:

less

Transcript and Presenter's Notes

Title: Murali Krishna Ramanathan


1
Static Path-Aware Analysis of Program Invariants
  • Murali Krishna Ramanathan
  • Department of Computer Science
  • Purdue University
  • (joint work with Suresh Jagannathan and Ananth
    Grama)

2
Motivation
Undocumented Program
Expert Programmer
New Programmer
BUGS
Tester
3
Context
  • What is a program invariant?
  • Property that must hold across all program
    executions
  • What is a failure?
  • Program run does not satisfy an expected
    invariant
  • System crashes
  • Logical bugs
  • Performance bugs
  • What is a specification?
  • Documentation of intended program invariants
  • e.g., lock must be followed by unlock
  • Unavailable or imprecise

4
Issues
  • Deriving specifications
  • Where do we start?
  • Absence of formal documentation
  • Legacy code
  • Identifying the source of failures
  • How do we search?
  • Exponential number of execution paths to explore
  • Representing common information among paths

5
Specification Inference
  • Challenges
  • What to look for?
  • Both relevant and irrelevant information present
    in the program source
  • How to be robust in the presence of bugs?
  • Assumptions
  • Programs are mostly well tested but can have bugs
  • Transparent no programmer annotations

6
Kinds of specifications
  • Control-flow preconditions
  • A call to fopen must always precede a call to
    fgets
  • Data-flow preconditions
  • The result of a call to socket must always be
    checked for error before a call to bind
  • Control-flow postconditions
  • A call to fopen is either followed by a call to
    fclose or error
  • Control-flow divergence preconditions
  • A call to read can be preceded either by a call
    to open or socket

7
Preconditions
fp fopen()
fp fopen() if(fp ! NULL) fgets(buf, SIZE,
fp)
fp ! null
fopen lt- fgets
  • Predicate
  • Captures properties associated with variables and
    procedure calls
  • Preconditions for procedure
  • Composed of predicates that need to hold always
    before every call to a procedure

8
Types of predicates
fp fopen() if(fp ! NULL) fgets(buf, SIZE,
fp)
  • Data-flow
  • captures data flow properties associated with
    variables
  • fp is assigned the return of fopen, fp is not
    null,
  • Control-flow
  • define precedence properties among procedures
  • fgets is preceded by fopen

9
Control-flow preconditions (ICSE 07)
  • 181 RI_FKey_check(PG_FUNCTION_ARGS)
  • 182
  • 199 ri_CheckTrigger(...)
  • 210 pk_rel heap_open(...)
  • 296 match_type ri_DetermineMatchType(...)
  • 303 ri_BuildQueryKeyFull(...)
  • 437

Check that RI trigger function was called
in expected context
Get the relation descriptors of the FK and PK
tables
Convert the MATCH TYPE string into a
switchable int
Build up a new hashtable key for a prepared
SPI Plan of a constraint trigger of MATCH
FULL
10
Control-flow preconditions
  • 181 RI_FKey_check(PG_FUNCTION_ARGS)
  • 182
  • 199 ri_CheckTrigger(...)
  • 210 pk_rel heap_open(...)
  • 212 if(TRIGGER_FIRED_BY_UPDATE(...))
  • ...
  • 218 else
  • ...
  • 231 if(!HeapTupleSatisfies(...))
  • ...
  • 296 match_type ri_DetermineMatchType(...)
  • 298 if(match_typeRI_MATCH_TYPE_PARTIAL)
  • 299 ereport(...)
  • 303 ri_BuildQueryKeyFull(...)
  • 437

11
Control-flow preconditions
  • 181 RI_FKey_check(PG_FUNCTION_ARGS)
  • 182
  • 199 ri_CheckTrigger(...)
  • 210 pk_rel heap_open(...)
  • 248 if (tgnargs 4)
  • 249
  • 250 ri_BuildQueryKeyFull(...)
  • 294
  • 437

ri_BuildQueryKeyFull not preceded by
ri_DetermineMatchType Leads to a potential crash
12
Static Specification Mining
  • To generate preconditions for a procedure
  • Generate predicates at each call-site of the
    procedure
  • Ideally common predicates across all the
    call-sites form the preconditions for the
    procedure
  • How to find common predicates?
  • Use mining techniques
  • Construct patterns built from alignments or
    permutations of predicate sets
  • Approximation Patterns appearing in programs
    denote preconditions

13
Approach
  • Analyze control-flow graph
  • Build precedence relation (a lt- b)
  • A binary relation between procedures a and b
  • A call to b is always preceded by call to a
  • Necessitates an inter-procedural analysis
  • Relations can cross procedure boundaries
  • Convergence requires fixpoint calculation
  • Procedure signatures
  • Frequent subsequence mining
  • Mine the chains formed by precedence relations

14
Path Exploration
Path-Sensitive Exploration q lt- p, q lt-
r lt- p
q
Path-Insensitive Exploration q , r lt- p
r
q
q
Path-Aware Exploration q lt- p
p
15
Precedes relation
q
q
r
t
q
q
q
exit
p
p
q lt- p
q lt- p
16
Inter-procedural Analysis
h() if(cond) lwrap() else
lwrap() uwrap()
lwrap () init()
uwrap () access()
17
Procedure Signatures
s
entry
s
q
u
t
r
q
q
s lt- t
s lt- q lt- p lt- t
Procedure signature for s q lt- p
ret
p
18
Mining sequences
  • Sequence mining
  • Input set of sequences (I)
  • Output sequences that occur frequently as
    subsequences in I
  • Use the Apriori-all algorithm Agrawal and
    Srikant, Mining Sequential Patterns, ICDE 95

19
Motivation for sequence mining
  • Control paths Invariant
  • a, b, c, e a lt- c lt-
    e
  • g, a, d, c, e
  • a, c, e
  • a, c, d, e, f
  • e, f, d, a (Faulty path, no call to a and c
    before e)
  • Intersection of these paths
  • e is preceded by nothing
  • Use mining to overcome brittleness of path
    intersection

20
Sequence Mining - Example
  • Input sequences Min Frequency 4/5
  • a, b, c, e
  • g, a, d, c, e
  • a, c, e
  • a, c, d, e, f
  • e, f, d, a
  • Input sequences Min Frequency 4/5
  • a, b, c, e
  • g, a, d, c, e
  • a, c, e
  • a, c, d, e, f
  • e, f, d, a
  • Input sequences Min Frequency 4/5
  • a, b, c, e
  • g, a, d, c, e
  • a, c, e
  • a, c, d, e, f
  • e, f, d, a
  • Input sequences Min Frequency 4/5
  • a, b, c, e
  • g, a, d, c, e
  • a, c, e
  • a, c, d, e, f
  • e, f, d, a

Maximal
21
Data-flow preconditions (PLDI 07)
  • Challenges
  • Data-flow predicates may be aliased
  • No anchors for data-flow predicates

if (x gt 0) f(x)
if (y gt 0) f(y)
x g() h(x) if(x gt 0) f(x)
22
Motivating Example
  • main()
  • for(ai options.listen_addrs)
  • listen_sock socket(ai-gtai_family,)
  • if(listen_sock lt 0) error()
  • if(num_listen_socks gt 16) error()
  • if((ret getnameinfo()))
  • if(setsockopt(listen_sock,) -1) error()
  • if(bind(listen_sock, ai-gtai_addr,) lt 0)
  • In a call to bind, the first parameter is always
    assigned the return value of a call to socket and
    is checked for error

23
Generate Predicates
  • main()
  • for(ai options.listen_addrs)
  • listen_sock socket(ai-gtai_family,)
  • if(listen_sock lt 0) error()
  • if(num_listen_socks gt 16) error()
  • if((ret getnameinfo()))
  • if(setsockopt(listen_sock,) -1) error()
  • if(bind(listen_sock, ai-gtai_addr,) lt 0)

listen_sock return(socket),
num_listen_socks (lt,16)
(param_1, bind)
ret return(getnameinfo)
(param_1, setsockopt),
(gt,0)
24
Another call-site
  • ssh_control_listener(void)
  • if(control_fd socket(PF_UNIX,) lt 0) error()
  • old_umask umask(0177)
  • if(bind(control_fd,(struct sockaddr )addr,))

control_fd return(socket),
old_umask return(umask)
(param_1, bind)
(gt,0)
25
Structural Similarity Problem
listen_sock return(socket),
num_listen_socks (lt,16)
(param_1, bind)
ret return(getnameinfo)
(param_1, setsockopt),
(gt,0)
old_umask return(umask)
control_fd return(socket),
(param_1, bind)
(gt, 0)
  • How to group the attribute sets that need to be
    mined together?
  • Find maximal matching of attribute sets
  • NP-hard
  • Use approximations based on program structures

26
Approximations
  • Type
  • attribute sets divided based on type of variable
  • Parameter
  • Supplied as arguments to the same parameter for
    any given procedure
  • Result
  • Variables that are assigned the return values of
    the same function

27
Example revisited
listen_sock return(socket),
num_listen_socks (lt,16)
(param_1, bind)
ret return(getnameinfo)
(param_1, setsockopt),
(gt,0)
old_umask return(umask)
control_fd return(socket),
(param_1, bind)
(gt, 0)
  • Variable names are not comparable
  • Use positional information
  • Different number of attributes
  • Interspersed with irrelevant operations

28
Is intersection robust?
sockfd return(socket),
listen_sock return(socket),
(param_1, bind)
(param_1, bind)
(param_1, setsockopt),
Precondition
(gt, 0)
return(socket),
(param_1, bind)
control_fd return(socket),
(param_1, bind)
(gt, 0) missing!
(gt, 0)
  • Same limitations as with control-flow
    preconditions
  • Adopt frequent itemset mining
  • Order of events is less critical
  • Aggregate collection of data-flow facts at
    call-sites

29
Locality
main() fp init_file() fgets(buf, SIZE,
fp) init_file() fp fopen() if(fp
! NULL) return fp exit(-1)
main() fp fopen() if(fp ! NULL)
read_file(fp) read_file(FILE fp)
fgets(buf, SIZE, fp)
  • Interprocedural analysis to capture precondition
    crossing procedure boundaries

30
Example
p1, p2
p1
q
p1
s
p1
s
q
p1
p1, p2
p1
r
r
s
p1
p2
p1
t
p2
Intraprocedural edge
Interprocedural edge
31
Experiments
  • Applied on open source C programs
  • Input to the implementation control flow graphs
  • Control flow nodes varied from 16K to 958K
  • Roughly 2M LoC
  • Procedure count varied from 298 to 8568
  • Precondition predicates varied from 189 to 5963
  • Analysis time varied from 26s to 20m

32
Experimental Goals
  • Path awareness improves precision
  • Useful for bug detection
  • Generates salient documentation

33
Effectiveness of path awareness
  • Fewer protocols generated using our approach
  • Reduction not at the expense of increase in
    false negatives
  • Reduces false positives

34
Bug Detection Openssh
  • Procedure prime_test in openssh-4.4p1
  • Testing difficult as it performs Miller-Rabin
    primality testing
  • Program crashes due to the absence of a error
    check
  • e.g., BN_mod_word(p, ), if p is null, program
    crashes
  • Fixed in openssh-4.5p1
  • Error check not always necessary
  • e.g., BN_is_prime(, ctx,), ctx can either be
    null or pre-allocated

35
Bug detection
  • Case Study Linux
  • Hardware Bug
  • Difficult to detect using traditional testing
    techniques
  • Platform dependent error
  • Transparently identified using our approach
  • Performance Bug
  • Cache lookup operation was absent
  • Not easily specified as a bug for testing
  • Deviation delays data write flushes
  • Difficult to identify using traditional testing
    techniques

36
Change in Confidence
  • Increase in confidence reduces the number of
    predicates

37
Related Work
  • Static techniques
  • Inferring Specifications from Within, Kremenek et
    al, OSDI 06
  • Bugs as deviant behavior, Engler et al, SOSP 01
  • Dynamic techniques
  • Strauss, Ammons et al, POPL 02
  • Daikon, Ernst et al, TSE 01
  • Our approach
  • Path-aware analysis
  • Generates preconditions
  • Predicates of arbitrary size
  • Annotation free

38
Future Work
  • Richer specifications
  • Post-conditions, divergence structures,
  • More sophisticated mining techniques
  • Graph mining,
  • Validating generated specifications
  • Integration with theorem prover
  • Specifications and concurrency
  • Atomicity violations

39
Other work
  • Dynamic analysis
  • Detecting cause of assertion failures (under
    review)
  • Static path profiles (under review)
  • Impact analysis ASE 06
  • Memory aliasing FASE 06
  • Test case prioritization SAC 08
  • Distributed Systems
  • Randomized leader election (Distributed Computing
    07)
  • Eliminating duplicates in P2P systems (TPDS 07)
  • Search in P2P systems (P2P 05)
  • Efficient tag detection in RFID systems (SECON 05)

40
(No Transcript)
41
Why not mine post-conditions?
fp fopen() if(fp NULL)
exit(-1) fclose()
  • Precedence protocol
  • A call to fclose is always preceded by a call to
    fopen
  • Successor protocol
  • A call to fopen is always succeeded by a call to
    fclose

42
Why parameter tracing is insufficient?
  • uldap_connection_find () //code fragment from
    httpd
  • if (APR_SUCCESS apr_thread_mutex_trylock(l-gtlo
    ck))
  • compare_client_certs(st-gtclient_certs,
    l-gtclient_certs)
  • In a call to compare_client_certs, the return
    value of a call to apr_thread_mutex_trylock must
    be APR_SUCCESS.
  • Predicate for compare_client_certs includes
  • return value of apr_thread_mutex_trylock() is
    APR_SUCCESS

43
Predicate size distribution
  • Majority of predicates less than 3
Write a Comment
User Comments (0)
About PowerShow.com