Title: Murali Krishna Ramanathan
1Static Path-Aware Analysis of Program Invariants
- Murali Krishna Ramanathan
- Department of Computer Science
- Purdue University
- (joint work with Suresh Jagannathan and Ananth
Grama)
2Motivation
Undocumented Program
Expert Programmer
New Programmer
BUGS
Tester
3Context
- What is a program invariant?
- Property that must hold across all program
executions - What is a failure?
- Program run does not satisfy an expected
invariant - System crashes
- Logical bugs
- Performance bugs
- What is a specification?
- Documentation of intended program invariants
- e.g., lock must be followed by unlock
- Unavailable or imprecise
4Issues
- Deriving specifications
- Where do we start?
- Absence of formal documentation
- Legacy code
- Identifying the source of failures
- How do we search?
- Exponential number of execution paths to explore
- Representing common information among paths
5Specification Inference
- Challenges
- What to look for?
- Both relevant and irrelevant information present
in the program source - How to be robust in the presence of bugs?
- Assumptions
- Programs are mostly well tested but can have bugs
- Transparent no programmer annotations
6Kinds of specifications
- Control-flow preconditions
- A call to fopen must always precede a call to
fgets - Data-flow preconditions
- The result of a call to socket must always be
checked for error before a call to bind - Control-flow postconditions
- A call to fopen is either followed by a call to
fclose or error - Control-flow divergence preconditions
- A call to read can be preceded either by a call
to open or socket
7Preconditions
fp fopen()
fp fopen() if(fp ! NULL) fgets(buf, SIZE,
fp)
fp ! null
fopen lt- fgets
- Predicate
- Captures properties associated with variables and
procedure calls - Preconditions for procedure
- Composed of predicates that need to hold always
before every call to a procedure
8Types of predicates
fp fopen() if(fp ! NULL) fgets(buf, SIZE,
fp)
- Data-flow
- captures data flow properties associated with
variables - fp is assigned the return of fopen, fp is not
null,
- Control-flow
- define precedence properties among procedures
- fgets is preceded by fopen
9 Control-flow preconditions (ICSE 07)
- 181 RI_FKey_check(PG_FUNCTION_ARGS)
- 182
- 199 ri_CheckTrigger(...)
- 210 pk_rel heap_open(...)
- 296 match_type ri_DetermineMatchType(...)
- 303 ri_BuildQueryKeyFull(...)
- 437
Check that RI trigger function was called
in expected context
Get the relation descriptors of the FK and PK
tables
Convert the MATCH TYPE string into a
switchable int
Build up a new hashtable key for a prepared
SPI Plan of a constraint trigger of MATCH
FULL
10 Control-flow preconditions
- 181 RI_FKey_check(PG_FUNCTION_ARGS)
- 182
- 199 ri_CheckTrigger(...)
- 210 pk_rel heap_open(...)
- 212 if(TRIGGER_FIRED_BY_UPDATE(...))
- ...
- 218 else
- ...
- 231 if(!HeapTupleSatisfies(...))
- ...
- 296 match_type ri_DetermineMatchType(...)
- 298 if(match_typeRI_MATCH_TYPE_PARTIAL)
- 299 ereport(...)
- 303 ri_BuildQueryKeyFull(...)
- 437
11Control-flow preconditions
- 181 RI_FKey_check(PG_FUNCTION_ARGS)
- 182
- 199 ri_CheckTrigger(...)
- 210 pk_rel heap_open(...)
- 248 if (tgnargs 4)
- 249
- 250 ri_BuildQueryKeyFull(...)
- 294
- 437
ri_BuildQueryKeyFull not preceded by
ri_DetermineMatchType Leads to a potential crash
12Static Specification Mining
- To generate preconditions for a procedure
- Generate predicates at each call-site of the
procedure - Ideally common predicates across all the
call-sites form the preconditions for the
procedure - How to find common predicates?
- Use mining techniques
- Construct patterns built from alignments or
permutations of predicate sets - Approximation Patterns appearing in programs
denote preconditions
13Approach
- Analyze control-flow graph
- Build precedence relation (a lt- b)
- A binary relation between procedures a and b
- A call to b is always preceded by call to a
- Necessitates an inter-procedural analysis
- Relations can cross procedure boundaries
- Convergence requires fixpoint calculation
- Procedure signatures
- Frequent subsequence mining
- Mine the chains formed by precedence relations
-
14Path Exploration
Path-Sensitive Exploration q lt- p, q lt-
r lt- p
q
Path-Insensitive Exploration q , r lt- p
r
q
q
Path-Aware Exploration q lt- p
p
15Precedes relation
q
q
r
t
q
q
q
exit
p
p
q lt- p
q lt- p
16Inter-procedural Analysis
h() if(cond) lwrap() else
lwrap() uwrap()
lwrap () init()
uwrap () access()
17Procedure Signatures
s
entry
s
q
u
t
r
q
q
s lt- t
s lt- q lt- p lt- t
Procedure signature for s q lt- p
ret
p
18Mining sequences
- Sequence mining
- Input set of sequences (I)
- Output sequences that occur frequently as
subsequences in I - Use the Apriori-all algorithm Agrawal and
Srikant, Mining Sequential Patterns, ICDE 95
19Motivation for sequence mining
- Control paths Invariant
- a, b, c, e a lt- c lt-
e - g, a, d, c, e
- a, c, e
- a, c, d, e, f
- e, f, d, a (Faulty path, no call to a and c
before e) - Intersection of these paths
- e is preceded by nothing
- Use mining to overcome brittleness of path
intersection
20Sequence Mining - Example
- Input sequences Min Frequency 4/5
- a, b, c, e
- g, a, d, c, e
- a, c, e
- a, c, d, e, f
- e, f, d, a
- Input sequences Min Frequency 4/5
- a, b, c, e
- g, a, d, c, e
- a, c, e
- a, c, d, e, f
- e, f, d, a
- Input sequences Min Frequency 4/5
- a, b, c, e
- g, a, d, c, e
- a, c, e
- a, c, d, e, f
- e, f, d, a
- Input sequences Min Frequency 4/5
- a, b, c, e
- g, a, d, c, e
- a, c, e
- a, c, d, e, f
- e, f, d, a
Maximal
21Data-flow preconditions (PLDI 07)
- Challenges
- Data-flow predicates may be aliased
- No anchors for data-flow predicates
if (x gt 0) f(x)
if (y gt 0) f(y)
x g() h(x) if(x gt 0) f(x)
22Motivating Example
- main()
- for(ai options.listen_addrs)
- listen_sock socket(ai-gtai_family,)
- if(listen_sock lt 0) error()
- if(num_listen_socks gt 16) error()
- if((ret getnameinfo()))
- if(setsockopt(listen_sock,) -1) error()
- if(bind(listen_sock, ai-gtai_addr,) lt 0)
-
- In a call to bind, the first parameter is always
assigned the return value of a call to socket and
is checked for error
23Generate Predicates
- main()
- for(ai options.listen_addrs)
- listen_sock socket(ai-gtai_family,)
- if(listen_sock lt 0) error()
- if(num_listen_socks gt 16) error()
- if((ret getnameinfo()))
- if(setsockopt(listen_sock,) -1) error()
- if(bind(listen_sock, ai-gtai_addr,) lt 0)
-
listen_sock return(socket),
num_listen_socks (lt,16)
(param_1, bind)
ret return(getnameinfo)
(param_1, setsockopt),
(gt,0)
24Another call-site
- ssh_control_listener(void)
- if(control_fd socket(PF_UNIX,) lt 0) error()
- old_umask umask(0177)
- if(bind(control_fd,(struct sockaddr )addr,))
-
control_fd return(socket),
old_umask return(umask)
(param_1, bind)
(gt,0)
25Structural Similarity Problem
listen_sock return(socket),
num_listen_socks (lt,16)
(param_1, bind)
ret return(getnameinfo)
(param_1, setsockopt),
(gt,0)
old_umask return(umask)
control_fd return(socket),
(param_1, bind)
(gt, 0)
- How to group the attribute sets that need to be
mined together? - Find maximal matching of attribute sets
- NP-hard
- Use approximations based on program structures
26Approximations
- Type
- attribute sets divided based on type of variable
- Parameter
- Supplied as arguments to the same parameter for
any given procedure - Result
- Variables that are assigned the return values of
the same function
27Example revisited
listen_sock return(socket),
num_listen_socks (lt,16)
(param_1, bind)
ret return(getnameinfo)
(param_1, setsockopt),
(gt,0)
old_umask return(umask)
control_fd return(socket),
(param_1, bind)
(gt, 0)
- Variable names are not comparable
- Use positional information
- Different number of attributes
- Interspersed with irrelevant operations
28Is intersection robust?
sockfd return(socket),
listen_sock return(socket),
(param_1, bind)
(param_1, bind)
(param_1, setsockopt),
Precondition
(gt, 0)
return(socket),
(param_1, bind)
control_fd return(socket),
(param_1, bind)
(gt, 0) missing!
(gt, 0)
- Same limitations as with control-flow
preconditions - Adopt frequent itemset mining
- Order of events is less critical
- Aggregate collection of data-flow facts at
call-sites
29Locality
main() fp init_file() fgets(buf, SIZE,
fp) init_file() fp fopen() if(fp
! NULL) return fp exit(-1)
main() fp fopen() if(fp ! NULL)
read_file(fp) read_file(FILE fp)
fgets(buf, SIZE, fp)
- Interprocedural analysis to capture precondition
crossing procedure boundaries
30Example
p1, p2
p1
q
p1
s
p1
s
q
p1
p1, p2
p1
r
r
s
p1
p2
p1
t
p2
Intraprocedural edge
Interprocedural edge
31Experiments
- Applied on open source C programs
- Input to the implementation control flow graphs
- Control flow nodes varied from 16K to 958K
- Roughly 2M LoC
- Procedure count varied from 298 to 8568
- Precondition predicates varied from 189 to 5963
- Analysis time varied from 26s to 20m
32Experimental Goals
- Path awareness improves precision
- Useful for bug detection
- Generates salient documentation
33Effectiveness of path awareness
- Fewer protocols generated using our approach
- Reduction not at the expense of increase in
false negatives - Reduces false positives
34Bug Detection Openssh
- Procedure prime_test in openssh-4.4p1
- Testing difficult as it performs Miller-Rabin
primality testing - Program crashes due to the absence of a error
check - e.g., BN_mod_word(p, ), if p is null, program
crashes - Fixed in openssh-4.5p1
- Error check not always necessary
- e.g., BN_is_prime(, ctx,), ctx can either be
null or pre-allocated
35Bug detection
- Case Study Linux
- Hardware Bug
- Difficult to detect using traditional testing
techniques - Platform dependent error
- Transparently identified using our approach
- Performance Bug
- Cache lookup operation was absent
- Not easily specified as a bug for testing
- Deviation delays data write flushes
- Difficult to identify using traditional testing
techniques
36Change in Confidence
- Increase in confidence reduces the number of
predicates
37Related Work
- Static techniques
- Inferring Specifications from Within, Kremenek et
al, OSDI 06 - Bugs as deviant behavior, Engler et al, SOSP 01
-
- Dynamic techniques
- Strauss, Ammons et al, POPL 02
- Daikon, Ernst et al, TSE 01
-
- Our approach
- Path-aware analysis
- Generates preconditions
- Predicates of arbitrary size
- Annotation free
38Future Work
- Richer specifications
- Post-conditions, divergence structures,
- More sophisticated mining techniques
- Graph mining,
- Validating generated specifications
- Integration with theorem prover
- Specifications and concurrency
- Atomicity violations
39Other work
- Dynamic analysis
- Detecting cause of assertion failures (under
review) - Static path profiles (under review)
- Impact analysis ASE 06
- Memory aliasing FASE 06
- Test case prioritization SAC 08
- Distributed Systems
- Randomized leader election (Distributed Computing
07) - Eliminating duplicates in P2P systems (TPDS 07)
- Search in P2P systems (P2P 05)
- Efficient tag detection in RFID systems (SECON 05)
40(No Transcript)
41Why not mine post-conditions?
fp fopen() if(fp NULL)
exit(-1) fclose()
- Precedence protocol
- A call to fclose is always preceded by a call to
fopen
- Successor protocol
- A call to fopen is always succeeded by a call to
fclose
42Why parameter tracing is insufficient?
- uldap_connection_find () //code fragment from
httpd - if (APR_SUCCESS apr_thread_mutex_trylock(l-gtlo
ck)) -
- compare_client_certs(st-gtclient_certs,
l-gtclient_certs) -
- In a call to compare_client_certs, the return
value of a call to apr_thread_mutex_trylock must
be APR_SUCCESS. - Predicate for compare_client_certs includes
- return value of apr_thread_mutex_trylock() is
APR_SUCCESS
43Predicate size distribution
- Majority of predicates less than 3