Murali Krishna Ramanathan

About This Presentation

Title:

Murali Krishna Ramanathan

Description:

... Invariants. 2. Motivation. Expert Programmer. How do ... Transparent no programmer annotations. 6. Kinds of specifications. Control-flow preconditions ... – PowerPoint PPT presentation

Number of Views:137

Avg rating:3.0/5.0

Slides: 44

Provided by: mmur5

Learn more at: https://www.cs.purdue.edu

Category:

more less

Transcript and Presenter's Notes

Title: Murali Krishna Ramanathan

1
Static Path-Aware Analysis of Program Invariants

Murali Krishna Ramanathan
Department of Computer Science
Purdue University
(joint work with Suresh Jagannathan and Ananth
Grama)

2
Motivation
Undocumented Program
Expert Programmer
New Programmer
BUGS
Tester
3
Context

What is a program invariant?
Property that must hold across all program
executions
What is a failure?
Program run does not satisfy an expected
invariant
System crashes
Logical bugs
Performance bugs
What is a specification?
Documentation of intended program invariants
e.g., lock must be followed by unlock
Unavailable or imprecise

4
Issues

Deriving specifications
Where do we start?
Absence of formal documentation
Legacy code
Identifying the source of failures
How do we search?
Exponential number of execution paths to explore
Representing common information among paths

5
Specification Inference

Challenges
What to look for?
Both relevant and irrelevant information present
in the program source
How to be robust in the presence of bugs?
Assumptions
Programs are mostly well tested but can have bugs
Transparent no programmer annotations

6
Kinds of specifications

Control-flow preconditions
A call to fopen must always precede a call to
fgets
Data-flow preconditions
The result of a call to socket must always be
checked for error before a call to bind
Control-flow postconditions
A call to fopen is either followed by a call to
fclose or error
Control-flow divergence preconditions
A call to read can be preceded either by a call
to open or socket

7
Preconditions
fp fopen()
fp fopen() if(fp ! NULL) fgets(buf, SIZE,
fp)
fp ! null
fopen lt- fgets

Predicate
Captures properties associated with variables and
procedure calls
Preconditions for procedure
Composed of predicates that need to hold always
before every call to a procedure

8
Types of predicates
fp fopen() if(fp ! NULL) fgets(buf, SIZE,
fp)

Data-flow
captures data flow properties associated with
variables
fp is assigned the return of fopen, fp is not
null,

Control-flow
define precedence properties among procedures
fgets is preceded by fopen

9
Control-flow preconditions (ICSE 07)

181 RI_FKey_check(PG_FUNCTION_ARGS)
182
199 ri_CheckTrigger(...)
210 pk_rel heap_open(...)
296 match_type ri_DetermineMatchType(...)
303 ri_BuildQueryKeyFull(...)
437

Check that RI trigger function was called
in expected context
Get the relation descriptors of the FK and PK
tables
Convert the MATCH TYPE string into a
switchable int
Build up a new hashtable key for a prepared
SPI Plan of a constraint trigger of MATCH
FULL
10
Control-flow preconditions

181 RI_FKey_check(PG_FUNCTION_ARGS)
182
199 ri_CheckTrigger(...)
210 pk_rel heap_open(...)
212 if(TRIGGER_FIRED_BY_UPDATE(...))
...
218 else
...
231 if(!HeapTupleSatisfies(...))
...
296 match_type ri_DetermineMatchType(...)
298 if(match_typeRI_MATCH_TYPE_PARTIAL)
299 ereport(...)
303 ri_BuildQueryKeyFull(...)
437

11
Control-flow preconditions

181 RI_FKey_check(PG_FUNCTION_ARGS)
182
199 ri_CheckTrigger(...)
210 pk_rel heap_open(...)
248 if (tgnargs 4)
249
250 ri_BuildQueryKeyFull(...)
294
437

ri_BuildQueryKeyFull not preceded by
ri_DetermineMatchType Leads to a potential crash
12
Static Specification Mining

To generate preconditions for a procedure
Generate predicates at each call-site of the
procedure
Ideally common predicates across all the
call-sites form the preconditions for the
procedure
How to find common predicates?
Use mining techniques
Construct patterns built from alignments or
permutations of predicate sets
Approximation Patterns appearing in programs
denote preconditions

13
Approach

Analyze control-flow graph
Build precedence relation (a lt- b)
A binary relation between procedures a and b
A call to b is always preceded by call to a
Necessitates an inter-procedural analysis
Relations can cross procedure boundaries
Convergence requires fixpoint calculation
Procedure signatures
Frequent subsequence mining
Mine the chains formed by precedence relations

14
Path Exploration
Path-Sensitive Exploration q lt- p, q lt-
r lt- p
q
Path-Insensitive Exploration q , r lt- p
r
q
q
Path-Aware Exploration q lt- p
p
15
Precedes relation
q
q
r
t
q
q
q
exit
p
p
q lt- p
q lt- p
16
Inter-procedural Analysis
h() if(cond) lwrap() else
lwrap() uwrap()
lwrap () init()
uwrap () access()
17
Procedure Signatures
s
entry
s
q
u
t
r
q
q
s lt- t
s lt- q lt- p lt- t
Procedure signature for s q lt- p
ret
p
18
Mining sequences

Sequence mining
Input set of sequences (I)
Output sequences that occur frequently as
subsequences in I
Use the Apriori-all algorithm Agrawal and
Srikant, Mining Sequential Patterns, ICDE 95

19
Motivation for sequence mining

Control paths Invariant
a, b, c, e a lt- c lt-
e
g, a, d, c, e
a, c, e
a, c, d, e, f
e, f, d, a (Faulty path, no call to a and c
before e)
Intersection of these paths
e is preceded by nothing
Use mining to overcome brittleness of path
intersection

20
Sequence Mining - Example

Input sequences Min Frequency 4/5
a, b, c, e
g, a, d, c, e
a, c, e
a, c, d, e, f
e, f, d, a

Input sequences Min Frequency 4/5
a, b, c, e
g, a, d, c, e
a, c, e
a, c, d, e, f
e, f, d, a

Input sequences Min Frequency 4/5
a, b, c, e
g, a, d, c, e
a, c, e
a, c, d, e, f
e, f, d, a

Input sequences Min Frequency 4/5
a, b, c, e
g, a, d, c, e
a, c, e
a, c, d, e, f
e, f, d, a

Maximal
21
Data-flow preconditions (PLDI 07)

Challenges
Data-flow predicates may be aliased
No anchors for data-flow predicates

if (x gt 0) f(x)
if (y gt 0) f(y)
x g() h(x) if(x gt 0) f(x)
22
Motivating Example

main()
for(ai options.listen_addrs)
listen_sock socket(ai-gtai_family,)
if(listen_sock lt 0) error()
if(num_listen_socks gt 16) error()
if((ret getnameinfo()))
if(setsockopt(listen_sock,) -1) error()
if(bind(listen_sock, ai-gtai_addr,) lt 0)

In a call to bind, the first parameter is always
assigned the return value of a call to socket and
is checked for error

23
Generate Predicates

main()
for(ai options.listen_addrs)
listen_sock socket(ai-gtai_family,)
if(listen_sock lt 0) error()
if(num_listen_socks gt 16) error()
if((ret getnameinfo()))
if(setsockopt(listen_sock,) -1) error()
if(bind(listen_sock, ai-gtai_addr,) lt 0)

listen_sock return(socket),
num_listen_socks (lt,16)
(param_1, bind)
ret return(getnameinfo)
(param_1, setsockopt),
(gt,0)
24
Another call-site

ssh_control_listener(void)
if(control_fd socket(PF_UNIX,) lt 0) error()
old_umask umask(0177)
if(bind(control_fd,(struct sockaddr )addr,))

control_fd return(socket),
old_umask return(umask)
(param_1, bind)
(gt,0)
25
Structural Similarity Problem
listen_sock return(socket),
num_listen_socks (lt,16)
(param_1, bind)
ret return(getnameinfo)
(param_1, setsockopt),
(gt,0)
old_umask return(umask)
control_fd return(socket),
(param_1, bind)
(gt, 0)

How to group the attribute sets that need to be
mined together?
Find maximal matching of attribute sets
NP-hard
Use approximations based on program structures

26
Approximations

Type
attribute sets divided based on type of variable
Parameter
Supplied as arguments to the same parameter for
any given procedure
Result
Variables that are assigned the return values of
the same function

27
Example revisited
listen_sock return(socket),
num_listen_socks (lt,16)
(param_1, bind)
ret return(getnameinfo)
(param_1, setsockopt),
(gt,0)
old_umask return(umask)
control_fd return(socket),
(param_1, bind)
(gt, 0)

Variable names are not comparable
Use positional information
Different number of attributes
Interspersed with irrelevant operations

28
Is intersection robust?
sockfd return(socket),
listen_sock return(socket),
(param_1, bind)
(param_1, bind)
(param_1, setsockopt),
Precondition
(gt, 0)
return(socket),
(param_1, bind)
control_fd return(socket),
(param_1, bind)
(gt, 0) missing!
(gt, 0)

Same limitations as with control-flow
preconditions
Adopt frequent itemset mining
Order of events is less critical
Aggregate collection of data-flow facts at
call-sites

29
Locality
main() fp init_file() fgets(buf, SIZE,
fp) init_file() fp fopen() if(fp
! NULL) return fp exit(-1)
main() fp fopen() if(fp ! NULL)
read_file(fp) read_file(FILE fp)
fgets(buf, SIZE, fp)

Interprocedural analysis to capture precondition
crossing procedure boundaries

30
Example
p1, p2
p1
q
p1
s
p1
s
q
p1
p1, p2
p1
r
r
s
p1
p2
p1
t
p2
Intraprocedural edge
Interprocedural edge
31
Experiments

Applied on open source C programs
Input to the implementation control flow graphs
Control flow nodes varied from 16K to 958K
Roughly 2M LoC
Procedure count varied from 298 to 8568
Precondition predicates varied from 189 to 5963
Analysis time varied from 26s to 20m

32
Experimental Goals

Path awareness improves precision
Useful for bug detection
Generates salient documentation

33
Effectiveness of path awareness

Fewer protocols generated using our approach
Reduction not at the expense of increase in
false negatives
Reduces false positives

34
Bug Detection Openssh

Procedure prime_test in openssh-4.4p1
Testing difficult as it performs Miller-Rabin
primality testing
Program crashes due to the absence of a error
check
e.g., BN_mod_word(p, ), if p is null, program
crashes
Fixed in openssh-4.5p1
Error check not always necessary
e.g., BN_is_prime(, ctx,), ctx can either be
null or pre-allocated

35
Bug detection

Case Study Linux
Hardware Bug
Difficult to detect using traditional testing
techniques
Platform dependent error
Transparently identified using our approach
Performance Bug
Cache lookup operation was absent
Not easily specified as a bug for testing
Deviation delays data write flushes
Difficult to identify using traditional testing
techniques

36
Change in Confidence

Increase in confidence reduces the number of
predicates

37
Related Work

Static techniques
Inferring Specifications from Within, Kremenek et
al, OSDI 06
Bugs as deviant behavior, Engler et al, SOSP 01
Dynamic techniques
Strauss, Ammons et al, POPL 02
Daikon, Ernst et al, TSE 01
Our approach
Path-aware analysis
Generates preconditions
Predicates of arbitrary size
Annotation free

38
Future Work

Richer specifications
Post-conditions, divergence structures,
More sophisticated mining techniques
Graph mining,
Validating generated specifications
Integration with theorem prover
Specifications and concurrency
Atomicity violations

39
Other work

Dynamic analysis
Detecting cause of assertion failures (under
review)
Static path profiles (under review)
Impact analysis ASE 06
Memory aliasing FASE 06
Test case prioritization SAC 08
Distributed Systems
Randomized leader election (Distributed Computing
07)
Eliminating duplicates in P2P systems (TPDS 07)
Search in P2P systems (P2P 05)
Efficient tag detection in RFID systems (SECON 05)

40
(No Transcript)
41
Why not mine post-conditions?
fp fopen() if(fp NULL)
exit(-1) fclose()

Precedence protocol
A call to fclose is always preceded by a call to
fopen

Successor protocol
A call to fopen is always succeeded by a call to
fclose

42
Why parameter tracing is insufficient?

uldap_connection_find () //code fragment from
httpd
if (APR_SUCCESS apr_thread_mutex_trylock(l-gtlo
ck))
compare_client_certs(st-gtclient_certs,
l-gtclient_certs)

In a call to compare_client_certs, the return
value of a call to apr_thread_mutex_trylock must
be APR_SUCCESS.
Predicate for compare_client_certs includes
return value of apr_thread_mutex_trylock() is
APR_SUCCESS

Murali Krishna Ramanathan - PowerPoint PPT Presentation

Murali Krishna Ramanathan

... Invariants. 2. Motivation. Expert Programmer. How do ... Transparent no programmer annotations. 6. Kinds of specifications. Control-flow preconditions ... – PowerPoint PPT presentation