Title: Static Analysis of Memory Errors
 1Static Analysis of Memory Errors
Mooly Sagiv Tel Aviv University 
 2Project Goals
- Statically determine that data are used in a 
sound way  -  No unexpected software behavior 
 -  In C 
 - No undefined semantics (ANSI C) 
 - Prevent bad programming styles 
 - In Java 
 - Certain exceptions will never be raised 
 - Sound analysis 
 - Minimal false alarms
 
  3Sample Cleanness Problems
- C String related errors 
 - Unsafe calls to strcpy(), strcat() 
 - Out of bound references 
 - Pointer arithmetic 
 -  Java interface requirements for library usages 
 
  4String Manipulation Cleanness Checking
Nurit Dor  Greta Yorsh
http//www.cs.tau.ac.il/nurr 
 5Are String Violations Common?
- FUZZ study (1995) 
 - Random test programs on various systems 
 - 9 different UNIX systems 
 - 18  23 hang or crash 
 - 80 are string related errors 
 - CERT advisory 
 - 50 of attacks are abuses of buffer overflows 
 
  6Example  unsafe call to strcpy()
simple()  char s20 char p char t
10 strcpy(s,Hello) p  s  
5 strcpy(p, world!) strcpy(t,s)  
 7Example  unsafe call to strcpy()
simple()  char s20 char p char t
10 strcpy(s,Hello) p  s  
5 strcpy(p, world!) strcpy(t,s) 
cleanness is always violated alloc(t)  
10 len(s)  12 
 8Example  unsafe pointer arithmetic
/ from web2c strpascal.c / void 
null_terminate(char s)  while ( s !   
) s s  0  
 9Example  unsafe pointer arithmetic
/ from web2c strpascal.c / void 
null_terminate(char s)  while ( s !   
) s s  0 
Cleanness is potentially violated offtset(s) 
alloc(buff(s)) 
 10Complicated Example 
/ from web2c fixwrites.c / define BUFSIZ 
1024 char bufBUFSIZ char insert_long(char 
cp)  char tempBUFSIZ  for (i  0 
bufi lt cp  i) tempi  
bufi strcpy(tempi,(long)) strcpy(temp
i6,cp)  
(long)
temp 
 11Complicated Example 
/ from web2c fixwrites.c / define BUFSIZ 
1024 char bufBUFSIZ char insert_long(char 
cp)  char tempBUFSIZ  for (i  0 
bufi lt cp  i) tempi  
bufi strcpy(tempi,(long)) strcpy(temp
i6,cp)  
buf
cp
( l o n g )
temp
Cleanness is potentially violated 7  offset 
(cp) ?BUFSIZ 
 12Complicated Example 
/ from web2c fixwrites.c / define BUFSIZ 
1024 char bufBUFSIZ char insert_long(char 
cp)  char tempBUFSIZ  for (i  0 
bufi lt cp  i) tempi  
bufi strcpy(tempi,(long)) strcpy(temp
i6,cp)  
(long)
temp
Cleanness is potentially violated offset(cp)7 
len(cp) ? BUFSIZ 7  offset (cp) lt BUFSIZ 
 13Vulnerable String Manipulation
- Pointers to buffers char p buffer    
 while( ) p  - Standard string manipulation functions 
 - strcpy(), strcat(),  
 - NULL termination 
 - strncpy(), 
 
  14C Static String Verifier (CSSV) Objectives
- Modular analysis 
 - Procedure pre-condition/post-condition/mod 
 - Automatically generate procedure specification 
 - Handle full C 
 - Multi-level pointers 
 - Structures 
 -  Reduce complexity of transformation 
 - Linear in the number of variables
 
  15CSSV 
Pointer Analysis
ProceduresPointer info
Procedure name
C2IP
Integer Proc
Potential Error Messages
Integer Analysis 
 16Advantages of Procedure Specification
- Modular analysis 
 - Not all the code is available 
 - Enables more expensive analyses 
 -  User control of the verification 
 - Detect errors at point of logical error 
 - Improve the precision of the analysis 
 - Check additional properties 
 - Beyond ANSI-C 
 
  17Specification and Soundness
- All errors are detected 
 - Violation of procedures precondition 
 - Call 
 - Violation of procedure's postcondition 
 - Return 
 - Violation of statements precondition 
 - ai 
 
  18Specification  strcpy
- char strcpy(char dst, char src) 
 -  requires mod 
 -  ensures 
 
( string(src) ? alloc(dst) gt len(src) )
( len(dst)   pre_at_len(src) ? return   
pre_at_dst )  
 19Specification  insert_long()
/ insert_long.c / include "insert_long.h" 
 char bufBUFSIZ char  insert_long (char cp) 
 char tempBUFSIZ int i for (i0 
bufi lt cp i) tempi  bufi 
  strcpy (tempi,"(long)") strcpy 
(tempi  6, cp) strcpy (buf, temp) 
return cp  6 
char  insert_long(char cp) requires( 
string(cp) ? buf ? cp lt buf  BUFSIZ ) mod 
cp.strlen ensures ( len(cp)   
prelen(cp)  6 ? return_value   cp  
6  ) 
 20CSSV 
Pointer Analysis
ProceduresPointer info
Procedure name
C2IP
Integer proc
Potential Error Messages
Integer Analysis 
 21CSSV
Pointer Analysis
ProceduresPointer info
LeafProcedure 
C2IPside effect
Mod
Integer proc 
 22CSSV
Pointer Analysis
ProceduresPointer info
LeafProcedure 
C2IP
Pre Mod
Integer proc
Potential Error Messages
Integer Analysis 
 23C2IP
char  insert_long (char cp)  
char tempBUFSIZ
int i
require string(cp)
for(i0 bufi lt cp i)  
tempicpi  
strcpy(tempi,"(long)") 
 24AWP
-  Approximate the Weakest Precondition 
 -  Backward integer analysis 
 -  Generates a precondition
 
  25AWP  insert_long()
-  Generate the following precondition 
 - string(cp) ?
 
- len(buf) ? offset(cp)  1017 
 
-  Not the weakest precondition 
 - string(cp) ? 
 - len(buf) ? 1017
 
  26Implementation
-  Using 
 - ASToolKit Microsoft 
 - GOLF Microsoft  Manuvir Das 
 - New Polka IMAG - Bertrand Jeannet 
 -  Main steps 
 - Simplifier 
 - Pointer analysis 
 - C2IP 
 - Integer Analysis
 
  27Preliminary results (web2C)
- Up to four times faster than SAS01
 
  28Preliminary results (EADS/RTC_Si) 
 29The Canvas Project Component ANnotation, 
Verification And Stuff
- J. Field 
 -  D. Goyal. 
 -  G. Ramalingam
 
IBM Research
http//www.research.ibm.com/menage/canvas 
 30The problem
- Class libraries and software components are 
supposed to  -  make building complex applications from "parts" 
easier  -  make a market for pre-packaged code... 
 -  ...but in practice 
 -  programming with components is hard 
 - inadequate documentation 
 - lack of source code 
 - increased API complexity (to allow for 
customization)  -  Programmers often resort to iterative 
trial-and-error methods to get components to work 
in their application 
  31Canvas Goals 
- The component designers specify component 
conformance constraints  - Develop automated certification tools to 
determine whether the client satisfies the 
component's conformance constraints  - focus on JavaTM libraries and JavaBeansTM
 
  32Our Approach 
- Specify component behavior in a Java like 
language (EASL)  - Use TVLA for statically analyzing Java heap 
 - Specialize the algorithm for the component
 
  33The Concurrent Modification Problem(PLDI02 
Berlin) 
- Static analysis of Java programs manipulating 
Java 2 collections  - Inconsistent usages of iterators 
 - An Iterator object i defined on a collection 
object c  - No use of i may be preceded by update to the 
contents of c, unless the update was also made 
via i  
  34- class Make  
 -  private Worklist worklist 
 -  public static void main (String args)  
 -  Make m  new Make() 
 -  m.initializeWorklist(args) 
 -  m.processWorklist()  
 -  void initializeWorklist(String args)  
 -  ... worklist  new Worklist() ... 
 -  // add some items to worklist 
 -  void processWorklist()  
 -  Set s  worklist.unprocessedItems() 
 -  for (Iterator i  s.iterator() 
i.hasNext())  -  Object item  i.next() 
 -  if (...) processItem(item) 
 -    
 -  void processItem(Object i) ... 
doSubproblem(...)  -  void doSubproblem(...)  
 -  ... worklist.addItem(newitem) ...  
 
- public class Worklist  
 -  Set s 
 -  public Worklist() . 
 -  .. s  new HashSet() ...  
 -  public void addItem(Object item)  
s.add(item)   -  public Set unprocessedItems()  
 -  return s  
 -  
 - return rev  
 
  35EASL Specification
class Version  
class Collection  Version version 
Collection()  version  new Version() 
 boolean add(Object o)  version  new 
Version()  Iterator iterator()  return new 
Iterator(this)  
class Iterator  Collection set Version 
definingVersion Iterator (Collection s) 
definingVersion  s.version set  s  
void remove()  requires (definingVersion 
 set.version) set.ver  new Version() 
 definingVersion  set.version  Object 
next()  requires (definingVersion  
set.version)  
 36Prototype
Jimple AST
CFG  actions
J2TVP Translator
Java
Soot
EASL
Specialize
Three Value Logic Analyzer
action definition
Analysis result Potential cleanness violations 
 37Empirical Results 
 38Conclusion
- Ambitious sound analyses 
 - Very few false alarms 
 - Scaling is an issue 
 - Use staged analyses 
 - Use modular analysis 
 - Use encapsulation