Title:
1Tracking Pointers with Path and Context
Sensitivity for Bug Detection in C Programs
CMSC 838Z Spring 2004
- V. Benjamin Livshits and Monica S. Lam
- presented by Mujtaba Ali
- (Based on a presentation given at ACM SIGSOFT
FSE-11, September 2003)
2Bugs are Bad
- Costs lots of money to fix after deployment
- Nastiest bugs security violations
- Hard to discover effects of malicious attacks
- Legal/liability issues
3CVE Classification
62
Would like to address these
- Study of security reports in 2002
- source SecurityFocus.com
4Motivation
- Goal Report hard-to-find security violations
- Errors spanning many functions and files
- Reduce false positives
- Applications
- Buffer overflows
- Format string vulnerabilities
5Examples
- Two security vulnerabilities
- Buffer overrun in gzip, compression utility
- Format string violation in muh, network game
- Cause Unsafe use of user-supplied data
- gzip copies data to statically-sized buffer
- may result in an overrun
- muh uses data as format argument call to
vsnprintf - user can maliciously embed n into format string
6Buffer Overrun in gzip
gzip.c593
- 0592     while (optind lt argc)Â
- 0593Â Â Â Â Â Â Â Â Â treat_file(argvoptind)
- 0594
0704 local void treat_file(char iname) ... 0716Â
    if (get_istat(iname, istat) ! OK) return
gzip.c716
0997 local int get_istat(char iname,
struct stat sbuf) ... 1009     strcpy(ifname,
 iname)
gzip.c1009
Need a model of strcpy
0233Â char ifnameMAX_PATH_LEN /input file
name/
gzip.c233
7Format String Violation in muh
muh.c839
- 0838             s  ( char  )malloc( 1024 )
- 0839             while( fgets( s, 1023, messagelog
 ) ) - 0841                irc_notice(c_client, status.n
ickname, s) - 0842            Â
- 0843             FREESTRING( s )
irc.c263
257 void irc_notice(con_type con, char nick,Â
char format, ... )259     va_list va260 Â
   char buffer BUFFERSIZE 261 262     va_star
t( va, format )263     vsnprintf( buffer, BUFFER
SIZE - 10, format, va )
8Easy Bugs are Boring
- Programs have security violations despite code
reviews and years of use - Common observation about hard errors
- Errors on interface boundaries need to follow
data flow between procedures - Errors occur along complicated control-flow
paths need to follow long definition-use chains
9Need for Alias Analysis
- Both examples involved complex flow of data
- Tracking data flow in modern languages requires
alias analysis - Steensgaards or Andersens analysis?
- Flow- and context- insensitive
- Fast, but imprecise too many false positives
- But flow- and context- sensitive analyses do not
scale
10Tradeoff Scalability vs Precision
3-value logic
This analysis
Wilson Lam
high
Precision
Andersen
Steensgaard
low
slow and expensive
fast
Speed / Scalability
11A Hybrid Analysis
- Maintain precision selectively
- Analyze precisely
- Local variables
- Function parameters
- Global variables
- their dereferences and fields
- These are essentially access paths, i.e.
p.next.data.
- The rest Break into equivalence classes
- Represent by abstract locations
- Recursive data structures
- Arrays
- Locations accessed through pointer arithmetic
12Linearity
- Regular assignments result in strong updates
- Assignments to abstract memory locations weak
updates
x  1 x0 1 x  2 x1 2 y
x y0 x1
x is 2
Ai  1 m0 1 Aj  2 m1
?(m0, 2) b Ak b0 m1
Either 1 or 2
13Static Single Assignment
- Sparse representation of program
- Propagate facts about definition where needed
- Definition-use relationships
- Each variable is defined only once
- Give new names (subscripts) to definitions
- Solves flow-sensitivity problem
- But a definition may be used many times
14SSA Join points
- In standard SSA, use ? function at joins
- d3 ? (d1, d2)
- In Gated SSA, use ? function at joins
- d3 ? ( ltP, d1gt, ltP, d2gt)
- Solves path-sensitivity problem
15IPSSA Intraprocedurally
- Extension of Gated SSA
- Provides pointer resolution
- Replace indirect pointer dereferences with direct
accesses of potentially new temporary locations
16Example of Pointer Resolution
int a0,b1 int c2,d3 if(Q)    p  aels
e    p  b c   p p  d
a0 0, b0 1
c0 2, d0 3
p1 a
p2 b
p3 ?(ltQ, p1gt, ltQ, p2gt)
Load resolution
c1 ?(ltQ, a0gt, ltQ, b0gt)
a1 ?(ltQ, d0gt, ltQ, a0gt)
Store resolution
b1 ?(ltQ, b0gt, ltQ, d0gt)
17Pointer Resolution Rules
- When resolving definition d, next step depends on
RHS of d - Expressed as conditional rewrite rules
- A few sample rules
- d x, result is x
- d ?(), result is d
- d ?(ltP1, d1gt,,ltPn, dngt), follow d1dn
- Refer to the paper for details
18Interprocedural Example
- Data flow in and out of functions
- Create links between formal and actual parameters
- Reflect stores and assignments to globals at the
callee
int f(int p) p  100 int main()
    int x 0     int q  x        c
f(q)Â Â Â
p0 ?(ltc,q0gt)
p1 100
Formal-actual connection for call site c
x0 0
q0 x
Reflect store inside of f within main
x1 ?(ltf,100gt)
19Unsound Unaliasing Assumption
A1 No aliased parameters A2 No aliased abstract locations
Assumption Locations accessible through different parameters are distinct Things pulled out of an abstract location is not aliased
Justification Matches how good interfaces are written Holds in most usage cases
Consequence Context-independent procedure summaries Give unique names when we get data from abstract location
20Interprocedural Algorithm
- Process one SCC of the call graph at a time
- Bottom-up
- SCC strongly-connected component
- For each SCC, within each procedure
- Resolve all pointer operations (loads and stores)
- Create links between formal and actual parameters
- Reflect stores and assignments to globals at call
sites - Iterate within SCC until the representation
stabilizes
21Framework
Framework makes it easy to add new analyses
Program sources
Buffer overruns
IPSSA construction
Format violations
Error traces
IP data flow info
others
Abstracts away many details. Makes it easy to
write tools
22Application
- Start at roots sources of user input such as
- argv elements
- Input functions fgets, gets, recv, getenv, etc.
- Follow data flow chains provided by IPSSA for
every definition, IPSSA provides a list of its
uses - A sink is a potentially dangerous usage such as
- A buffer of a statically defined length
- A format argument of vulnerable functions
printf, fprintf, snprintf, vsnprintf - Report bug, record full path
23Experimental Setup
- Implementation w/ SUIF2 compiler suite
- Pentium IV 2GHz, 2GB of RAM running Linux
24Summary of Results
Many definitions
Many procedures
25False Positive in pcre
- Copying tainted user data to a statically-sized
buffer may be unsafe - Turns out to be safe in this case
Tainted data
sprintf(buffer, .512s, filename)
Limits the length of copied data. Buffer is big
enough!
26Related Work
- xgcc
- Also unsound
- Many more false positives
- Even regular developers can specify new
analyses - Cqual
- Interprocedural, sound analysis
- Requires annotations
- Flow-, context-, and path-insensitive
- More false positives
27The Good
- Unsoundness gives very few false positives
- Unsoundness allows efficient path- and
context-sensitive analysis - Tested on real code
- Good presentations to steal slides from
28The Bad
- Cqual is much improved now
- Very few false positives on these types of
security vulnerabilities - Lets see some more interesting vulnerabilities
- Does not detect buffer overflows due to array
bounds violations - Not tested with large programs
29Singular Key Idea
- IPSSA coupled with a (slightly) unsound alias
analysis facilitates efficient detection of
hard-to-find security violations with very few
false positives