Title: An Overview of the Saturn Project
1An Overview of the Saturn Project
2The Three-Way Trade-Off
- Precision
- Modeling programs accurately enough to be useful
- Scalability
- Saying anything at all about large programs
- Human Effort
- How much work must the user do?
- Either giving specifications, or interpreting
results
Todays focus
Not so much about this . . .
3Precision
Primary abstraction is done at function
boundaries.
formula
A(Ff), A(Fg), A(Fh)
A(Ff)
Ff
A(Fg)
A(Ff), A(Fg), A(Fh)
A(Ff), A(Fg), A(Fh)
A(Fh)
Intraprocedural analysis with minimal abstraction.
4Scalability
- Design constraint
- SAT formula size function size
- Analyze one function at a time
- Parallel implementation
- Server sends functions to clients to analyze
- Typically use 50-100 cores to analyze Linux
5Summaries
- Abstract at function boundaries
- Compute a summary for functions behavior
- Summaries should be small
- Ideally linear in the size of the functions
interface - Summaries are our primary form of abstraction
- Saturn delays abstraction to function boundaries
- Slogan Analysis design is summary design!
6Expressiveness
- Analyses written in Calypso
- Logic programs
- Express traversals of the program
- E.g., backwards/forwards propagation
- Constraints
- For when we dont know traversal order
- Written 40,000 lines of Calypso code
7Availability
- An open source project
- BSD license
- All Calypso code available for published
experiments - saturn.stanford.edu
8People
Isil Dillig
Suhabe Bugrara
Thomas Dillig
Peter Hawkins
9Outline
- Saturn overview
- An example analysis
- Intraprocedural
- Interprocedural
- What else can you do?
- Survey of results
10Saturn Architecture
C Program
11Parsing and C Frontend
Source Code
Build Interceptor
Preprocessed Source Code
CIL frontend
Abstract Syntax Tree Databases
12Calypso
- General purpose logic programming language
- Pure
- Prolog-like syntax
- Bottom-up evaluation
- Magic sets transformation
- Also a (minor) moon of Saturn
13Helpful Features
- Strong static type and mode checking
- Permanent data (sessions)
- stored as Berkeley DB databases
- Sessions are just a named bundle of predicates
- Support for unit-at-a-time analysis
14Extensible Interpreter
SAT Solver sat predicate,
Logic Program Interpreter
LP Solver
DOT graph package
UI package
15Scalability
- Interpreter is not very efficient
- OK, its slow
- But can run distributed analyses
- 50-100 CPUs
- Scalability is more important than raw speed
- Can run intensive analyses of the entire Linux
kernel (gt6MLOC) in a few hours.
16Cluster Architecture
Calypso DB
Worker Node 1
Master Node
Databases
Calypso DB
Worker Node 100
17Job Scheduling
Job a function body
Dynamically track dependencies between jobs
- Rerun jobs if new dependencies found
- Optimistic concurrency control
Iterate to fixpoint for circular dependencies
18Calypso Analyses
C Syntax Predicates
Constraint Solvers
19The Paradigmatic Locking Analysis
- Check that a thread does not
- acquire the same lock twice
- release the lock twice
- Otherwise the application may deadlock or crash.
20Specification
unlock
unlock
error
locked
unlocked
lock
lock
21Basic Setup
- We assume
- one locking function lock(l)
- one unlocking function unlock(l).
- We analyze one function at a time
- produce locking summary describing the FSM
transitions associated with a given lock.
22An Example Function Summary
f( . . ., lock L, . . .) lock(L)
. . . unlock(L)
L unlocked -gt unlocked locked -gt error
- Summaries are input state -gt output state
- The net effect of the function on the lock
- Summary size is independent of function size
- Bounded by the square of the number of states
23Lock States
- type lockstate locked unlocked error.
- Predicates to describe lock states on nodes and
edges of the CFG - predicate node_state(Ppp,Lt_trace,Slockstate,G
g_guard). -
- predicate edge_state(Ppp,Lt_trace,Slockstate,G
g_guard). -
24The Intraprocedural Analysis
- 1. Initialize lock states at function entry
- 2. Join operator
- Combine edges to produce successors node_state
- 3. Transfer functions for every primitive
- assignments
- tests
- function calls
25Initializing a Lock
- Use fresh boolean variable ?
- Interpretation
- ? is true ) L is locked
- ? is true ) L is unlocked
- Enforces that L cannot be both locked and
unlocked simultaneously
26Notation
(lock, state, guard)
P
At program point P, the lock is in state if
guard is true.
27Initialization Rules
- node_state(P0,L,locked,LG)-
- entry(P0),
- is_lock(L),
- fresh_variable(L, LG).
- node_state(P0,L,unlocked,UG)-
- entry(P0),
- node_state(P0,L,locked,LG),
- not(LG, UG).
f( . . ., lock L, . . .) . . .
(L, locked, LG)
P0
(L, unlocked, UG)
28The Intraprocedural Analysis
- 1. Initialize lock states at function entry
- 2. Join operator
- Combine edges to produce successors node_state
- 3. Transfer functions for every primitive
- assignments
- tests
- function calls
29Joins
(L, locked, F2)
(L, locked, F1)
(L, locked, F1ÇF2)
node_state(P,L,S,G) - edge_state(P,L,S,_),
\/edge_state(P,L,S,EG)or_all(EG,G).
Note There is no abstraction in the join . . .
30The Intraprocedural Analysis
- 1. Initialize lock states at function entry
- 2. Join operator
- Combine edges to produce successors node_state
- 3. Transfer functions for every primitive
- assignments
- function calls
- etc.
31Assignments
- Assignments do not affect lock state
- edge_state(P1,L,S,G) -
- assign(P0,P1,_),
- node_state(P0,L,S,G).
-
P0
(L, S, G)
X E
(L,S, G)
P1
32Interprocedural Analysis Basics
- Function summaries are the building blocks of
interprocedural analysis. - Generating a function summary requires
- Predicates encoding relevant facts
- A session to store these predicates.
33Interprocedural Analysis Outline
- 1. Generating function summaries
- 2. Using function summaries
- How do we retrieve the summary of a callee?
- How do we map facts associated with a callee to
the namespace of the currently analyzed function?
34Summary Declaration
- session sum_locking(FNstring) containinglock_tra
ns. - predicate lock_trans(L t_trace, S0 lockstate,
S1 lockstate).
Declares a persistent database sum_locking
(function name) holding lock_trans facts
sum_locking
35Summary Generation Primitives
- Summaries for lock and unlock
- sum_locking("lock")-gtlock_trans(arg0,locked,error
) - . - sum_locking("lock")-gtlock_trans(arg0,unlocked,loc
ked) - . - sum_locking("unlock")-gtlock_trans(arg0,unlocked,e
rror) - . - sum_locking("unlock")-gtlock_trans(arg0,locked,unl
ocked) -.
36Summary Generation Other Functions
- sum_locking(F)-gtlock_trans(L, S0, S1) -
- current_function(F),
- entry(P0),
- node_state(P0, L, S0 , G0),
- exit(P1),
- node_state(P1, L, S1, G1),
- and(G0, G1, G),
- guard_satisfiable(G).
F( . . ., lock L, . . .) . .
.
P0
(L, S0, G0)
P1
(L, S1, G1)
if SAT(G1 Æ G2), then . . .
h
F S0 ! S1
37Summary Application Rule
- call_transfer(I, L, S0, S1, G) -
- direct_call(I, F),
- call(P0, _, I),
- sum_locking(F)-gtlock_trans(CL, S0, S1),
- instantiate(s_callI, P0, CL, L, G).
G( . . .) F(. . .)
P0
(S0, L, G)
F S0 ! S1
(S1, L, G)
38Applications
- Bug finding
- Verification
- Software Understanding
39Saturn Bug Finding
- Early work
- Locking
- Scalable Error Detection using Boolean
Satisfiability. POPL 2005 - Memory leaks
- Context- and Path-Sensitive Memory Leak
Detection. FSE 2005 - Scripting languages
- Static Detection of Security Vulnerabilities in
Scripting Languages. 15th USENIX Security
Symposium, 2006 - Recent work
- Inconsistency Checking
- Static Error Detection Using Semantic
Inconsistency Inference. PLDI 2007
40Examples Null pointer dereferences
Application KLOC Warnings Bugs False Alarms FA Rate
Openssl-0.9.8b 339 55 47 6 11.30
Samba-3.0.23b 516 68 46 19 29.20
Openssh-4.3p2 155 9 8 1 11.10
Pine-4.64 372 150 119 28 19.00
Mplayer-1.0pre8 762 119 89 28 23.90
Sendmail-8.13.8 365 9 8 1 11.10
Linux-2.6.17.1 6200 373 299 66 18.10
Total 8793 783 616 149 19.50
41Lessons Learned
- Saturn-based tools improve bug-finding
- Multiple times more bugs than previous results
- Lower false positive rate
- Why?
- Sounder than previous bug finding tools
- bit-level modeling, handling casts, aliasing,
etc. - Precise
- Fully intraprocedurally path-sensitive
- Partially interprocedurally path-sensitive
42Lessons Learned (Cont.)
- Design of function summary is key to scalability
and precision - Summary-based analysis only looks at the relevant
parts of the heap for a given function - Programmers write functions with simple
interfaces
43Saturn Verification
- Unchecked user pointer dereferences
- Important OS security property
- Also called probing or user/kernel pointers
- Precision requirements
- Context-sensitive
- Flow-sensitive
- Field-sensitive
- Intraprocedurally path-sensitive
44Current Results for Linux-2.6.1
- 6.2 MLOC with 91,543 functions
- Verified 616 / 627 system call arguments
- 98.2
- 11 false alarms
- Verified 851,686 / 852,092 dereferences
- 99.95
- 406 false alarms
45Preliminary Lessons Learned
- Bug finders can be sloppy ignore functions or
points-edges that inhibit scalability or
precision - Soundness substantially more difficult than
finding bugs - Lightweight, sparsely placed annotations
- Have programmers add some information
- Makes verification tractable
- Only 22 annotations need for user pointer analysis
46Saturn for Software Understanding
- A program analysis is a code search engine
- Generic question Do programmers ever do X?
- Write an analysis to find out
- Run it on lots of code
- Classify the results
- Write a paper . . .
47Examples
- Aliasing is used in very stylized ways, at least
in C - Cursors into data structures
- Parent/child pointers
- And 7 other idioms
- How is Aliasing Used in Systems Software? FSE
2006 - Do programmers take the address of function ptrs?
- Answer Almost never.
- Allows simpler analysis of function pointers
48Other Things Weve Thought About
- Shape analysis
- We notice the lack of shape information
- Interprocedural path-sensitivity
- Needed for some common programming patterns
- Proving correctness of Saturn analyses
49Related Work
- Lots
- All bug finding and verification tools of the
last 10 years - Particularly, though
- Systems using logic programming (bddbddb)
- ESP
- Metal
- CQual
- Blast
50saturn.stanford.edu