Data Modeling for Program Analysis - PowerPoint PPT Presentation

1 / 17
About This Presentation
Title:

Data Modeling for Program Analysis

Description:

proof. diagnosis. assistant. explanation. fix. wrong. behavior ... Direct impact on success of analysis. Example: Strings. Initial model: two function symbols ... – PowerPoint PPT presentation

Number of Views:13
Avg rating:3.0/5.0
Slides: 18
Provided by: ScottM134
Category:

less

Transcript and Presenter's Notes

Title: Data Modeling for Program Analysis


1
Data Modeling forProgram Analysis
  • Scott McPeakOSQ Retreat

2
A Program Verifier
  • Verification assures that a program meets some
    specification, e.g. "no segfaults"
  • Full correctness vs. partial specs
  • This is undecidable annotations

useful facts
new obligations
Program
Specification
Annotations
3
Verifier Architecture
Verification condition generation (semantics)
"proved"
program
predicates
Theorem prover
(collectively imply program meets spec)
annotations
"not proved"
specification (hardcoded)
4
Verification Benefits
  • Potential for reducing costs of testing and
    debugging is enormous
  • Memory safety
  • Concurrency safety
  • Adherence to domain-specific protocols
  • Annotation appeal capture "why" info
  • Could prove absence of certain security violations

5
Run Time is Too Late
  • Doesn't reduce testing cost
  • Run-time cost may be significant
  • Cumulative across different analyses
  • Recovery after run-time failure?
  • Delay between introduction of a bug and the
    discovery of its effect

6
Will Anyone Annotate?
  • Of course, if cost/benefit ratio is right
  • Benefits can be high (previous slide)
  • Abstraction is key to controlling cost
  • Can re-use "why" knowledge libraries, etc.
  • Common tasks must be easy (e.g. array of non-null
    elements)
  • Module-wide defaults under user control

7
Development Model
code
compile
verifier
testing
...
type error
failed proof
wrong behavior
fix
diagnosis assistant
debugging
...
explanation
fix
8
Data Modeling
  • Program analyzer must abstract application data
    (otherwise it's just executing!)
  • Model family of mathematical objects, and axioms
    which relate them
  • Enormous design space, little guidance
  • Direct impact on success of analysis

9
Example Strings
  • Initial model two function symbols
  • size(addr) of allocated bytes
  • strlen(addr) least index of a 0 byte
  • strcpy(d, s) pre size(d) lt strlen(s) post
    strlen(d) strlen(s)
  • strcat(d, s) pre size(d) - strlen(d) lt
    strlen(s) post strlen(d) pre(strlen(d)
    strlen(s))

10
String as a Set
  • Add the predicatecontains(addr, ch) ! T,F
  • strcpy(d, s)post 8 ch. contains(s, ch) ,
    contains(d, ch)
  • strchr(s, ch) ! rpost contains(s, ch) ) 9 i. r
    si contains(s, ch) ) r NULL

11
String as a Sequence
  • Add another symbol ""addri ! ch
  • strcpy(d, s)post 8 i. di si
  • strchr(s, ch) ! rpost (9i. sich) ) rch
    (9i. sich) ) rNULL

12
Example Integers
  • "int" is easy to model, right? Well...
  • Mathematical integers
  • Finite partition lt0, 0, 1, gt1
  • 32-bit 2's complement with wraparound

13
Example Memory
struct
array
mem
8
g
a
int indexes
toplevel obj addr
field offsets
3
x
"x" sel(mem0, addrx)
malloc(..)
"a.g3" sel(sel(sel(mem0, addra), g), 3)
"a"
"a.g"
14
Pointers
  • Pointers are access paths
  • "(a.g3)" sub(sub(sub(whole, a), g), 3)
  • Rules to read via pointers
  • Can also write, do pointer arithmetic, deeper
    indexing, e.g. "(p-gtx)"

15
Data Structure Invariants
  • Classic approach universal quantifier
  • 8 a. type(a)Foo ) a-gtx a-gty 1
  • Field admission predicate
  • Bar p admission p!NULL
  • Object state field "ok" vs. "not ok"
  • Change a field ! state"not ok"
  • Manually certify "ok", preconditioninvariant
  • 8 a. type(a)Foo ) a-gtstate"ok"

16
Example Change Sets
  • Globals list of changed / list of unchanged
  • Not ideal.. name sets of globals?
  • Hierarchical mem changed object is easy
  • new update(old, obj_addr, some_value)
  • But changed field (of many objects) is hard
  • Possible alternative staged weakened
    invariants state what is still true, rather than
    naming what has changed

17
Conclusions
  • Try to capture invariants implicitly, via
    representation choices
  • Be explicit about related entities
    inDegree(n)d vs. inDegree1(n, referrer)
  • Let user select among possible models, even to
    choose not to model certain fields
  • Try to think like a programmer
Write a Comment
User Comments (0)
About PowerShow.com