Analysis is necessary but far from sufficient - PowerPoint PPT Presentation

About This Presentation
Title:

Analysis is necessary but far from sufficient

Description:

Large code bases in nasty languages (e.g., C/C ) 1M LOC is medium-sized; 10M LOC is large. Or, smaller code ... People use all those nasty language features ... – PowerPoint PPT presentation

Number of Views:54
Avg rating:3.0/5.0
Slides: 45
Provided by: jpin8
Category:

less

Transcript and Presenter's Notes

Title: Analysis is necessary but far from sufficient


1
Analysis is necessary but far from sufficient
  • Jon Pincus
  • Reliability Group (PPRC)
  • Microsoft Research

2
  • Why are so few successful real-world development
    and testing tools influenced by program analysis
    research?

3
Outline
  • Provocation
  • Successful tools
  • Analysis in context
  • Implications for analysis
  • Conclusion

4
Success a simple view
  • A tool is successful if people use it
  • Not if people think its interesting but dont
    try it
  • Not if people try it but dont use it
  • Not if people buy it but dont use it
    (Shelfware)

5
Some examples of success
  • Purify
  • BoundsChecker
  • PREfix (2.X and later)
  • Especially interesting because 1.0 was
    unsuccessful

6
Why do people use a tool? If
  • it helps them get their work done
  • more efficiently than they would otherwise
  • without making them look (or feel) bad.
  • Aside look at organizational and personal goals.
  • See Alan Coopers books, e.g. About Face

7
Value vs. Cost
  • Value the quantified benefit from the tool
  • Cost primarily time investment
  • Licensing cost is typically much smaller
  • (Value Cost) must be
  • Positive
  • Positive fairly quickly
  • More positive than any alternatives
  • Value and cost are difficult to estimate
  • and others estimates are often
    questionable

8
An example
  • Purify 1.0
  • Virtually zero initial cost on most code bases
  • trial license
  • easy to integrate
  • Immediate value
  • Companies then invested to increase the value
  • E.g., changing memory allocators to better match
    Purifys
  • (and buying lots of licenses)

9
Characteristics of successful tools
  • Successful tools almost always
  • address significant problems,
  • on real code bases,
  • give something for (almost) nothing,
  • and are easy to use.

10
Significant problems
  • Nobody fixes all the bugs.
  • What are the key ones?
  • Often based on most recent scars
  • Often based on development or business goals
  • Examples
  • Purify memory leaks
  • BoundsChecker bounds violations
  • Lint (back in KR days) portability issues

11
Real code bases
  • Large code bases in nasty languages (e.g., C/C)
  • 1M LOC is medium-sized 10M LOC is large
  • Or, smaller code bases in different nasty
    languages
  • Perl, JScript, VBScript, HTML/DHTML, TCL/Tk, SQL
  • 5000 LOC is medium 50K is large

12
More reality
  • Most code bases involve multiple languages
  • Extensions and incompatibilities, e.g.
  • GCC/G, MS C, Sun C
  • ECMAScript/JScript/JavaScript
  • HTML versions
  • People use all those nasty language features
  • (e.g., casts between pointers and ints, unions,
    bit fields, gotos, )

13
Something for (almost) nothing
  • Engineering time is precious
  • Engineers are skeptical
  • so are unwilling to commit their valuable time
  • Dont even think about requiring significant
    up-front investment
  • code modifications
  • process changes

14
Examples something for (almost) nothing
  • Purify for UNIX just relink!
  • BoundsChecker you dont even need to relink!!
  • PREfix 2.X point your web browser to a URL!!!
  • A non-technology solution well do it for you
  • Commercial variant an initial benchmark for X
  • Preferably money back if it isnt useful
  • In many cases, money is cheaper than engineering
    time

15
Revolutionary tools
  • People may be willing to do up-front work to
  • Enable something previously impossible
  • Or provide order-of-magnitude improvements
  • BUT!
  • Still must be significant problem, real code base
  • Need compelling evidence of chance for success
  • Any examples?

16
Outline
  • What makes a tool successful?
  • Successful tools
  • Analysis in context
  • Implications for analysis
  • Conclusion

17
PREfix
  • Analyzes C/C source code
  • Identifies defects
  • GUI to aid understanding and prioritization
  • Viewing individual defects
  • Sorting/filtering sets of defects
  • Integrates smoothly into existing builds
  • Stores results in database

18
PREfix 2.X Architecture
Web Browser
C/C Parser
Source Code
Defect Database
Model Database
Jon Pincus (Microsoft Research)
19
Counterintuitively
  • Actual analysis is only a small part of any
    program analysis tool.
  • In PREfix,

20
3 key non-analysis issues
  • Parsing
  • Integration
  • Build process
  • Defect tracking system
  • SCM system
  • User interaction
  • Information presentation
  • Navigation
  • Control

21
Parsing
  • You cant parse better than anybody else
  • but you can parse worse
  • Complexities
  • Incompatibilities and extensions
  • Full language complexity
  • Language evolution
  • Solution dont
  • Alternatives GCC, EDG,

22
Integration
  • A tool is useless if people cant use it
  • Implied use it in their existing environment
  • Environment includes
  • Configuration management (SCM)
  • A build process (makefiles, scripts, )
  • Policies
  • A defect tracking system
  • People have invested hugely in their environment
  • They probably wont change it just for one tool

23
User interaction
  • Engineers must be able to
  • Use the analysis results
  • Understanding individual defects
  • Prioritizing, sorting, and filtering sets of
    defects
  • Interact with other engineers
  • Influence the analysis
  • Current tools are at best okay here
  • Improvement is highly leveraged

24
Example Noise
  • Noise messages people dont care about
  • Noise can result from
  • Incorrect tool requirements
  • Integration issues
  • Usability issues (e.g., unclear messages)
  • Analysis inaccuracies

25
Dealing with noise
  • Improving analysis is usually not sufficient
  • May be vital may not be required
  • Successful user interaction techniques
  • Filtering
  • History
  • Prioritization
  • Improving presentation, navigation
  • Providing more detail

26
Outline
  • What makes a tool successful?
  • Characteristics of successful tools
  • Analysis in context
  • Implications for analysis
  • Conclusion

27
Characteristics of useful analyses
  • Scalable to large enough system
  • Typically implies incomplete, unsound,
    decomposable, and/or very simple
  • Accurate enough for the task at hand
  • Produce information usable by typical engineer
  • E.g., if theres a defect, where? How? Why?
  • Remember half the engineers are below average
  • Handle full language complexity
  • (or degrades gracefully for unhandled constructs)
  • Handle partial programs

28
Analyses are not useful if
  • They dont apply to the tools reality
  • For a subset of C, excluding pointers and
    structs
  • We have tested on our approach on programs up to
    several thousand lines of Scheme
  • They assume up-front work for the end user
  • Once the programmer modifies the code to include
    calls to the appropriate functions
  • The programmer simply inserts the annotations to
    be checked as conventional comments

29
Different tradeoffs from compilers
  • Focus on information, not just results
  • Compilers dont have to explain what they did and
    why
  • Unsoundness is death for optimization but may
    be okay for other purposes
  • Intra-procedural analysis often not enough

30
Types of analyses
  • FCIA Flow- and context-insensitive
  • FSA Flow-sensitive
  • CSA Context-sensitive
  • FCSA Flow and context sensitive
  • PSA Path-sensitive

31
Performance vs. Accuracy
32
Dont forget information!
33
Example analysis tradeoffs
  • PREfix scalable, usable analysis results
  • Path-sensitive
  • Incomplete (limit of paths traversed)
  • Unsound (many approximations)
  • Major emphasis on summarization (models)
  • PREfast fast, usable analysis results
  • Local analyses, using PREfix models
  • Flow-insensitive and flow-sensitive analyses
  • Far less complete than PREfix

34
Aside Techniques for scalability
  • Decompose the problem
  • Use the existing structure (function, class,
    etc.)
  • Summarization, memoization
  • Caveat make sure you dont lose key info!
  • Give up completeness and soundness
  • Use three-valued logic with dont know state
  • Track approximations to limit the damage
  • Examine and re-examine tradeoffs!!!!
  • Optimize for significant special cases

35
Outline
  • What makes a tool successful?
  • Characteristics of successful tools
  • Analysis in context
  • Implications for analysis
  • Conclusion

36
Recap successful tools
  • People use tools to accomplish their tasks
  • Successful tools must
  • address real problems,
  • on real code bases,
  • give something for (almost) nothing,
  • and be easy to use
  • Analysis is only one piece of a tool
  • Information is useless if its not presented well

37
One persons opinion
  • Why are so few successful real-world development
    and testing tools influenced by program analysis
    research?
  • Several key areas are outside the traditional
    scope of program analysis research
  • User interaction
  • Visualization (of programs and analysis results)
  • Integration

38
One persons opinion (cont.)
  • Why are there so few successful real-world
    programming and testing tools based on academic
    research?
  • Program analysis research in general
  • Not directly focused on key problems
  • Not applicable to real world code bases
  • Makes unrealistic assumptions about up-front work

39
One tool developers mindset
  • We have plenty of ideas already.
  • We cant even implement all our pet projects!
  • We are interested in new ideas but skeptical
  • The burden is on you to show relevance
  • Remember, analysis is only part of our problem
  • If we cant figure out how to present it forget
    it

40
Making analysis influential
  • Show how the analysis addresses a significant
    problem
  • Synchronization, security,
  • Convince us that it will work in our reality
  • Avoid the obvious problems discussed above
  • Demonstrate in our reality
  • (perhaps by using real-world code bases)
  • or persuade us that it will work

41
Some interesting questions
  • Which analyses are right for which problems?
  • How to get difficult analyses to scale well?
  • Are there soundness/completeness tradeoffs?
  • Are there opportunities to combine analyses?
  • Can we use a cheap flow-insensitive algorithm to
    focus a more expensive algorithm on juicy places?
  • Can we use expensive local path-sensitive
    algorithms to improve global flow-insensitive
    algorithms?

42
Beyond analysis
  • Can visualization and user interaction for
    analysis tools become an interesting research
    area?
  • How can analysis be used to refine visualization
    and user interaction?

43
Questions?
44
Analysis is necessary but far from sufficient
  • Jon Pincus
  • Reliability Group (PPRC)
  • Microsoft Research
Write a Comment
User Comments (0)
About PowerShow.com