Title: Evaluating Static Analysis Tools
1Evaluating Static Analysis Tools
- Dr. Paul E. Black
- paul.black_at_nist.gov
- http//samate.nist.gov/
2Static and Dynamic Analysis Complement Each Other
- Static Analysis
- Examine code
- Handles unfinished code
- Can find backdoors, eg, full access for user name
JoshuaCaleb - Potentially complete
- Dynamic Analysis
- Run code
- Code not needed, eg, embedded systems
- Has few(er) assumptions
- Covers end-to-end or system tests
3Different Static Analyzers Are Used For Different
Purposes
- To check intellectual property violation
- By developers to decide if anything needs to be
fixed (and learn better practices) - By auditors or reviewer to decide if it is good
enough for use
4Dimensions of Static Analysis
Application (explicit)
- Analysis can look for general or
application-specific properties - Analysis can be on source code, byte code, or
binary - The level of rigor can vary from syntactic to
fully formal.
Properties
Source
Code
Byte code
General (implicit)
Binary
Level of Rigor
5SATE 2008 Overview
- Static Analysis Tool Exposition (SATE) goals
- Enable empirical research based on large test
sets - Encourage improvement of tools
- Speed adoption of tools by objectively
demonstrating their use on real software - NOT to choose the best tool
- Co-funded by NIST and DHS, Natl Cyber Security
Division - Participants
- Aspect Security ASC ? HP DevInspect
- Checkmarx CxSuite ? SofCheck Inspector for Java
- Flawfinder ? UMD FindBugs
- Fortify SCA ? Veracode SecurityReview
- Grammatech CodeSonar
6SATE 2008 Events
- Telecons, etc. to come up with procedures and
goals - We chose 6 C Java programs with security
implications and gave them to tool makers (15
Feb) - Tool makers ran tools and returned reports (29
Feb) - We analyzed reports - (tried to) find ground
truth (15 Apr) - We expected a few thousand warnings - we got over
48,000. - Critique and update rounds with some tool makers
(13 May) - Everyone shared observations at a workshop (12
June) - We released our final report and all data 30 June
2009 - http//samate.nist.gov/index.php/SATE.html
7SATE 2008 Theres No Such Thing as One Weakness
- Only 1/8 to 1/3 of weaknesses are simple.
- The notion breaks down when
- weakness classes are related and
- data or control flows are intermingled.
- Even location is nebulous.
8How Weakness Classes Relate
- Hierarchy
- Chains
- lang 2e./2e./2e/etc/passwd00
- Composites
- from Chains and Composites,Steve Christey,
MITRE http//cwe.mitre.org/data/reports/chains_and
_composites.html
9Intermingled Flow2 sources, 2 sinks, 4
pathsHow many weakness sites?
free line 1503
free line 2644
use line 819
use line 808
10Other Observations
- Tools cant catch everything cleartext
transmission, unimplemented features, improper
access control, - Tools catch real problems XSS, buffer overflow,
cross-site request forgery - 13 of SANS Top 25
(21 with related CWEs) - Tools reported some 200 different kinds of
weaknesses - Buffer errors still very frequent in C
- Many XSS errors in Java
- Raw report rates vary by 3x depending on code
- Tools are even more helpful when tuned
- Coding without security in mind leaves MANY
weaknesses
11Current Source Code Security Analyzers Have
Little Overlap
Non-overlap Hits reported by one tool and no
others (84)
Overlap Hits reported by more than one tool (16)
2 tools
3 tools
4 tools
All 5 tools
from MITRE
12Precision Recall Scoring
The Perfect Tool Finds all flaws and finds only
flaws
Finds more flaws
Better
Finds mostly flaws
All True Positives
No True Positives
from DoD
13Tool A
Use after free
TOCTOU
Tainted data/Unvalidated user input
Memory leak
All flaw types
Uninitialized variable use
Null pointer dereference
Buffer overflow
Improper return value use
All True Positives
No True Positives
from DoD
14Tool B
Command injection
Tainted data/Unvalidated user input
Format string vulnerability
Improper return value use
Use after free
Buffer overflow
TOCTOU
All flaw types
Uninitialized variable use
Memory leak
Null pointer dereference
All True Positives
No True Positives
from DoD
15Best Tool
Format string vulnerability
Tainted data/Unvalidated user input
Command injection
Improper return value use
Buffer overflow
Null pointer dereference
Use after free
TOCTOU
Memory leak
Uninitialized variable use
All True Positives
No True Positives
from DoD
16Tools Useful in Quality Plains
- Tools alone are not enough to achieve the highest
peaks of quality. - In the plains of typical quality, tools can
help. - If code is adrift in a sea of chaos, train
developers.
Tararua mountains and the Horowhenua region, New
Zealand Swazi Apparel Limited www.swazi.co.nz
used with permission
17Tips on Tool Evaluation
- Start with many examples covering code
complexities and weaknesses - SAMATE Reference Dataset (SRD)
http//samate.nist.gov/SRD - Many cases from MIT Lippmann, Zitser, Leek,
Kratkiewicz - Add some of your typical code.
- Look for
- Weakness types (CWEs) reported
- Code complexities handled
- Traces, explanations, and other analyst support
- Integration and machine-readable reports
- Ability to write rules and ignore known good
code - False alarm ratio (fp/tp) is a poor measure.
Report density (r/kLoc) is probably better.