Title: Automated Vulnerability Analysis: Leveraging Control Flow for Evolutionary Input Crafting
1 Automated Vulnerability Analysis Leveraging
Control Flow for Evolutionary Input Crafting
- Sherri Sparks, Shawn Embleton,
- Ryan Cunningham, and Cliff Zou
- School of Electrical Engineering and Computer
ScienceUniversity of Central Florida - December, 2007
- ACSAC
2Vulnerability Analysis
- Involves discovering a subset of a program input
space with which a malicious user can exploit
logic errors to drive it into an insecure state - Complexity of modern software makes complete
program state space exploration an intractable
problem
3Motivation
- Oftentimes, security researchers/hackers have
analyzed and located a potential vulnerable
location in a system (software/hardware) - C programs have well-known potentially vulnerable
API functions (e.g., strcpy()). - A critical hardware component dealing with user
inputs - Exploitability implies reachability
- In order to determine if a potential
vulnerability is exploitable one must prove that
- It is reachable on the runtime execution path
- It is dependent / influenceable by user supplied
input - Testing Intelligent input generation to improve
code coverage
4An Input Crafting Problem
- What does the input have to look like to exercise
the code path between input node (recv) the
potentially vulnerable node (strcpy) ?
recv
Parsing validation logic on path between recv
and strcpy
strcpy
Control Flow Graph (CFG)
- Testing intelligently generate inputs that can
reach a code region for intense testing
5TFTP Control Flow Graph
6Basic Idea of Our Approach
- Some inputs are better than others
- They increase coverage by reaching previously
unexplored areas of the CFG - They are on a path to a basic block where some
potentially vulnerable APIs are being used - Find new improved inputs by Genetic Algorithm
(GA) - Mate the best of previous inputs weve found in
the past to generate new generation of inputs - Propose Dynamic Markov Model for input
measurement - Apply Grammatical Evolution to shrink input
search space
7Short Review ? Genetic Algorithms
- A stochastic optimization algorithm that mimics
evolution - Requires two things
- A representation
- What should a solution look like
- Binary string, ASCII string, integer
- A fitness function
- Tells how good or bad each a solution is
8Short Review ? Genetic Algorithms
- It works like this
- Start out with a population (set) of random
solutions - Find each solutions fitness
- Select solutions with high fitness values
- Generate new solutions through mutations and
crossover on selected solutions - GOTO 2 (the next generation)
9Grammatical Evolution in
Generating Inputs
- Efficiently reduce search space
- Flexible in utilizing partial-known knowledge of
inputs (user-specified context-free grammar) - Not used in any previous approaches
0 1 2
S
sAs xBx m
A
bBb B
B
aAa C AB
C
c d e
10011
S
xBx
xaAax
xabBbax
xabCbax
xabdbax
10Fitness Function ? Dynamic Markov Model
- Treat the control flow graph as a Markov Chain
- The probability on each conditional transition
edge is updated along the searching based on
previously tested inputs - Edge transition probability is calculated by
Control Flow Graph (CFG)
11Fitness of An Input
- Fitness of an input inverse of the product of
transition probabilities of all edges along the
execution path - Larger fitness is better
- Explore unobserved states
- Explore rarely observed states
- Increase coverage
- Better than previous methods
- Explore less observed state
- Utilize information of all previously searched
paths
Fitness 1/(.75 x .9 x .5 x .67 x
.8) 5.525
Execution Path A, C, E, D, G, M
12Prototype ? An Intelligent Fuzz Testing Tool (1)
- Fuzzers Black box analysis tools that inject
random generated inputs into a program and then
monitor it for crashes - Pros Simple, automated, test unthinkable inputs
- Cons non-intelligent, hard to achieve good code
coverage
13Prototype ? An Intelligent Fuzz Testing Tool(2)
- We seek to provide the following desirable
qualities (many existing tools lack one or more) - Intelligence
- The ability to learn something useful from the
inputs that have been tried in the past and use
that knowledge to guide the selection of future
inputs. - Targeted Code Coverage
- The ability to focus testing upon selective
regions of interest in the code. - Targeted Execution Control
- The ability to drive program execution through
parse code to drill down to a specific node in
the control flow graph (which is suspected to
contain a vulnerability) - Source Code Independence
- Ability to work on compiled binaries without
source code availability - Extensibility and Configurability
- The ability to fuzz multiple protocols with a
single tool
14Prototype ? An Intelligent Fuzz Testing Tool(3)
- Implementation
- Use PAIMEI framework to build a prototype fuzz
testing tool - PAIMEI is a reverse engineering framework
- Written in Python scripting language
- Has been used by security community to build
various fuzzing, code coverage, and data flow
tracking tools - Use IDA Pro plugin SDK to construct control flow
graph - Have successfully tested on TFTP binary program
15System Overview
- Extract program control flow graph (CFG)
- Extract focusing subgraph (source, destination)
- Set breakpoints and register breakpoint handlers
- Initialize the set of random inputs
- Inject inputs one by one
- Record an inputs execution path via breakpoint
handlers - Update dynamic Markov model parameters of CFG
- Calculate fitness
- Select a fraction of best inputs
- Build the new set of inputs via mutation and
crossover
16Evaluation
- Target Application
- We used the tftpd.exe Windows server program for
our initial experiments and validation of our
approach - GA Parameters
- Mutation Rate 90
- Crossover Rate 75
- Elitism
- Selective Breeding
- Dynamic Mutation
- Context Free Grammar
- Hex bytes 0-255
- Strings netascii, octet, and mail
17TFTP Control Flow Graph
18Experiment 1 Targeted Execution Control
- Tested the ability of GA fuzzer to drive
execution through parse logic to 2 embedded,
vulnerable strcpy() functions. - Compared against fuzzing with random input
- 1st strcpy() reached in
- GA 224 generations
- Random 2294 generations
- 2nd strcpy() reached in
- GA 224 generations
- Random 9106 generations
19GA vs. Random Search
Fuzzing ran around 1 hour for 10,000 generations
(may still not reach target node), while our
approach ran around 10 minutes to reach target
node
20Experiment 2Code Coverage Selectivity
- Tested the ability of our GA to achieve code
coverage of the tftp parser logic - Compared against random input selection
- Better code coverage
- Average over 3000 generations
- GA 84.81 coverage
- Random 49.54 coverage
- Random approach running for an additional 7000
generations only increased its coverage to 54.51 - Achieves deeper code coverage quicker
- Able to leverage what it has learned from past
inputs!
21Experiment 3CFG Penetration Depth
22Experiment 4 Learning Input Formats
- Programs assume that input will comply with
published standards - As a result, protocol parsing bugs abound!!!
- We test the ability of our prototype to explore
the boundaries of the TFTP packet parsing logic
by attempting to have it learn a valid packet
format - We set the destination node as the basic block
corresponding to an accepted packet
23Evolving A TFTP Packet
24Major Contributions
- Practical implementation
- Finished initial prototype
- Analysis on binary code
- Novelty in methodology
- Dynamic Markov model as fitness
- Grammatical evolution for input generation
- Security focused
- Previous related work focuses on software testing
- Targeted code coverage
- Efficiently test mission-critical or susceptible
parts
25Advantages of Our Approach
- We apply knowledge gained from past experience to
drive our choice for future inputs - Well suited to applying to parser code, which has
a rich control flow structure for the GA to learn
from - Maximizes code coverage within specific portions
of a program graph - Minimal knowledge of input structure required
- GA can learn to approximate input format during
execution - Once a target location has been reached, the
algorithm continues to exploit weakensses in the
CFG to produce additional, different inputs
capable of reaching it
26Limitations
- Difficulty to extract some parts of the CFG
statically - Thread Creation
- Call tables
- Dependent upon Control Flow Graph structure
- Program must have enough information embedded
within its structure for the GA to be able to
learn from - Assumes dependency between graph structure and
user supplied input (an example would be parser
code) - Not useful for programs that have a flat CFG
structure - Finding all paths has high complexity O() and
takes a long time on large program graphs - We can prove reachability by getting to a
potentially vulnerable target state, but failure
to get there does not mean the location is
unreachable!
27Conclusions
- Shows how genetic algorithms can be applied to
the external input crafting process to maximize
exploration of program state space and
intelligently drive a program into potential
vulnerable states. - Automated approach ? treats the internal
structure of each node in the CFG as a black box. - Needs testing on more complex programs
- Our work is theoretical and prototypish