Automated Vulnerability Analysis: Leveraging Control Flow for Evolutionary Input Crafting PowerPoint PPT Presentation

presentation player overlay
1 / 27
About This Presentation
Transcript and Presenter's Notes

Title: Automated Vulnerability Analysis: Leveraging Control Flow for Evolutionary Input Crafting


1
Automated Vulnerability Analysis Leveraging
Control Flow for Evolutionary Input Crafting
  • Sherri Sparks, Shawn Embleton,
  • Ryan Cunningham, and Cliff Zou
  • School of Electrical Engineering and Computer
    ScienceUniversity of Central Florida
  • December, 2007
  • ACSAC

2
Vulnerability Analysis
  • Involves discovering a subset of a program input
    space with which a malicious user can exploit
    logic errors to drive it into an insecure state
  • Complexity of modern software makes complete
    program state space exploration an intractable
    problem

3
Motivation
  • Oftentimes, security researchers/hackers have
    analyzed and located a potential vulnerable
    location in a system (software/hardware)
  • C programs have well-known potentially vulnerable
    API functions (e.g., strcpy()).
  • A critical hardware component dealing with user
    inputs
  • Exploitability implies reachability
  • In order to determine if a potential
    vulnerability is exploitable one must prove that
  • It is reachable on the runtime execution path
  • It is dependent / influenceable by user supplied
    input
  • Testing Intelligent input generation to improve
    code coverage

4
An Input Crafting Problem
  • What does the input have to look like to exercise
    the code path between input node (recv) the
    potentially vulnerable node (strcpy) ?

recv
Parsing validation logic on path between recv
and strcpy
strcpy
Control Flow Graph (CFG)
  • Testing intelligently generate inputs that can
    reach a code region for intense testing

5
TFTP Control Flow Graph
6
Basic Idea of Our Approach
  • Some inputs are better than others
  • They increase coverage by reaching previously
    unexplored areas of the CFG
  • They are on a path to a basic block where some
    potentially vulnerable APIs are being used
  • Find new improved inputs by Genetic Algorithm
    (GA)
  • Mate the best of previous inputs weve found in
    the past to generate new generation of inputs
  • Propose Dynamic Markov Model for input
    measurement
  • Apply Grammatical Evolution to shrink input
    search space

7
Short Review ? Genetic Algorithms
  • A stochastic optimization algorithm that mimics
    evolution
  • Requires two things
  • A representation
  • What should a solution look like
  • Binary string, ASCII string, integer
  • A fitness function
  • Tells how good or bad each a solution is

8
Short Review ? Genetic Algorithms
  • It works like this
  • Start out with a population (set) of random
    solutions
  • Find each solutions fitness
  • Select solutions with high fitness values
  • Generate new solutions through mutations and
    crossover on selected solutions
  • GOTO 2 (the next generation)

9
Grammatical Evolution in
Generating Inputs
  • Efficiently reduce search space
  • Flexible in utilizing partial-known knowledge of
    inputs (user-specified context-free grammar)
  • Not used in any previous approaches

0 1 2
S
sAs xBx m
A
bBb B
B
aAa C AB
C
c d e
10011
S
xBx
xaAax
xabBbax
xabCbax
xabdbax
10
Fitness Function ? Dynamic Markov Model
  • Treat the control flow graph as a Markov Chain
  • The probability on each conditional transition
    edge is updated along the searching based on
    previously tested inputs
  • Edge transition probability is calculated by

Control Flow Graph (CFG)
11
Fitness of An Input
  • Fitness of an input inverse of the product of
    transition probabilities of all edges along the
    execution path
  • Larger fitness is better
  • Explore unobserved states
  • Explore rarely observed states
  • Increase coverage
  • Better than previous methods
  • Explore less observed state
  • Utilize information of all previously searched
    paths

Fitness 1/(.75 x .9 x .5 x .67 x
.8) 5.525
Execution Path A, C, E, D, G, M
12
Prototype ? An Intelligent Fuzz Testing Tool (1)
  • Fuzzers Black box analysis tools that inject
    random generated inputs into a program and then
    monitor it for crashes
  • Pros Simple, automated, test unthinkable inputs
  • Cons non-intelligent, hard to achieve good code
    coverage

13
Prototype ? An Intelligent Fuzz Testing Tool(2)
  • We seek to provide the following desirable
    qualities (many existing tools lack one or more)
  • Intelligence
  • The ability to learn something useful from the
    inputs that have been tried in the past and use
    that knowledge to guide the selection of future
    inputs.
  • Targeted Code Coverage
  • The ability to focus testing upon selective
    regions of interest in the code.
  • Targeted Execution Control
  • The ability to drive program execution through
    parse code to drill down to a specific node in
    the control flow graph (which is suspected to
    contain a vulnerability)
  • Source Code Independence
  • Ability to work on compiled binaries without
    source code availability
  • Extensibility and Configurability
  • The ability to fuzz multiple protocols with a
    single tool

14
Prototype ? An Intelligent Fuzz Testing Tool(3)
  • Implementation
  • Use PAIMEI framework to build a prototype fuzz
    testing tool
  • PAIMEI is a reverse engineering framework
  • Written in Python scripting language
  • Has been used by security community to build
    various fuzzing, code coverage, and data flow
    tracking tools
  • Use IDA Pro plugin SDK to construct control flow
    graph
  • Have successfully tested on TFTP binary program

15
System Overview
  • Extract program control flow graph (CFG)
  • Extract focusing subgraph (source, destination)
  • Set breakpoints and register breakpoint handlers
  • Initialize the set of random inputs
  • Inject inputs one by one
  • Record an inputs execution path via breakpoint
    handlers
  • Update dynamic Markov model parameters of CFG
  • Calculate fitness
  • Select a fraction of best inputs
  • Build the new set of inputs via mutation and
    crossover

16
Evaluation
  • Target Application
  • We used the tftpd.exe Windows server program for
    our initial experiments and validation of our
    approach
  • GA Parameters
  • Mutation Rate 90
  • Crossover Rate 75
  • Elitism
  • Selective Breeding
  • Dynamic Mutation
  • Context Free Grammar
  • Hex bytes 0-255
  • Strings netascii, octet, and mail

17
TFTP Control Flow Graph
18
Experiment 1 Targeted Execution Control
  • Tested the ability of GA fuzzer to drive
    execution through parse logic to 2 embedded,
    vulnerable strcpy() functions.
  • Compared against fuzzing with random input
  • 1st strcpy() reached in
  • GA 224 generations
  • Random 2294 generations
  • 2nd strcpy() reached in
  • GA 224 generations
  • Random 9106 generations

19
GA vs. Random Search
Fuzzing ran around 1 hour for 10,000 generations
(may still not reach target node), while our
approach ran around 10 minutes to reach target
node
20
Experiment 2Code Coverage Selectivity
  • Tested the ability of our GA to achieve code
    coverage of the tftp parser logic
  • Compared against random input selection
  • Better code coverage
  • Average over 3000 generations
  • GA 84.81 coverage
  • Random 49.54 coverage
  • Random approach running for an additional 7000
    generations only increased its coverage to 54.51
  • Achieves deeper code coverage quicker
  • Able to leverage what it has learned from past
    inputs!

21
Experiment 3CFG Penetration Depth
22
Experiment 4 Learning Input Formats
  • Programs assume that input will comply with
    published standards
  • As a result, protocol parsing bugs abound!!!
  • We test the ability of our prototype to explore
    the boundaries of the TFTP packet parsing logic
    by attempting to have it learn a valid packet
    format
  • We set the destination node as the basic block
    corresponding to an accepted packet

23
Evolving A TFTP Packet
24
Major Contributions
  • Practical implementation
  • Finished initial prototype
  • Analysis on binary code
  • Novelty in methodology
  • Dynamic Markov model as fitness
  • Grammatical evolution for input generation
  • Security focused
  • Previous related work focuses on software testing
  • Targeted code coverage
  • Efficiently test mission-critical or susceptible
    parts

25
Advantages of Our Approach
  • We apply knowledge gained from past experience to
    drive our choice for future inputs
  • Well suited to applying to parser code, which has
    a rich control flow structure for the GA to learn
    from
  • Maximizes code coverage within specific portions
    of a program graph
  • Minimal knowledge of input structure required
  • GA can learn to approximate input format during
    execution
  • Once a target location has been reached, the
    algorithm continues to exploit weakensses in the
    CFG to produce additional, different inputs
    capable of reaching it

26
Limitations
  • Difficulty to extract some parts of the CFG
    statically
  • Thread Creation
  • Call tables
  • Dependent upon Control Flow Graph structure
  • Program must have enough information embedded
    within its structure for the GA to be able to
    learn from
  • Assumes dependency between graph structure and
    user supplied input (an example would be parser
    code)
  • Not useful for programs that have a flat CFG
    structure
  • Finding all paths has high complexity O() and
    takes a long time on large program graphs
  • We can prove reachability by getting to a
    potentially vulnerable target state, but failure
    to get there does not mean the location is
    unreachable!

27
Conclusions
  • Shows how genetic algorithms can be applied to
    the external input crafting process to maximize
    exploration of program state space and
    intelligently drive a program into potential
    vulnerable states.
  • Automated approach ? treats the internal
    structure of each node in the CFG as a black box.
  • Needs testing on more complex programs
  • Our work is theoretical and prototypish
Write a Comment
User Comments (0)
About PowerShow.com