Forensics Analysis of ToolkitGenerated Malicious Programs - PowerPoint PPT Presentation

1 / 13
About This Presentation
Title:

Forensics Analysis of ToolkitGenerated Malicious Programs

Description:

... with both Next Generation Virus Construction Kit (NGVCK) and Virus Creation Lab (VCL32) ... We downloaded our kits from vx.netlux.org, which is an almost complete ... – PowerPoint PPT presentation

Number of Views:41
Avg rating:3.0/5.0
Slides: 14
Provided by: csu15
Category:

less

Transcript and Presenter's Notes

Title: Forensics Analysis of ToolkitGenerated Malicious Programs


1
Forensics Analysis of Toolkit-Generated Malicious
Programs
  • Department of Computer Science
  • Yasmine Kandissounon

2
The problem
  • In an attempt to create the undetectable virus,
    malware writers have imagined and used many
    strategies, the most current and efficient being
    metamorphism. Metamorphism is a strategy that
    helps a virus hide its malicious behavior and
    change its appearance at each generation.
  • Metamorphism led to a profusion of virus with
    which the Anti-Virus scanners cannot keep up
    with. The invention of virus generation kits made
    things worse as it allows people with few or even
    non-existent programming skills to generate a
    metamorphic virus in no time 1. Most malicious
    programs created by virus generation kits are
    able to avoid detection because the techniques
    used by Anti Virus scanners are just not
    efficient enough to outsmart them.

3
Related Works
  • Metamorphic malware has challenged scholars and
    inspired serious research.
  • IBM researchers applied neural networks to
    detect boot-sector viruses 2. They generated
    short byte strings (called trigrams) from a set
    of trained examples and used them as features for
    virus detection. According to IBM, this technique
    has helped detect about 75 of known boot sector
    viruses, but failed to recognize programs which
    malicious programs were obscured.
  • Chouchane and al. have suggested a detection
    using Instructions Frequency Vectors based on
    Markov Chains 3. They computed the matrices for
    the IFVs of the eve of a virus and its variant
    after a number of generations to prove or
    disprove that the variant was generated by a
    particular engine.

4
Our solution
  • Eugene Spaffords analysis of authorship of a
    software inspired our solution. Spafford used the
    same idea behind forensic linguistics which can
    accurately identify an English texts author
    4. Indeed, combining software metrics and other
    features like variable naming and code
    indentation, Spafford showed that a program could
    be attributed to a specific author. His technique
    is even easier to use in the case of virus
    generation kits, given that their signature is
    more consistent than humans. Our solution
    consists of using Markov Chains to attribute the
    authorship of a virus to an engine.
  • The extraction and study of the opcodes of a
    number of variants of popular generation kits
    showed an independency between an opcode and the
    one two steps up. Hence, the Markov Chains can be
    applied to viruses generated by kits to get the
    engines signature.

5
The culture
  • We decided to work with both Next Generation
    Virus Construction Kit (NGVCK) and Virus Creation
    Lab (VCL32). Mark Stamp from San Jose State
    University showed that the similarities among
    NGVCK variants are less than 2, which makes it a
    highly metamorphic engine and thus relevant to
    our study 6. Except from the fact that VCL32
    variants also presents an interesting low degree
    of similarity, VCL32 has inspired many other
    virus generation kits which strive to get the
    same metamorphic features.
  • Virtual Box of Sun Microsystems was used as our
    isolated platform.
  • We downloaded our kits from vx.netlux.org, which
    is an almost complete repository of all known
    virus engines, constructors and simulators. This
    website also provides some documentation for each
    virus in a library.

6
  • NGVCKs graphical interface
  • VCL32s graphical interface

7
The work
  • From the preceding GUIs, we created 50 variants
    of each kit and extracted the opcodes of each
    variant using a little homemade java program.
  • The next phase in the process of finding a
    common signature to the variants of each kit
    consists of computing a transition matrix using
    Markov Chains for each variant and calculating
    the average matrix which will constitute a
    signature for each the variants of NGVCK and
    VCL32.

8
  • A Markov Chain is a set of states linked by
    probabilities. Let Ss1, s2, s3,, sn be a set
    of states. If a process starts at s1, it will
    need a probability p12 (called transition
    probability) to move to state s2 and so on. More
    generally, the probability pij of a process with
    n states to move from si to sj is
  • n
  • pij ? pikpkj
  • k1
  • A transition matrix is a matrix which holds the
    probabilities of the different states in the
    Markov Chain. In our case, the states are the
    different opcodes

9
  • For each opcode, the probability will be taken
    proportionally to the opcodes that follow it.
    Thus, if an opcode Oi occurs n times in a variant
    and is followed x times by an opcode Oj, in our
    transition matrix the probability p for state
    (here opcode) Oi to be followed by state Oj is
    x/n.
  • As an example, lets compute the transition
    matrix of a simple program with opcodes common to
    those in our variants

10
  • call
  • push
  • add
  • add
  • sub
  • jmp
  • push
  • push
  • push
  • add
  • call
  • This yields the following transition matrix M
    (notice that the sum of the probabilities of each
    state has to be 1)

11
Expected Impact
  • Our solution presents the advantage of accuracy
    and space and time efficiency. Using Markov
    Chains help reduce the percentage of false
    positives. We expect to define a reasonable
    threshold which will help separate malicious
    programs from benign ones without getting high
    quantities of false negatives.
  • In addition, storing only one signature for a
    whole set of metamorphic variants with a common
    origin is more space-efficient than storing a
    signature for each of the variants as the Anti
    Virus companies seems to do.
  • Finally, our solution presents the advantage of
    being time-efficient, as the algorithm of the
    comparison our computed signature against a
    potential malicious program has a linear time
    complexity in the size of the matrix, which is
    accepted as time-efficient by scientists.

12
Limitations
  • Although our solution seems very appealing, it
    also has some downsides
  • One disadvantage is the very fact that the
    signature is the average matrix . The definition
    of a threshold to back up the average matrix may
    be really tricky as it will need to be accurate
    enough to avoid false negatives.
  • Also, because we have a very limited culture (50
    variants for each NGVCK and VCL32), we will test
    the signature on a very limited scale and will
    only assume it works on a larger scale.

13
References
  • 1 http//packetstormsecurity.org/mag/40hex/40HEX
    -10/40HEX-10.001J
  • 2http//www.research.ibm.com/antivirus/SciPapers
    /Tesauro/NeuralNets.html
  • 3 M.R. Chouchane, A. Walenstein, A. Lakhotia.
    Using Markov Chains to Filter Machine-morphed
    Variants of Malicious Programs.
  • 4 Ivan Krsul and Eugene H. Spafford, Authorship
    Analysis Identifying the Author of a Program.
  • 5Peter Szor, Advanced Code Evolution
    Techniques and Computer Virus Generation Kits.
  • 6Wing wong and Mark Stamp, Hunting for
    Metamorphic Engine.
  • 7 www.vx.netlux.org
Write a Comment
User Comments (0)
About PowerShow.com