HUNTING FOR METAMORPHIC ENGINES - PowerPoint PPT Presentation

About This Presentation
Title:

HUNTING FOR METAMORPHIC ENGINES

Description:

avast! antivirus version 4.7. AVG Anti-Virus version 7.1. Each ... eTrust and avast! detected 17 (G2 and MPCGEN) AVG detected 27 viruses (G2, MPCGEN and VCL32) ... – PowerPoint PPT presentation

Number of Views:267
Avg rating:3.0/5.0
Slides: 65
Provided by: Fio93
Learn more at: http://www.cs.sjsu.edu
Category:

less

Transcript and Presenter's Notes

Title: HUNTING FOR METAMORPHIC ENGINES


1
HUNTING FOR METAMORPHIC ENGINES
  • Mark Stamp
  • Wing Wong
  • September 13, 2006

2
Outline
  • Metamorphic software
  • Both good and evil uses
  • Metamorphic virus construction kits
  • How effective are metamorphic engines?
  • How to compare two pieces of code?
  • Similarity within and between virus families
  • Similarity to non-viral code
  • Can we detect metamorphic viruses?
  • Commercial virus scanners
  • Hidden Markov models (HMMs)
  • Similarity index
  • Conclusion

3
PART I
  • Metamorphic Software

4
What is Metamorphic Software?
  • Software is metamorphic provided
  • All copies do the same thing
  • Internal structure of copies differs
  • Today almost all software is cloned
  • Good metamorphic software
  • Mitigate buffer overflow attacks
  • Bad metamorphic software
  • Avoid virus/worm signature detection

5
Metamorphic Software for Good?
  • Suppose program has a buffer overflow
  • If we clone the program
  • One attack breaks every copy
  • Break once, break everywhere (BOBE)
  • If instead, we have metamorphic copies
  • Each copy still has a buffer overflow
  • One attack does not work against every copy
  • BOBE-resistant
  • Analogous to genetic diversity in biology
  • A little metamorphism does a lot of good!

6
Metamorphic Software for Evil?
  • Cloned virus/worm can be detected
  • Common signature on every copy
  • Detect once, detect everywhere (DODE?)
  • If instead virus/worm is metamorphic
  • Each copy has different signature
  • Same detection does not work against every copy
  • Provides DODE-resistance
  • Analogous to genetic diversity in biology
  • But, effective use of metamorphism here is tricky!

7
Crypto Analogy
  • In information security, almost everything that
    consistently works is either
  • Crypto, or
  • Has a crypto analogy
  • Consider WWII ciphers
  • German Enigma
  • Broken by Polish and British cryptanalysts
  • Design was (mostly) known to cryptanalysts
  • Japanese Purple
  • Broken by American cryptanalysts
  • Design was (mostly) unknown to cryptanalysts

8
Crypto Analogy
  • Cryptanalysis ? break a (known) cipher
  • Diagnosis ? determine how an unknown cipher works
    (from ciphertext)
  • Which was the greater achievement, breaking
    Enigma or Purple?
  • Cryptanalysis of Enigma was harder
  • Diagnosis of Purple was harder
  • Can make a reasonable case for either

9
Crypto Analogy
  • What does this have to do with metamorphic
    software?
  • Suppose we (the good guys) generate metamorphic
    copies of our software
  • Bad guys can attack individual copies
  • Can bad guys attack all copies?
  • Bad guys can try to diagnose our metamorphic
    generator

10
Crypto Analogy
  • How to diagnose metamorphic generator (from
    exes)?
  • Reverse engineer many copies, look at
    differences, etc., etc.
  • Lots of work
  • Diagnosis problem is hard
  • If good guys can force bad guys to solve a
    diagnosis problem, the good guys win
  • Security by obscurity? Violates (spirit of)
    Kerckhoffs Principle?
  • Yes, but still may be valuable in the real world

11
Crypto Analogy
  • What about case where bad guys write metamorphic
    code?
  • Metamorphic viruses, for example
  • Do good guys need to solve diagnosis problem?
  • If so, good guys are in trouble
  • Not if good guys only need to detect the
    metamorphic code (not diagnose)
  • Not claiming the good guys job is easy
  • Just claiming that there is hope

12
Virus Evolution
  • Viruses first appeared in the 1980s
  • Fred Cohen
  • Viruses must avoid signature detection
  • Virus can alter its appearance
  • Techniques employed
  • encryption
  • polymorphic
  • metamorphic

13
Virus Evolution - Encryption
  • Virus consists of
  • decrypting module (decryptor)
  • encrypted virus body
  • Different encryption key
  • different virus body signature
  • Weakness
  • decryptor can be detected

14
Virus Evolution Polymorphism
  • Try to hide signature of decryptor
  • Can use code emulator to decrypt putative virus
    dynamically
  • Decrypted virus body is constant
  • Signature detection is possible

15
Virus Evolution Metamorphism
  • Change virus body
  • Mutation techniques
  • permutation of subroutines
  • insertion of garbage/jump instructions
  • substitution of instructions

16
PART II
  • Virus Construction Kits

17
Virus Construction Kits PS-MPC
  • According to Peter Szor
  • PS-MPC Phalcon/Skism Mass-Produced Code
    generator uses a generator that effectively
    works as a code-morphing engine the viruses
    that PS-MPC generates are not only polymorphic,
    but their decryption routines and structures
    change in variants

18
Virus Construction Kits G2
  • From the documentation of G2 (Second Generation
    virus generator)
  • different viruses may be generated from
    identical configuration files

19
Virus Construction Kits - NGVCK
  • From the documentation for NGVCK (Next Generation
    Virus Creation Kit)
  • all created viruses are completely different
    in structure and opcode impossible to catch
    all variants with one or more scanstrings.
    nearly 100 variability of the entire code
  • Oh, really?

20
PART III
  • How Effective Are Metamorphic Engines?

21
How We Compare Two Pieces of Code
22
Virus Families Test Data
  • Four generators, 45 viruses
  • 20 viruses by NGVCK
  • 10 viruses by G2
  • 10 viruses by VCL32
  • 5 viruses by MPCGEN
  • 20 normal utility programs from the Cygwin bin
    directory

23
Similarity within Virus Families Results
24
Similarity within Virus Families Results
25
Similarity within Virus Families Results
26
Similarity within Virus Families Results
27
Similarity within Virus Families Results
28
NGVCK Similarity to Virus Families
  • NGVCK versus other viruses
  • 0 similar to G2 and MPCGEN viruses
  • 0 5.5 similar to VCL32 viruses (43 out of 100
    comparisons have score gt 0)
  • 0 1.2 similar to normal files (only 8 out of
    400 comparisons have score gt 0)

29
NGVCK Metamorphism/Similarity
  • NGVCK
  • By far the highest degree of metamorphism of any
    kit tested
  • Virtually no similarity to other viruses or
    normal programs
  • Undetectable???

30
PART IV
  • Can Metamorphic Viruses Be Detected?

31
Commercial Virus Scanners
  • Tested three virus scanners
  • eTrust version 7.0.405
  • avast! antivirus version 4.7
  • AVG Anti-Virus version 7.1
  • Each scanned 37 files
  • 10 NGVCK viruses
  • 10 G2 viruses
  • 10 VCL32 viruses
  • 7 MPCGEN viruses

32
Commercial Virus Scanners
  • Results
  • eTrust and avast! detected 17 (G2 and MPCGEN)
  • AVG detected 27 viruses (G2, MPCGEN and VCL32)
  • none of NGVCK viruses detected by the scanners
    tested

33
Hidden Markov Models (HMMs)
  • state machines
  • transitions between states have fixed
    probabilities
  • each state has a probability distribution for
    observing a set of observation symbols
  • can train an HMM to represent a set of data (in
    the form of observation sequences)
  • states features of the input data
  • transition and the observation probabilities
    statistical properties of features

34
HMM Example the Occasionally Dishonest Casino
35
HMM Example the Occasionally Dishonest Casino
  • 2 states fair/loaded
  • The switch between dice is a Markov process
  • Outcomes of a roll have different probabilities
    in each state
  • If we can only see a sequence of rolls, the state
    sequence is hidden
  • want to understand the underlying Markov process
    from the observations

36
HMMs the Three Problems
  • Find the likelihood of seeing an observation
    sequence O given a model ?, i.e. P(O ?)
  • Find an optimal state sequence that could have
    generated a sequence O
  • Find the model parameters given a sequence O,
    i.e. find transition and observation
    probabilities that maximize the probability of
    observing O
  • There exist efficient algorithms to solve the
    three problems

37
HMM Application Determining the Properties of
English Text
  • Given a large quantity of written English text
  • Input a long sequence of observations consisting
    of 27 symbols (the 26 lower-case letters and the
    word space)
  • Train a model to find the most probable
    parameters (i.e., solve Problem 3)
  • Use trained model to score any unknown sequence
    of letters (and spaces) to determine whether it
    corresponds to English text. (i.e., solve Problem
    1)

38
HMM Application Initial and Final Observation
Probability Distributions
39
HMM Application - Results
  • Observation probabilities converged, each letter
    belongs to one of the two hidden states
  • The two states correspond to consonants and
    vowels
  • Note
  • no a priori assumption was made
  • HMM effectively recovered the statistically
    significant feature inherent in English

40
HMM Application - Results
  • Probabilities can be sensibly interpreted for up
    to n 12 hidden states
  • Trained model could be used to detect English
    text, even if the text is disguised by, say, a
    simple substitution cipher or similar
    transformation

41
Virus Detection with HMMs
  • Use hidden Markov models (HMMs) to represent
    statistical properties of a set of metamorphic
    virus variants
  • Train the model on family of metamorphic viruses
  • Use trained model to determine whether a given
    program is similar to the viruses the HMM
    represents

42
Virus Detection with HMMs
  • A trained HMM
  • maximizes the probabilities of observing the
    training sequence
  • assigns high probabilities to sequences similar
    to the training sequence
  • represents the average behavior if trained on
    multiple sequences
  • represents an entire virus family, as opposed to
    individual viruses

43
Virus Detection with HMMs Data
  • Data set
  • 200 NGVCK viruses (160 for training, 40 for
    testing)
  • Comparison set
  • 40 normal exes from Cygwin
  • 25 other non-family viruses (G2, MPCGEN and
    VCL32)
  • 25 HMM models generated and tested

44
Virus Detection with HMMs Methodology
45
Virus Detection with HMMs Results
46
Virus Detection with HMMs Results
  • Detect some other viruses for free

47
Virus Detection with HMMs
  • Summary of experimental results
  • All normal programs distinguished
  • VCL32 viruses had scores close to NGVCK family
    viruses
  • With proper threshold, 17 HMM models had 100
    detection rate and 10 models had 0 false
    positive rate
  • No significant difference in performance between
    HMMs with 3 or more hidden states

48
Virus Detection with HMMs Trained Models
  • Converged probabilities in HMM matrices may give
    insight into the features of the represented
    viruses
  • We observe
  • opcodes grouped into hidden states
  • most opcodes in one state only
  • What does this mean?
  • We are not sure

49
HMMs The Trained Models
50
Detection via Similarity Index
  • Straightforward similarity index can be used as
    detector
  • To determine whether a program belongs to the
    NGVCK virus family, compare it to any randomly
    chosen NGVCK virus
  • NGVCK similarity to non-NGVCK code is small
  • Can use this fact to detect metamorphic NGVCK
    variants

51
Detection via Similarity Index
52
Detection via Similarity Index
  • Experiment
  • compare 105 programs to one selected NGVCK virus
  • Results
  • 100 detection, 0 false positive
  • Does not depend on specific NGVCK virus selected

53
PART V
  • Conclusion

54
Conclusion
  • Metamorphic generators vary a lot
  • NGVCK has highest metamorphism (10 similarity on
    average)
  • Other generators far less effective (60
    similarity on average)
  • Normal files 35 similar, on average
  • But, NGVCK viruses can be detected!
  • NGVCK viruses too different from other viruses
    and normal programs

55
Conclusion
  • NGVCK viruses not detected by commercial scanners
    we tested
  • Hidden Markov model (HMM) detects NGVCK (and
    other) viruses with high accuracy
  • NGVCK viruses also detectable by similarity index

56
Conclusion
  • All metamorphic viruses tested were detectable
    because
  • High similarity within family and/or
  • Too different from normal programs
  • Effective use of metamorphism by virus/worm
    requires
  • A high degree of metamorphism and similarity to
    other programs
  • This is not trivial!

57
Conclusion
  • How practical is our detection method?
  • We cheat in several ways
  • Use IDA to disassemble
  • Viruses not embedded in other code
  • Limited testing, small no. of files, etc.
  • But results appear to be robust
  • If so, we can be sloppy (i.e., more efficient)
    and still get good results

58
The Bottom Line
  • Metamorphism for good
  • For example, buffer overflow mitigation
  • A little metamorphism does a lot of good
  • Metamorphism for evil
  • For example, try to evade virus/worm signature
    detection
  • Requires high degree of metamorphism and
    similarity to normal programs
  • Not impossible, but not easy

59
The Bottom Bottom Line
  • For metamorphic software, perhaps the inherent
    advantage lies with the good guys rather than the
    bad guys
  • All-too-often in information security, the
    advantage lies with the bad guys

60
References
  • X. Gao, Metamorphic software for buffer overflow
    mitigation, MS thesis, Dept. of CS, SJSU, 2005
  • P. Szor, The Art of Computer Virus Research and
    Defense, Addison-Wesley, 2005
  • M. Stamp, Information Security Principles and
    Practice, Wiley InterScience, 2005
  • W. Wong, Analysis and detection of metamorphic
    computer viruses, MS thesis, Dept. of CS, SJSU,
    2006
  • W. Wong and M. Stamp, Hunting for metamorphic
    engines, to appear in Journal in Computer Virology

61
Appendix
  • Extra Materials

62
HMMs Run Time of Training Process
  • 5 to 38 minutes, depending on number of states n.

63
HMMs Run Time of Classifying Process
  • 0.008 to 0.4 milliseconds, depending on N and
    number of opcodes T .

64
AVG Anti-Virus Scanning Result
Write a Comment
User Comments (0)
About PowerShow.com