METAMORPHIC SOFTWARE FOR GOOD AND EVIL - PowerPoint PPT Presentation

About This Presentation
Title:

METAMORPHIC SOFTWARE FOR GOOD AND EVIL

Description:

avast! antivirus version 4.7. AVG Anti-Virus version 7.1. Each ... eTrust and avast! detected 17 (G2 and MPCGEN) AVG detected 27 viruses (G2, MPCGEN and VCL32) ... – PowerPoint PPT presentation

Number of Views:195
Avg rating:3.0/5.0
Slides: 63
Provided by: Fio93
Learn more at: http://www.cs.sjsu.edu
Category:

less

Transcript and Presenter's Notes

Title: METAMORPHIC SOFTWARE FOR GOOD AND EVIL


1
METAMORPHIC SOFTWARE FOR GOOD AND EVIL
  • Wing Wong
  • Mark Stamp
  • November 20, 2006

2
Outline
  • Metamorphic software
  • What is it?
  • Good and evil uses
  • Metamorphic virus construction kits
  • How effective are metamorphic engines?
  • How to compare two pieces of code?
  • Similarity of viruses/normal code
  • Can we detect metamorphic viruses?
  • Commercial virus scanners
  • HMMs and similarity index
  • Conclusion

3
PART I
  • Metamorphic Software

4
What is Metamorphic Software?
  • Software is metamorphic provided
  • All copies do the same thing
  • Internal structure differs
  • Today almost all software is cloned
  • Good metamorphic software
  • Mitigate buffer overflow attacks
  • Bad metamorphic software
  • Avoid virus/worm signature detection

5
Metamorphic Software for Good?
  • Suppose program has a buffer overflow
  • If we clone the program
  • One attack breaks every copy
  • Break once, break everywhere (BOBE)
  • If instead, we have metamorphic copies
  • Each copy still has a buffer overflow
  • One attack does not work against every copy
  • BOBE-resistant
  • Analogous to genetic diversity in biology
  • A little metamorphism does a lot of good!

6
Metamorphic Software for Evil?
  • Cloned virus/worm can be detected
  • Common signature on every copy
  • Detect once, detect everywhere (DODE?)
  • If instead virus/worm is metamorphic
  • Each copy has different signature
  • Same detection may not work against every copy
  • Provides DODE-resistance?
  • Analogous to genetic diversity in biology
  • Effective use of metamorphism here is tricky!

7
Crypto Analogy
  • Consider WWII ciphers
  • German Enigma
  • Broken by Polish and British cryptanalysts
  • Design was (mostly) known to cryptanalysts
  • Japanese Purple
  • Broken by American cryptanalysts
  • Design was (mostly) unknown to cryptanalysts

8
Crypto Analogy
  • Cryptanalysis ? break a (known) cipher
  • Diagnosis ? determine how an unknown cipher works
    (from ciphertext)
  • Which was the greater achievement, breaking
    Enigma or Purple?
  • Cryptanalysis of Enigma was harder
  • Diagnosis of Purple was harder
  • Can make a reasonable case for either

9
Crypto Analogy
  • What does this have to do with metamorphic
    software?
  • Suppose the good guys generate metamorphic copies
    of software
  • Bad guys can attack individual copies
  • Can bad guys attack all copies?
  • If they can diagnose our metamorphic generator,
    maybe
  • But thats a diagnosis problem

10
Crypto Analogy
  • What about case where bad guys write metamorphic
    code?
  • Metamorphic viruses, for example
  • Do good guys need to solve diagnosis problem?
  • If so, good guys are in trouble
  • Not if good guys only need to detect the
    metamorphic code (not diagnose)
  • Not claiming the good guys job is easy
  • Just claiming that there is hope

11
Virus Evolution
  • Viruses first appeared in the 1980s
  • Fred Cohen
  • Viruses must avoid signature detection
  • Virus can alter its appearance
  • Techniques employed
  • encryption
  • polymorphic
  • metamorphic

12
Virus Evolution - Encryption
  • Virus consists of
  • decrypting module (decryptor)
  • encrypted virus body
  • Different encryption key
  • different virus body signature
  • Weakness
  • decryptor can be detected

13
Virus Evolution Polymorphism
  • Try to hide signature of decryptor
  • Can use code emulator to decrypt putative virus
    dynamically
  • Decrypted virus body is constant
  • Once (partially) decrypted, signature detection
    is possible

14
Virus Evolution Metamorphism
  • Change virus body
  • Mutation techniques
  • permutation of subroutines
  • insertion of garbage/jump instructions
  • substitution of instructions

15
PART II
  • Virus Construction Kits

16
Virus Construction Kits PS-MPC
  • According to Peter Szor
  • PS-MPC Phalcon/Skism Mass-Produced Code
    generator uses a generator that effectively
    works as a code-morphing engine the viruses
    that PS-MPC generates are not only polymorphic,
    but their decryption routines and structures
    change in variants

17
Virus Construction Kits G2
  • From the documentation of G2 (Second Generation
    virus generator)
  • different viruses may be generated from
    identical configuration files

18
Virus Construction Kits NGVCK
  • From the documentation for NGVCK (Next Generation
    Virus Creation Kit)
  • all created viruses are completely different
    in structure and opcode impossible to catch
    all variants with one or more scanstrings.
    nearly 100 variability of the entire code
  • Oh, really?

19
PART III
  • How Effective Are Metamorphic Engines?

20
How We Compare Two Pieces of Code
21
Virus Families Test Data
  • Four generators, 45 viruses
  • 20 viruses by NGVCK
  • 10 viruses by G2
  • 10 viruses by VCL32
  • 5 viruses by MPCGEN
  • 20 normal utility programs from the Cygwin bin
    directory

22
Similarity within Virus Families Results
23
Similarity within Virus Families Results
24
Similarity within Virus Families Results
25
Similarity within Virus Families Results
26
Similarity within Virus Families Results
27
NGVCK Similarity to Virus Families
  • NGVCK versus other viruses
  • 0 similar to G2 and MPCGEN viruses
  • 0 5.5 similar to VCL32 viruses (43 out of 100
    comparisons have score gt 0)
  • 0 1.2 similar to normal files (only 8 out of
    400 comparisons have score gt 0)

28
NGVCK Metamorphism/Similarity
  • NGVCK
  • By far the highest degree of metamorphism of any
    kit tested
  • Virtually no similarity to other viruses or
    normal programs
  • Undetectable???

29
PART IV
  • Can Metamorphic Viruses Be Detected?

30
Commercial Virus Scanners
  • Tested three virus scanners
  • eTrust version 7.0.405
  • avast! antivirus version 4.7
  • AVG Anti-Virus version 7.1
  • Each scanned 37 files
  • 10 NGVCK viruses
  • 10 G2 viruses
  • 10 VCL32 viruses
  • 7 MPCGEN viruses

31
Commercial Virus Scanners
  • Results
  • eTrust and avast! detected 17 (G2 and MPCGEN)
  • AVG detected 27 viruses (G2, MPCGEN and VCL32)
  • none of NGVCK viruses detected by the scanners
    tested

32
Virus Detection with HMMs
  • Use hidden Markov models (HMMs) to represent
    statistical properties of a set of metamorphic
    virus variants
  • Train the model on family of metamorphic viruses
  • Use trained model to determine whether a given
    program is similar to the viruses the HMM
    represents

33
Virus Detection with HMMs Data
  • Data set
  • 200 NGVCK viruses (160 for training, 40 for
    testing)
  • Comparison set
  • 40 normal exes from Cygwin
  • 25 other non-family viruses (G2, MPCGEN and
    VCL32)
  • 25 HMM models generated and tested

34
Virus Detection with HMMs Methodology
35
Virus Detection with HMMs Results
36
Virus Detection with HMMs Results
  • Detect some other viruses for free

37
Virus Detection with HMMs
  • Summary of experimental results
  • All normal programs distinguished
  • VCL32 viruses had scores close to NGVCK family
    viruses
  • With proper threshold, 17 HMM models had 100
    detection rate and 10 models had 0 false
    positive rate
  • No significant difference in performance between
    HMMs with 3 or more hidden states

38
Virus Detection with HMMs Trained Models
  • Converged probabilities in HMM matrices may give
    insight into the features of the represented
    viruses
  • We observe
  • opcodes grouped into hidden states
  • most opcodes in one state only
  • What does this mean?
  • We are not sure

39
Detection via Similarity Index
  • Straightforward similarity index can be used as
    detector
  • To determine whether a program belongs to the
    NGVCK virus family, compare it to any randomly
    chosen NGVCK virus
  • NGVCK similarity to non-NGVCK code is small
  • Can use this fact to detect metamorphic NGVCK
    variants

40
Detection via Similarity Index
41
Detection via Similarity Index
  • Experiment
  • compare 105 programs to one selected NGVCK virus
  • Results
  • 100 detection, 0 false positive
  • Does not depend on specific NGVCK virus selected

42
PART V
  • Conclusion

43
Conclusion
  • Metamorphic generators vary a lot
  • NGVCK has highest metamorphism (10 similarity on
    average)
  • Other generators far less effective (60
    similarity on average)
  • Normal files 35 similar, on average
  • But, NGVCK viruses can be detected!
  • NGVCK viruses too different from other viruses
    and normal programs

44
Conclusion
  • NGVCK viruses not detected by commercial scanners
    we tested
  • Hidden Markov model (HMM) detects NGVCK (and
    other) viruses with high accuracy
  • NGVCK viruses also detectable by similarity index

45
Conclusion
  • All metamorphic viruses tested were detectable
    because
  • High similarity within family and/or
  • Too different from normal programs
  • Effective use of metamorphism by virus/worm
    requires
  • A high degree of metamorphism and similarity to
    other programs
  • This is not trivial!

46
The Bottom Line
  • Metamorphism for good
  • Buffer overflow mitigation, BOBE-resistance
  • A little metamorphism does a lot of good
  • Metamorphism for evil
  • For example, try to evade virus/worm signature
    detection
  • Requires high degree of metamorphism and
    similarity to normal programs
  • Not impossible, but not easy

47
The Bottom Bottom Line
  • All-too-often in security, the advantage lies
    with the bad guys
  • For metamorphic software, perhaps the inherent
    advantage lies with the good guys

48
References
  • X. Gao, Metamorphic software for buffer overflow
    mitigation, MS thesis, Dept. of CS, SJSU, 2005
  • P. Szor, The Art of Computer Virus Research and
    Defense, Addison-Wesley, 2005
  • M. Stamp, Information Security Principles and
    Practice, Wiley InterScience, 2005
  • M. Stamp, Applied Cryptanalysis Breaking Ciphers
    in the Real World, Wiley, 2007
  • W. Wong, Analysis and detection of metamorphic
    computer viruses, MS thesis, Dept. of CS, SJSU,
    2006
  • W. Wong and M. Stamp, Hunting for metamorphic
    engines, Journal in Computer Virology, Vol. 2,
    No. 3, 2006, pp. 211-229

49
Appendix
  • Bonus Material

50
Hidden Markov Models (HMMs)
  • state machines
  • transitions between states have fixed
    probabilities
  • each state has a probability distribution for
    observing a set of observation symbols
  • states features of the input data
  • transition and the observation probabilities
    statistical properties of features
  • can train an HMM to represent a set of data (in
    the form of observation sequences)

51
HMM Example the Occasionally Dishonest Casino
52
HMM Example the Occasionally Dishonest Casino
  • 2 states fair/loaded
  • The switch between dice is a Markov process
  • Outcomes of a roll have different probabilities
    in each state
  • If we can only see a sequence of rolls, the state
    sequence is hidden
  • want to understand the underlying Markov process
    from the observations

53
HMMs the Three Problems
  • Find the likelihood of seeing an observation
    sequence O given a model ?, i.e. P(O ?)
  • Find an optimal state sequence that could have
    generated a sequence O
  • Find the model parameters given a sequence O
  • There exist efficient algorithms to solve the
    three problems

54
HMM
55
HMM Application Determining the Properties of
English Text
  • Given a large quantity of written English text
  • Input a long sequence of observations consisting
    of 27 symbols (the 26 lower-case letters and the
    word space)
  • Train a model to find the most probable
    parameters (i.e., solve Problem 3)

56
HMM Application Initial and Final Observation
Probability Distributions
57
HMM Application - Results
  • Observation probabilities converged, each letter
    belongs to one of the two hidden states
  • The two states correspond to consonants and
    vowels
  • Can use trained model to score any unknown
    sequence of letters to determine whether it
    corresponds to English text. (i.e. Problem 1)
  • Note
  • no a priori assumption was made
  • HMM effectively recovered the statistically
    significant feature inherent in English

58
HMM Application - Results
  • Probabilities can be sensibly interpreted for up
    to n 12 hidden states
  • Trained model could be used to detect English
    text, even if the text is disguised by, say, a
    simple substitution cipher or similar
    transformation

59
HMMs The Trained Models
60
HMMs Run Time of Training Process
  • 5 to 38 minutes, depending on number of states N.

61
HMMs Run Time of Classifying Process
  • 0.008 to 0.4 milliseconds, depending on N and
    number of opcodes T .

62
AVG Anti-Virus Scanning Result
Write a Comment
User Comments (0)
About PowerShow.com