Polygraph: Automatically Generating Signatures for Polymorphic Worms - PowerPoint PPT Presentation

About This Presentation
Title:

Polygraph: Automatically Generating Signatures for Polymorphic Worms

Description:

Polygraph: Automatically Generating Signatures for Polymorphic Worms – PowerPoint PPT presentation

Number of Views:132
Avg rating:3.0/5.0
Slides: 47
Provided by: jdne
Learn more at: http://www.cs.cmu.edu
Category:

less

Transcript and Presenter's Notes

Title: Polygraph: Automatically Generating Signatures for Polymorphic Worms


1
Polygraph Automatically Generating Signatures
for Polymorphic Worms
  • James Newsome,
  • Brad Karp, and Dawn Song

Intel Research Pittsburgh
Carnegie Mellon University
2
Internet Worms
  • Definition Malicious code that propagates by
    exploiting software
  • No human interaction needed
  • Able to spread very quickly
  • Slammer scanned 90 of Internet in 10 minutes

3
Proposed Defense Strategy
Worm Detected!
  • Honeycomb Kreibich2003
  • Autograph Kim2004
  • Earlybird Singh2004

4
Challenge Polymorphic Worms
  • Polymorphic worms minimize invariant content
  • Encrypted payload
  • Obfuscated decryption routine
  • Polymorphic tools are already available
  • Clet,ADMmutate

Do good signatures for polymorphic worms
exist? Can we generate them automatically?
5
Good News Still some invariant content
  • Protocol framing
  • Needed to make server go down vulnerable code
    path
  • Overwritten Return Address
  • Needed to redirect execution to worm code
  • Decryption routine
  • Needed to decrypt main payload
  • BUT, code obfuscation can eliminate patterns here

6
Bad News Previous Approaches Insufficient
  • Previous approaches use a common substring
  • Longest substring
  • HTTP/1.1
  • 93 false positive rate
  • Most specific substring
  • \xff\xbf
  • .008 false positive rate (10 / 125,301)

7
What to do?
  • No one substring is specific enough
  • BUT, there are multiple substrings
  • Protocol framing
  • Value used to overwrite return address
  • (Parts of poorly obfuscated code)
  • Our approach combine the substrings

8
Outline
  • Substring-based signatures insufficient
  • Generating signatures
  • Perfect (noiseless) classifier case
  • Signature classes algorithms
  • Evaluation
  • Imperfect classifier case
  • Clustering extensions
  • Evaluation
  • Attacking the system
  • Conclusion

9
Goals
  • Identify classes of signatures that can
  • Accurately describe polymorphic worms
  • Be used to filter a high speed network line
  • Be generated automatically and efficiently
  • Design and implement a system to automatically
    generate signatures of these classes

10
Polygraph Architecture
Suspicious Flow Pool
Network Tap
Signature Generator
Flow Classifier
Worm Signatures
Innocuous Flow Pool
11
Outline
  • Substring-based signatures insufficient
  • Generating signatures
  • Perfect (noiseless) classifier case
  • Signature classes algorithms
  • Evaluation
  • Imperfect classifier case
  • Clustering extensions
  • Evaluation
  • Attacking the system
  • Conclusion

12
Signature Class (I) Conjunction
  • Signature is a set of strings (tokens)
  • Flow matches signature iff it contains all tokens
    in the signature
  • O(n) time to match (n is flow length)
  • Generated signature
  • GET and HTTP/1.1 and \r\nHost and
    \r\nHost and \xff\xbf
  • .0024 false positive rate (3 / 125,301)

Decryption Routine
Decryption Key
Encrypted Payload
\xff\xbf
NOP slide
13
Generating Conjunction Signatures
  • Use suffix tree to find set of tokens that
  • Occur in every sample of suspicious pool
  • Are at least 2 bytes long
  • Generation time is linear in total byte size of
    suspicious pool
  • Based on a well-known string processing algorithm
    Hui1992

14
Signature Class (II) Token Subsequence
  • Signature is an ordered set of tokens
  • Flow matches iff it contains all the tokens in
    signature, in the given order
  • O(n) time to match (n is flow length)
  • Generated signature
  • GET.HTTP/1.1.\r\nHost.\r\nHost.\xff\xbf
  • .0008 false positive rate (1 / 125,301)

Decryption Routine
Decryption Key
Encrypted Payload
\xff\xbf
NOP slide
15
Generating Token Subsequence Signatures
  • Use dynamic programming to find longest common
    token subsequence (lcseq) between 2 samples in
    O(n2) time
  • SmithWaterman1981
  • Find lcseq of first two samples
  • Iteratively find lcseq of intermediate result and
    next sample

16
Experiment Signature Generation
  • How many worm samples do we need?
  • Too few samples ? signature is too specific
    ?false negatives
  • Experimental setup
  • Using a 25 day port 80 trace from lab perimeter
  • Innocuous pool First 5 days (45,111 streams)
  • Suspicious Pool
  • Using Apache exploit described earlier
  • Non-invariant portions filled with random bytes
  • Signature evaluation
  • False positives Last 10 days (125,301 streams)
  • False negatives 1000 generated worm samples

17
Signature Generation Results
Worm Samples Conjunction Subseq
2 100 FN 100 FN
3 to 100 0 FN .0024 FP 0 FN.0008 FP
18
Also Works for Binary Protocols
  • Created polymorphic version of BIND TSIG exploit
    used by Li0n Worm
  • Single substring signatures
  • 2 bytes of Ret Address .001 false positives
  • 3 byte TSIG marker .067 false positives
  • Conjunction 0 false positives
  • Subsequence 0 false positives
  • Evaluated using a 1 million request trace from a
    DNS server that serves a major university and
    several CCTLDs

19
Outline
  • Substring-based signatures insufficient
  • Generating signatures
  • Perfect (noiseless) classifier case
  • Signature classes algorithms
  • Evaluation
  • Imperfect classifier case
  • Clustering extensions
  • Evaluation
  • Attacking the system
  • Conclusion

20
Noise in Suspicious Flow Pool
  • What if classifier has false positives?
  • 3 worm samples
  • GET . HTTP/1.1\r\n.\r\nHost .\r\nHost.\xff\x
    bf
  • 3 worm samples 1 legit GET request
  • GET . HTTP/1.1\r\n.\r\nHost
  • 3 worm samples a non-HTTP request
  • .

21
Our Approach Hierarchical Clustering
  • Used for multiple sequence alignment in
    Bioinformatics Gusfield1997
  • Initialization
  • Each sample is a cluster
  • Each cluster has a signature matching all samples
    in that cluster
  • Greedily merge clusters
  • Minimize false positive rate, using innocuous
    pool
  • Stop when any further merging results in
    significant false positives
  • Output the signature of each final cluster of
    sufficient size

22
Hierarchical Clustering
Worm Sample 1
Innoc Sample 1
Worm Sample 2
Innoc Sample 2
Worm Sample 3
Common substrings HTTP/1.1, GET, High false
positive rate!
23
Hierarchical Clustering
Worm Sample 1
Innoc Sample 1
Worm Sample 2
Innoc Sample 2
Worm Sample 3
Common substrings HTTP/1.1, GET, High false
positive rate!
24
Hierarchical Clustering
Worm Sample 1
Innoc Sample 1
Worm Sample 2
Innoc Sample 2
Worm Sample 3
Common substrings HTTP/1.1, GET, \xff\xbf,
\xde\xad Low false positive rate (but high false
negative rate)
25
Hierarchical Clustering
Worm Sample 1
Innoc Sample 1
Worm Sample 2
Innoc Sample 2
Worm Sample 3
HTTP/1.1, GET, \xff\xbf, \xde\xad
HTTP/1.1, GET, \xff\xbf
26
Clustering Evaluation (with noise)
  • Suspicious pool consists of
  • 5 polymorphic worm samples
  • Varying number of noise samples
  • Noise samples chosen uniformly at random from
    evaluation trace
  • Clustering uses innocuous pool to estimate false
    positive rate

27
Clustering Results
Noise Conjunction Fpos Fneg Subseq Fpos Fneg
0 .0024 0 .0008 0
38 .0024 0 .0008 0
50 .0024 0 .0008 0
80 .0024 0 .7470 100 .0008 0 1.109 100
90 .0024 0 .3384 100 .4150 100 .0008 0 .6903 100 1.716 100
28
Outline
  • Substring-based signatures insufficient
  • Generating signatures
  • Perfect (noiseless) classifier case
  • Signature classes algorithms
  • Evaluation
  • Imperfect classifier case
  • Clustering extensions
  • Evaluation
  • Attacking the system
  • Conclusion

29
Overtraining Attacks
  • Conjunction and Subsequence can be tricked into
    overtraining
  • Red herring attack
  • Include extra fixed tokens
  • Remove them over time
  • Result Have to keep generating new signatures
  • Coincidental pattern attack
  • Create coincidental patterns given a small set
    of worm samples
  • Result more samples needed to generate a
    low-false-negative signature (50)

30
Solution Threshold matching
  • Signature classifies as worm if enough tokens are
    present
  • Implementation Bayes Signatures
  • Assign each token a score based on Bayes Law
  • Choose highest-acceptable false positive rate
  • Choose threshold that gets at most that rate in
    innocuous training pool
  • Properties
  • Signatures generated and matched in linear time
  • Not susceptible to overtraining attacks
  • Dont need clustering
  • You get the false positive rate you specify
  • Currently does not use ordering

31
Outline
  • Substring-based signatures insufficient
  • Generating signatures
  • Perfect (noiseless) classifier case
  • Signature classes algorithms
  • Evaluation
  • Imperfect classifier case
  • Clustering extensions
  • Evaluation
  • Attacking the system
  • Conclusion

32
Remaining False Positives
  • Conjunction signature has 3 false positives
  • 1 of these also matched by subsequence signature
  • What is causing these?
  • Would it be so bad if 3 legitimate requests were
    filtered out every 10 days?

33
The Offending Request
  • GET /Download/GetPaper.php?paperIdXXX HTTP/1.1
  • Host nsdi05.cs.washington.edu\r\n
  • POST /Author/UploadPaper.php HTTP/1.1\r\n
  • Host nsdi05.cs.washington.edu\r\n
  • ltbinary data containing \xff\xbfgt

34
Possible Fixes
  • Use protocol knowledge
  • Match on request level instead of TCP flow level
  • Require \xff\xbf be part of Host header
  • Disadvantage need protocol knowledge
  • Use distance between tokens
  • Makes signatures more specific
  • Disadvantage risks more overtraining attacks

35
Future Work
  • Defending against overtraining
  • Further reducing false positives
  • Could be reduced by learning more features (such
    as offsets)
  • But this increases risk of overtraining
  • Promising solution semantic analysis
  • Automatically analyze how worm exploit works
  • Only use features that must be present
  • First steps in Newsome05 (NDSS)
  • Currently extending this work (Brumley-Newsome-Son
    g)

36
Conclusions
  • Key observation Content variability is limited
    by nature of the software vulnerability
  • Have shown that
  • Accurate signatures can be automatically
    generated for polymorphic worms
  • Demonstrated low false positives with real
    exploits, on real traffic traces

37
Thanks!
  • Questions?
  • Contact jnewsome_at_ece.cmu.edu

38
(No Transcript)
39

Coincidental Pattern Attack
  • Conjunction Subsequence may overtrain
  • Coincidental pattern attack
  • For non-invariant bytes, choose a or b
  • Result
  • Suspicious pool has many substrings in common of
    form aabba, babba
  • Unseen worm samples will have many of these
    substrings, but not every one

40
Results with Coincidental Pattern Attack
  • False negatives

Suspicious Pool Size
41
Results Multiple Worms Noise
Noise Conjunction Subseq Bayes
0 .0024 0 .0008 0 .008 0
38 .0024 0 .0008 0 .008 0
50 .0024 0 .0008 0 .008 0
80 .0024 0 .7470 100 .0008 0 1.109 100 .008 0
90 .0024 0 .3384 100 .4150 100 .0008 0 .6903 100 1.716 100 10 100
42
The Innocuous Pool
  • Used to determine
  • How often tokens appear in legit traffic
  • Estimated signature false positive rates
  • Goals
  • Representative of current traffic
  • Does not contain worm flows
  • Can be generated by
  • Taking a relatively old trace
  • Filtering out known worms and exploits

43
Key Algorithm Token Extraction
  • Need to identify useful tokens
  • Substrings that occur in worm samples
  • Problem Find all substrings that
  • Occur in at least k out of n samples
  • Are at least x bytes long
  • Can be solved in time linear in total length of
    samples using a suffix tree

44
Signature Class (III) Bayes
  • Use a Bayes classifier
  • Presence of a token is a feature
  • Hence, each token has a score
  • Generated signature
  • (GET .0035, Host .0022, HTTP/1.1 .11,
    \xff\xbf 3.15) Threshold1.99
  • .008 false positive rate (10 / 125,301)

45
Generating Bayes Signatures
  • Use suffix tree to find tokens that occur in a
    significant number of samples
  • Determine probabilities
  • Pr(worm) Pr(worm) .5
  • Pr(substringworm) use suspicious pool
  • Pr(substringworm) use innocuous pool
  • Set a certainty threshold c
  • Signature matches a flow if the Bayes formula
    identifies it as more than c likely to be a worm
  • Choose c that results in few (lt 5) false
    positives in innocuous pool

46
Innocuous Pool Poisoning
  • Before releasing worm
  • Determine what signature of worm is
  • Flood Internet with innocuous requests that match
  • Eventually included in innocuous training pool
  • Release worm
  • Polygraph will
  • Generate signature for worm
  • See that it causes many false positives in
    innocuous pool
  • Reject signature
  • Solution
  • Use a relatively old trace for innocuous pool
  • Drawback Hierarchical clustering generates more
    spurious signatures
Write a Comment
User Comments (0)
About PowerShow.com