Title: Automated Worm Fingerprinting
1Automated Worm Fingerprinting
- Sumeet Singh, Cristian Estan, George Varghese,
and Stefan Savage - Manan Sanghi
2The menace
3Context
- Worm Detection
- Scan detection
- Honeypots
- Host based behavioral detection
- Payload-based ???
4Context
- Characterization
- A priori vulnerability signatures
- Generally manual
- Honeycomb
- Host based
- Longest common subsequences
- Autograph
- Network level automatic signature generation
5Context
Internet Quarantine
- Containment
- Host quarantine
- String matching
- Connection throttling
Address Blacklisting Content Filtering
6Worm behavior
- Content Invariance
- Limited polymorphism e.g. encryption
- key portions are invariant e.g. decryption
routine - Content Prevalence
- invariant portion appear frequently
- Address Dispersion
- of infected distinct hosts grow overtime
- reflecting different source and dest. addresses
7Key Idea
- Detect unknown worms on the basis of
- A common exploit sequence
- Rage of unique sources and destination
8Content Sifting
- For each string w, maintain
- prevalence(w) Number of times it is found in the
network traffic - sources(w) Number of unique sources
corresponding to it - destinations(w) Number of unique destinations
corresponding to it - If thresholds exceeded, then block(w)
9Issues
- How to compute prevalence(w), sources(w) and
destinations(w) efficiently? - Scalable
- Low memory and CPU requirements
- Real time deployment over a Gigabit scale link
10prevalence(w)
- w entire packet
- Use multi-stage filters (k-ary sketches?)
- w small fixed length b
- Rabin fingerprints
- Value sampling
11Value Sampling
- The problem s-b1 substrings
- Solution Sample
- But Random sampling is not good enough
- Trick Sample only those substrings for which the
fingerprint matches a certain pattern - Since Rabin fingerprints are randomly ditributed,
- Prtrack(x)1-e-f(x-b1)
12sources(w) destinations(w)
- Address Dispersion
- Counting distinct elements vs. repeating elements
- Simple list or hash table is too expensive
- Key Idea Bitmaps
- Trick Scaled Bitmaps
13Direct Bitmap
- Each content source is hashed into a bitmap, the
corresponding bit is set, and an alarm is raised
when the number of bits set exceeds a threshold - Drawback lose estimation of actual values of
each counter
14Scaled Bitmap
- Idea Subsample the range of hash space
- How it works?
- multiple bitmaps each mapped to progressively
smaller and smaller portions of the hash space. - bitmap recycled if necessary.
Result Roughly 5 time less memory actual
estimation of address dispersion
15Putting it together
16Experience
- System design Sensors and Aggregators
- sensor sift through traffic on configurable
address space zones of responsibility - aggregator coordinates real-time updates from the
sensors, coalesces related signatures and so on. - Parameters
- content prevalence 3
- address dispersion threshold30
- garbage collection time several hours
17prevalence(w) threshold
18Address Dispersion threshold
19Garbage Collection threshold
20Trace-based False Positives
21Performance
- Processing time
- Memory
- Consumption
- 4M bytes
22Live Experience
- Detect known worms CodeRed,
- Detect new worms MyDoom, Sasser, Kibvu.B
23Limitation Extension
- Variant content
- Network evasion
- Extension Dealing with slow worms
24Comparison
Earlybird Autograph
Infect the system with Network Data (real traces) Infect the system with Network Data (real traces)
Rabin fingerprint Rabin fingerprint
White-list/blacklist White-list/blacklist
No-prefiltering Flow-reassembly
Single sensor algorithmics centralized aggregators Distributed Deployment active cooperation between multiple sensors
On-line Off-line
Overlapping, fixed-length chunks Non-overlapping, variable-length chunks
Qinghua Zhang
25Breather
26Polygraph Automatically Generating Signatures
For Polymorphic Worms
- James Newsome, Brad Karp, Dawn Song
27The case for polymorphic worms
- Single Substring Insufficient
- Sensitive Should exist in all payload of a worm
- Specific Should be long enough to not exist in
any non-worm payload
28Examples
29Signature Classes
- Signature set of tokens
- Conjunction Signatures
- Token-subsequence Signatures
- Bayes Signatures
30Problem Formulation
31Algorithms
- Preprocessing
- Distinct substrings of a minimum length l that
occur in at least k samples in suspicious pool - Generating signatures
- Conjunction signatures
- Token Subsequence Signatures
- Bayes Signatures
32Wrap Up
- Automated Worm Fingerprinting (OSDI 2004)
- Polygraph Automatically Generating Signatures
For Polymorphic Worms - (IEEE Security Symposium 2005)
Manan Sanghi