Title: Finding Diversity in Remote Code Injection Exploits
1Finding Diversity in Remote Code Injection
Exploits
- Justin Ma, John Dunagan, Helen J. Wang,
- Stefan Savage, Geoffrey M. Voelker
- University of California, San Diego
- Microsoft Research
2Encountering new malware
Have I seen this before? How closely related is
it to what I have seen before?
3Practical considerations
New defense?
?
4Theoretical considerations
Evolutionary relationship?
?
?
5Grouping similar malware together
- Ultimately, construct malware families
- Anti-virus industry is active in this area
6Motivation
710 new families 40,000 new variants
Family and variant defined in ad-hoc fashion
Is there a systematic way to determine the nature
of this diversity?
7Exploit diversity
MS RPC Request Exploit
Attacker
8Polymorphism
Encrypted
Attacker
9Behind the encryption
Attacker
10Differing constants
Different IP address
Attacker
11Functional differences
Waiting for a connection
Attacker
12Different code base
Calling tftp.exe
Attacker
13ISystemActivator vulnerability
How different are they?
1,561 exploit attempts
90 unique payloads
14Our goal
- Automatically construct phylogeny, or family tree
of exploits
15Outline for this talk
- On classifying shellcodes
- Steps for systematically studying shellcodes
- Trace collection
- Shellcode extraction
- Shellcode decryption
- Comparing samples
- Cluster analysis
- Post-hoc manual inspection to validate
- Look at the code!
16Why shellcodes?
- Our study focuses on exploits
- They are packaged with the exploit
- First foreign code that executes on a newly
infected machine - Part of exploit with most leeway for variation
- Primary challenge collecting and analyzing
shellcodes
17Remote code injection attacks
low
MS RPC Request Exploit
Vulnerable buffer
Flow of execution
Shellcode
Decrypted shellcode
Victim
high
Victims stack memory
18Trace collection
- Studying 5 vulnerabilities
- Residential
- 2-day trace
- Windows XP SP2
- 29 unused DSL IP addresses
- 4,400 exploit samples
- Enterprise Trace
- 1 Hour
- Active responders
- 5x /24 subnets
- 1,500 exploit samples
19Shellcode extraction
- Shield (Sigcomm04)
- Framework for specifying network-based protocols
and vulnerabilities - Extracts shellcodes from raw network packets
20Shellcode decryption
- Shellcode is encrypted
- Use shellcodes own decryption loop!
- Limited emulation
- Similar to generic decryption technique used for
viruses
21Comparing samplesCandidate metrics
- Edit distance
- Too specific non-code portions of payload made
related exploits unnecessarily distant - Structural distance
- Control flow graph over basic blocks
- Basic blocks summarized with a color/hash
- Too general did not capture subtle instruction
variations between exploit families
22Comparing samplesFinal metric
- Exedit distance metric
- Edit distance over executed parts of shellcode
- Distinguishes code from data
- Maintains instruction-level details
Canonical string for shellcode
23Cluster analysis
- Need to group samples using the exedit distance
metric - Agglomerative clustering
- Each iteration, merge closest pair of clusters
- Cluster distance distance of furthest samples
between two clusters
24Results
- Caught exploits for 5 vulnerabilities over traces
- Summary for residential trace
25ISystemActivator
10 clustering threshold
Need to manually verify this
6 families
26ISystemActivator
4-byte decoding key Kernel-address loading
function Function-finding block
27ISystemActivator
4-byte decoding key Kernel-address loading
function Function-finding block
28ISystemActivator
Longest payload Many function blocks in middle of
payload
29ISystemActivator
Command-line call to tftp.exe
30ISystemActivator
Different instructions in parts, otherwise very
similar
31ISystemActivator
Connect-back version
Bind version
32Conclusions
- Systematic method for classifying exploits
- Exploit collection
- Shellcode extraction and decryption
- Shellcode comparison using exedit distance
- Group exploits with clustering
- Similarity between samples in computed
phylogenies corresponded well with observed
differences - Useful step toward automating malware
classification