Title: Automating Analysis of Large-Scale Botnet Probing Events
1Automating Analysis of Large-Scale Botnet Probing
Events
- Zhichun Li, Anup Goyal, Yan Chen and Vern Paxson
- Lab for Internet and Security Technology (LIST)
Northwestern University - UC Berkeley / ICSI
2Motivation
IPv4 Space
Botnets
Can we answer this question with only limited
information observed locally in the enterprise?
Enterprise
Does this attack specially target us?
Administrators
3Motivation
- Can we infer the probe strategy used by botnets?
- Can we infer whether a botnet probing attack
specially targets a certain network, or we are
just part of a larger, indiscriminant attack? - Can we extrapolate botnet global properties given
limited local information?
4Agenda
- Motivation
- Basic framework
- Discover the botnet probing strategies
- Extrapolate global properties
- Evaluation
- Conclusions
5Botnet Probing Events
Big spikes of larger numbers of probers mainly
caused by botnets
6System Framework
- See the paper for subtle system details.
7Agenda
- Motivation
- Basic framework
- Discover the botnet probing strategies
- Extrapolate global properties
- Evaluation
- Conclusions
8Discover the Botnet Probing Strategies
- Use statistical tests to understand probing
strategies - Leverage on existing statistical tests
- Monotonic trend checking detect whether bots
probe the IP space monotonically - Uniformity checking detect whether bots scan the
IP range uniformly. - Design our own
- Hitlist (liveness) checking detect whether they
avoid the dark IP space - Dependency checking do the bots scan
independently or are they coordinated?
9Design Space
10Hitlist Checking
- Configure the sensor to be half darknet and half
honeynet - Use metric ? src in darknet/ src in
honeynet. - Threshold 0.5
11Agenda
- Motivation
- Basic framework
- Discover the botnet probing strategies
- Extrapolate global properties
- Global scan scope, total of bots, total of
scans, total scan rate for each bot - Evaluation
- Conclusions
12Extrapolate Global Properties Basic Ideas and
Validation
- Observe the packet fields that change with
certain patterns in continuous probes. - IPID a packet field in IP header used for IP
defragmentation - Ephemeral port number the source port used by
bots - Increment for a fixed per scan
- Validation
- IPID continuity All versions of Windows and
MacOS - Ephemeral port number continuity botnet source
code study - Agobot, Phatbot, Spybot, SDbot, rxBot, etc.
- Control experiments with NAT
13Estimate Global Scan Rate of Each Bot
- Count the IPID ephemeral port changes
- Recover the overflow of IPID and ephemeral port
number - Estimate the rate with linear regression when
correlation coefficient gt 0.99 - Counter overestimation use less of the two
14Extrapolate Global Scan Scope
IPv4 Space
Botnets
boti
ni100
Total scans from boti scan rate Ri scan time
Ti 1001000100,000
Local/global ratio
Aggregating multiple bots
15Extrapolate Global of Bots
- Idea similar to Mark and Recapture
- Assumption All bots have the same global scan
range
Bots
M
m1
m2
Mm1m2/m12
m12
16Agenda
- Motivation
- Basic framework
- Discover the botnet probing strategies
- Extrapolate global properties
- Evaluation
- Conclusions
17Dataset
- Based on a 10 /24 honeynet in a National Lab
(LBNL) - 293GB packet traces in 24 months (2006-07)
- Totally observed 203 botnet probing events
- Average observed bots/event is 980.
- Mainly on SMB/WINRPC, VNC, Symantec, MSSQL, HTTP,
Telnet - Size of the system 13,900 lines Bro (6,000),
Python (4,000), C (2,500), R (1,400)
18Property Checking Results
- More than 80 uniform scanning
- Validate the results through visualization and
find the results are highly accurate.
19Extrapolation Results
- Most of extrapolated global scopes are at /8
size, which means the botnets do not target the
enterprise (LBNL). - Validation based with DShield data
- DShield the largest Internet alert repository
- Find the /8 prefixes in DShield with sufficient
source (bots) overlap with the honeynet events - Due to incompleteness of Dshield data, 12 events
validated - Calculate the scan scope in each /8 based on
sensor coverage ratio.
20Extrapolation Validation
- Define scope factor as max(DShield/Honeynet,Honeyn
et/DShield)
75 within 1.35 All within 1.5
CDF of the scope factor
21Conclusions
- Develop a set of statistical approaches to assess
four properties of botnet probing strategies - Designed approaches to extrapolate the global
properties of a scan event based on limited local
view - Through real-world validation based on DShield,
we show our scheme are promisingly accurate
22Backup
23Event size distribution
24Extrapolate the scope
Probes observed locally
Local/global ratio
Estimate global probing rate
Probing time window
25Monotonic trend checking
- Goal detect whether the bots probe the IP space
monotonically - E.g. simple sequential probing
- Technique
- Mann-Kendall trend test
- Intuition check whether the aggregated sign
value (sign(Ai1-Ai)) out of the range of
randomness can achieve. - When most (gt80) senders in an events follow
trend we label the events follow trends
26Uniformity Checking
- Goal detect whether the botnet scan the IP range
uniformly. - Technique
- Chi-Square test
- Intuition put address into bins. The scan
observed in each bin should be similar. - Significance level of 0.5
27Dependency Checking
- Goal Is the bots try to get out each others
way? - Idea account the number of address receive zero
scan and comparing with confidence interval of
the independent random case.