BotGraph: Large Scale Spamming Botnet Detection - PowerPoint PPT Presentation

About This Presentation

Title:

BotGraph: Large Scale Spamming Botnet Detection

Description:

Goal: stop aggressive account signup, limit outgoing spam. Challenges ... Severely limit attackers' spam throughput. Conclusions ... – PowerPoint PPT presentation

Number of Views:86

Avg rating:3.0/5.0

Slides: 28

Provided by: Jin1

Learn more at: https://users.cs.northwestern.edu

Category:

more less

Transcript and Presenter's Notes

Title: BotGraph: Large Scale Spamming Botnet Detection

1
BotGraph Large Scale Spamming Botnet Detection

Yao Zhao
Yinglian Xie, Fang Yu, Qifa Ke, Yuan Yu, Yan
Chen and Eliot Gillum
EECS Department, Northwestern University
Microsoft Research Silicon Valley
Microsoft Cooperation

2
Web-Account Abuse Attack
Zombie (Compromised host)
Spammers Server
User/Pwd
Captcha solver
RDSXXTD3
3
Problems and Challenges

Detect Web-account Abuse with Hotmail Logs
Input user activity traces (signup, login,
email-sending records)
Goal stop aggressive account signup, limit
outgoing spam
Challenges
Attack is stealthy individual account detection
difficult
Attack is large scale finding correlated
activities
gt500 million accounts
300GB-400GB data per month
Low false positive and false negative rate

4
The BotGraph System

A graph-based approach to attack detection
A large user-user graph to capture bot-account
correlations
Identify 26M bot-accounts with a low false
positive rate in two months
Efficient implementation using Dryad/DryadLINQ
Graph construction/analysis is not easily
parallelizable
Hundreds of millions of nodes, hundreds of
billions of edges
Process 200GB-300GB data in 1.5 hours with a
240-machine cluster
The first to provide a systematic solution to
the new botnet-based web-account abuse attack

4
5
System Architecture
1. History based algorithm to detect aggressive
signups
EWMA based change detection
(ID, IP, time)
Verification prune
(ID, time, of recipients)
2. Graph-based algorithm to find correlations
Verification prune
Random graph based clustering
Graph generation
(ID, IP, time)
Login graph
Login data
3. Parallel algorithm on DryadLINQ clusters
6
Detect Aggressive Signups
25
Large prediction error
Signup Count
20
EWMA Prediction
15
Back to normal
Number of Signup Accounts
10
5
Date
1-Jul
2-Jul
3-Jul
4-Jul
5-Jul
6-Jul
7-Jul
8-Jul
9-Jul

Simple and efficient
Detect 20 million malicious accounts in 2 months

6
7
System Architecture
1. History based algorithm on Signup detection
EWMA based change detection
Verification prune
(ID, IP, time)
(ID, time, of recipients)
2. Graph-based algorithm on login detection
Verification prune
Random graph based clustering
Graph generation
(ID, IP, time)
Login graph
Login data
3. Parallelel Algorithm on DryadLinq clusters
8
Detect Stealthy Accounts by Graphs

Observation bot-accounts work collaboratively
Normal Users
Share IP addresses in one AS with DHCP assignment
Bot-users

A user-user graph to model behavior similarities
8
9
Detect Stealthy Accounts by Graphs

Observation bot-accounts work collaboratively
Normal Users
Share IP addresses in one AS with DHCP assignment
Bot-users
Likely to share different IPs across ASes

A user-user graph to model behavior similarities
9
10
User-user Graph
User3
2 ASes

Node Hotmail account
Edge weight of ASes of the shared IP addresses
Consider edges with weightgt1
Key Observations
Bot-users form a giant connected-component while
normal users do not
Interpreted by the random graph theory

User1
3 ASes
5 ASes
4 ASes
User4
User2
User5
1 AS
User6
11
Random Graph Theory

Random Graph G(n,p)
n nodes and each pair of nodes has an edge with
probability p and average degree d (n-1) p
Theorem
If d lt 1, then with high probability the largest
component in the graph has size less than O(log
n)
No large connected subgraph
If d gt 1, with high probability the graph will
contain a giant component with size at the order
of O(n)
Most nodes are in one connected subgraph

12
Graph-based Bot-user Detection

Step 1 detect giant connected-components from
the user-user graph
Step 2 hierarchical algorithm to identify the
correct groupings
Different bot-user groups may be mixed
Easier validation with correct group statistics
Difficult to choose a fixed edge-threshold
Step 3 prune normal-user groups
Due to national proxies, cell phone users,
facebook applications, etc.

13
Graph-based Bot-user Detection

Step 1 detect giant connected-components from
the user-user graph
Step 2 hierarchical algorithm to identify the
correct groupings
Different bot-user groups may be mixed
Easier validation with correct group statistics
Difficult to choose a fixed edge-threshold
Step 3 prune normal-user groups
Due to national proxies, cell phone users,
facebook applications, etc.

13
14
Hierarchical Bot-Group Extraction
T2
T3
T4
14
15
System Architecture
1. History based algorithm on Signup detection
EWMA based change detection
Verification prune
(ID, IP, time)
(ID, time, of recipients)
2. Graph-based algorithm on login detection
Verification prune
Random graph based clustering
Graph generation
(ID, IP, time)
Login graph
Login data
3. Parallelel Algorithm on DryadLINQ clusters
16
Parallel Implementation on DryadLINQ