Title: Worm%20Origin%20Identification%20Using%20Random%20Moonwalks
1Worm Origin Identification Using Random Moonwalks
- Yinglian Xie, V. Sekar, D. A. Maltz, M. K.
Reiter, Hui Zhang - 2005 IEEE Symposium on Security and Privacy
Presented by Anup Goyal Edward Merchant
2Outline
- Motivation/Introduction
- Problem Formulation
- The Random Moonwalk Algorithm
- Evaluation Methodology
- Analytical Model
- Real Trace Study
- Simulation Study
- Deployment and Future Work
3Outline
- Motivation/Introduction
- Problem Formulation
- The Random Moonwalk Algorithm
- Evaluation Methodology
- Analytical Model
- Real Trace Study
- Simulation Study
- Deployment and Future Work
4Motivation
- Little automated support for identifying the
location from which an attack is launched. - Knowledge of the origin support law enforcement.
- Knowledge of the casual flow that advance attack
supports diagnosis of how network defense is
breached.
5Introduction
- We craft an algorithm that determines the origin
of epidemic spreading attacks. - identify the patient zero of the epidemic
- reconstruct the sequence of spreading
6Introduction (contd)
- Random moonwalk algorithm - Find the origin and
propagation paths of a worm attack. - performs post-mortem analysis on the traffic
records logged by the network. - It depends on the assumption that worm
propagation occurs in a tree-like structure.
7Outline
- Introduction
- Problem Formulation
- The Random Moonwalk Algorithm
- Evaluation Methodology
- Analytical Model
- Real Trace Study
- Simulation Study
- Deployment and Future Work
8Problem Formulation
9Problem Formulation (contd)
- A directed host contact graph G (V, E)
- V H T
- H is the set of all hosts in the network
- T is time
- Each directed edge represents a network flow
between two end hosts at certain time. - flow has a finite duration, and involves transfer
of one or more packets. - e (u, v, ts, te)
10Problem Formulation (contd)
- normal edge
- The flow does not carry an infectious payload.
- attack edge
- The flow carries attack traffic, whether or not
the flow is successful. - causal edge
- The flow that actually infect its destination.
- Goal - Identify a set of edges that are edges
from the top level of the casual tree.
11Outline
- Introduction
- Problem Formulation
- The Random Moonwalk Algorithm
- Evaluation Methodology
- Analytical Model
- Real Trace Study
- Simulation Study
- Deployment and Future Work
12Random Moonwalk Algo.
- Causal relationship between flows by exploiting
the global structure of worm attacks - No use of attack content, attack packet size, or
port numbers - For attack progress, there has to be a
communication link between source of the attack
and compromised nodes - This infection causing communication flows form a
causal tree, rooted at the source of attack. - Find the tree and root is the source of attack
- Find causal flows and attack flows
13Random Moonwalk Algo.
- Basic Algorithm
- Go backward from every node for certain distance.
- At each node choose only the flows which are
within certain time limit - Do it Z number of times
- Find the edges with highest frequency
- Create a tree for these flows
- Most probably this is the causal tree and root is
the source of attack
14Random Moonwalk Algo. (contd)
- Sampling process controlled by three parameters
- W the number of walks (samples) performed.
- D maximum length of the path traversed.
- ?t - sampling window size, max. time allowed
between two consecutive edges
15Random Moonwalk Algo. (contd)
- Why this algorithm works ?
- To propagate, sometime after infection, worm
creates a new flows to other hosts. - This forms a link from source to last victim
- Traverse this link backward and find the source
- An infected host generally originates more flows
than it receives. - The originators host contact graph are mostly
clients. Normal edges have no predecessor within
?t.
16Outline
- Introduction
- Problem Formulation
- The Random Moonwalk Algorithm
- Evaluation Methodology
- Analytical Model
- Real Trace Study
- Simulation Study
- Deployment and Future Work
17Outline
- Evaluation Methodology
- Analytical Model
- Assumptions
- Edge Probability Distribution
- False Positives and False Negatives
- Parameter Selection
- Real Trace Study
- Simulation Study
18Analytical Model (Assumptions)
- The host contact graph is known.
- E edges and H hosts
- Discretize time into units. Every flow has a
length of one unit and fits into one unit.
19Analytical Model (Probability)
20Analytical Model (FP FN)
(42 malicious edges at k 1.)
(Total 105 host.)
21Outline
- Evaluation Methodology
- Analytical Model
- Real Trace Study
- Detect the Existence of an Attack
- Identify Casual Edges Initial Infected Host
- Reconstruct the Top Level Casual Tree
- Parameter Selection
- Performance
- Simulation Study
22Real Trace Study
- Background Traffic
- Traffic trace was collected over a 4 hour period
at backbone of a class-B university network. - collect intra-campus flows only (1.4 million)
involving 8040 hosts - Addition
- Add flow records to represent worm-like traffic
with vary scanning rate - randomly select the vulnerable hosts.
23Real Trace Study (Existence)
24Real Trace Study (Identify)
(800 causal edges from 1.5106 flows) (The
scanning rate of Trace-50 is less than Trace-10.)
25Real Trace Study (Identify)
- Top frequent sampling v.s. Actual initial edges
(total 800 causal edges, initial 10 are the
first 80 edges) (The scanning rate of Teace-50 is
less than Trace-10.)
26Top 60, Trace-50, 104 walks
Original Attacker
Blaster Worm scan
27Real Trace Study (Parameter)
28Real Trace Study (Performance)
- Random moonwalk
- Z 100, 104 walks
- Heavy-hitter
- Find 800 hosts with largest number of flows in
the trace, random pick 100 flows - Super-spreader
- Find 800 hosts contacted the largest number of
destination, randomly pick 100 flows - Oracle
- With zero false positive rate, randomly select
100 flows between infected hosts
29Real Trace Study (Performance)
30Real Trace Study (Performance)
- Scanning Method
- Smart worm (always scan valid hosts), R?
- Scan with random address
C casual edge A attack edge 100 Z100 500
Z500
31Outline
- Evaluation Methodology
- Analytical Model
- Real Trace Study
- Simulation Study
32Simulation Study
- Simulate different background traffic
- Realistic host contact graphs tend to be much
sparser, meaning the chance of communication
between two arbitrary hosts is very low.
p.s. in campus network,the accuracy is about 0.7
33Outline
- Introduction
- Problem Formulation
- The Random Moonwalk Algorithm
- Evaluation Methodology
- Analytical Model
- Real Trace Study
- Simulation Study
- Deployment and Future Work
34Deployment and Future Work
- This approach assumes that the availability of
complete data. - the missing data on performance
- the deployment of the algorithm
35Questions ????