Mining Anomalies Using Traffic Feature Distributions - PowerPoint PPT Presentation

1 / 39

About This Presentation

Title:

Mining Anomalies Using Traffic Feature Distributions

Description:

... Traffic Feature Distributions. Anukool Lakhina, Mark Crovella, Christophe Diot ... A general, unsupervised method for reliably detecting and classifying network ... – PowerPoint PPT presentation

Number of Views:105

Avg rating:3.0/5.0

Slides: 40

Provided by: Abhi1

Category:

more less

Transcript and Presenter's Notes

Title: Mining Anomalies Using Traffic Feature Distributions

1
Mining Anomalies Using Traffic Feature
Distributions

Anukool Lakhina, Mark Crovella, Christophe Diot
ACM SIGCOMM, August 2005
Presented by Abhinay Kampasi
Referred to presentation on authors website

2
Motivation for Anomaly Detection

Is my customer being attacked?
Is someone probing my network?
Are there worms spreading?
A sudden traffic surge?
An equipment outage?
Something never seen before?

Anomalies present in network trafficare buried
like needles in a haystack!
3
Previous Work

Volume based anomaly detection
Largely focused on
Point solutions
not a general approach
Rule-based classification
not unsupervised
Data from single links
not network-wide

A general, unsupervised method for reliably
detecting and classifying network anomalies is
needed
4
Feature Distributions

Anomalies can be detected and distinguished by
inspecting traffic features SrcIP, SrcPort,
DstIP, DstPort

5
Feature Distribution Changes induced by Port Scan
Anomaly
6
Entropy

Metric that captures degree of dispersal or
concentration
where symbol i occurs ni times in sample
S is total of observations
Value lies between 0 and log2N
0 when distribution maximally concentrated
All observations same
log2N when distribution maximally disperseds
All observations distinct
Entropy value is normalized

7
Applying Entropy to Port Scan Data
8
Methodology

Detect
Use multiway subspace method
Augments volume metrics, highly sensitive
Identify anomalies on multiple features and flows
Classify
Use clustering on anomaly features
Can do unsupervised classification

9
Network-Wide Traffic Data Collected

Collected 3 weeks of sampled NetFlow data at 5
minute bins from two backbone networks
Compute entropy on packet histograms for 4
traffic features SrcIP, SrcPort, DstIP, DstPort
Two sources of bias sampling and anonymization
of IP addresses

10
Multiway Subspace Method

Based on subspace method and principal component
analysis
Every point in subspace has normal and residual
components
H(t,p,k) denotes the entropy value at time t for
flow p, of traffic feature k

Unwrap the multiway matrix into one matrix
Apply subspace method on merged matrix
Detect anomalies by monitoring size of
residual vector for unusually high values

11
How does entropy compare with volume-based
detection?

Does entropy allow detection of a larger set of
anomalies?
Are anomalies detected by entropy fundamentally
different from volume-based methods?
How precise is the entropy-based detection?

12
Comparison
Points that lie to the right of the vertical line
are volume-detected anomalies and points that lie
above the horizontal line are detected in entropy.
13
Manual Inspection
14
Detection Rate by Injecting Real Anomalies

Evaluation Methodology
Superimpose known anomaly traces into OD flows
Test sensitivity at varying anomaly intensities,
by thinning trace
Results are average over a sequence of experiments

15
Classifying Anomalies by Clustering

Use unsupervised classification
Each anomaly is a point in 4-D space
(SrcIP), (SrcPort), (DstIP),
(DstPort)
Use Hierarchical Agglomerative Algorithm for
determining clusters
Minimizes intra-cluster variation and maximizes
intra-cluster variation

16
Clustering Known Anomalies (2-D view)
Legend Code Red Scanning Single source DOS
attack Multi source DOS attack
17
3-D view of Abilene anomaly clusters

Used 2 different clustering algorithms
Results consistent
Heuristics identify about 10 clusters in dataset

18
Anomaly Clusters in Abilene data
19
Summary

Feature distributions as summarized by entropy
are promising for general anomaly diagnosis
Network-Wide Detection
Entropy significantly augments volume metrics
Highly sensitive Detection rates of 90
possible, even when anomaly is 1 of background
traffic
Anomaly Classification
Clusters are meaningful, and reveal new anomalies

20
Points to Ponder

The paper only discusses anomaly detection on
offline data. Can it be enhanced for online
anomaly detection?
We still need volume based detection because
feature distribution does not identify all
anomalies.
Can other fields in packet header be used for
anomaly detection?

21
Profiling Internet Backbone Traffic Behavior
Models and Applications

Kuai Xu, Zhi-LI Zhang, Supratik BhattacharyyaACM
SIGCOMM, August 2005
Presented by Abhinay Kampasi
Referred to presentation on authors website

22
Why profile traffic?

Changes in Internet traffic dynamics
increase in unwanted traffic
emergence of disruptive applications
new services on traditional ports
traditional service on non-standard ports
Existing tools
rely on ports for identifying or classifying
traffic
report volume-based heavy hitters
look for specific or known patterns
Need better techniques to discover behavior
patterns
help network operators secure and manage networks

23
Communication patterns

Underlying communication patterns of end hosts
Who are they talking to? How are ports used?
How many packets or bytes transferred?
Can communication patterns reveal interesting
behavior?

24
Methodology

Data pre-processing
aggregate packet streams into 5-tuple flows
group flows into clusters
Extract significant clusters
data reduction step using entropy
Classify cluster behavior based on
similarity/dissimilarity of communication
patterns
characterize using information theory
clusters classified into behavior classes
Interpret behavior classes
structural modeling for dominant activities

25
Data Preprocessing

Aggregate packet streams into 5-tuple flows
Group flows associated with same end hosts/ports
into clusters

26
Extract Significant Clusters

Focus on significant clusters
Sufficiently large number of flows
Represent behavior of significant interest
Adaptive thresholding using entropy
A cluster is significant if standing outfrom
the rest
Use entropy to quantify whether the rest looks
random

27
Entropy based adaptive thresholding
28
Sample Results
Though the total number of distinct values along
a given dimension may not fluctuate very much,
the number of significant feature values
(clusters) may vary dramatically, due to changes
in the underlying feature value distributions.
29
Relative Uncertainty

Entropy H(X) -Sp(xi)logp(xi)
Maximum Entropy Hmax(X) log min(m,N)
Relative Uncertainty of variable XRU(X) H(X) /
Hmax(X), RU ? 0, 1
RU(X) 0 X is deterministic
RU(X) 1 X is randomly distributed

30
Behavior Characterization
31
Behavior Classes
Summarize three feature distributions into 27
classes0, 0, 0 2, 2, 2, for convenience
BC0to BC26
32
Summary of behavior classes

Behavior classes classify clusters based on
communication patterns
Behavior classes have distinct temporal
properties
Popularity
Average Size
Membership Volatility
Clusters within the same behavior class have
similar structural models
Clusters have stable behavior over time

33
Dominant State Analysis

Each cluster has hundreds or thousands of flows.
An exhaustive approach is not practical
Need a compact summary
Dominant state analysis
Identify the dependency among the free dimensions
of a cluster
Dominant states of a cluster are subsets of
values that approximate the original data

34
General procedure for Dominant State Analysis
35
Applications of Profiles
36
Anomalous Behaviors

Clusters in rare behavior classes
Identified a web server under DDoS attack
Behavioral changes for clusters
Yahoo web server example
Unusual profiles for popular service ports

37
Conclusion

Developed a systematic methodology to
automatically discover and interpret
communication patterns
Used information-theoretical techniques to build
behavior models of end hosts and applications
Applied dominant state analysis to explain
traffic behavior
Identified typical behavior profiles as well as
rare and deviant behaviors

38
Future Work