Mining Anomalies Using Traffic Feature Distributions - PowerPoint PPT Presentation

1 / 39
About This Presentation
Title:

Mining Anomalies Using Traffic Feature Distributions

Description:

... Traffic Feature Distributions. Anukool Lakhina, Mark Crovella, Christophe Diot ... A general, unsupervised method for reliably detecting and classifying network ... – PowerPoint PPT presentation

Number of Views:105
Avg rating:3.0/5.0
Slides: 40
Provided by: Abhi1
Category:

less

Transcript and Presenter's Notes

Title: Mining Anomalies Using Traffic Feature Distributions


1
Mining Anomalies Using Traffic Feature
Distributions
  • Anukool Lakhina, Mark Crovella, Christophe Diot
  • ACM SIGCOMM, August 2005
  • Presented by Abhinay Kampasi
  • Referred to presentation on authors website

2
Motivation for Anomaly Detection
  • Is my customer being attacked?
  • Is someone probing my network?
  • Are there worms spreading?
  • A sudden traffic surge?
  • An equipment outage?
  • Something never seen before?

Anomalies present in network trafficare buried
like needles in a haystack!
3
Previous Work
  • Volume based anomaly detection
  • Largely focused on
  • Point solutions
  • not a general approach
  • Rule-based classification
  • not unsupervised
  • Data from single links
  • not network-wide

A general, unsupervised method for reliably
detecting and classifying network anomalies is
needed
4
Feature Distributions
  • Anomalies can be detected and distinguished by
    inspecting traffic features SrcIP, SrcPort,
    DstIP, DstPort

5
Feature Distribution Changes induced by Port Scan
Anomaly
6
Entropy
  • Metric that captures degree of dispersal or
    concentration
  • where symbol i occurs ni times in sample
  • S is total of observations
  • Value lies between 0 and log2N
  • 0 when distribution maximally concentrated
  • All observations same
  • log2N when distribution maximally disperseds
  • All observations distinct
  • Entropy value is normalized

7
Applying Entropy to Port Scan Data
8
Methodology
  • Detect
  • Use multiway subspace method
  • Augments volume metrics, highly sensitive
  • Identify anomalies on multiple features and flows
  • Classify
  • Use clustering on anomaly features
  • Can do unsupervised classification

9
Network-Wide Traffic Data Collected
  • Collected 3 weeks of sampled NetFlow data at 5
    minute bins from two backbone networks
  • Compute entropy on packet histograms for 4
    traffic features SrcIP, SrcPort, DstIP, DstPort
  • Two sources of bias sampling and anonymization
    of IP addresses

10
Multiway Subspace Method
  • Based on subspace method and principal component
    analysis
  • Every point in subspace has normal and residual
    components
  • H(t,p,k) denotes the entropy value at time t for
    flow p, of traffic feature k
  • Unwrap the multiway matrix into one matrix
  • Apply subspace method on merged matrix
  • Detect anomalies by monitoring size of
    residual vector for unusually high values

11
How does entropy compare with volume-based
detection?
  • Does entropy allow detection of a larger set of
    anomalies?
  • Are anomalies detected by entropy fundamentally
    different from volume-based methods?
  • How precise is the entropy-based detection?

12
Comparison
Points that lie to the right of the vertical line
are volume-detected anomalies and points that lie
above the horizontal line are detected in entropy.
13
Manual Inspection
14
Detection Rate by Injecting Real Anomalies
  • Evaluation Methodology
  • Superimpose known anomaly traces into OD flows
  • Test sensitivity at varying anomaly intensities,
    by thinning trace
  • Results are average over a sequence of experiments

15
Classifying Anomalies by Clustering
  • Use unsupervised classification
  • Each anomaly is a point in 4-D space
  • (SrcIP), (SrcPort), (DstIP),
    (DstPort)
  • Use Hierarchical Agglomerative Algorithm for
    determining clusters
  • Minimizes intra-cluster variation and maximizes
    intra-cluster variation

16
Clustering Known Anomalies (2-D view)
Legend Code Red Scanning Single source DOS
attack Multi source DOS attack
17
3-D view of Abilene anomaly clusters
  • Used 2 different clustering algorithms
  • Results consistent
  • Heuristics identify about 10 clusters in dataset

18
Anomaly Clusters in Abilene data
19
Summary
  • Feature distributions as summarized by entropy
    are promising for general anomaly diagnosis
  • Network-Wide Detection
  • Entropy significantly augments volume metrics
  • Highly sensitive Detection rates of 90
    possible, even when anomaly is 1 of background
    traffic
  • Anomaly Classification
  • Clusters are meaningful, and reveal new anomalies

20
Points to Ponder
  • The paper only discusses anomaly detection on
    offline data. Can it be enhanced for online
    anomaly detection?
  • We still need volume based detection because
    feature distribution does not identify all
    anomalies.
  • Can other fields in packet header be used for
    anomaly detection?

21
Profiling Internet Backbone Traffic Behavior
Models and Applications
  • Kuai Xu, Zhi-LI Zhang, Supratik BhattacharyyaACM
    SIGCOMM, August 2005
  • Presented by Abhinay Kampasi
  • Referred to presentation on authors website

22
Why profile traffic?
  • Changes in Internet traffic dynamics
  • increase in unwanted traffic
  • emergence of disruptive applications
  • new services on traditional ports
  • traditional service on non-standard ports
  • Existing tools
  • rely on ports for identifying or classifying
    traffic
  • report volume-based heavy hitters
  • look for specific or known patterns
  • Need better techniques to discover behavior
    patterns
  • help network operators secure and manage networks

23
Communication patterns
  • Underlying communication patterns of end hosts
  • Who are they talking to? How are ports used?
  • How many packets or bytes transferred?
  • Can communication patterns reveal interesting
    behavior?

24
Methodology
  • Data pre-processing
  • aggregate packet streams into 5-tuple flows
  • group flows into clusters
  • Extract significant clusters
  • data reduction step using entropy
  • Classify cluster behavior based on
    similarity/dissimilarity of communication
    patterns
  • characterize using information theory
  • clusters classified into behavior classes
  • Interpret behavior classes
  • structural modeling for dominant activities

25
Data Preprocessing
  • Aggregate packet streams into 5-tuple flows
  • Group flows associated with same end hosts/ports
    into clusters

26
Extract Significant Clusters
  • Focus on significant clusters
  • Sufficiently large number of flows
  • Represent behavior of significant interest
  • Adaptive thresholding using entropy
  • A cluster is significant if standing outfrom
    the rest
  • Use entropy to quantify whether the rest looks
    random

27
Entropy based adaptive thresholding
28
Sample Results
Though the total number of distinct values along
a given dimension may not fluctuate very much,
the number of significant feature values
(clusters) may vary dramatically, due to changes
in the underlying feature value distributions.
29
Relative Uncertainty
  • Entropy H(X) -Sp(xi)logp(xi)
  • Maximum Entropy Hmax(X) log min(m,N)
  • Relative Uncertainty of variable XRU(X) H(X) /
    Hmax(X), RU ? 0, 1
  • RU(X) 0 X is deterministic
  • RU(X) 1 X is randomly distributed

30
Behavior Characterization
31
Behavior Classes
Summarize three feature distributions into 27
classes0, 0, 0 2, 2, 2, for convenience
BC0to BC26
32
Summary of behavior classes
  • Behavior classes classify clusters based on
    communication patterns
  • Behavior classes have distinct temporal
    properties
  • Popularity
  • Average Size
  • Membership Volatility
  • Clusters within the same behavior class have
    similar structural models
  • Clusters have stable behavior over time

33
Dominant State Analysis
  • Each cluster has hundreds or thousands of flows.
  • An exhaustive approach is not practical
  • Need a compact summary
  • Dominant state analysis
  • Identify the dependency among the free dimensions
    of a cluster
  • Dominant states of a cluster are subsets of
    values that approximate the original data

34
General procedure for Dominant State Analysis
35
Applications of Profiles
36
Anomalous Behaviors
  • Clusters in rare behavior classes
  • Identified a web server under DDoS attack
  • Behavioral changes for clusters
  • Yahoo web server example
  • Unusual profiles for popular service ports

37
Conclusion
  • Developed a systematic methodology to
    automatically discover and interpret
    communication patterns
  • Used information-theoretical techniques to build
    behavior models of end hosts and applications
  • Applied dominant state analysis to explain
    traffic behavior
  • Identified typical behavior profiles as well as
    rare and deviant behaviors

38
Future Work
  • Correlating behavior profiles across multiple
    links
  • Validate behavior profiles using additional
    features, e.g., packet payload
  • Integrate traffic profiling framework with a
    real-time monitoring system

39
Thank You ?
Questions / Comments?
Write a Comment
User Comments (0)
About PowerShow.com