Structural Analysis of Network Traffic Flows - PowerPoint PPT Presentation

1 / 43
About This Presentation
Title:

Structural Analysis of Network Traffic Flows

Description:

How to extract meaning from this high dimensional structure in a systematic fashion? ... Can we get a high-level understanding of a set of OD flows in terms of ... – PowerPoint PPT presentation

Number of Views:78
Avg rating:3.0/5.0
Slides: 44
Provided by: ghe
Category:

less

Transcript and Presenter's Notes

Title: Structural Analysis of Network Traffic Flows


1
Structural Analysis of Network Traffic Flows
  • Lakhina, K. papagiannaki, M. Crovella, C. Diot,
    E.D. Kolaczyk and N. Taft
  • Presented by guanghui He

2
Outline of the paper
  • Motivation and objective
  • Principal Component Analysis
  • Empirical studies
  • Conclusion

3
Motivation
  • Traditional Traffic Analysis
  • Focus on
  • Short stationary timescales
  • Traffic on a single link in isolation
  • Principal results
  • Scaling properties
  • Packet delays and loses
  • What ISPs Care About
  • Focus on
  • Long, nonstationary timescales
  • Traffic on all links simultaneously
  • Principle goals
  • Traffic engineering
  • Anomaly detection
  • Capacity planning

4
For Whole-network Traffic Analysis
  • Traffic Engineering YAX
  • How to tune A? How does traffic move throughout
    the network?
  • Attack/Anomaly Detection
  • On which links is there unusual traffic?
  • Capacity planning
  • How much and where in network to upgrade?

5
Complicated Job
  • Measuring and modeling traffic on all links
    simultaneously is challenging.
  • Hundreds to thousands of links in a large IP
    backbone network
  • Even single link modeling is difficult
  • High-dimensional timeseries
  • Significant correlation structure
  • Is there a more fundamental representation?

6
One way out OD flows
  • Link traffic arises from the superposition of
    Origin-Destination (OD) flows
  • Modeling OD flows instead of link traffic removes
    a significant source of correlation

7
Still too complicated
  • Each OD flow serves a different customer
    population
  • No two OD flows carry same traffic
  • Are they still correlated?
  • Even more OD flows than links
  • Cause YAX a ill-posed problem
  • How to extract meaning from this high dimensional
    structure in a systematic fashion?

8
Principal Component Analysis
  • Look for a low-dimensional representation
    preserving the most important features of data
  • Usually, a high-dimensional structure may be
    explainable in terms of a small number of
    independent variables
  • Commonly used too Principal Component Analysis
    (PCA)

9
Specific Questions
  • Are there low dimensional representations for a
    set of OD flows?
  • Do OD flows share common features?
  • What do the feature look like?
  • Can we get a high-level understanding of a set of
    OD flows in terms of these features?

10
PCA (1)
  • For any given dataset, PCA finds a new coordinate
    system that maps maximum variability in the data
    to a minimum number of coordinates
  • New axes are called Principal Axes or Components

11
Properties of Principal Components
  • Let p be the number of OD flows and t denote the
    number of successive time intervals of interest.
    Then X is a matrix representing the
    timeseries of all OD flows in a network
  • Each PC points in the direction of maximum
    (remaining) energy in the data

12
PCA on OD flows
  • Set of flows mapped onto a single PC is called an
    eigenflow. V is a new basis for X

13
PCA on OD flows (2)
14
An example of Eigenflow and PC
15
Empirical studies
  • Find intrinsic dimensionality of OD flows

16
(No Transcript)
17
(No Transcript)
18
Major types of eigenflows
19
(No Transcript)
20
(No Transcript)
21
(No Transcript)
22
(No Transcript)
23
Contribution of eigenflow type
24
Contribution to each OD flows
25
Summary Specific questions
  • Are there low dimensional representations for a
    set of OD flows?
  • 5 or 6 eigenflows is sufficient for good
    approximation of a set of 100 OD flows
  • Do OD flows share common features?
  • The common features across OD flows are
    eigenflows
  • What do the features look like?
  • Three types, D,S or N.
  • Can we get a high-level understanding of a set of
    OD flows in terms of these features?
  • High volume flows tend to be dominated by D type
  • Low volume flows tend to be dominated by N type
  • S type contributes across all OD flows

26
Possible applications
  • Traffic Matrix Estimation
  • Anomaly Detection
  • Traffic Forecasting
  • Traffic Engineering

27
Data Streaming Algorithms for Efficient and
Accurate Estimation of Flow Size Distribution
  • A. Kumar, M. Sung, J. Xu, J. Wang

28
Problem statement
  • Computing the distribution of the sizes of the
    flows. Let the flow sizes change from 1 to z. The
    total number of flows is n, is the number
    of flows with i packets. We need to find

29
The approach
  • Data streaming using a lossy data structure.

30
Solution Architecture
  • Measurement proceeds in epochs (e.g. 100s)
  • Maintain an array of counters in fast memory
    (SRAM)
  • For each packet, a counter is chosen via hashing
    and incremented.
  • No attempt to detect or resolve collisions.
  • Data collection is lossy (erroneous), but very
    fast.
  • At the end of the epoch, the counter array is
    paged to disk

31
Offline estimation mechanisms
  • Ideally, no collision happens, then the
    distribution can be accurately estimated. With
    real-world hash functions, collisions do occur.

32
Estimation module
  • The counter array is processed to obtain the
    Counter Value Distribution. is the of
    counters with value 0, and is the of
    counters with value i, i1,2,,z.
  • Use Bayesian statistics to derive the following
    quantities
  • The total number of flows n
  • The total number of flows with exactly 1 packet,
    .
  • The flow distribution

33
Estimation of n and
  • Let the total number of counters be m.
  • The number of flows hashing to any counter c is
    modeled by the Poisson random variable with
    parameter
  • There is a simple estimator for the total number
    of flows
  • The result can be extended to derive an estimator
    of flows of size 1

34
Why?
  • Assume flows have been inserted, the
    number of flows hashed to any counter is Poisson(
    ), then the number of counters not hit is
  • Among these counters, the number of counters
    with exact 1 packet is , so we
    have

35
Estimating the entire flow distribution
  • Begin with a guess of the flow distribution,
    .
  • Based on this, compute the various possible ways
    of splitting a particular counter value and the
    respective probabilities of such events.
  • Then a refined estimate of the flow distribution
  • can be computed.
  • Use in the next iteration.
  • Repeating this until the estimate converge. (EM)

36
The algorithm
37
Calculate
  • Let be the event that flows of size
    ,,
  • of size collide into a slot, then
  • and

38
Computational complexity
  • For counters with value larger than 300, ignore
    the cases involving the collision of 4 or more
    flows.
  • For counters with value between 50 and 300,
    ignore the cases involving the collision of 5 or
    more flows.
  • Other counter values, ignore the cases involving
    the collision of 7 or more flows.

39
Evaluation
40
Evaluationsmall flow size
41
Multi-resolution array of counters
  • The multi-resolution array of counters allow the
    scheme to operate for any value of n, with
    graceful degradation in accuracy for large number
    of flows.

42
Evaluation of MRAC
43
Conclusion
  • Data-streaming based solution for estimating
    flow-distribution
  • Lossy data structure and Bayesian statistics
    generate accurate streaming.
  • Estimation using EM algorithm
Write a Comment
User Comments (0)
About PowerShow.com