Title: Structural Analysis of Network Traffic Flows
1Structural Analysis of Network Traffic Flows
- Anukool Lakhina
- with Dina Papagiannaki, Mark Crovella, Christophe
Diot, Eric Kolaczyk, and Nina Taft
ACM SIGMETRICS 04
2Traditional Traffic Analysis
- Focus on
- Short stationary periods
- Traffic on a single link in isolation
- Principal results
- Scaling properties
- Models for single-link traffic
3Need for Whole-Network Traffic Analysis
- Traffic Engineering How does traffic move
throughout the network? - Attack/Anomaly Detection Which links show
unusual traffic? - Capacity planning How much and where in network
to upgrade?
4Origin-Destination Flows
total traffic on the link
traffic
time
- All link traffic arises from the superposition of
OD flows - Traffic carried by OD flows roughly independent
- A useful primitive for whole-network analysis
5This is Complicated!
- Understanding traffic on all flows simultaneously
is challenging - Even single flow traffic analysis is difficult
- 100s of OD flows in large IP networks
- High-Dimensional, multivariate timeseries
6High Dimensionality A General Strategy
- Look for good low-dimensional representations
- Often a high-dimensional structure can be
captured via a small number of independent
variables - A commonly used technique Principal Component
Analysis (PCA)
7Our Work
- Measure complete sets of OD flow traffic from
two backbone networks - Use PCA to understand their structure
- Extract common features
- Characterize individual features
- Reconstruct as sum of features
- Describe potential applications of results
8Datasets
- Two networks
- Abilene 11 PoPs, 121 OD flows
- Sprint-Europe 13 PoPs, 169 OD flows
- Methodology
- Collect sampled traffic from every ingress link
- Use BGP tables to resolve egress points
- Week-long byte timeseries, at 10 minute bins
9Example OD Flows
Some have visible structure, some less so
10Specific Questions of Structural Analysis
- Do low dimensional representations for OD flows
exist? - Do OD flows share common features?
- What do these features look like?
- Can we get a high-level understanding of a set of
OD flows in terms of these features?
11Principal Component Analysis
Coordinate transformation method
Original Data
12PCA on OD flows
- Each principal axis in the direction of maximum
(remaining) energy in set of OD flows - Ordered by amount of energy they capture
- Eigenflow set of OD flows mapped onto a
principal axis a common pattern - Ordered by most common to least common pattern
- An OD flow is a weighted sum of eigenflows
13Low Intrinsic Dimensionality of OD Flows
Plot of energy captured by each principal
component
Energy Captured
Principal Component
14Approximating With Top 5 Eigenflows
15Kinds of Eigenflows
Deterministic d-eigenflows
Spike s-eigenflows
Noise n-eigenflows
Roughly stationary and Gaussian
Sudden, isolated spikes and drops
Predictable (periodic) trends
16D-eigenflows Have Periodicity
Power spectrum
17S-eigenflows Have Spikes
5-sigma threshold
18N-eigenflows Are Gaussian
qq-plot
19Hundreds of OD Flows But Only Three Basic
Patterns
20Which Eigenflows Are Most Significant?
d-eigenflows are most significant in both
networks s-eigenflows are next important n-
s-eigenflows account for rest
N
S
D
N
S
D
mostcommon
leastcommon
21An OD Flow, Reconstructed
OD flow
D-components
S-components
N-components
22Contribution to Each OD Flow (Sprint)
Largest OD flows Strong deterministic
component Smallest OD flows Primarily dominated
by spikes Regardless of size, n-eigenflows
account for a fairly constant portion
(Sprint)
23Contribution to Each OD Flow (Abilene)
Largest OD flows Strong deterministic
component Smallest OD flows Dominated by noise,
but have diurnal trends also Regardless of size,
spikes account for a fairly constant portion
24Summary Specific Questions
- Are there low dimensional representations for a
set of OD flows? - 5-10 eigenflows are sufficient to describe 100
OD flows - Do OD flows share common features?
- The common features across OD flows are
eigenflows - What do the features look like?
- Each eigenflow can be categorized as D, S, or N
- Can we get a high-level understanding of a set of
OD flows in terms of these features? - Both networks Large flows are primarily diurnal
- Sprint Small flows are primarily spikes noise
constant - Abilene Small flows have N and D spikes
constant
25New Approaches to Important Problems
- Anomaly detection Low dimensional structure can
be considered "normal" to identify anomalies (see
our Sigcomm04 paper) - Traffic Matrix Estimation Low dimensional
structure easier to estimate from link traffic - Traffic Forecasting Build forecasting models on
d-eigenflows, and forecast all OD flows - Traffic Engineering Use D/S/N classification to
identify "heavy hitters", and treat these
differently
26Final Thoughts
- OD flows are a useful primitive for whole-network
traffic analysis - PCA forms an effective basis for a Structural
Analysis of OD flows - Structural Analysis has many benefits
- provides insight into nature of OD flows
- allows feature-based decomposition of OD flows
- provides leverage on many important problems
27Thanks!
- Help with Sprint-Europe Data
- Bjorn Carlsson, Jeff Loughridge (SprintLink)
- Supratik Bhattacharyya, Richard Gass (ATL)
- Help with Abilene Data
- Mark Fullmer, Rick Summerhill, (Internet2)
- Matthew Davy (Indiana University)