Title: Diagnosing Network Disruptions with Networkwide Analysis
1Diagnosing Network Disruptions with Network-wide
Analysis
- Yiyi Huang, Nick Feamster,
- Anukool Lakhina , Jim Xu
College of Computing, Georgia Tech Guavus, Inc
2Problem Overview
- Network disruptions happen frequently, sometimes
leading to a disaster - Abilene network 282 disruptions from January 1,
2006 to June 30, 2006(among 379 e-mails on
operational mailing list)
- How to help network operators?
Goal quick detection and accurate
identification of disruptions
3Is Mining Traffic Data enough?
No! Mining routing data for disruptions
Misconfiguration!!
Atlanta Users
1.High packet loss rate2.repeat sending request
Gatech
Flash Crowd?
Atlanta
Tokyo
4Analyzing Routing Data
Previous work grouping messages in a single
stream and reporting loud events
5Problems with Analyzing Single Routing Streams
- Size of disruptions varies
- Different events from 102 to 106
- Same events vary across routers
- Many known disruptions are low volume
- 50 disruptions have no more than 1000 messages
in 10 minutes - Not all severe events are large spikes
Hard to set thresholds!
6Key Idea Network-wide Analysis
Intuition A single network event can cause
routing changes at many routers across the
network
Dest
Small number of routes shifting
Correlated routing changes
B
A
C
Large amount of traffic shifting
E
D
Correlation across routers is caused by the
structure and configuration of the network
7Overview of Approach
Detection
MultivariateTimeseries Analysis
PossibleDisruptions
Network-Wide BGP routing updates
BGP routing update streams
Hybrid Static/Dynamic Analysis
LikelyScenarios
Identification
Static Configuration Analysis
Network-wideRouter Configurations
Network Model
8Detection Illustration
In general, anomalous updates results in a large
value of
Updates on Router 2
Updates on Router 1
9Detection Subspace Method
10Detection Approach
Inspecting size of vector in abnormal subspace
using squared prediction error
Updates on Router2
0.001
Updates on Router 1
0.001
11Evaluation on Abilene Network
- Data BGP updates from Abilene Network
- Validation Abilene operational mailing list
- Ground truth
- Time period January 1, 2006 to June 30, 2006
12Detection Results
- High detection rate
- 100 of node and link disruptions 60 of peer
disruptions - Acceptable false positives
Reported in the email list
Detected by subspace method
Instability Unavailability118 events
75
43
535
125
Maintenance
13Overview of Approach
Detection
MultivariateTimeseries Analysis
Possible Disruptions
Network-Wide BGP routing updates
BGP routing update streams
Hybrid Static/Dynamic Analysis
Likely Scenario
Identification
Static Configuration Analysis
Network-wideRouter Configurations
Network Model
14Identification Concept
AS1
Route advertisement
R1
AS 2
R2
15Identification Implementation
- Bootstrapping phase constructs the routing table
for each router - Runtime tracking phase
of iBGPnext-hopsdecrease atall routers?
Y
Node
N
of iBGPnext-hops decrease at some routers?
of eBGPnext-hops decrease at some routers?
N
External
N
Y
Y
Link
Peer
16Identification Results
- Out of what the subspace mehtod detected, we
correctly identified - 100 of node disruptions
- 74 link disruptions
- 93 peer disruptions
- Possible Reason for identification failure
- Routing table is built from bootstrapping phase
- Withdraw takes time over all routers
17Summary
- Network-wide Analysis of routing updates
- Size of disruptions vary by several orders of
magnitude - Many severe disruptions are low volume
- Accurate detection and identification
- Potentially fast
- Next step online implementation
- New direction on exploiting routing information
using network-wide analysis
18Thank you