Diagnosing Network Disruptions with Networkwide Analysis - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

Diagnosing Network Disruptions with Networkwide Analysis

Description:

'Ground truth' Time period: January 1, 2006 to June 30, 2006. Seattle. 6,766,986. Sunnyvale ... Destination Next-hop AS Path. 130.207.0.0/16. R1. 1..dest. R1 ... – PowerPoint PPT presentation

Number of Views:81
Avg rating:3.0/5.0
Slides: 19
Provided by: ccGa
Category:

less

Transcript and Presenter's Notes

Title: Diagnosing Network Disruptions with Networkwide Analysis


1
Diagnosing Network Disruptions with Network-wide
Analysis
  • Yiyi Huang, Nick Feamster,
  • Anukool Lakhina , Jim Xu

College of Computing, Georgia Tech Guavus, Inc
2
Problem Overview
  • Network disruptions happen frequently, sometimes
    leading to a disaster
  • Abilene network 282 disruptions from January 1,
    2006 to June 30, 2006(among 379 e-mails on
    operational mailing list)
  • How to help network operators?

Goal quick detection and accurate
identification of disruptions
3
Is Mining Traffic Data enough?
No! Mining routing data for disruptions
Misconfiguration!!
Atlanta Users
1.High packet loss rate2.repeat sending request
Gatech
Flash Crowd?
Atlanta
Tokyo
4
Analyzing Routing Data
Previous work grouping messages in a single
stream and reporting loud events
5
Problems with Analyzing Single Routing Streams
  • Size of disruptions varies
  • Different events from 102 to 106
  • Same events vary across routers
  • Many known disruptions are low volume
  • 50 disruptions have no more than 1000 messages
    in 10 minutes
  • Not all severe events are large spikes

Hard to set thresholds!
6
Key Idea Network-wide Analysis
Intuition A single network event can cause
routing changes at many routers across the
network
Dest
Small number of routes shifting
Correlated routing changes
B
A
C
Large amount of traffic shifting
E
D
Correlation across routers is caused by the
structure and configuration of the network
7
Overview of Approach
Detection
MultivariateTimeseries Analysis
PossibleDisruptions
Network-Wide BGP routing updates
BGP routing update streams
Hybrid Static/Dynamic Analysis
LikelyScenarios
Identification
Static Configuration Analysis
Network-wideRouter Configurations
Network Model
8
Detection Illustration
  • Approach subspace method

In general, anomalous updates results in a large
value of
Updates on Router 2
Updates on Router 1
9
Detection Subspace Method
10
Detection Approach
Inspecting size of vector in abnormal subspace
using squared prediction error
Updates on Router2
0.001
Updates on Router 1
0.001
11
Evaluation on Abilene Network
  • Data BGP updates from Abilene Network
  • Validation Abilene operational mailing list
  • Ground truth
  • Time period January 1, 2006 to June 30, 2006

12
Detection Results
  • High detection rate
  • 100 of node and link disruptions 60 of peer
    disruptions
  • Acceptable false positives

Reported in the email list
Detected by subspace method
Instability Unavailability118 events
75
43
535
125
Maintenance
13
Overview of Approach
Detection
MultivariateTimeseries Analysis
Possible Disruptions
Network-Wide BGP routing updates
BGP routing update streams
Hybrid Static/Dynamic Analysis
Likely Scenario
Identification
Static Configuration Analysis
Network-wideRouter Configurations
Network Model
14
Identification Concept
AS1
Route advertisement
R1
AS 2
R2
15
Identification Implementation
  • Bootstrapping phase constructs the routing table
    for each router
  • Runtime tracking phase

of iBGPnext-hopsdecrease atall routers?
Y
Node
N
of iBGPnext-hops decrease at some routers?
of eBGPnext-hops decrease at some routers?
N
External
N
Y
Y
Link
Peer
16
Identification Results
  • Out of what the subspace mehtod detected, we
    correctly identified
  • 100 of node disruptions
  • 74 link disruptions
  • 93 peer disruptions
  • Possible Reason for identification failure
  • Routing table is built from bootstrapping phase
  • Withdraw takes time over all routers

17
Summary
  • Network-wide Analysis of routing updates
  • Size of disruptions vary by several orders of
    magnitude
  • Many severe disruptions are low volume
  • Accurate detection and identification
  • Potentially fast
  • Next step online implementation
  • New direction on exploiting routing information
    using network-wide analysis

18
Thank you
  • Questions?
Write a Comment
User Comments (0)
About PowerShow.com