Title: Trafficaware InterDomain Routing for Improved Internet Routing Stability
1Traffic-aware Inter-Domain Routing for Improved
Internet Routing Stability
- Zhenhai Duan
- Florida State University
2Outline
- Introduction and Background
- Motivation and Intuition
- Traffic-Aware Inter-Domain Routing (TIDR)
- Performance Studies
- Summary
3Introduction and Background
- Internet consists of large number of network
domains - Or Autonomous Systems (ASes)
- Currently about 26K
- Exchange network prefix reachability information
using BGP - In a system this big, things happen all the time
- Fiber cuts, equipment outages, operator errors
- Direct consequence on routing system
- Large number of BGP updates exchanged between
ASes - Re-computing/propagating best routes
- Events may propagated through entire Internet
- Effects on user-perceived network performance
- Long network delay, packet loss, even loss of
network connectivity
4Introduction and Background
- Implicit design assumption in BGP
- Failure events of same importance to all users
- No explicit mechanisms to localize failure in BGP
- Internet global reachability global
propagation of failure - Is this valid?
- A user (AS) in US may not be interested in
failure in Asian country - Design of BGP failed to recognize two Internet
properties - Internet access non-uniformity
- Prevalence of transient failures
5Motivation and Intuition
- Internet access non-uniformity
- APRANET(1970, Kleinrok and Naylor)
- Top 12.6 responsible for 90 of traffic
- NSFNET(1980,Rekhter and Chinoy)
- Top 10 responsible for 85 of traffic
- Fang and Peterson (1999), and Rexford(2002)
- Non-uniform distribution nature of Internet
traffic - Model on network value IEEE/SPECTRUM2006
- Zipfs law
6Internet Access Non-Uniformity
- FSU Study
- Study if Internet access locality holds from
viewpoint of edge network - Bidirectional data traffic collected at border
router at FSU for 16 days
7FSU Data Traffic on other Days
8BGP Updates (RouteViews Project)
Most of updates are from rest of the prefixes
Only a few updates are related to top prefixes at
FSU
9Motivation and Intuition
- Prevalence of transient failures
- Sprint backbone measurement (2002)
- BGP misconfigurations
- 50 misconfigurations lasted less than 10 minutes
- 50 lt 1 minute
- 80 lt 10 minutes
- 90 lt 20 minutes
Majority of network failures are transient
10Motivation and Intuition
TIDR
Prevalence of Transient Failure Majority of the
network failures on the Internet are transient
Internet Access Non-Uniformity Users
(networks) normally communicates with small set
of other network domains
11Traffic-aware Inter-Domain Routing (TIDR)
- Prefix classified into either significant or
insignificant - At AS v, with respect to neighbor n
- Treat differently propagation of sign/insign
prefixes - Propagating BGP updates of sign prefixes with
high priority - Aggressively slow down propagation of BGP updates
of insign prefixes - Localizing effect of transient failures on insign
prefixes - Hold propagation of transient failures if valid
alternative route exists - BGP withdrawals always propagated
n
12TIDR Timers
Recovery
AS
10 MIN.
15/30 SEC.
TIDR TIMER
MRAI TIMER
13TIDR Design
- How to avoid traffic black-holes?
- If the alternative route that is held by Timer is
invalid, node will be the black-hole that drops
all the packets that it receives - Utilizing Root Cause Information (RCI)
- Similar to EPIC and RCN
- flush out all local invalid alternative routes
- Alternative route chosen can be guaranteed to be
valid - How to avoid slow propagation of long-term
failure of insign pref - Every node will hold propagation of BGP update,
if not design carefully - Only one node will apply TIDR timer to insign
prefixes - Nodes neighboring to failure
- First node to have valid alternative route
14TIDR Algorithm
15Performance Studies
- Used simBGP simulator
- With both clique and Waxman random network
topologies - Simulated both link fail-down and fail-over
events - Only dummy node announce prefixes
- 20 to be significant, 80 to be insignificant
- Link failure
- 20 to be long-term, 80 to be transient
- Settings
- Link delay randomly from 0.01 to 0.1 seconds
- Processing delay randomly from 0.001 to 0.01
seconds - MRAI timer 30 seconds
- TIDR timer 10 minutes
16Fail-down Events
17Fail-Over Events
18Summary and On-going Work
- TIDR Traffic-aware Inter-Domain Routing
- Capitalizing on two important properties
- Internet access non-uniformity
- Prevalence of transient failure
- Differentiated BGP update propagation for sign
and insign prefixes - Propagating updates of sign prefixes with higher
priority - Aggressively slow down propagation of updates of
insign prefix - Performed simulation studies
- Outperforms BGP and other existing enhancements