Title: On Understanding of Transient Interdomain Routing Failures
1On Understanding of Transient Interdomain Routing
Failures
- Feng Wang, Lixin Gao, Jia Wang, and Jian Qiu
Department of Electrical and Computer Engineering
University of Massachusetts, Amherst MA 01002
ATT Labs-research 180 Park Ave, Florham Park NJ
07869
2Outline
- What is transient routing failures?
- When can transient routing failures occur?
- How long can transient routing failures last?
- Measurement results
3Internet Routing
- Autonomous systems (ASes)
- Internet Service Providers (ISPs)
- Companies
- Universities
- Intradomain Routing Protocols
- Static Routing, OSPF, IS-IS
- Interdomain Routing Protocol
- Border Gateway Protocol (BGP)
4Long Convergence Delay
- Long convergence delay (Labovitz et al, TON2001)
- Bringing a route back
- (Tup) ltshortest path length ? MRAI
- Disconnecting a route
- (Tdown) ltlongest path length ? MRAI
- Fail-over rerouting from Path A to Path B
- During the time for discovering Path B, routers
might experience transient routing failures,
i.e., no route is available
5An Example of Transient Routing Failure
AS3
AS1
W20
W20
W20
AS2
120 10
10
20
210
A10
A10
A10
losing reachability
Traffic on data plane
AS0
BGP update
d
BGP Routing table
6Our Contributions
- Identify transient routing failures
- Sufficient conditions
- Bound transient routing failure duration
7Outline
- What is transient routing failures?
- When can transient routing failures occur?
- How long can transient routing failures last?
- Measurement results
8When Transient Routing Failures can Occur?
- Two sufficient conditions for a node must
experience a transient routing failure (transient
routing failure for sure). - One sufficient condition for a node may
experience a transient routing failure (potential
transient routing failure).
w
10
310
1
3
w
2
210 20
20
0
9When Transient Routing Failures can Occur?
(contd.)
w
310 320
320
10
310
1
3
w
A
2
210 20
20
0
10Outline
- What is transient routing failures?
- When can transient routing failures occur?
- How long can transient routing failures last?
- Measurement results
11How long Transient Routing Failures last?
MRAI timer
MRAI timer
W 2 0
W 2 0
W 2 0
120 10
10
10
210
2
1
A 10
A 10
A 10
0
d
12MRAI Timers
- Minimum Advertisement Interval timer
- Minimum amount of time that must elapse between
routing updates - Applied to BGP announcement or withdrawal
- Default MRAI value
- eBGP session 30 seconds
- iBGP session 5 seconds
13Upper Bound for Transient Routing Failure Duration
- Transient routing failure ? min(du? d ?u ) ?
MRAI
du?
, d?u
?
u
u
?
v
0
0
14Transient Failures in a Typical BGP System
- A typical BGP system means that every router in
the system applies common routing policies. - Routing policies are guided by commercial
relationships between ASes. - Customer-to-provider
- Peer-to-peer
- Common routing policies
- Import policies are guided by the prefer-customer
routing policies. - Export policies are guided by the no-valley
routing policies
15Occurrence of Transient failures in a typical BGP
system
- In a typical BGP system, transient failures are
prevalent. - Tier-1 ASes can experience transient routing
failures, where alternate routes come from their
edge routers. - Non tier-1 ASes can experience transient routing
failures, where alternate routes are obtained
from other ASes.
16Outline
- What is transient routing failures?
- When can transient routing failures occur?
- How long can transient routing failures last?
- Measurement results
17Measuring Transient Failures within a tier-1 AS
BGP updates, BGP tables and router configuration
files are collected during July 2004
Cumulative distribution of transient Failure
Duration
Percentage of transient failures among all
routing failures that last less than 30 seconds
18Measuring Transient Failures contd.
- Transient failures in tier-2 ASes using Oregon
RouteViews BGP updates (July 2004)
19Popularity of Prefixes Experiencing Transient
Failures
- We aggregate the Netflow data collected in the
tier-1 AS during the week (1/2/20051/8/2005) - Transient routing failures can impact on popular
prefixes and unpopular prefixes
Fraction of transient routing failures
20Conclusions
- Transient routing failures are prevalent in the
Internet, and can last for a significant period
of time. - Majority of transient failures occur under the
commonly applied routing policy setting. - Popular and unpopular prefixes can experience
transient failures.
21Thanks