Title: Understanding the LargeScale Dynamics of Internet Routing Protocols The Global Internet: Measurement
1Understanding the Large-Scale Dynamics of
Internet Routing ProtocolsThe Global Internet
Measurement Modeling and AnalysisLeiden, the
Netherlands September, 2000
Craig Labovitz, Roger Wattenhofer Srinivasan
Venkatachary Microsoft Research labovit,
rogerwar, cheenu_at_microsoft.com
Abha Ahuja, Farnam Jahanian, Abhijit
Bose University of Michigan ahuja, farnam,
abose_at_umich.edu
2Motivation
- Large-scale, distributed protocols/systems often
exhibit unexpected behaviors in deployment - Self-synchronization
- BGP/TCP pathologies
- Global TCP flow synchronization
- Modeling routing behaviors critical for improved
end-to-end performance and reliability - Can we model Internet routing dynamics?
- What are the properties of fault propagation (and
recovery) of Internet paths?
3Conventional Routing Wisdom(IETF, IAB, Books,
ISPs, etc)
- Internet routing is robust under faults
- Supports path re-routing and restoral on the
order of seconds - BGP has good convergence properties
- Does not exhibit looping/bouncing problems of RIP
- Internet fail-over will improve with faster
routers and faster links - More redundant connections (multi-homing) to
Internet will always improve site fault-tolerance - Internet topology/diameter of all paths small (lt
3)
4In This Talk
- We will show that most of the conventional wisdom
about routing convergence is not accurate - Measurement of BGP failures
- Measurement of BGP dynamics following failures
- Analysis/intuition behind delayed BGP routing
convergence - Impact of policy and topology on BGP convergence
5Basic Methodology
- Deploy probe machines at IXPs around the world
- Write home-brewed Unix software tools to collect
(and later, inject) BGP (and OSPF/ISIS/RIP)
routing information from lots of commercial
providers
6Internet BGP Update Volume
- Withdraws in millions until 2/1998 due to
withdraw looping/Cisco bug. Dramatic drop after
IOS release - Announcements growing after 6/98 due to MED
policy and convergence?
7Open Question
- After a fault in a path to multi-homed site, how
long does it take for the majority of Internet
routers to fail-over to the secondary path?
- Routing table convergence (backbone routers reach
steady-state) after a fault - End-to-end paths stable (normal levels of loss
and latency)
BGP
Primary ISP
Customer
BGP
Backup ISP
8Experiments
- Inject BGP faults (announcements/withdraws) of
varied prefix and ASPath lengths into
topologically and geographically diverse ISP
peering sessions - Monitor impact faults through 1) recordings of
default-free BGP peering sessions with 20
tier1/tier2 ISPs and 2) active ICMP measurements
(512 byte/second to 100 random web sites) - Wait two years (and 250,000 faults)
9Fault Scenarios
- Tup -- A new route is advertised
- Tdown -- A route is withdrawn (i.e. single-homed
failure) - Tshort -- Advertise a shorter/better ASPath (i.e.
primary path repaired) - Tlong -- Advertise a longer/worse ASPath
(i.e.primary path fails)
10Major Convergence Results
- Routing convergence requires an order of
magnitude longer than expected (10s of minutes) - Routes converge more quickly following Tup/Repair
than Tdown/Failure events (bad news travels more
slowly) - Curiously, withdrawals (Tdown) generate several
times the number of announcements than
announcements (Tup)
11Example of BGP Convergence
- TIME BGP Message/Event
- 104030 Route Fails/Withdrawn by AS2129
- 104108 2117 announce 5696 2129
- 104132 2117 announce 1 5696 2129
- 104150 2117 announce 2041 3508 3508 4540 7037
1239 5696 2129 - 104217 2117 announce 1 2041 3508 3508 4540 7037
1239 5696 2129 - 104305 2117announce 2041 3508 3508 4540 7037
1239 6113 5696 2129 - 104335 2117 announce 1 2041 3508 3508 4540 7037
1239 6113 5696 2129 - 104359 2117 sends withdraw
- BGP log of updates from AS2117 for route via
AS2129 - One BGP withdrawal triggers 6 announcements and
one withdrawal from 2117 - Increasing ASPath length until final withdraw
12CDF of BGP Routing Table Convergence Times
New Route Long-gtShort Fail-over
Short-gtLong Fail-Over
Failure
- Less than half of Tdown events converge within
two minutes - Tup/Tshort and Tdown/Tlong form equivalence
classes - Long tailed distribution (up to 15 minutes)
13Impact of Delayed Convergence
- Why do we care about routing table convergence?
It deleteriously impacts end-to-end Internet
paths - ICMP experiment results
- Loss of connectivity, packet loss, latency, and
packet re-ordering for an average of 3-5 minutes
after a fault - Why? Routers drop packets for which they do not
have a valid next hop. Also problems with cache
flushing in some older routers.
14End-to-End Impact Failover
- ICMP loss to 100 randomly chosen web sites with
VIF source address of our probe - Tlong/Tshort exhibit similar relationship as
before
15Delayed Convergence Background
- Well known that distance vector protocols exhibit
poor convergence behaviors - Counting to infinity, looping, bouncing problem
- RIP redefines infinity and adds split-horizon,
poison reverse, etc. - Still, slow convergence (N3) and not scalable
- BGP advertises ASPaths instead of distance
- ASPath Solves counting to infinity and RIP
looping problem, but - BGP can still explore invalid paths during
convergence (i.e. the bouncing problem)
16Problems with Distance Vector ProtocolsCounting
to Infinity
B
A
R
R 5
R 7
17BGP Convergence Example
18N gt 4?
AS6453
AS2497
6453 1239 5696 237
AS6113
2497 5696 237
6113 2914 237
AS6461
6461 5696 237
AS1239
1239 5696 237
AS5696
5696 237
AS2914
2914 237
AS237
237
AS701
701 6461 5696 237
AS5000
5000 237
AS1
AS1673
1 5696 237
1673 5696 237
19Intuition for Delayed BGP Convergence
- There exists possible ordering of messages such
that BGP will explore ALL possible ASPaths of ALL
possible lengths - BGP is O(N!), where N number of default-free BGP
speakers in a complete graph with default policy - Although seemingly very different protocols, BGP
and RIP share very similar convergence behaviors.
Major difference - RIP explores metrics (1N)
- BGP ASPath provides multiple ways to represent
metric (path) of length N, or (N-1)!
20BGP and RIP
- Both exhibit routing table loops
- Both learn invalid state from neighbors (based
incomplete knowledge) and propagate invalid state
information to neighbors - Both employ hold-downs
- RIP 30 second timer
- BGP MinRouteAdver
- Adds synchronization in best case
21Lower Bound on BGP
- If assume optimal ordering of messages, what is
the best we can expect from BGP? - In practice, BGP timers (MinRouteAdver) provide
synchronization and limit possible orderings of
messages - MinRouteAdver timer specifies interval between
successive updates sent to a peer for a given
prefix - Useful for bundling updates together
- According to RFC, MinRouteAdver applies only
announcements - But, interaction of MinRouteAdver and vendor
ASPath loop detection implementation introduce
artificial delay
22MinRouteAdver
- Minimum interval between successive updates sent
to a peer for a given prefix - Allow for greater efficiency/packing of updates
- Rate throttle
- Applied only to announcements (at least according
to BGP RFC) - Applied on (prefix destination, peer) basis, but
implemented on (peer) basis
23MinRouteAdver
- 30(N-3) delay due to creation mutual
dependencies. Provide proof that N-3 rounds
necessarily created during bounded BGP
MinRouteAdver convergence - Rounds due to
- Ambiguity in the BGP RFC and lack sender-side
loop detection - Inclusion of BGP withdrawals with MinRouteAdver
(in violation of RFC)
24MinRouteAdver Rounds
- Implementation of MinRouteAdver timer and
receiver-side loop detection timer leads to 30
second rounds O(n-3)30 seconds time complexity
25Impact of Policy and Topology
- In practice, Internet is not a complete graph and
ASes maintain complex routing policies - Given ISP policies and an Internet topology for a
route, can we estimate the time required for
convergence? - Most analysis of Internet topology is based on
steady-state or low frequency snapshots - How does steady-state topology compare to set of
all possible paths?
26Comparing ISP Convergence Latencies
- CDF of faults injected into three Mae-West
providers and observed at Japanese ISP - Significant variations between providers
27Observed Fault Injection Topologies
ISP 4
MAE-WEST
- In steady-state, topologies between ISP1, ISP2,
ISP3 similar all direct BGP peers of ISP4. Does
not explain variation - Most studies report steady-state diameter of the
Internet relatively small (lt 3 AS)
28Factors Impacting BGP Propagation
- Each AS router adds between 0-45 MinRouteAdver
Delay - IBGP
- MinRouteAdver race conditions
29ISP1-ISP4 Paths During Failure
ISP 4
Steady State
FAULT
R1
ISP 1
- Only one back up path (length 3)
30ISP2-ISP4 Paths During Failure
ISP 4
Steady State
FAULT
R2
ISP 2
31ISP3-ISP4 Paths During Failure
ISP 4
Steady State
FAULT
R3
ISP 3
32Relationship Between Backup Paths and Convergence
- Convergence related to length longest possible
backup ASpath between two nodes
33Conclusion and Next Steps
- Internet does not posses effective inter-domain
fail-over (15 minutes is a long time for phone
call) - Majority of BGP convergence delay due to vendor
implementation decisions of MinRouteAdver and
loop detection - In practice, Internet is not a complete graph and
same degree of message re-ordering unlikely. Our
current work - What is the impact of ISP policy and topology on
BGP convergence? - Can we improve BGP convergence times?
34MTTF of Backbone Networks
- Informally How long before a network is
unreachable? - Majority of Internet routes unreachable within 30
days
35Mean Time to Fail-Over
- How long before traffic is re-routed?
- Majority of Internet routes which possess backup
paths fail-over every 3 days
36Internet Route Repair
- How long before a network is reachable again?
- Long-tailed distribution with plateau at 30
minutes. Why this plateau?
37Simulation Results