Title: The Impact of Policy and Topology on Internet Routing Convergence NANOG 20 October 23, 2000
1The Impact of Policy and Topology on Internet
Routing ConvergenceNANOG 20October 23, 2000
Abha Ahuja InterNap ahuja_at_umich.edu
Craig Labovitz Microsoft Research labovit_at_microsof
t.com
In collaboration with Roger Wattenhofer,
Srinivasan Venkatachary, Madan Musuvathi
2Background
- In NANOG 19, we showed BGP exhibits poor
convergence behavior - Measured convergence times of up to 20 minutes
for BGP path changes/failures - Factorial (N!) theoretic upper bound on BGP
convergence complexity (explore all paths of all
possible lengths) - Open question In practice, what topological and
policy factors impact convergence delay ?
3This Talk
- Goal Understand BGP convergence behavior under
real topologies/policies - Given a physical topology and ISP policies, can
we estimate the time required for convergence? - Do convergence behaviors of ISPs differ?
- How does steady-state topology compare to paths
explored during failure? - Can we change policies/topology to improve BGP
convergence times?
4Experiments
- Analyzed secondary paths between between 20
source/destination AS pairs - Inject and monitor BGP faults
- Survey providers to determine policies behind
paths - To provide intuition, we will focus on faults
injected into three ISPs at Mae-West - Observed faults via fourth ISP (in Japan)
- Three ISPs roughly map onto tier1, tier2, tier3
providers - Results from these three ISPs representative of
all data
5Comparing ISP Convergence Latencies
- CDF of faults injected into three Mae-West
providers and observed at Japanese ISP - Significant variations between providers
- Not related to geography
6Observed Fault Injection Topologies
ISP 4
MAE-WEST
- In steady-state, topologies between ISP1, ISP2,
ISP3 similar all direct BGP peers of ISP4. Does
not explain variation on previous slide
7Factors Impacting BGP Propagation
- Topology and policy impact graph (usually DAG)
- Each AS router adds between 0-45 seconds of
MinRouteAdver Delay - iBGP/Route Reflector
- MinRouteAdver and path race conditions affect
which routes chosen as backup routes
iBGP
D
C
B
A
8ISP1-ISP4 Paths During Failure
ISP 4
Steady State
FAULT
R1
ISP 1
- Only one back up path (length 3)
9ISP2-ISP4 Paths During Failure
ISP 4
Steady State
FAULT
R2
ISP 2
10ISP3-ISP4 Paths During Failure
ISP 4
Steady State
FAULT
R3
ISP 3
11Why the Different Levels of Complexity?
- Provider relationship taxonomy
- Transit relationships
- customer/provider
- customer sends their customer routes
- provider sends default-free routing info (or
default) - Peer relationships
- Bilateral exchange of customer routes
- Back-up transit
- peer relationship becomes transit relationship
based on failure - These relationships constrain topology (no N!
states) and determine number of possible backup
paths
12Convergence in the Real World
3
customer
peer
2
1
4
X
5
Longest path 3 4 5 2 1
Possible paths for node 3 2 1 x 4 2 1 x (4 5 2
1 x)
Possible paths for node 4 2 1 x 3 2 1 x 5 2 1 x
13Convergence in the Real World
Hierarchy eliminates some states
3
customer
peer
2
1
4
X
5
Tier 1?
Longest path 3 4 5 2 1
Possible paths for node 3 2 1 x 4 5 2 1 x
Possible paths for node 4 3 2 1 x 5 2 1 x
14Policy and Convergence
- Strict hierarchical relationships eliminate
exploring some extra states - Policy controls the number of possible paths to
explore. - But turns out the number of paths does not matter
15Relationship Between Backup Paths and Convergence
Longest Observed ASPath Between AS Pair
- Convergence related to length longest possible
backup ASPath between two nodes
16So, what does all of this mean for convergence
time?
- Convergence time is related to the length of the
longest path that needs to be explored - Before fail-over, need to withdraw all
alternative paths - This is bounded O(n) by length of the longest
alternative path in the system - This longest path is related to policy
17Towards Millisecond BGP Convergence
- Three possible solutions
- Entirely new protocol
- Turn off MinRouteAdver timer
- Tag BGP updates
- Provide hint so nodes can detect bogus state
information
18Further Information
C. Labovitz, R. Wattenhofer, A. Ahuja, S.
Venkatachary, The Impact of Topology and Policy
on Delayed Internet Routing Convergence. MSR
Technical Report (number pending). June,
2000. C. Labovitz, A. Ahuja, A. Bose, F.
Jahanian, Internet Delayed Routing Convergence.
To appear in Proceedings of ACM SIGCOMM. August,
2000. Send email to ipma-support_at_merit.edu for
more information or to participate in the policy
survey