Title: Routing Convergence Delay
1Routing Convergence Delay
- Prof. Gao
- ECE697J Spring 2005
- Advanced Computer Networks
2Project Proposal
- Content
- Motivation of project
- Proposed approach analytical, experimental,
measurement, simulation, survey - Team of at most 2 students
- Please feel free to talk with me
- Due Date Friday March 11 11pm
- Send by email to lgao_at_ecs.umass.edu
3Routing Convergence
- Many network changes
- Equipment failures, or new deployment
- Router configuration changes
- Planned maintenance on the network
- Control plane adapts to changes
- Detect the change
- Propagate routing messages
- Compute new routes
- Update the forwarding tables
4What could happen during routing convergence?
- Inconsistent routing state
- Asynchronous propagation of route changes
- Distributed route computation and FIB update
- Effect on data forwarding plane
- Drop packets
- Long packet delay
- Forwarding loops
- Out of order packets
- Reduce routing convergence delay!
- Worst case convergence delay infinite
- Impact on packet forwarding
- Key to support multi service traffic, e.g., VoIP
5Convergence Delay
- Control plane convergence delay
- Period of time to converged routing state
- Forwarding plane convergence delay
- Period of time to converged forwarding state
6Intradomain routing protocol convergence delay
7Reduce Intradomain Rerouting Delay
- Replace intradomain routing with something else
- e.g., MPLS Fast Failure Recovery
- Figure out whats wrong with intradomain routing
and fix it
8Link-State Routing Protocol Convergence Delay
- Detect local topology changes (e.g. link up/down)
- Flood Link State Advertisements (LSAs)
- SPF Calculation
- each router calculates a single source shortest
path tree - Update Forwarding Information Base (FIB)
- each router uses the tree to build its FIB, which
governs packet forwarding
9Router Model
Route Processor (CPU)
OSPF Process
Topology View
LSA Processing
LSA Flooding
SPF Calculation
FIB Update
FIB
Forwarding
Forwarding
Switching Fabric
Interface card
Interface card
10Detecting a Link is Dead
- Periodic hello packets (hello_interval,default
is 10sec, greater than 1sec) - Timeout if not received (dead_interval, 40 sec)
- Declare failure and flood the info to others
- Small values lead to faster detection, but also
- Higher bandwidth consumption for hellos
- False detection during congestion interval
- False detection if router CPU falls a little
behind
hello
hello
11Knowing the Link is Dead Interface Support
- Smart interface hardware
- Detects loss of connectivity at lower layer
- Interrupts the router CPU about the failure
- Common in Packet Over SONET technology
- Detect in less than 100 msec
- But
- Some media dont support it (e.g., Ethernet, ATM)
- so, you often need hello messages anyway
12LSA Propagation
- A link state packet is generated at the point of
detection then flooded, unmodified, through the
network. - It should propagate at near the speed of light
plus one store-and-forward delay per hop. - So in theory LSP propagation should make a
negligible contribution to the re-route time. - Theory doesnt often resemble reality...
13LSA Propagation Explanation
- Pacing LSA propagation to combine LSAs
- In some implementation, SPF calculation is done
before LSAs are flooded - To prevent this,
- Spec might be amended to explicitly state that
LSP flooding is higer priority than SPF
calculation - SPF computation time becomes more important
14Reducing the SPF Computational Overhead
- Important if SPF and LSA flooding in series
- Good system
- Fast processor
- High-speed memory
- Good algorithms
- Traditional approach computes from scratch
- Improved from O(n2) to O(nlogn)
- Incremental algorithms compute only the changes
- O(logn) instead of O(nlogn)
- Pre-computation
- Pre-compute effects of certain failure scenarios
- E.g., all single-link or single-router failures
15Updating the Forwarding Table
- Forwarding table
- Map destination prefix to outgoing link(s)
- Copy of table on each interface card
- Highly optimized for fast lookups
- Updating the forwarding table
- Computing the new forwarding table
- Making updates to the copy of the line card
- Important source of delay
- Sprint end-to-end study around 1 second
- ATT router-level study 100 msec 300 msec
16Significance of Protocol Timers
- Hello and dead intervals
- Failure-detection delay vs. false diagnosis
- Pacing the link-state flooding
- Combining LSAs vs. longer convergence delay
- Some routers wait till after re-running Dijkstra!
- Delaying start of shortest-path computation
- Reducing computations vs. convergence delay
- Especially useful if failure affects multiple
links
17OSPF Task Delays (Cisco)
- LSA Processing
- 100-800 microseconds
- LSA flooding
- 30-40 milliseconds
- pacing timer is the determining factor
- SPF calculation
- 1-40 milliseconds
- O(n2) behavior for full n x n mesh
- FIB update time
- 100-300 milliseconds
- no dependence on the size of the topology
18Reduce the Effects of Convergence
- Long convergence delay is bad
- Transient problems with loss and delay
- Disruptive for VoIP and online gaming
- Solution 1 better implementation
- Interfaces that detect failures automatically
- Cranking down the values of the timers
- Faster CPUs and path-computation algorithms
- Avoid forwarding loops during convergence
- Solution 2 network design and operation
- Improve forwarding-plane convergence
- Improve convergence during maintenance
- .
19Summary
- Reduce intradomain routing convergence delay
- Faster SPF computation
- Make Hello timer milliseconds rather than seconds
- Low-level support for detecting link up and down
- Reduce impact of convergence delay
- Protocol improvement to avoid forwarding loops
- Improve network design
20Project Ideas
- Simulation study of data forwarding during IGP
route convergence - Effective mechanisms for bad forwarding during
convergence - Avoid forwarding loops
- Stress test of OSPF or IS-IS on timer setting
21Interdomain Routing Convergence Delay
22Control Plane Convergence Time
- BGP router detects the change
- Propagate the update messages to neighbors
- Announcement
- Withdrawal
- Until all routers reach a stable state and no
route update is propagated
23Link Failure Detection
- Detect at lower layer
- For some networks, can not rely on lower layer
- Keep alive message timer 1/3 of hold time
(default hold timer 90sec) - Hold timer expires before keep alive message is
received, BGP session is down
24Route Propagation
- How often a router can send route update?
- Minimum Route Advertisement timer (MRAI)
- Batch processing updates
- How many hops route updates have to propagate?
- Depends on topology
- Depends on routing policy
25MRAI Timers
- Minimum Advertisement Interval Timer
- Minimum amount of time that must elapse between
route updates - Applied to BGP announcement or withdrawal
- Avoid router CPU overload wait till route is
stable - Default MRAI value
- eBGP session 30 seconds
- iBGP session 5 seconds
26Different Implementation of Timers
- Prefix based rate limiting timer
- Peer based rate limiting timer
- Current implementation of CISCO
27Impact of Topology
- Route convergence time
- Assume BGP selects the shortest path as the best
path - w Minimum Route Advertisement Interval
- From down to up, Tup O(dw)
- where d is the length of the shortest path in the
network - From up to down, Tdown O(Dw)
- where D is the length of the longest loop-free
path in the network - Failover convergence?
- Implications
- Good news spread fast
- Bad news spread very slow
28Impact of Routing Policy
- Route convergence time
- w Minimum Route Advertisement Interval
- From down to up, Tup O(dw)
- where d is the length of the shortest policy
conforming paths - From up to down, Tdown O(Dw)
- where D is the length of the longest policy
conforming path - Failover convergence?
29Failover Convergence
- Link failure leads to route to change to another
path - Can be longer than Tdown!
30An Example of Failover
AS1
AS2
W20
W20
W20
120 10
10
20
210
A10
A10
A10
d
AS0
packet
BGP update
BGP Routing table
31Worst Case Analysis of of Messages
- Assumptions
- Fully meshed topology of n nodes
- Export to all
- Withdraw d from node 0
- O((n-1)!) messages
- O((n-1)!) distinct paths to reach d
- (n-1) paths of length 1
- (n-1)(n-2) paths of length 2
- (n-1)! paths of length n
32Data Collection
- Data Collected BGP routing messages
- Time Period Over the course of 9 months starting
Jan 96 - Where Five of the major U.S. network exchange
points - Tool Unix based route servers, Multithreaded
routing Toolkit(MRTd)
33Routing Updates Observed
- For 45,000 prefixes and 1500 paths
- 3 to 6 million updates per day
34Transient Failures During Convergence
AS1
AS2
W20
W20
W20
120 10
10
20
210
A10
A10
A10
Transient failure
d
AS0
packet
BGP update
BGP Routing table
35Another Example of Transient Failure
310
320
3
w
A
A
w
210 20
2 0
A
A
10
120
2
w
w
1
0
d
Peer-to-peer Provider-to-customer
36Failure Duration for Different Implementation of
Timers
- MRAI
- Reset by announcements
- MRWI
- Reset by withdrawals
37Prefix Based Rate Limiting Timers
- MRAI enabled
- Failure duration ? ?
- ? link propagation delay router
processing delay - MRAI and MRWI enabled
- Failure duration ? ? MRWI ? MRWI
viewpoint
01x 0y
0y
1x
10y
w
w
w
w
w
21x
210y
Alternate path y
0
1
2
A
A
A
A
A
Start the timer
Waiting ..
Reset timer
x
38Peer Based Rate Limiting Timers
- MRAI and MRWI
- MRAI enabled
Failure duration ? (D??Nshort) MRAI
Router ? provides alternate path router v is
the first router detecting failure.
Nshort shortest path between ? and u
Waiting ..
Reset timer
A
A
viewpoint
A
?vx ?y
? y
A
vx
v?y
w
w
w
w
w
uvx
u?y
Alternate path y
?
?
u
A
A
D??
x
39What About Convergence Delay?
40Prefix Based Rate Limiting Timers
- MRAI enabled
- Convergence delay ? Dvx MRAI
- MRAI and MRWI enabled
- Convergence delay ? ? MRWI Dvx MRAI
viewpoint
01x 0y
0y
1x
10y
w
w
w
w
w
21x
210y
Alternate path y
0
v
2
A
A
A
A
A
Dvx length of longest path between v and x That
goes through the failed link
Start the timer
Waiting ..
Reset timer
x
41Peer Based Rate Limiting Timers
- MRAI and MRWI
- MRAI enabled
Convergence delay ? (D??DbestDvx) MRAI
Router ? provides alternate path router v is
the first router detecting failure.
Dshort shortest path length between ? and u
Waiting ..
Reset timer
A
A
viewpoint
A
?vx ?y
? y
A
vx
v?y
w
w
w
w
w
uvx
u?y
Alternate path y
?
?
u
A
A
D??
Dvx length of longest path between v and x That
goes through the failed link
x
42Minimize Convergence Delay
- Make MRAI as small as possible
- of messages processed
- CPU load correlates with of updates
- http//www.pam2004.org/papers/155.pdf
- BGP message pass through time is small
- http//www.pam2004.org/papers/170.pdf
- Improving BGP
- Find invalid paths
- Remove invalid paths from best path selection
process - Topology and routing policy design
43Solve Problem at Higher Layer
- Overlay routing
- RON project at MIT
- Skype for VoIP
- Multipath routing for multihomed nodes
- Smart routing service
- Internap
- Route science
44Summary
- Convergence delay can be as long as 30 minutes
(15 minutes is a long time for VoIP) - BGP convergence delay depends MRAI
- Routing policy and topology impact
- Transient failure during convergence
- Implementation of MRAI
45Project Ideas
- Forwarding plane performance during BGP
convergence - Sufficient condition for forwarding loops
- Fast failover BGP
- Timer implementation
- Reduce transient failure duration