Title: Delayed Internet Routing Convergence due to Flap Dampening
1Delayed Internet Routing Convergence due to Flap
Dampening
- Z. Morley Mao
- Ramesh Govindan, Randy Katz, George Varghese
- zmao_at_eecs.berkeley.edu
2Slow Internet routing convergence
- BGP is a path-vector protocol
- Convergence can be O(n!)
- Multi-homed fail-over linear with longest backup
path length - Can take up to 15 minutes
- Why so slow?
- Protocol effects path vector protocol
- Flap Damping can delay convergence!
- Unexpected interference between two mechanisms of
the routing protocol - Study this interaction and propose a solution to
eliminate this undesired interaction
3What is route flap dampening?
- RFC2439, widely deployed
- Goals
- Reduce router processing load caused by
instability - Prevent sustained routing oscillations
- Without sacrificing convergence times for
well-behaved routes - Parameters
- Penalty, half-life, suppress-limit, reuse limit,
maximum suppressed time
4How does flap dampening work?
- RIPE-229 recommendation
- Dont damp until fourth flap
- /24 or longer prefixes maxmin outage 60 min
- /22, /23 prefixes
- max outage45min, min outage30min
- Other prefixes
- max outage30min, min outage10min
5Route withdraw convergence process
Assuming node 1 has a route to a destination, it
withdraws the route
Stage (msg processed) Msg queued 0
1-gt2,3,4W 1 (1-gt2W) 1-gt3,4W,
2-gt3,4A241 2 (1-gt3W) 1-gt4W,
2-gt3,4A241, 3-gt2,4A341 3
(1-gt4W) 2-gt3,4A241, 3-gt2,4A341,
4-gt2,3A431 4 (4-gt2A431) 2-gt3,4A241,
3-gt2,4A341, 4-gt3A431 5
(4-gt3A431) 2-gt3,4A241, 3-gt2,4A341 6
(3-gt2A341) 2-gt3,4A241, 3-gt4A341 7
(3-gt4A341) 2-gt3,4A241 8
(2-gt3A241) 2-gt4A241, 9
(2-gt4A241) MinRouteAdver timer
expires 4-gt2,3W, 3-gt2,4A3241,
2-gt3,4A2431 (omitted) Note In responding
to withdrawal from 1, node 3 sends out 3
messages 3-gt2,4A341, 3-gt2,4A3241,
3-gt2,4W
6Interaction btw. Flap damping and convergence
Example topology
- Assume a node 5 is attached to 3, and after node
1 withdraws, it announces the route again - Node 5 can suppress the route from node 3!
- A single flap is multiplied by 3, triggering
route suppression - Convergence is further delayed!
1
2
3
4
5
7Data analysis
- Is the toy topology realistic?
- Exchange points often have clique topologies
- There are usually multiple backup paths
- Evidence found in data analysis of real BGP
updates - Example (from RIPE)
BGP4MP1009757425A202.12.29.644608199.5.187.0/
244608 1221 4637 701IGP202.12.29.6400NAG
BGP4MP1009757478A202.12.29.644608199.5.187.0
/244608 1221 4637 1 701IGP202.12.29.6400NAG
BGP4MP1009757505A202.12.29.644608199.5.18
7.0/244608 1221 4637 7176 1 701IGP202.12.29.64
00NAG BGP4MP1009757531W202.12.29.644608
199.5.187.0/24
8Simulations/Analysis
- Simulation using SSFnet
- Topologies
- Toy topologies, e.g., cliques
- Real AS graphs with commercial relationships
- Analysis
- Impact of flap damping on convergence
- Properties of topologies to trigger this effect
- Effect of policies
- Decisions of provider selections and connectivity
9Proposed solution
- Redefine the definition of flap
- Currently any route change is considered a flap
- New definition
- flap has to change direction of route degree of
preference (dop) value, relative to the previous
flap - Keep two additional bits (about dop comparison)
- 00 undefined, 01 equal, 10 better, 11 worse
- Convergence flap properties
- Increasing Aspath lengths
- Route value keeps increasing
- Solution is currently evaluated using
trace-driven simulation!
10Conclusion/Future work
- Route flap damping can interfere with BGP route
convergence - Trades off convergence for stability
- Interesting thought exercises
- Tradeoffs between convergence and stability
- Flap Damping
- How to infer the causes of flaps
- How to prevent damping legitimate updates
- Challenges
- Internet topology is less hierarchical
- Multi-homing is growing