Internet Routing COS 598A Today: Interdomain Routing Convergence - PowerPoint PPT Presentation

About This Presentation
Title:

Internet Routing COS 598A Today: Interdomain Routing Convergence

Description:

Interaction with path exploration. Stability of popular destinations ... Avoiding complete path exploration. Why this is harder than it looks ... – PowerPoint PPT presentation

Number of Views:66
Avg rating:3.0/5.0
Slides: 36
Provided by: albertgr
Category:

less

Transcript and Presenter's Notes

Title: Internet Routing COS 598A Today: Interdomain Routing Convergence


1
Internet Routing (COS 598A)Today Interdomain
Routing Convergence
  • Jennifer Rexford
  • http//www.cs.princeton.edu/jrex/teaching/spring2
    005
  • Tuesdays/Thursdays 1100am-1220pm

2
Outline
  • BGP convergence
  • Causes of routing changes
  • Detecting session failures
  • BGP path exploration
  • Route-flap damping
  • Damping persistent flapping
  • Interaction with path exploration
  • Stability of popular destinations
  • Are things really all that bad?
  • Reducing convergence delay
  • Avoiding complete path exploration
  • Why this is harder than it looks

3
Causes of BGP Routing Changes
  • Topology changes
  • Equipment going up or down
  • Deployment of new routers or sessions
  • BGP session failures
  • Due to equipment failures, maintenance, etc.
  • Or, due to congestion on the physical path
  • Changes in routing policy
  • Reconfiguration of preferences
  • Reconfiguration of route filters
  • Persistent protocol oscillation
  • More on this next week!

4
BGP Session Operation
Establish session on TCP port 179
AS1
BGP session
Exchange all active routes
AS2
While connection is ALIVE exchange route UPDATE
messages
Exchange incremental updates
5
BGP Session Failure
  • BGP runs over TCP
  • BGP only sends updates when changes occur
  • TCP doesnt detect lost connectivity on its own
  • Detecting a failure
  • Keep-alive 60 seconds
  • Hold timer 180 seconds
  • Reacting to a failure
  • Discard all routes learned from the neighbor
  • Send new updates for any routes that change

AS1
AS2
6
Routing Change Before and After
0
0
(2,0)
(2,0)
(1,0)
(1,2,0)
1
1
2
2
(3,2,0)
(3,1,0)
3
3
7
Routing Change Path Exploration
  • AS 1
  • Delete the route (1,0)
  • Switch to next route (1,2,0)
  • Send route (1,2,0) to AS 3
  • AS 3
  • Sees (1,2,0) replace (1,0)
  • Compares to route (2,0)
  • Switches to using AS 2

0
(2,0)
(1,2,0)
1
2
(3,2,0)
3
8
Routing Change Path Exploration
(2,0) (2,1,0) (2,3,0) (2,1,3,0)
  • Initial situation
  • Destination 0 is alive
  • All ASes use direct path
  • When destination dies
  • All ASes lose direct path
  • All switch to longer paths
  • Eventually withdrawn
  • E.g., AS 2
  • (2,0) ? (2,1,0)
  • (2,1,0) ? (2,3,0)
  • (2,3,0) ? (2,1,3,0)
  • (2,1,3,0) ? null

(1,0) (1,2,0) (1,3,0)
1
2
3
(3,0) (3,1,0) (3,2,0)
9
Convergence Overhead and Delay
  • Path exploration is expensive
  • Large number of possible paths
  • Might have to explore (nearly) all of them
  • Minimum Route Advertisement Interval
  • Minimum time between advertisement of routes for
    a given destination to a given neighbor
  • Rate limit on BGP update messages
  • and allows combining multiple messages in one
  • Typical value of 30 seconds
  • Convergence delay
  • (30 seconds) ( of paths)

10
Four Kinds of BGP Routing Changes
  • Destination becomes reachable
  • Switch from no path to a new path
  • Better path becomes available
  • Switch from old path to new, better path
  • Best path becomes unavailable
  • Switch from old path to new, worse path
  • Destination becomes unreachable
  • Switch from old path to no path at all

lower delay
higher delay
11
Questions About Convergence Delay
  • Reduce the MRAI timer?
  • High message overhead on the router?
  • Delays from overloading the CPU?
  • What is the right value?
  • Dependence on topology?
  • Worst-case n!
  • Fully-connected graph (i.e., a clique)
  • No filtering of advertisements
  • Shortest-path routing
  • Destination dies completely
  • Typical case?????

12
Route Flap Damping
13
Persistent Routing Changes
  • Causes
  • Link with intermittent connectivity
  • Congestion causing repeated session resets
  • Persistent oscillation due to policy conflicts
  • Effects
  • Lots of BGP update messages
  • Disruptions to data traffic
  • High overhead on routers
  • Solution
  • Suppress paths that go up/down repeatedly
  • to avoid updates and prefer stable paths

14
Route Flap Damping
  • BGP-speaking router
  • One or more BGP neighbors
  • Keep an RIB-in per neighbor
  • Select single best route per destination prefix
  • Route-flap damping
  • Penalty counter per (peer, prefix) pair
  • Increment penalty when peer changes route
  • Decrease penalty over time when route is stable
  • Design and deployed in the mid 1990s
  • Widely viewed as helping improve stability

15
Example Why Damping is Good
  • Consider AS 3
  • Path 1 (3,1,0)
  • Path 2 (3,2,0)
  • If link (1,0) fails
  • AS 3 switches routes
  • If link (1,0) restores
  • AS 3 switches routes
  • If this happens a lot
  • Better for AS 3 to stick with (3,2,0)

0
(1,0)
(2,0)
1
2
3
16
Damping Penalty Function
suppression threshold
penalty
reuse threshold
time
17
Configurable Damping Parameters
  • Penalty for a routing change
  • May vary with the type of update message
  • Advertisement vs. withdraw? Attributes change?
  • Decaying in absence of a change
  • Exponent in the exponential decay
  • Suppression threshold
  • Trigger for damping the route
  • Determines how many updates are tolerated
  • Reuse threshold
  • Trigger for considering the route again
  • Determines how long the route is not usable

18
Best Common Practices for Damping
  • Different parameters for different prefixes
  • More aggressive with small address blocks
  • Disable damping on certain prefixes (e.g.,
    corresponding to the DNS root servers)
  • Avoid suppressing stable routes
  • Tolerate at least four routing changes
  • Suppress unstable routes for quite a while
  • Values ranging from 10 minutes to 1 hour
  • Values for 30 minutes are not uncommon

19
Interaction with Path Exploration
  • BGP routing convergence
  • Explore one or more alternate paths
  • Number of alternate paths may be quite high
  • Time between steps is small (e.g., 30 seconds)
  • Triggering route-flap damping
  • Increasing penalty with each step
  • Only small amount of decay between steps
  • Convergence may trigger route flap damping
  • Convergence may involve more than 4 changes
  • Routing change may trigger lost connectivity!!!
  • Confirmed by recent active measurement studies

20
Effects of Damping are Confusing
  • AS 0 is a stable network
  • Link (1,3) fails a lot
  • AS 3 switches routes back and forth a lot
  • Sends new BGP updates to its customers
  • Suppose AS 3 does not apply route-flap damping
  • AS 3s customers
  • Eventually dampen route
  • Causes lost reachability to destination in AS 0
  • How can AS 0 diagnose this problem, and fix it?

0
1
2
3
21
Open Questions
  • Want to suppress unstable routes
  • Otherwise, lots of update messages
  • and lots of transient disruptions
  • Yet, want to tolerate path exploration
  • Otherwise, you suppress stable routes
  • and black-hole otherwise reachable destinations
  • How to reconcile?
  • Better flap-damping parameters?
  • More information in update messages?
  • Something more gentle than suppression?

22
BGP Stability of Popular Destinationshttp//www.c
s.princeton.edu/jrex/papers/imw02.pdf
23
BGP Routing and Traffic Popularity
  • A possible saving grace
  • Most BGP updates due to few prefixes
  • and, most traffic due to few prefixes
  • ... but, hopefully not the same prefixes
  • Popularity vs. BGP stability
  • Do popular prefixes have stable routes?
  • Yes, for 10 days at a stretch!
  • Does most traffic travel on stable routes?
  • A resounding yes!
  • Direct correlation of popularity and stability?
  • Well, no, not exactly

24
BGP Updates
  • BGP updates for March 2002
  • ATT route reflector
  • RouteViews and RIPE-NCC
  • Data preprocessing
  • Filter duplicate BGP updates
  • Filter resets of monitor sessions
  • Removes 7-30 of updates
  • Grouping updates into events
  • Updates for the same prefix
  • Close together in time (45 sec)
  • Reduces sensitivity to timing

Confirmed few prefixes responsible for most
events
25
Two Views of Prefix Popularity
  • ATT traffic data
  • Netflow data on peering links
  • Aggregated to the prefix level
  • Outbound from ATT customers
  • Inbound to ATT customers
  • NetRatings Web sites
  • NetRatings top-25 list
  • Convert to site names
  • DNS to get IP addresses
  • Clustered into 33 prefixes

26
Traffic Volume vs. BGP Events (CDF)
50 of traffic 0.1 of events (0.3 of prefixes)
27
Update Events/Day (CCDF, log-log plot)
Most popular prefixes had lt 0.2 events/day and
just 1 update/event
28
An Interpretation of the Results
  • Popular ? stable
  • Well-managed
  • Few failures and fast recovery
  • Single-update events to alternate routes
  • Unstable ? unpopular
  • Persistent flaps hard to reach
  • Frequent flaps poorly-managed sites
  • Unpopular does not imply unstable
  • Most prefixes are quite stable
  • Well-managed, simple configurations
  • Managed by upstream provider

29
Avoiding Path Exploration
30
Reducing Path Exploration By Tagging
  • When AS 1 sees (1,0) fail
  • Switches to (1,2,0)
  • Why not say because the link (1,0) has failed?
  • Allow ASes to discard all paths that use edge
    (1,0)
  • Should reduce exploration
  • E.g., AS 3 should not consider (3,2,1,0)
  • E.g., AS 2 should not consider (2,3,1,0)
  • Seems appealing, but

(1,0) (1,2,0) (1,3,0)
(2,0) (2,1,0) (2,3,0)
1
2
3
(3,0) (3,1,0) (3,2,0)
31
Problem 1 Timing of Information
  • How long should the ASes believe the info?
  • What if the link (1,0) comes back up?
  • What if the info about the failure is still
    propagating?
  • Do the ASes need to remember the old paths?
  • E.g., should AS 2 remember (2,3,1,0) in case it
    learns later that (1,0) has come back up?
  • BGP is an incremental protocol, so forgetting
    information may be risky unless you will get it
    back again
  • But, these issues are probably surmountable
  • with some attention to the details

32
Problem 2 AS With Multiple Routers/Links
  • BGP introduces abstraction
  • Treats each AS as a single node
  • Doesnt distinguish between links
  • Example one link fails
  • Should AS 1 tell others?
  • Need to identify which link?
  • Does it introduce more updates?
  • Internal BGP details matter
  • Some AS 1 routers dont know about both paths
    through AS 0

1
0
d
33
Internal BGP Convergence
Briefly, the border router has no route at all!
34
Questions
  • Can we reduce path exploration
  • Hints in the BGP update messages
  • To avoid exploring a set of related paths
  • Handling the challenges
  • Timing details
  • Multiple routers and links per AS
  • without excessive overhead
  • Can we change the problem
  • Server per AS that stores all candidate routes
  • Exchanging information about the root cause

35
Next Time Protocol Divergence
  • Two papers
  • The Stable Paths Problem and Interdomain
    Routing
  • Stable Interdomain Routing Without Global
    Coordination
  • Review only of the first paper
  • Summary
  • Why accept
  • Why reject
  • Future work
  • Optional NANOG video on BGP Wedgies
Write a Comment
User Comments (0)
About PowerShow.com