Title: Investigating the Causes of InterDomain Routing Instability
1Investigating the Causes ofInter-Domain Routing
Instability
- Thesis Proposal
- BJ Premore
- March 1, 2000
- Thesis Committee
- David Nicol, Dartmouth College (Adviser)
- Javed Aslam, Dartmouth College
- Thomas Cormen, Dartmouth College
- Andy Ogielski, DIMACS, Rutgers University
2Overview
- PART I Background
- Complexity of large internetworks
- Internetworks and routing
- Routing instability
- The Border Gateway Protocol
- Observed pathological behaviors of inter-domain
routing - PART II Hypotheses
- Modeling routing instability
- Suspected causes
- Coping strategies
- PART III Investigation
- Network experimentation techniques
- Feasibility of simulating instability
- Model requirements
- Model implementation
- Identifying and measuring instability
3The Complexity of Large Internetworks
- Heterogeneity
- Hardware, protocols
- Protocol flavors and implementation variations
- Size
- rare events arent so rare
- Complexity of components
- Congestion avoidance in data transfer protocols
- Arbitrary routing protocol policies
4Internetworks and Routing
- Router
- Forwards packets
- Forwarding
- Using a lookup table to forward packets
- Routing
- Building and maintaining forwarding tables
- Autonomous system (domain)
- Set of routers under single technical
administration - Two-level routing hierarchy
- Intra-domain
- Inter-domain
- Border Gateway Protocol (BGP)
- Inter-domain routing
- de facto standard in the Internet
5to Middlebury
To UMaine
Autonomous System (AS)
Dartmouth
To MIT
6Midd
UMaine
Dartmouth
MIT
7Routing Instability
- the rapid change of network reachability and
topology information - (Labovitz, Malan and Jahanian, 1997)
- Effects
- Increased packet loss
- Delay of network convergence
- Increase in resource overhead
- Known causes
- Link and router failure
- New computers and networks
- Traffic congestion
- Poorly implemented protocols
- Timer synchronization
- Instability is not well-understood
- Huge volume of IDR traffic
8Border Gateway Protocol1 of 2
- Algorithm
- 1. Learn neighbors
- 2. Share reachability information with neighbors
- 3. Continue sharing updated reachability
information - Message types
- Keep-alive
- Update
- Timers
- Determining (non-)existence of neighbors
- Managing flow of updates
9BGP
BGP
Midd
UMaine
BGP
BGP
BGP
Dartmouth
BGP
BGP
MIT
BGP
10Border Gateway Protocol2 of 2
- Evaluating routes
- No global metrics
- Configurable policies
- Consistent within each AS
- Decision Process
- Phase 1 calculate degree of preference
- Phase 2 select routes for forwarding table
- Phase 3 select routes for dissemination (to
neighbors)
11Observed Pathological Behaviors of Inter-Domain
Routing1 of 2
- Rate of change of forwarding table info
- Watch recent changes for repeats
- Excessive updates
- Updates only required when reachability changes
- Route flapping
- Route to destination rapidly changes paths /
availability - Possible cause local instability
12Observed Pathological Behaviors of Inter-Domain
Routing2 of 2
- Route oscillation
- Looks like flapping, but periodic
- Additional possible cause routing policies
- Periodic message bursts
- Known cause timer synchronization
- Prevention route flap damping and timer jitter
- Useful for estimating instability levels
13Overview
- PART I Background
- Complexity of large internetworks
- Internetworks and routing
- Routing instability
- The Border Gateway Protocol
- Observed pathological behaviors of inter-domain
routing - PART II Hypotheses
- Modeling routing instability
- Suspected causes
- Coping strategies
- PART III Investigation
- Network experimentation techniques
- Feasibility of simulating instability
- Model requirements
- Model implementation
- Identifying and measuring instability
14Modeling Routing Instability
15Suspected Causes1 of 3
- 1. Poor BGP implementation choices
- Some have already led to problems (e.g. no
jitter) - 2. BGP misconfiguration
- IGP/BGP interaction is complex and lossy
- 3. Admissible oscillation
- Result of interaction of valid BGP policies
- (Varadhan, Govindan, and Estrin, 1996)
- 4. BGP timer synchronization
- Just how bad is it to be synchronized?
16Suspected Causes2 of 3
- 5. Link router failure
- To what degree to they contribute?
- Find out what expected stability level is
- 6. Intra-domain routing instability
- BGP policies depending on IGP metrics
- 7. Traffic congestion
- Can cause connections to break
- Can contribute to self-synchronization
- 8. Changing network usage rates
- Instability varies in proportion
- May only be because of associated congestion
17Suspected Causes3 of 3
- 9. Other causes
- Look for general signs of instability, trace back
18Coping Strategies1 of 2
- 1. Timer jittering adjustments
- Is current jittering good enough?
- Amount of randomness needed surprisingly high
- (Floyd and Jacobson, 1994)
- Periodic bursts still exist
- (Labovitz, Malan, and Jahanian, 1999)
- 2. Timers independent of events
- Decrease chance of synchronization
- Possible alternative to jittering
19Coping Strategies2 of 2
- 3. Outgoing route flap damping
- Incoming damping prevents propagation, not
origination - Could prevent internal routing instability
- 4. Hierarchical network layout
- Internet is becoming less and less hierarchical
- Makes aggregation more effective
20Overview
- PART I Background
- Complexity of large internetworks
- Internetworks and routing
- Routing instability
- The Border Gateway Protocol
- Observed pathological behaviors of inter-domain
routing - PART II Hypotheses
- Modeling routing instability
- Suspected causes
- Coping strategies
- PART III Investigation
- Network experimentation techniques
- Feasibility of simulating instability
- Model requirements
- Model implementation
- Identifying and measuring instability
21Network Experimentation Techniques
- Using network testbeds
- TCP congestion avoidance (Jacobson and Karels,
1988) - Packet drop strategies (Villamizar and Song,
1994) - Gathering trace data for analysis
- Logging BGP messages (Chinoy, 1993
- Labovitz, Malan, and Jahanian, 1997
- Govindan and Reddy, 1997)
- Using simulation
- ns (LBNL), TeD (Perumalla), home grown
- Advantages detail, controllability, repeatability
22Feasibility of Simulating Instability1 of 3
- Boils down to two factors
- Can we build a model with enough detail?
- Effort and careful planning
- SSFNet component repository
- Do we have powerful enough tools to simulate such
a model in a reasonable amount of time? - Parallelization introduced potential for big
speed increases - SSF
23Feasibility of Simulating Instability2 of 3
- Scalable Simulation Framework (SSF)
- Generalized framework for parallel simulation
- Two primary implementations, in Java and C
- JSSF
- Java
- Large network component repository (SSFNet)
- Many large, detailed models simulated
- Room for improvement in performance
- DaSSF
- C
- Has simulated huge models
- Fewer detailed components implemented
- Performance is excellent
24Feasibility of Simulating Instability3 of 3
- What is fast enough?
- DaSSF 80,000 nodes at 1,000,000 packet
events/sec on 14 processors - Estimate at max 100 ASes, 3 hours
- gt 100,000 nodes 100 billion packet events
- At 1 million packet events/sec gt 1 day
25Model Requirements1 of 3
- General principle include as much as possible
- 1. Large enough topology
- Extrapolation from small models may not be
accurate - 2. Representative topology
- Lack of a clean hierarchy has great effect on
routing dynamics - 3. Routers must implement standard congestion
avoidance algorithms - Congestion can greatly alter traffic dynamics and
affect routing
26Model Requirements2 of 3
- 4. BGP must be fully compliant and fully
configurable - Configurable gt can vary routing policies
(increase heterogeneity) - 5. TCP must be fully compliant and fully
configurable - Intricacies are a prime suspect
- 6. Realistic traffic model
- (Willinger et al., 1993 and 1998)
- Yields realistic congestion
-
27Model Requirements3 of 3
- 7. Realistic router and link behavior
- Router buffering and latency
- Link bandwidth and delay
- 8. Intra-domain routing protocol implementation
- Some configurations may actually affect
inter-domain routing - 9. Model must allow for heterogeneous
configuration - Network components of the same type with
different characteristics - 10. Model must imitate typical network usage
fluctuations - Instability is known to fluctuate in similar
patterns
28Model Implementation Suspected Causes1 of 2
- 1. Poor BGP implementation choices
- 2. BGP misconfiguration
- 3. Admissible oscillation
- 4. BGP timer synchronization
- We can turn off jitter
29Model Implementation Suspected Causes2 of 2
- 5. Link router failure
- 6. Intra-domain routing instability
- Modify OSPF to alternate between exit points
- 7. Traffic congestion
- Increase number of clients and/or connection
rates - 8. Changing network usage rates
- Modified traffic clients
30Model Implementation Coping Strategies
- 1. Timer jittering adjustments
- Modify already existing jitter algorithm
- 2. Timers independent of events
- Dont reset timers when message events arrive
- 3. Outgoing route flap damping
- Use same method as for incoming damping
- 4. Hierarchical network layout
- Just use DML
31Identifying and Measuring Instability
- Showing existence
- Watch forwarding table changes
- Look for pathological behaviors
- Identifying the cause
- Trace back from pathological behaviors and
congestion - Choose thresholds for each behavior
- Repeat simulation, observe more closely
- Measuring
- Count forwarding table changes
- Count occurrences of pathological behaviors
32Summary
- Routing instability is not well-understood
- Hypotheses
- 1. We can model instability
- 2. Suspected causes
- 3. Coping strategies
- Investigation
- Simulation of detailed models
- Required model attributes
- Measuring instability
33Time Line