Investigating the Causes of InterDomain Routing Instability - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

Investigating the Causes of InterDomain Routing Instability

Description:

Complexity of large internetworks. Internetworks and routing. Routing instability ... Complexity of components. Congestion avoidance in data transfer protocols ... – PowerPoint PPT presentation

Number of Views:28
Avg rating:3.0/5.0
Slides: 34
Provided by: brian510
Category:

less

Transcript and Presenter's Notes

Title: Investigating the Causes of InterDomain Routing Instability


1
Investigating the Causes ofInter-Domain Routing
Instability
  • Thesis Proposal
  • BJ Premore
  • March 1, 2000
  • Thesis Committee
  • David Nicol, Dartmouth College (Adviser)
  • Javed Aslam, Dartmouth College
  • Thomas Cormen, Dartmouth College
  • Andy Ogielski, DIMACS, Rutgers University

2
Overview
  • PART I Background
  • Complexity of large internetworks
  • Internetworks and routing
  • Routing instability
  • The Border Gateway Protocol
  • Observed pathological behaviors of inter-domain
    routing
  • PART II Hypotheses
  • Modeling routing instability
  • Suspected causes
  • Coping strategies
  • PART III Investigation
  • Network experimentation techniques
  • Feasibility of simulating instability
  • Model requirements
  • Model implementation
  • Identifying and measuring instability

3
The Complexity of Large Internetworks
  • Heterogeneity
  • Hardware, protocols
  • Protocol flavors and implementation variations
  • Size
  • rare events arent so rare
  • Complexity of components
  • Congestion avoidance in data transfer protocols
  • Arbitrary routing protocol policies

4
Internetworks and Routing
  • Router
  • Forwards packets
  • Forwarding
  • Using a lookup table to forward packets
  • Routing
  • Building and maintaining forwarding tables
  • Autonomous system (domain)
  • Set of routers under single technical
    administration
  • Two-level routing hierarchy
  • Intra-domain
  • Inter-domain
  • Border Gateway Protocol (BGP)
  • Inter-domain routing
  • de facto standard in the Internet

5
to Middlebury
To UMaine
Autonomous System (AS)
Dartmouth
To MIT
6
Midd
UMaine
Dartmouth
MIT
7
Routing Instability
  • the rapid change of network reachability and
    topology information
  • (Labovitz, Malan and Jahanian, 1997)
  • Effects
  • Increased packet loss
  • Delay of network convergence
  • Increase in resource overhead
  • Known causes
  • Link and router failure
  • New computers and networks
  • Traffic congestion
  • Poorly implemented protocols
  • Timer synchronization
  • Instability is not well-understood
  • Huge volume of IDR traffic

8
Border Gateway Protocol1 of 2
  • Algorithm
  • 1. Learn neighbors
  • 2. Share reachability information with neighbors
  • 3. Continue sharing updated reachability
    information
  • Message types
  • Keep-alive
  • Update
  • Timers
  • Determining (non-)existence of neighbors
  • Managing flow of updates

9
BGP
BGP
Midd
UMaine
BGP
BGP
BGP
Dartmouth
BGP
BGP
MIT
BGP
10
Border Gateway Protocol2 of 2
  • Evaluating routes
  • No global metrics
  • Configurable policies
  • Consistent within each AS
  • Decision Process
  • Phase 1 calculate degree of preference
  • Phase 2 select routes for forwarding table
  • Phase 3 select routes for dissemination (to
    neighbors)

11
Observed Pathological Behaviors of Inter-Domain
Routing1 of 2
  • Rate of change of forwarding table info
  • Watch recent changes for repeats
  • Excessive updates
  • Updates only required when reachability changes
  • Route flapping
  • Route to destination rapidly changes paths /
    availability
  • Possible cause local instability

12
Observed Pathological Behaviors of Inter-Domain
Routing2 of 2
  • Route oscillation
  • Looks like flapping, but periodic
  • Additional possible cause routing policies
  • Periodic message bursts
  • Known cause timer synchronization
  • Prevention route flap damping and timer jitter
  • Useful for estimating instability levels

13
Overview
  • PART I Background
  • Complexity of large internetworks
  • Internetworks and routing
  • Routing instability
  • The Border Gateway Protocol
  • Observed pathological behaviors of inter-domain
    routing
  • PART II Hypotheses
  • Modeling routing instability
  • Suspected causes
  • Coping strategies
  • PART III Investigation
  • Network experimentation techniques
  • Feasibility of simulating instability
  • Model requirements
  • Model implementation
  • Identifying and measuring instability

14
Modeling Routing Instability
  • We can

15
Suspected Causes1 of 3
  • 1. Poor BGP implementation choices
  • Some have already led to problems (e.g. no
    jitter)
  • 2. BGP misconfiguration
  • IGP/BGP interaction is complex and lossy
  • 3. Admissible oscillation
  • Result of interaction of valid BGP policies
  • (Varadhan, Govindan, and Estrin, 1996)
  • 4. BGP timer synchronization
  • Just how bad is it to be synchronized?

16
Suspected Causes2 of 3
  • 5. Link router failure
  • To what degree to they contribute?
  • Find out what expected stability level is
  • 6. Intra-domain routing instability
  • BGP policies depending on IGP metrics
  • 7. Traffic congestion
  • Can cause connections to break
  • Can contribute to self-synchronization
  • 8. Changing network usage rates
  • Instability varies in proportion
  • May only be because of associated congestion

17
Suspected Causes3 of 3
  • 9. Other causes
  • Look for general signs of instability, trace back

18
Coping Strategies1 of 2
  • 1. Timer jittering adjustments
  • Is current jittering good enough?
  • Amount of randomness needed surprisingly high
  • (Floyd and Jacobson, 1994)
  • Periodic bursts still exist
  • (Labovitz, Malan, and Jahanian, 1999)
  • 2. Timers independent of events
  • Decrease chance of synchronization
  • Possible alternative to jittering

19
Coping Strategies2 of 2
  • 3. Outgoing route flap damping
  • Incoming damping prevents propagation, not
    origination
  • Could prevent internal routing instability
  • 4. Hierarchical network layout
  • Internet is becoming less and less hierarchical
  • Makes aggregation more effective

20
Overview
  • PART I Background
  • Complexity of large internetworks
  • Internetworks and routing
  • Routing instability
  • The Border Gateway Protocol
  • Observed pathological behaviors of inter-domain
    routing
  • PART II Hypotheses
  • Modeling routing instability
  • Suspected causes
  • Coping strategies
  • PART III Investigation
  • Network experimentation techniques
  • Feasibility of simulating instability
  • Model requirements
  • Model implementation
  • Identifying and measuring instability

21
Network Experimentation Techniques
  • Using network testbeds
  • TCP congestion avoidance (Jacobson and Karels,
    1988)
  • Packet drop strategies (Villamizar and Song,
    1994)
  • Gathering trace data for analysis
  • Logging BGP messages (Chinoy, 1993
  • Labovitz, Malan, and Jahanian, 1997
  • Govindan and Reddy, 1997)
  • Using simulation
  • ns (LBNL), TeD (Perumalla), home grown
  • Advantages detail, controllability, repeatability

22
Feasibility of Simulating Instability1 of 3
  • Boils down to two factors
  • Can we build a model with enough detail?
  • Effort and careful planning
  • SSFNet component repository
  • Do we have powerful enough tools to simulate such
    a model in a reasonable amount of time?
  • Parallelization introduced potential for big
    speed increases
  • SSF

23
Feasibility of Simulating Instability2 of 3
  • Scalable Simulation Framework (SSF)
  • Generalized framework for parallel simulation
  • Two primary implementations, in Java and C
  • JSSF
  • Java
  • Large network component repository (SSFNet)
  • Many large, detailed models simulated
  • Room for improvement in performance
  • DaSSF
  • C
  • Has simulated huge models
  • Fewer detailed components implemented
  • Performance is excellent

24
Feasibility of Simulating Instability3 of 3
  • What is fast enough?
  • DaSSF 80,000 nodes at 1,000,000 packet
    events/sec on 14 processors
  • Estimate at max 100 ASes, 3 hours
  • gt 100,000 nodes 100 billion packet events
  • At 1 million packet events/sec gt 1 day

25
Model Requirements1 of 3
  • General principle include as much as possible
  • 1. Large enough topology
  • Extrapolation from small models may not be
    accurate
  • 2. Representative topology
  • Lack of a clean hierarchy has great effect on
    routing dynamics
  • 3. Routers must implement standard congestion
    avoidance algorithms
  • Congestion can greatly alter traffic dynamics and
    affect routing

26
Model Requirements2 of 3
  • 4. BGP must be fully compliant and fully
    configurable
  • Configurable gt can vary routing policies
    (increase heterogeneity)
  • 5. TCP must be fully compliant and fully
    configurable
  • Intricacies are a prime suspect
  • 6. Realistic traffic model
  • (Willinger et al., 1993 and 1998)
  • Yields realistic congestion

27
Model Requirements3 of 3
  • 7. Realistic router and link behavior
  • Router buffering and latency
  • Link bandwidth and delay
  • 8. Intra-domain routing protocol implementation
  • Some configurations may actually affect
    inter-domain routing
  • 9. Model must allow for heterogeneous
    configuration
  • Network components of the same type with
    different characteristics
  • 10. Model must imitate typical network usage
    fluctuations
  • Instability is known to fluctuate in similar
    patterns

28
Model Implementation Suspected Causes1 of 2
  • 1. Poor BGP implementation choices
  • 2. BGP misconfiguration
  • 3. Admissible oscillation
  • 4. BGP timer synchronization
  • We can turn off jitter

29
Model Implementation Suspected Causes2 of 2
  • 5. Link router failure
  • 6. Intra-domain routing instability
  • Modify OSPF to alternate between exit points
  • 7. Traffic congestion
  • Increase number of clients and/or connection
    rates
  • 8. Changing network usage rates
  • Modified traffic clients

30
Model Implementation Coping Strategies
  • 1. Timer jittering adjustments
  • Modify already existing jitter algorithm
  • 2. Timers independent of events
  • Dont reset timers when message events arrive
  • 3. Outgoing route flap damping
  • Use same method as for incoming damping
  • 4. Hierarchical network layout
  • Just use DML

31
Identifying and Measuring Instability
  • Showing existence
  • Watch forwarding table changes
  • Look for pathological behaviors
  • Identifying the cause
  • Trace back from pathological behaviors and
    congestion
  • Choose thresholds for each behavior
  • Repeat simulation, observe more closely
  • Measuring
  • Count forwarding table changes
  • Count occurrences of pathological behaviors

32
Summary
  • Routing instability is not well-understood
  • Hypotheses
  • 1. We can model instability
  • 2. Suspected causes
  • 3. Coping strategies
  • Investigation
  • Simulation of detailed models
  • Required model attributes
  • Measuring instability

33
Time Line
Write a Comment
User Comments (0)
About PowerShow.com