Title: Congestion Avoidance
1Congestion Avoidance Controlfor OSPF
Networks(draft-ash-manral-ospf-congestion-control
-00.txt)
Anurag Maunder Sanera Systems amaunder_at_sanera.net
Jerry Ash ATT gash_at_att.com
Gagan Choudhury ATT choudhury_at_att.com
Vera Sapozhnikova ATT sapozhnikova_at_att.com
Vishwas Manral NetPlane Systems vishwasm_at_netplane
.com
Mostafa Hashem Sherif ATT mhs_at_att.com
2Outline (draft-ash-manral-ospf-congestion-control
-00.txt)
- problem
- concerns over scalability of IGP link-state
protocols (e.g., OSPF) - much evidence that LS protocols cannot recover
from large failures widespread loss of topology
database information - failure experience
- vendor analysis
- simulation modeling
- propose protocol mechanisms to address problem
- throttle LSA updates/retranmissions
- detect notify congestion state
- neighbor nodes throttle LSA updates/retransmission
s - keep adjacencies up
- database backup resynchronization
- proprietary implementations of mechanisms have
improved scalability/stability - need standard features for uniform implementation
interoperability - issues discussed on list
3Background Motivation
- Failure experience
- LS routing protocols cannot recover from large
flooding storms - triggered by wide range of causes network
failures, bugs, operational errors, etc. - flooding storm overwhelms processors, causes
database asynchrony incorrect shortest path
calculation, etc. - ATT has experienced several very large LS
protocol failures (4/13/1998, 7/2000, 2/20/2001,
described in I-D) - vendor analysis of LS protocol recovery from
total network failure(loss of all database
information in the specified scenario, 400 nodes,
etc.) - recovery time estimates up to 5.5 hours
- expectation is that vendor equipment recovery not
adequate under large failure scenario - network-wide event simulation model choudhury
- medium to large flooding storms cause network to
recover with difficulty and/or not recover at all - model validated -- results match actual network
experience
4Failure ExperienceATT Frame Relay Network,
4/13/98
- cause effect
- administrative error coupled with a software bug
- result was the loss of all topology database
information - the link-state protocol then attempted to recover
the database with the usual Hello topology
state updates (TSUs) - huge overload of control messages kept network
down for very long time - several problems occurred to prevent the network
from recovering properly (based on root-cause
analysis) - very large number of TSUs being sent to every
node to process, causing general processor
overload - route computation based on incomplete topology
recovery routes generated based on transient,
asynchronous topology information then in need
of frequent re-computation - inadequate work queue management to allow
processes to complete before more work is put
into the process queue - inability to access node processors with network
management commands due to lack of necessary
priority of these messages - worked with vendor to make protocol fixes to
address problems - along the lines suggested in the I-D
5Proposed Protocol MechanismsThrottle LSA
Updates/Retransmissions
- detect node-congestion by
- length of internal work queues
- high processor occupancy long CPU busy times
- notify congestion state to other nodes
- use TBD packet to convey congestion signal
- when a node detects congestion from a neighbor
- progressively decrease flooding rate, e.g.
- double LSA_RETRANSMIT_INTERVAL for low congestion
- quadruple LSA_RETRANSMIT_INTERVAL for high
congestion - simulation analysis shows proposed mechanisms
perform effectively (Choudhury) - deals better with non-linear failure modes than
statistical detection/notification methods
6Issues Discussed on List
- is there a problem (need to prevent catastrophic
network collapse) - most seem to agree there is a problem
- several have observed LSA storms their ill
effects - storms triggered by hardware failure, software
bug, faulty operational practice, etc., many
different events - sometimes network cannot recover
- unacceptable to operators
- vendors invited to analyze failure scenario given
in draft - no response yet
- how to solve problem
- better/smart implementation/coding of protocol
within current specification - e.g., never losing an adjacency solves problem
- these are proprietary, single-vendor,
implementation extensions - standard protocol extensions
- for uniform implementation
- for multi-vendor interoperability
- already demonstrated with proprietary,
single-vendor implementations
7Issues Discussed on List
- what protocol extensions?
- not just signaling congestion message on the
wire but also response - need uniform response to congestion signal slow
down by this much to be effective - rather than implementation dependent response
- like helper router response to grace LSA from
congested router in hitless restart - how evaluate effectiveness of proposals
- expert analysis based on experience
- simulation
- a couple of academic shaky simulation
comments - validated simulations used widely
- for network design of routing features, nm
features, congestion control, etc. - for many years
- many large-scale network design examples (e.g.,
Dynamic Routing in Telecommunications Networks,
McGraw Hill) - white-box approach
- implement text in the lab
- expert analysis, simulation, white-box all useful
8Issues Discussed at IETF-55Routing Area Meeting
MPLS WG Meeting
- box builders view
- stop intruding into our box
- design choices should be made by box builders
- nothing wrong with current way of building boxes
- box users view
- still observe major failures
- most agree there is a problem (from list
discussion) - box-builder/vendor analysis shows unacceptable
failure response (in draft) - box-builders/vendors invited to analyze scenario
in draft - box-builders approach doesnt work to prevent
failures - boxes need a few, critical, standard protocol
mechanisms to address problem - have gotten vendors to make proprietary changes
to fix problem - require standard protocol extensions
- for uniform implementation
- for multi-vendor interoperability
- user requirements need to drive solution to
problem
9Conclusions
- problem
- concerns over scalability of IGP link-state
protocols - evidence that LS routing protocols (e.g., OSPF)
currently can not recover from large failures
widespread loss of topology database information - problem is flooding, data base asynchrony,
shortest path calculation, etc. - evidence based on failure experience, vendor
analysis, simulation modeling - propose protocol mechanisms to address problem,
e.g. - throttle LSA update/retransmissions
- detect notify congestion state
- neighbor nodes throttle LSA updates/retransmission
s - simulation analysis shows effectiveness of
proposed changes (Choudhury) - propose draft as an OSPF WG document
- refine/evolve proposed protocol extensions
10Backup Slides
11Proposed Congestion Control Mechanisms
- throttle LSA updates/retransmissions
- detect notify congestion state
- congested node signals other nodes to limit rate
of LSA messages sent to it - neighbor nodes throttle LSA updates/retransmission
s - automatically reduce rate under congestion
- keep adjacencies up
- database backup resynchronization
- topology database automatically recovered from
loss based on local backup mechanisms - allows a node to recover gracefully from local
faults on the node - prioritized processing of Hello LSA Ack
messages (Choudhury draft)
12Keep Adjacencies Up
- increase adjacency break interval under
congestion - goal is to avoid breaking adjacencies by
increasing wait interval for non-receipt of Hello
messages - if node detects congestion from a neighbor if
no packet received in NODE_DEAD-INTERVAL - wait additional time ADJACENCY_BREAK_INTERAL
before calling adjacency down - throttle setups of link adjacencies
- define MAX_ADJACENCY_BUILD_COUNT maximum number
of adjacencies a node can bring up at one time
13Database Backup Resynchronization
- database backup
- node should provide a local, primary, nonvolatile
memory backup GR-472-CORE - node should back up all non-self-originated LSAs,
routing tables, states of interfaces - database should be backed up at least every 5
minutes - restoration of data should be completed within 5
minutes of initiation GR-472-CORE - nodes signal neighbors when safe to perform
resynchronization procedures - based on TBD packet format
- under resynchronization, node
- should generate all its own LSAs
- should receive only LSAs that have changed
between time it failed current time - should base its routing on current database,
derived as above
14Database Backup Resynchronization
- database resynchronization
- propose changes to receiving/transmitting
database summary LSA request packets - when in full state
- node sends receives database summary LSA
request packets as if performing database
synchronization when peer data structure is in
Negotiating, Exchanging, loading states - node informs neighbor when to use resync
procedures - node supports resync to neighbor request by
receiving/transmitting database summary LSA
request packets
15Failure Experience
- other failures which have occurred with similar
consequences - moderate TSU storm following ATM nodes upgrade,
7/2000 - network recovered, with difficulty
- large TSU storm in ATM network, 2/20/2001
pappalardo1, pappalardo2 - manual procedures required to reduce TSU flooding
stabilize network - desirable to automate procedures for TSU flooding
reduction under overload - worked with vendor to make protocol fixes to
address problems - along the lines suggested in the I-D
- other relevant LS-network failures have been
reported cholewka, jander - conclusions
- LS vulnerable to loss of database information,
control overload to re-sync databases, other
failure/overload scenarios - networks more vulnerable in absence of adequate
protection mechanisms - generic problem of LS protocols
- across a variety of implementations
- across FR, ATM, IP-based technologies
16Vendor Analysis
- vendors service providers asked to analyze LS
protocol recovery from total network failure(loss
of all database information in the specified
scenario - network scenario
- 400 node network
- 100 backbone nodes
- 3 edge nodes per backbone node (edge single
homed) - backbone nodes connected to max of 10 backbone
nodes - max node adjacency is 13
- sparse network
- 101 peer groups
- 1 backbone peer group with 100 backbone nodes
- 100 edge peer groups, each with 3 nodes, all
homed on the backbone peer group - 1,000,000 addresses advertised
17Vendor Analysis
- projected recovery times
- Recovery Time Estimate A 3.5 hours
- Recovery Time Estimate B 5-15 minutes
- Recovery Time Estimate C 5.5 hours
- expectation is that vendor equipment recovery not
adequate under large failure scenario
18Analysis Modeling
- various studies published atmf00-0249, maunder,
choudhury - choudhury reports network-wide event simulation
model - study impact of a TSU storm
- captures
- node congestion
- propagation delay between nodes
- retransmissions if TSU not acknowledged within 5
seconds - link declared down if Hello delayed beyond
node-dead interval (aka inactivity timer in
PNNI, router-dead interval in OSPF) - link recovery following database synchronization
- approximates real network behavior processing
times - results show
- dispersion -- number of control packets generated
but not processed in at least one node - medium to large TSU storms cause network to
recover with difficulty and/or not recover at all - results match actual network experience
19Impact of TSU Storm on Network Stability