Protection and Restoration - PowerPoint PPT Presentation

About This Presentation
Title:

Protection and Restoration

Description:

Protection and Restoration. Definitions. A major application for MPLS. The problem ... Must have global (platform) label space. Need some RSVP support ... – PowerPoint PPT presentation

Number of Views:14
Avg rating:3.0/5.0
Slides: 25
Provided by: csd6
Category:

less

Transcript and Presenter's Notes

Title: Protection and Restoration


1
Protection and Restoration
  • Definitions
  • A major application for MPLS

2
The problem
  • Network resources will fail
  • Nodes and links
  • IGP will re-converge
  • But this may take some time
  • 10s of seconds
  • Fast convergence has a price
  • May make IGP more sensitive/unstable
  • I may have sensitive traffic that can not afford
    interruptions
  • Voice, Consumer TV
  • Do something for the time until IGP re-converges

3
Terminology
  • Restoration
  • Bring traffic back to normal
  • Backup
  • Alternative resources to be used when there is a
    failure
  • Protection
  • Determine and allocate the backup resources
    before the failure
  • When there is a failure just activate them
  • Can be very fast
  • Repair
  • Determine, allocate and activate the backup
    resources after the failure
  • Will be slower

4
Failure Modes
  • Single vs. multiple link failures
  • If duration of link failure is short, can assume
    that there will be only a single link failure
  • Much harder to deal with multiple link failures
  • Node vs. link failures
  • Can assume that links will fail more frequently
    than nodes
  • Node failures are harder to handle

5
Backup resources
  • Can be multiple types
  • Links
  • Paths
  • Trees
  • Cycles
  • Whole topologies
  • In order to avoid network overload after a
    failure need to have some extra capacity for
    backup resources
  • Problem is how to engineer them so as not to make
    the network too expensive
  • Minimize the amount of backup capacity that is
    reserved

6
More jargon
  • 11
  • 1 working, 1 backup
  • Wastes a lot of bandwidth for the backups
  • 1N
  • N working and 1 backup
  • Assume that only 1 working will fail
  • Then 1 backup is enough save bandwidth
  • Revertive
  • when the failure is fixed, revert to the primary
  • SRLG Shared Risk Link Group
  • A set of network links that fails together
  • E.g fibers that are in the same conduit
  • A bulldozer will cut all of them together

7
Other issues
  • How to detect the failure fast
  • BFD is one general solution
  • There are medium specific solutions
  • OAM for ATM
  • Alarms for SONET
  • Preferable if they exist
  • Protocol mechanisms (RSVP HELLOs, OSPF HELLOs,
    etc)
  • How to activate the backup
  • I.e how to make traffic use an alternate path, or
    a tree

8
Backbone failure analysis
  • Sprint backbone ca. March 2002
  • Link in class website
  • Monitor IS-IS traffic
  • Data only for link failures, not node failures
  • Failure Duration
  • 50 failures last less than 1 min
  • 40 failures last between 1 and 20 min
  • Maintenance
  • 50 of failures during maintenance windows
  • Mean time between failure (MTBF)
  • Mean time between failures varies a lot across
    links
  • good and bad links
  • 3 bad links account for 25 of the failures

9
More analysis
  • Unplanned failure breakdown
  • Shared link failures 30
  • Router related 16.5
  • Optical related 11.5
  • Individual link failures 70
  • Node failures less common that single link
    failures
  • About 16.5 of failures affect more than 1 link

10
Handling failures with IP
  • Easy case
  • ECMP, no need to do anything extra during failure
  • But it may not repair all failures
  • Coverage what percentage of the possible
    failures can be repaired
  • In general activating backup resources is hard
    with IP
  • Packets will follow the IP route table/FIB
  • Forwarding is hop-by-hop
  • Even if I compute a backup link for a failure, I
    have no control what will happen after the next
    hop
  • May have routing loops

11
IP protection
  • Backup next-hop
  • Each node computes a backup nexthop for each
    destination
  • so that I will not have routing loops
  • It may not have 100 coverage
  • For more general solutions I need tunneling
  • Must force packets to reach their destination
  • Without crossing the failed resource
  • Tunnel to the node after the failed link
  • Tunnel to an intermediate node
  • IP tunneling is an expensive operation
  • It is packet encapsulation

12
Not-Via addresses
  • Consider router A, with interfaces A1, A2, A3
  • A1 connects to interface B1 or router B,
  • A2 connects to interface C2 of router C
  • B1 has a second address B1-not-via-A
  • All routers compute paths to B1-not-via-A by
    removing router A from topology and running SPF
  • When router A fails, if C wants to reach B sends
    packets to address B1-not-via-A
  • Encapsulates the packets
  • 100 coverage
  • Can handle node and link failures
  • Still needs encapsulation

13
Multi-topology protection
  • New approach
  • Have multiple subsets of the topology
  • IGP protocols already support multi-topology
    routing
  • Switch to a different topology when there is a
    failure
  • By modifying the header of the packet
  • Or even using an MPLS label
  • Allows for more flexible routing of traffic after
    a failure

14
Using MPLS
  • MPLS can conveniently direct traffic where I want
  • Ideal for setting up backup resources
  • Mostly backup paths
  • Can be used to repair both IP and MPLS failures
    (I.e. LSP failure)
  • LSP protection can be
  • Path
  • Local

15
Path protection
  • For each LSP (primary) have a backup LSP
  • It is already established (with RSVP) but it is
    not carrying any traffic
  • Primary and backup LSPs should be link and node
    disjoint
  • When there a failure the source of the LSP will
    start sending traffic to the backup
  • Source needs to be notified for the failure
  • May take some time for the repair of the traffic
  • Can work in both 11 and 1N modes

16
Local protection
  • When a link or node fails the node upstream from
    the failure repairs the traffic
  • Traffic is put into a back LSP that does not go
    over the failed resource
  • Backup LSP merges with the primary LSP
  • Repairing router does not send a PATHerr upstream
  • Instead notify upstream nodes that it is
    repairing the failure
  • It is very fast
  • Can work in 11 and 1N modes
  • Can be
  • Node
  • Bypass a failed node
  • Link
  • Bypass a failed link

17
Link local protection
  • The node upstream of the failed link initiates
    the protection
  • Point of local repair (PLR)
  • Backup LSP will merge back to the primary one
  • At the next-hop (Nhop) of the PLR
  • Can work in 11 and 1N modes
  • Usually a single backup LSP protects multiple
    primary LSPs
  • Else scalability is not good

18
Node local protection
  • When a node fails, assume its links have failed
    too
  • The node upstream of the failed node initiates
    the protection
  • Point of local repair (PLR)
  • Backup LSP will merge back to the primary one
  • At the next-next-hop (NNHop) of the PLR
  • What label does the NNHop use for the primary
    LSP?
  • Need RSVPs help to find out
  • Will need multiple backup LSPs for each node
  • At least one for each NNHop
  • Can optionally configure more

19
Label stacking
  • Each time I send traffic into an LSP I push a
    label on the packets
  • Packets in the primary LSP already have a label
  • I create a label stack
  • Top label is popped by the router just before the
    merge point
  • A catch
  • At the merge point, packet arrives from an
    interface different than the expected one
  • Must have global (platform) label space

20
Need some RSVP support
  • If the LSP is protected do not send a errors
    upstream/downstream when there is a failure
  • Instead notify upstream nodes that repair is in
    progress
  • During failure the PATH,RESV for the primary LSP
    must continue
  • Send them through the backup LSP
  • For node protection need to know the label the
    NNHop is using for the primary
  • Use the record label option for the LSP
  • All the labels used in all the hops are recorded
    in the RESV message

21
LSP protecting IP
  • Can use the above techniques to also protect IP
    traffic
  • If a link fails all the traffic that would go
    through the link is sent over the backup LSP
  • Similar for node failures
  • But in this case, do I know the nnhop for IP?
  • In general, If I have MPLS in my network all my
    traffic will be inside MPLS tunnels anyway

22
Observations
  • If node degree is d and I have N nodes then
  • I need at least O(Nd) tunnels for link protection
  • And at least O(Nd2) for node protection
  • Of course I can not protect from failures of the
    ingress or egress node
  • The assumption is that failures will be short
    lived
  • Traffic may be unbalanced during the failure
  • Links can get overloaded

23
The resource allocation problem
  • How do I setup the backup tunnels so that
  • I do not overload any link after a failure
  • I minimize the amount of extra bandwidth that
    will need to be reserved for the backups
  • It is a form of traffic engineering (TE)
  • We will see more on TE later on
  • Has been studied a lot
  • In optical and telephone networks
  • And recently in MPLS type networks
  • Solutions can be
  • On-line (as the requests arrive)
  • Off-line

24
Example
  • Kodialam, Lakshman, 2001
  • Local link and node protection
  • Assume I know the b/w demands of all LSPs
  • Assume that only one link or node can fail at a
    time
  • Find a set of backup paths that minimizes the
    amount of bandwidth for both primary and backup
    LSPs
  • Backup LSPs can share bandwidth on some links
  • What do I know about the links?
  • How much bandwidth is used by each LSP
  • Complete but expensive to maintain
  • How much bandwidth is available
  • Almost zero information
  • How much bandwidth is used by backup LSPs
  • Little bit better than zero
Write a Comment
User Comments (0)
About PowerShow.com