Protection and Restoration - PowerPoint PPT Presentation

About This Presentation

Title:

Protection and Restoration

Description:

Protection and Restoration. Definitions. A major application for MPLS. The problem ... Must have global (platform) label space. Need some RSVP support ... – PowerPoint PPT presentation

Number of Views:14

Avg rating:3.0/5.0

Slides: 25

Provided by: csd6

Category:

more less

Transcript and Presenter's Notes

Title: Protection and Restoration

1
Protection and Restoration

Definitions
A major application for MPLS

2
The problem

Network resources will fail
Nodes and links
IGP will re-converge
But this may take some time
10s of seconds
Fast convergence has a price
May make IGP more sensitive/unstable
I may have sensitive traffic that can not afford
interruptions
Voice, Consumer TV
Do something for the time until IGP re-converges

3
Terminology

Restoration
Bring traffic back to normal
Backup
Alternative resources to be used when there is a
failure
Protection
Determine and allocate the backup resources
before the failure
When there is a failure just activate them
Can be very fast
Repair
Determine, allocate and activate the backup
resources after the failure
Will be slower

4
Failure Modes

Single vs. multiple link failures
If duration of link failure is short, can assume
that there will be only a single link failure
Much harder to deal with multiple link failures
Node vs. link failures
Can assume that links will fail more frequently
than nodes
Node failures are harder to handle

5
Backup resources

Can be multiple types
Links
Paths
Trees
Cycles
Whole topologies
In order to avoid network overload after a
failure need to have some extra capacity for
backup resources
Problem is how to engineer them so as not to make
the network too expensive
Minimize the amount of backup capacity that is
reserved

6
More jargon

11
1 working, 1 backup
Wastes a lot of bandwidth for the backups
1N
N working and 1 backup
Assume that only 1 working will fail
Then 1 backup is enough save bandwidth
Revertive
when the failure is fixed, revert to the primary
SRLG Shared Risk Link Group
A set of network links that fails together
E.g fibers that are in the same conduit
A bulldozer will cut all of them together

7
Other issues

How to detect the failure fast
BFD is one general solution
There are medium specific solutions
OAM for ATM
Alarms for SONET
Preferable if they exist
Protocol mechanisms (RSVP HELLOs, OSPF HELLOs,
etc)
How to activate the backup
I.e how to make traffic use an alternate path, or
a tree

8
Backbone failure analysis

Sprint backbone ca. March 2002
Link in class website
Monitor IS-IS traffic
Data only for link failures, not node failures
Failure Duration
50 failures last less than 1 min
40 failures last between 1 and 20 min
Maintenance
50 of failures during maintenance windows
Mean time between failure (MTBF)
Mean time between failures varies a lot across
links
good and bad links
3 bad links account for 25 of the failures

9
More analysis

Unplanned failure breakdown
Shared link failures 30
Router related 16.5
Optical related 11.5
Individual link failures 70
Node failures less common that single link
failures
About 16.5 of failures affect more than 1 link

10
Handling failures with IP

Easy case
ECMP, no need to do anything extra during failure
But it may not repair all failures
Coverage what percentage of the possible
failures can be repaired
In general activating backup resources is hard
with IP
Packets will follow the IP route table/FIB
Forwarding is hop-by-hop
Even if I compute a backup link for a failure, I
have no control what will happen after the next
hop
May have routing loops

11
IP protection

Backup next-hop
Each node computes a backup nexthop for each
destination
so that I will not have routing loops
It may not have 100 coverage
For more general solutions I need tunneling
Must force packets to reach their destination
Without crossing the failed resource
Tunnel to the node after the failed link
Tunnel to an intermediate node
IP tunneling is an expensive operation
It is packet encapsulation

12
Not-Via addresses

Consider router A, with interfaces A1, A2, A3
A1 connects to interface B1 or router B,
A2 connects to interface C2 of router C
B1 has a second address B1-not-via-A
All routers compute paths to B1-not-via-A by
removing router A from topology and running SPF
When router A fails, if C wants to reach B sends
packets to address B1-not-via-A
Encapsulates the packets
100 coverage
Can handle node and link failures
Still needs encapsulation

13
Multi-topology protection

New approach
Have multiple subsets of the topology
IGP protocols already support multi-topology
routing
Switch to a different topology when there is a
failure
By modifying the header of the packet
Or even using an MPLS label
Allows for more flexible routing of traffic after
a failure

14
Using MPLS

MPLS can conveniently direct traffic where I want
Ideal for setting up backup resources
Mostly backup paths
Can be used to repair both IP and MPLS failures
(I.e. LSP failure)
LSP protection can be
Path
Local

15
Path protection

For each LSP (primary) have a backup LSP
It is already established (with RSVP) but it is
not carrying any traffic
Primary and backup LSPs should be link and node
disjoint
When there a failure the source of the LSP will
start sending traffic to the backup
Source needs to be notified for the failure
May take some time for the repair of the traffic
Can work in both 11 and 1N modes

16
Local protection

When a link or node fails the node upstream from
the failure repairs the traffic
Traffic is put into a back LSP that does not go
over the failed resource
Backup LSP merges with the primary LSP
Repairing router does not send a PATHerr upstream
Instead notify upstream nodes that it is
repairing the failure
It is very fast
Can work in 11 and 1N modes
Can be
Node
Bypass a failed node
Link
Bypass a failed link

17
Link local protection

The node upstream of the failed link initiates
the protection
Point of local repair (PLR)
Backup LSP will merge back to the primary one
At the next-hop (Nhop) of the PLR
Can work in 11 and 1N modes
Usually a single backup LSP protects multiple
primary LSPs
Else scalability is not good

18
Node local protection

When a node fails, assume its links have failed
too
The node upstream of the failed node initiates
the protection
Point of local repair (PLR)
Backup LSP will merge back to the primary one
At the next-next-hop (NNHop) of the PLR
What label does the NNHop use for the primary
LSP?
Need RSVPs help to find out
Will need multiple backup LSPs for each node
At least one for each NNHop
Can optionally configure more

19
Label stacking

Each time I send traffic into an LSP I push a
label on the packets
Packets in the primary LSP already have a label
I create a label stack
Top label is popped by the router just before the
merge point
A catch
At the merge point, packet arrives from an
interface different than the expected one
Must have global (platform) label space

20
Need some RSVP support

If the LSP is protected do not send a errors
upstream/downstream when there is a failure
Instead notify upstream nodes that repair is in
progress
During failure the PATH,RESV for the primary LSP
must continue
Send them through the backup LSP
For node protection need to know the label the
NNHop is using for the primary
Use the record label option for the LSP
All the labels used in all the hops are recorded
in the RESV message

21
LSP protecting IP

Can use the above techniques to also protect IP
traffic
If a link fails all the traffic that would go
through the link is sent over the backup LSP
Similar for node failures
But in this case, do I know the nnhop for IP?
In general, If I have MPLS in my network all my
traffic will be inside MPLS tunnels anyway

22
Observations

If node degree is d and I have N nodes then
I need at least O(Nd) tunnels for link protection
And at least O(Nd2) for node protection
Of course I can not protect from failures of the
ingress or egress node
The assumption is that failures will be short
lived
Traffic may be unbalanced during the failure
Links can get overloaded

23
The resource allocation problem

How do I setup the backup tunnels so that
I do not overload any link after a failure
I minimize the amount of extra bandwidth that
will need to be reserved for the backups
It is a form of traffic engineering (TE)
We will see more on TE later on
Has been studied a lot
In optical and telephone networks
And recently in MPLS type networks
Solutions can be
On-line (as the requests arrive)
Off-line

24
Example

Kodialam, Lakshman, 2001
Local link and node protection
Assume I know the b/w demands of all LSPs
Assume that only one link or node can fail at a
time
Find a set of backup paths that minimizes the
amount of bandwidth for both primary and backup
LSPs
Backup LSPs can share bandwidth on some links
What do I know about the links?
How much bandwidth is used by each LSP
Complete but expensive to maintain
How much bandwidth is available
Almost zero information
How much bandwidth is used by backup LSPs
Little bit better than zero