Automatic Protection Switching presentation

About This Presentation

Transcript and Presenter's Notes

Title: Automatic Protection Switching

1
AutomaticProtection Switching

Yaakov (J) Stein
CTO
RAD Data Communications

Mar 2012
2

Course Outline
General protection switching principles
Examples of protection mechanisms
SONET/SDH
Ethernet linear protection
Ethernet ring protection
MPLS fast reroute
MPLS-TP APS

3
General principles
Definition References Traffic types Network
topologies Triggers Protection classes Entities Pr
otection types Signaling
4
Definition

Automatic Protection Switching (APS)
is a functionality of carrier-grade transport
networks
is often called resilience
since it enables service to quickly recover
from failures
is required to ensure high reliability and
availability
APS includes
detection of failures (signal fail or signal
degrade) on a working channel
switching traffic transmission to a protection
channel
selecting traffic reception from the protection
channel
(optionally) reverting back to the working
channel once failure is repaired
Automatic means uses (at most) control plane
protocols
no management layer or manual operations
needed

5
Some useful references

G.808.1 generic linear protection
G.808.2 generic ring protection (not yet
written)
G.841 and G.842 SDH
G.774.3/4/9/10 SDH protection management
G.870 and G.873.1 OTN
G.8031 Ethernet linear protection
G.8032 Ethernet ring protection
G.8131 T-MPLS APS
Y.1720 MPLS
I.630 ATM
M.495 analog signal protection
G.781 clock selection (can be used to protect
synchronization)
RFC 4090 MPLS Fast ReRoute
RFC 6372 MPLS-TP Survivability Framework
RFC 6378 MPLS-TP Linear Protection

6
Traffic types

In a network with APS capabilities, there are
three types of traffic
protected traffic
traffic that may be rapidly switched to
protection channel
at any time it may be on the working channel or
protection channel
Nonpreemptible Unprotected Traffic (NUT)
noncritical traffic that does not require
protection mechanism
not affected by protection mechanism
somewhat less expensive to customer
extra (preemptible) traffic
best effort background traffic that runs on
protection channel
preempted (blocked) when protection channel is
needed
very inexpensive to customer

7
Network topologies

APS can be defined for any topology with
redundant links
e.g., for tree topologies no protection is
possible
We will often discuss protection of individual
links
However, there are two topologies that are of
particular interest
rings
protection is natural for rings
although there are other reasons for using rings
as well
rings are so important that protection for other
topologies
is often called linear protection
dense meshes
for this topology multiple local bypasses can be
preconfigured
protection switching is similar to routing
change, but faster
often called Fast ReRoute (FRR)

8
Triggers

Protection switching is usually triggered by a
failure
although the operator may manually force a
protection switch
A failure is declared when a fault condition
persists long enough
for the ability to perform the required function
to be considered terminated
Failures are Signal Fail (SF) or Signal Degrade
(SD) (of various types)
and may be
detected by physical layer
indicated by signaling (e.g. AIS)
detected by OAM mechanisms
When there is no SF or SD, the state is called No
Request (NR)

9
Switching time (1)

SONET/SDH protection switching takes place in
under 50 ms
Regarding multiplex section shared protection
rings, G.841 states
The following network objectives apply
1) Switch time In a ring with no extra traffic,
all nodes in the idle state (no detected
failures,
no active automatic or external commands, and
receiving only Idle K-bytes), and with less
than 1200 km of fibre, the switch (ring and span)
completion time for a failure on a single
span shall be less than 50 ms. On rings under all
other conditions, the switch completion
time can exceed 50 ms (the specific interval is
under study) to allow time to remove extra
traffic, or to negotiate and accommodate
coexisting APS requests.
while for linear VC trail protection, it says
The following network objectives apply
1) Switch time The APS algorithm for LO/HO VC
trail protection shall operate as fast as
possible. A value of 50 ms has been proposed as a
target time. Concerns have been
expressed over this proposed target time when
many VCs are involved. This is for further
study. Protection switch completion time excludes
the detection time necessary to initiate the
protection switch, and the hold-off time.
There are similar statements in other clauses as
well

10
Switching time (2)

This 50 ms time has become the golden standard
and new protection schemes are expected to meet
this objective
However, studying the literature that lead up to
SONET/SDH standards
shows that the objective was to attain the
minimum possible time
for the sum of
persistent (i.e. non-transient) failure detection
speed of light propagation
signaling protocol time
regaining sync alignment
and 50 ms was the minimum that was considered
practical !
Many modern standards have built in 50 ms
and much marketing literature boasts faster
than 50 ms
But there is really nothing special about 50 ms
50 ms gaps in voiced speech are noticeable,
but not fatal if infrequent
50 ms of data at high rates can not be stored and
later forwarded
timing circuits can withstand much more than 50
ms without clock

11
Protection classes

It is useful to distinguish two different
protection classes
path protection (AKA trail protection, end-to-end
protection)
when a failure is detected on the end-to-end path
we switch to an alternative end-to-end path
the failure is usually detected by end-to-end OAM
local protection (AKA local restoration, SNC
protection, bypass, detour)
we protect individual network elements, links, or
groups of same
when such an entity fails
only that local entity is bypassed
the failure may be detected by link OAM or
physical layer means

12
APS entities (1)

The following entities are important in APS
working channel channel used when no failure
exists
protection channel channel used when a failure
exists
head-end entity transmitting data to
working/protection channel
tail-end entity receiving data from the
working/protection channel
Note we will usually consider traffic to be
bidirectional
so that the head-end for one direction
is the tail-end for the opposite direction

13
APS entities (2)

Bridge function at head-end that connects
traffic (including extra traffic) to the working
and protection channels
Selector function at tail-end that extracts
traffic (perhaps extra traffic) from the working
or protection channel
APS signaling channel channel used to
communicate between head-end and tail-end for APS
purposes
Trail termination function responsible for
failure detection
including injection and extraction of OAM

working channel
tail-end (selector)
head-end (bridge)
protection channel
signaling channel
14
Revertive operation

Reversion means returning to use the working
channel
after the failure has been rectified
Protection mechanisms can be revertive or
nonrevertive
Revertive mechanisms may be preferable
when the working channel has better performance
(free BW, BER, delay)
when there are frequent switches (easier to
manage)
when there is extra traffic
but nonrevertive also has advantages
only one service disruption due to protection
switching
may be simpler to implement

15
Uni/bi-directional

We will usually consider bidirectional traffic
but even then the failures can be uni- or bi-
directional
and for unidirectional failures there can be uni-
or bi- directional switching

16
Uni- / bi- directional switching

Unidirectional switching may be advantageous
for 11 - faster and no signaling channel is
needed
no unnecessary service disruption for direction
without failure
higher chance of protection under multiple
failures
easier to implement for local protection
maintains extra traffic in direction without
failure
But bidirectional may be preferable
easier management since directions traverse same
network elements
does not disrupt delay balance between direction
may simplify repair since failed spans are unused

17
Protection types

We distinguish several different protection types
11
11
1n
mn
(11)n
Each type has its applicability, advantages, and
disadvantages
and there are trade-offs between
simplicity
BW consumption
protection switch time
signaling requirements

18
11 protection

Simplest and fastest form of protection
but wasteful - only 50 of actual physical
capacity is used
Head-end bridge always sends data on both
channels
Tail-end selector chooses channel to use (based
on BER, dLOS, etc.)
For unidirectional11 switching there is no need
for APS signaling
If non-revertive
there is no distinction between working and
protection channels

19
11 protection

Head-end bridge usually sends data on working
channel
When failure detected it starts sending data over
protection channel
and tail-end needs to select the protection
channel
When not in use, protection channel can be used
for extra traffic
However, since failure is detected by tail-end,
APS signaling is needed
Protection channel should have OAM running to
ensure its functionality

20
1n protection

One protection channel is allocated for n working
channels
Only can protect one working channel at a time
but improbable that more than 1 working channel
will simultaneously fail
Only 1/(n1) of total capacity is reserved for
protection

21
mn protection

To enable protection of more than 1 channel
m protection channels are allocated for n working
channels (m lt n)
m simultaneous failures can be protected
Less protection capacity dedicated than for n
times 11
When failure detected,
1 of the m protection channels need to be
assigned and signaled
High complexity but conserves resources

22
(11)n protection

This is like n times 11 but the n protection
channels share bandwidth
Only 1 failed working channel can be protected
This is different from 1n since
n protection channels are preconfigured
n working channels need not be of the same type
Protection bandwidth must be at least that of the
largest working channel

23
APS algorithm

We have seen that protection switching is a
tricky business
So it is not surprising that network elements
that support APS
run an APS algorithm
This algorithm inputs
configuration (protection type, revertive?,
available channels, )
failure indications (NR, SF, SD)
operator commands
APS signaling (more on that soon)
and makes switching decisions
The algorithm maintains state information for
head-end and tail-end
APS algorithms are detailed in standards documents

24
Priority

Not every failure event / operator command
results in a protection switch
For example
in 1n protection the protection channel may
already be in use !
Conflicts are resolved by assigning priorities to
events/commands
When an event is detected or a command received
the APS algorithm will not act
if an event/command or equal or higher priority
is already in effect
True failure conditions usually have higher
priority than manual commands

25
Timers

Even failure events with priority are not acted
upon immediately
to do so would cause unnecessary switches after
transient defects
The APS algorithm may maintains several timers,
such as
Holdoff timers
the time between detection of a SF or SD event
and the APS algorithm acting upon this even
the algorithm usually used is called peek twice
i.e., the condition is checked again after the
timer expires
Wait To Restore timer
for revertive switching, the time between
detection of the failure being cleared and the
APS algorithm acting upon this event
also used in SDH optimized bidirectional 11
(nonrevertive)
Guard timer
for rings blockout time during which APS
messages are ignored (since they may be old and
outdated)

26
APS signaling

In all types except unidirectional 11, some APS
signaling is needed
APS signaling is used to synchronize between
head-end and tail-end
It is critical that head-end and tail-end always
be in the same state
Example messages include
No Request (NR)
by tail-end to inform head-end of Signal Failure
(SF)
by head-end to confirm the events priority
by head-end to report the particular protection
channel
by head-end to inform tail-end of Reverse
(bidirectional) Request (RR)
by tail-end after failure cleared to Wait To
Restore (WTR)
by tail-end after failure cleared to Do Not
Revert (DNR) for nonrevertive

27
APS signaling phases

When APS signaling is used, it needs to be as
rapid as possible
Depending on the scenario it may be
1-phase tail?head (fastest)
tail-end informs head-end of failure
both ends uniquely know the protection channel to
be used
only for 11 and unidirectional-(11)n
(including 11)
2-phase 1) tail?head 2) head?tail
tail-end informs head-end of failure
head-end signals that it has switched to
protection channel
not for bidirectional-1n or mn
3-phase 1) tail?head 2) head?tail 3) tail?head
(slowest)
works for all protection types (including mn)

28
Examples of 1-phase

Example of when 1-phase signaling is possible is
11 or (11)n
1. upon detection of failure the tail-end sends
SF to the head-end
and immediately changes its selector (blind
switch)
upon receipt the head-end changes the bridge
setting
(no priority is checked)
1-phase can also be used for bidirectional 11
1. upon detection of failure the tail-end sends
SF to the head-end
and immediately changes both its selector and
bridge
upon receipt the head-end changes its bridge and
selector

29
Example of 2-phase

2-phase is useful for unidirectional 1n with
priority checking
1. upon detection of failure the tail-end sends
SF to the head-end
but does not change its selector
2. the head-end checks priority
sends confirmation to tail-end (with identity of
working channel)
the bridge setting is changed
3. the tail-end changes its selector

30
Example of 3-phase

3-phase signaling is imperative for bidirectional
1n
1. upon detection of failure the tail-end sends
SF to the head-end
but does not change its selector
2. the head-end checks priority, and sends
confirmation to tail-end
head-end changes its bridge setting
and also sends a reverse request
3. the tail-end changes selector
checks priority and sends confirmation to
head-end
tail-end changes its bridge setting (as head-end
of opposite direction)
head-end receives confirmation and changes its
selector

31
For G.805 buffs

to add 11 trail protection to a trail - expand a
trail termination function
we use a special transport processing function -
the protection switch

the unprotected TTs report status to the
protection switch
32
SONET/SDH APS
33
SONET protection ?

SONET/SDH networks need to be highly reliable
(five nines)
Down-time should be minimal (less than 50 msec)
So systems must repair themselves (no time for
manual intervention)
Upon detection of a failure (dLOS, dLOF, high
BER)
the network must reroute traffic (protection
switching)
from working channel to protection channel
SDH APS is unidirectional
SDH APS may be revertive

34
SONET/SDH layers

Between regenerators there are sections
(regenerator sections)
Between ADMs there are lines (multiplex sections)
Between path terminations there are paths
Protection can be at OC-n level (different
physical fibers)
or at STM/VC level
or end-to-end path (trail protection)

35
Line APS
90 columns
Synchronous Payload Envelope
3 rows
9 rows
9 rows
6 rows
TOH
A1 A2 J0
B1 E1 F1
D1 D2 D3
H1 H2 H3
B2 K1 K2
D4 D5 D6
D7 D8 D9
DA DB DC
S1 M0 E2

TOH consists of
3 rows of section overhead - frame sync, trace,
EOC,
6 rows of line overhead - pointers, SSM, FEBE,
and
Line APS signaling uses bytes K1 and K2

36
HO Path APS

POH is responsible for type, status, path
performance monitoring, VCAT, trace
HO Path APS signaling uses 4 MSBs of byte K3

37
LO Path APS

VC OH is responsible for
Timing, PM, REI,
LO Path APS signaling is
4 MSBs of byte K4

38
How does it work?

Head-end and tail-end NEs have bridges (muxes)
Head-end and tail-end NEs maintain bidirectional
signaling channel
Signaling is contained in K bytes of protection
channel
For line APS
K1 tail-end status and requests
K2 head-end status

39
Linear 11 protection

Can be at OC-n level (different physical fibers)
or at STM/VC level (SubNetwork Connection
Protection)
or end-to-end path (called trail protection)
Head-end bridge always sends data on both
channels
Tail-end chooses channel to use based on BER,
dLOS, etc.
No need for signaling
If non-revertive
there is no distinction between working and
protection channels

40
Linear 11 protection

Head-end bridge usually sends data on working
channel
When tail-end detects failure it signals (using
K1) to head-end
Head-end then starts sending data over protection
channel
When not in use
protection channel can be used for (discounted)
extra traffic
(pre-emptible unprotected traffic)
May be at any layer (but only OC-n level protects
against fiber cuts)

working channel
extra traffic
protection channel
41
Linear 1N protection

In order to save BW
we allocate 1 protection channel for every N
working channels
N limited to 14
4 bits in K1 byte from tail-end to head-end
0 protection channel
1-14 working channels
15 extra traffic channel

42
Two fiber vs. Four-fiber rings

Ring based protection is popular in North America
(100K rings)
Full protection against physical fiber cuts
Simpler and less expensive than mesh topologies
Protection at line (multiplexed section) or path
layer
Four-fiber rings
fully redundant at OC level
can support bidirectional routing at line layer
Two-fiber rings
support unidirectional routing at line layer

2 fibers in opposite directions
43
Unidirectional vs. bidirectional

Unidirectional routing
working channel B-A same direction (e.g.
clockwise) as A-B
management simplicity A-B and B-A can occupy
same timeslots
Inefficient waste in ring BW and excessive delay
in one direction
Bidirectional routing
A-B and B-1 are opposite in direction
both using shortest route
spatial reuse timeslots can be reused in other
sections

44
UPSR vs. BLSR (MS-SPRing)
Path switching Line switching
Two-fiber Four-fiber
Unidirectional Bidirectional
UPSR
BLSR

Of all the possible combinations, only a few are
in use
Unidirectional (routing) Path Switched Rings
protects tributaries
extension of 11 to ring topology
Bidirectional (routing) Line Switched Rings
(two-fiber and four-fiber versions)
called Multiplex Section Shared Protection Ring
in SDH
simultaneously protects all tributaries in STM
extension of 11 to ring topology

45
UPSR

Working channel is in one direction
protection channel in the opposite direction
All path traffic is added in both directions
(11)
decision as to which to use is made at drop point
(no signaling)
Normally non-revertive, so effectively two
diversity paths
Good match for access networks
1 access resilient ring
less expensive than fiber pair per customer
Inefficient for core networks
no spatial reuse
every signal in every span
in both directions
node needs to continuously monitor
every tributary to be dropped

46
BLSR

Switch at line level less monitoring
When failure detected tail-end NE signals
head-end NE
Works for unidirectional/bidirectional fiber
cuts, and NE failures
Two-fiber version
half of OC-N capacity devoted to protection
only half capacity available for traffic
Four-fiber version
full redundant OC-N devoted to protection
twice as many NEs as compared to two-fiber

Example recovery from unidirectional fiber cut
47
Ethernet linear APS

STP
LAG
G.8031

48
STP

The original Spanning Tree Protocol automatically
removed loops
from arbitrary networks (with loops)
However, its convergence was very slow (about a
minute)
STP can not be used as a protection mechanism
since its reconvergence time is very long
due to a cumbersome protocol
and long holdoff timer settings
An evolutionary update called Rapid STP 802.1w
was incorporated into 802.1D-2004 clause 17
that converges in about the same time as STP
but can reconverge after a topology change in
less than 1 second
RSTP can be used to detect failures and
reconverge
and thus can be used as a primitive protection
mechanism
However, the switching time will be many tens of
ms to 100s of ms

49
Use of LAG

Ethernet link aggregation (AKA bonding,
Ethernet trunk, inverse mux, NIC teaming)
enables bonding several ports together as single
uplink
Defined by 802.3ad task force and folded into
802.3-2000 as clause 43
Binding of ports to Link Aggregation Groups
(LAGs) distributed via
Link Aggregation Control Protocol (LACP)
LACP uses slow protocol frames (up to 5 per
second)
Links may be dynamically added/removed from LAG
and LACP continuously monitors to detect if
changes needed
Upon link failure LAG delivers traffic at a
reduced rate
Thus LAG can be used as a primitive protection
mechanism
When used this way it is called worker/standby or
NN mode
The restoration time will be on the order of 1
second

50
G.8031

Q9 of SG15 in the ITU-T is responsible for
protection switching
In 2006 it produced G.8031 Linear Ethernet
Protection Switching
G.8031 uses standard Ethernet formats, but is
incompatible with STP
The standard addresses
point-to-point VLAN connections
SNC (local) protection class
11 and 11 protection types
unidirectional and bidirectional switching for
11
bidirectional switching for 11
revertive and nonrevertive modes
1-phase signaling protocol
G.8031 uses Y.1731 OAM CCM messages in order to
detect failures
G.8031 defines a new OAM opcode (39) for APS
signaling messages
Switching times should be under 50 ms (only
holdoff timers when groups)

51
G.8031 signaling

The APS signaling message looks like this
regular APS messages are sent 1 per 5 seconds
after change 3 messages are sent at max rate (300
per sec)
where
req/state identifies the message (NR, SF, WTR,
SD, forced switch, etc)
prot. type identifies the protection type (11,
11, uni/bidirectional, etc.)
requested and bridged signal identify incoming /
outgoing traffic
since only 11 and 11 they are either null or
traffic (all other values reserved)

52
G.8031 11 revertive operation

In the normal (NR) state
head-end and tail-end exchange CCM (at 300 per
second rate)
on both working and protection channels
head-end and tail-end exchange NR APS messages
on the protection channel (every 5 seconds)
When a failure appears in the working channel
tail-end stops receiving 3 CCM messages on
working channel
tail-end enters SF state
tail-end sends 3 SF messages at 300 per second on
the APS channel
tail-end switches selector (bi-d and bridge) to
the protection channel
head-end (receiving SF) switches bridge (bi-d and
selector) to protection channel
tail-end continues sending SF messages every 5
seconds
head-end sends NR messages but with
bridgednormal
When the failure is cleared
tail-end leaves SF state and enters WTR state
(typically 5 minutes, 5..12 min)
tail-end sends WTR message to head-end (in
nonrevertive - DNR message)
tail-end sends WTR every 5 seconds
when WTR expires both sides enter NR state

53
Ethernet ring APS

G.8032
RPR
CLEER

54
Ethernet rings ?

Ethernet has become carrier grade
deterministic connection-oriented forwarding
OAM
synchronization
The only thing missing to completely replace SDH
is ring protection
However, Ethernet and ring architectures dont go
together
Ethernet has no TTL, so looped traffic will loop
forever
STP builds trees out of any architecture no
loops allowed
There are two ways to make an Ethernet ring
open loop
cut the ring by blocking some link
when protection is required - block the failed
link
closed loop
disable STP (but avoid infinite loops in some way
!)
when protection is required - steer and/or wrap
traffic

55
Ethernet ring protocols

Open loop methods
G.8032 (ERPS)
rSTP (ex 802.1w)
RFER (RAD)
ERP (NSN)
RRST (based on RSTP)
REP (Cisco)
RRSTP (Alcatel)
RRPP (Huawei)
EAPS (Extreme, RFC 3619)
EPSR (Allied Telesis)
PSR (Overture)
Closed loop methods
RPR (IEEE 802.17)
CLEER and NERT (RAD)

56
G.8032

Q9 of SG15 produced G.8032 between 2006 and 2008
G.8032 is similar to G.8031
strives for 50 ms protection (lt 1200 km, lt 16
nodes)
but here this number is deceiving as MAC table is
flushed
standard Ethernet format but incompatible with
STP
uses Y.1731 CCM for failure detection
employs Y.1731 extension for R-APS signaling
(opcode40)
R-APS message format similar to APS of G.8031
(but between every 2 nodes and to MAC address
01-19-A7-00-00-01)
revertive and nonrevertive operation defined
However, G.8032 is more complex due to
requirement to avoid loop creation under any
circumstances
need to localize failures
need to maintain consistency between all nodes on
ring
existence of a special node (RPL owner)

57
RPL

G.8032v1 defines the Ring Protection Link (RPL)
as the link to be blocked (to avoid closing the
loop) in NR state
One of the 2 nodes connected to the RPL
is designated the RPL owner
Unlike RFER
there is only one RPL owner
the RPL and owner are designated before setup
operation is usually revertive
All ring nodes are simultaneously in 1 of 2 modes
idle or protecting
in idle mode the RPL is blocked
in protecting mode the failed link is blocked and
RPL is unblocked
in revertive operation
once the failure is cleared the block link is
unblocked
and the RPL is blocked again

58
G.8032 revertive operation

In the idle state
adjacent nodes exchange CCM at 300 per second
rate (including over RPL)
exchange NR RB (RPL Blocked) messages in
dedicated VLAN every 5 seconds (but not over RPL)
R-APS messages are never forwarded
When a failure appears between 2 nodes
node(s) missing CCM messages peek twice with
holdoff time
node(s) block failed link and flush MAC table
node(s) send SF message (3 times _at_ max rate, then
every 5 sec)
node receiving SF message will check priority and
unblock any blocked link
node receiving SF message will send SF message to
its other neighbor
in stable protecting state SF messages over every
unblocked link
When the failure is cleared
node(s) detect CCM and start guard timer (blocks
acting on R-APS messages)
node(s) send NR messages to neighbors (3 times _at_
max rate, then every 5 sec)
RPL owner receiving NR starts WTR timer
when WTR expires RPL owner blocks RPL, flushes
table, and sends NR RB
node receiving NR RB flushes table, unblocks any
blocked ports, sends NR RB

59
G.8032-2010

After coming out with G.8032 in 2008 (G.8032v1)
the ITU came out with G.8032-2010 (G.8032v2) in
2010
This new version is not backwards-compatible with
v1
but a v2 node must support v1 as well (but then
operation is according to v1)
Major differences
2 designated nodes RPL owner node and RPL
neighbor node
and for optional flush-optimization
next neighbor node
significant changes to
state machine
priority logic
commands (forced/manual/clear) and protocol
new Wait To Block timer
supports more general topologies (sub-rings)
ladders (For Further Study in v1)
multi-ring
ring topology discovery
virtual channel based on VLAN or MAC address

ring
subring
subring
ladder
60
RPR 802.17

Resilient Packet Rings
are compatible with standard Ethernet, but
different frame format
are robust (lossless, lt50ms protection, OAM)
are fair (based on client throttling)
support QoS (3 classes A, B, C)
are efficient (full spatial reuse)
are plug and play (automatic station
autodiscovery)
extend use of existing fiber rings
counter-rotating add/drop ringlets, running
SONET/SDH (any rate, PoS, GFP or LAPS) or
packetPHY (1 or 10 Gb/s ETH PHY)
developed by 802.17 WG
based on Ciscos Spatial Reuse Protocol (RFC
2892)

ringlet selection
61
Basic RPR queuing
traffic going around ring placed into internal
buffer in dual-transit queue mode placed into 1
of 2 buffers according to service class sent
according to fairness
traffic for local sink placed in output buffer
according to service class

traffic from local source
sent according to fairness
first sent to ringlet selection

Primary/Secondary Transit Queue
62
RPR service classes

RPR defines 3 main classes
class A real time (low latency/FDV)
class B near real time (bounded predictable
latency/FDV)
class C best effort

class use info rate D/FDV FE
A0 RT reserved low No
A1 RT allocated, reclaimable low No
B-CIR near RT allocated, reclaimable bounded No
B-EIR near RT opportunistic unbounded Yes
C BE opportunistic unbounded Yes
63
RPR Class use

A0 ring BW is reserved not reclaimed even if no
traffic
in dual-transit queue mode
class A frames from the ring are queued in PTQ
class B, C in STQ
priority for egress
frames in PTQ
local class A frames
local class B (when no frames in PTQ)
frames in STQ
local class C (when no PTQ, STQ, local A or B)
Notes
class A have minimal delay
class B have higher priority than STQ transit
frames, so bounded delay/FDV
classes B and C share STQ, so once in ring have
similar delay

64
RPR - protection

rings give inherent protection against single
point of failure
RPR specifies 2 mechanisms
steering
wrapping (optional)
(implementations may also do wrapping then
steering)

steering info
wrap
65
NERT and CLEER

New Ethernet Ring Technology / Closed Loop
Encapsulated Ethernet Ring
Similar to RPR but uses real Ethernet format
NERT and CLEER distinguish between
ring nodes
switches connected to ring nodes
Traffic in ring is MAC-in-MAC encapsulated
External MACs are of ring node
Internal MACs are original
Unexpected external MACs discarded
External MACs learned as in 1ah
Ring nodes forward according to table
NERT floods, CLEER never floods
Protection switch only involves changing table
so service restoration is fast

66
MPLS fast reroute

IP FRR
RFC 4090

67
IP FRR

True protection mechanisms do not exist for
connectionless IP
In practice, routing protocols discover breaks
and recalculate routes
but this usually takes a long time
Link-state IGPs detect link-down state using
hellos
for OSPF - typically every 10 sec, and detection
after 40 sec
and then Dijkstra algorithm avoids the failed
link
BFD can be used to speed up the detection
However,
the information still has to be propagated
further (seconds?)
and FIBs updated (100s of ms)
Various IP Fast ReRoute (IP FRR) mechanisms have
been proposed
but true protection is best done at the MPLS
level

68
MPLS fast reroute

RSVP-TE enables MPLS traffic engineering by fine
control over placement
specifies explicit path using information
gathered from IGP
resources may be reserved at LSRs along the way
RFC 4090 defines extensions to RSVP-TE Fast
ReRoute (FRR)
LSRs along the path preconfigure local bypasses
(detours)
Upon detection of failure by
BFD (specified in microseconds, typically 10s of
ms) or
RSVP hellos (RFC default is 5 ms) or
RESV / PATH messages (driven by IGP)
upstream LSR simply enables the detour
Since this is a local action, it should be fast
RFC 4090 only discusses adding FRR to RSVP-TE
network
but its use with LDP is possible if there is a
single label generator

not discussed in RFC 4090
69
PLRs and MPs

A fundamental entities in MPLS FRR are
Point of Local Repair (PLR)
Merge Point (MP)
A PLR is the LSR before the failed element (link
or node)
All LSRs except the egress LER can be PLRs
The PLR is solely responsible for the FRR (no
explicit APS signaling)
During path setup, potential PLRs create detours
towards the egress LER
A MP is the LSR where the detour rejoins the LSP
All LSRs except the ingress LER can be MPs

70
Methods

RFC 4090 defines two different protection methods
Usually one or the other is employed in a given
network
One-to-one backup
each LSP protected separately
detour LSP created for each LSP at each potential
PLR
no labels pushed
Facility backup
backup tunnel for multiple LSPs
bypass tunnel created at each potential PLR
uses label stacking

71
NHOP and NNHOP

MPLS FRR can bypass a failed link or a failed
node
In order to bypass a single failed link
we need an alternative path to the next hop
(NHOP)
In order to bypass a single failed node, we need
an alternative path to the next next hop (NNHOP)

72
MPLS TP APS

RFC 6372 (MPLS-TP Survivability Framework)
RFC 6378 (MPLS-TP Linear Protection)
draft-ietf-mpls-tp-ring-protection

73
MPLS-TP resilience

Since it strives to be a carrier-grade transport
network
TP has strong protection switching requirements
APS has been almost as contentious issue as OAM
and indeed the arguments are inter-related
RFC 6372 gives a general framework
and differentiates between
linear
shared-mesh and
ring protection

74
Linear protection

from RFC 6378 (ex draft-ietf-mpls-tp-linear-protec
tion)
11, 11, 1n and uni/bidi are supported
APS signaling protocol (for all modes except 11
uni)
is single-phase
and called the Protection State Coordination
protocol
PSC messages are sent over the protection
channel
APS messages are sent over the GACh with a
single channel type
message functions identified by a request field
6 states normal, protecting due to failure,
admin protecting,
WTR, protection path
unavailable, DNR
when revertive, a WTR timer is used

75
PSC message format

Request NR, SF, SD, manual switch, forced
switch, lockout, WTR, DNR
PT Protection Type uni 11, bidi 11, bidi
11/1n
R Revertive
FPath which path has fault Path which data
path is on protection channel

76
PSC control logic states

Normal state - no trigger events reported
Unavailable state - protection path is
unavailable
Protecting failure state
traffic is being transported on the protection
path
Protecting administrative state
operator issued command switching traffic to
protection path
Wait-to-Restore state - recovering from working
path SF/SD
WTR timer not up
Do-not-Revert state - recovered from a protecting
state
but operator has configured DNR

77
PSC local requests

In order from highest to lowest priority
1. Clear (operator command)
2. Lockout of protection (operator command)
3. Forced Switch (operator command)
4. Signal Fail on protection (OAM / control-plane
/ server indication)
5. Signal Fail on working (OAM / control-plane /
server indication)
6. Signal Degrade on working (OAM / control-plane
/ server indication)
7. Clear Signal Fail/Degrade (OAM / control-plane
/ server indication)
8. Manual Switch (operator command)
9. WTR Expires (WTR timer)
10. No Request (default)

78
Linear protection ITU style
from draft-zulr-mpls-tp-linear-protection-switchin
g Similar to previous, but uses Y.1731/G.8031
format (no surprise!)
79
Ring protection
once again there were two drafts, both supporting
p2p and p2mp, wrapping and steering, link/node
failures draft-ietf-mpls-tp-ring-protection (not
yet RFC) Between any 2 LSRs can define a Sub-Path
Maintenance Entity So between 2 LSRs on a ring
there are 2 SPMEs we define 1 as the
working channel and 1 as the protection
channel Now we re-use the linear protection
mechanisms, including the PSC protocol draft-helvo
ort-mpls-tp-ring-protection-switching Both
counter-rotating rings carry working and
protection traffic The bandwidth on each ring is
divided X BW is dedicated to working traffic
and Y dedicated to protection traffic The
protection bandwidth of one ring is used to
protect the other ring Each node should have
information about the sequence of ring nodes
MPLS-TP Ring Protection Switching is
G.8032-like, but forwards non-NR msgs

Write a Comment

User Comments (0)

Automatic Protection Switching PowerPoint PPT Presentation