Title: Automatic Protection Switching
1AutomaticProtection Switching
- Yaakov (J) Stein
- CTO
- RAD Data Communications
Mar 2012
2- Course Outline
- General protection switching principles
- Examples of protection mechanisms
- SONET/SDH
- Ethernet linear protection
- Ethernet ring protection
- MPLS fast reroute
- MPLS-TP APS
3General principles
Definition References Traffic types Network
topologies Triggers Protection classes Entities Pr
otection types Signaling
4Definition
- Automatic Protection Switching (APS)
- is a functionality of carrier-grade transport
networks - is often called resilience
- since it enables service to quickly recover
from failures - is required to ensure high reliability and
availability - APS includes
- detection of failures (signal fail or signal
degrade) on a working channel - switching traffic transmission to a protection
channel - selecting traffic reception from the protection
channel - (optionally) reverting back to the working
channel once failure is repaired - Automatic means uses (at most) control plane
protocols - no management layer or manual operations
needed
5Some useful references
- G.808.1 generic linear protection
- G.808.2 generic ring protection (not yet
written) - G.841 and G.842 SDH
- G.774.3/4/9/10 SDH protection management
- G.870 and G.873.1 OTN
- G.8031 Ethernet linear protection
- G.8032 Ethernet ring protection
- G.8131 T-MPLS APS
- Y.1720 MPLS
- I.630 ATM
- M.495 analog signal protection
- G.781 clock selection (can be used to protect
synchronization) - RFC 4090 MPLS Fast ReRoute
- RFC 6372 MPLS-TP Survivability Framework
- RFC 6378 MPLS-TP Linear Protection
6Traffic types
- In a network with APS capabilities, there are
three types of traffic - protected traffic
- traffic that may be rapidly switched to
protection channel - at any time it may be on the working channel or
protection channel - Nonpreemptible Unprotected Traffic (NUT)
- noncritical traffic that does not require
protection mechanism - not affected by protection mechanism
- somewhat less expensive to customer
- extra (preemptible) traffic
- best effort background traffic that runs on
protection channel - preempted (blocked) when protection channel is
needed - very inexpensive to customer
7Network topologies
- APS can be defined for any topology with
redundant links - e.g., for tree topologies no protection is
possible - We will often discuss protection of individual
links - However, there are two topologies that are of
particular interest - rings
- protection is natural for rings
- although there are other reasons for using rings
as well - rings are so important that protection for other
topologies - is often called linear protection
- dense meshes
- for this topology multiple local bypasses can be
preconfigured - protection switching is similar to routing
change, but faster - often called Fast ReRoute (FRR)
8Triggers
- Protection switching is usually triggered by a
failure - although the operator may manually force a
protection switch - A failure is declared when a fault condition
- persists long enough
- for the ability to perform the required function
- to be considered terminated
- Failures are Signal Fail (SF) or Signal Degrade
(SD) (of various types) - and may be
- detected by physical layer
- indicated by signaling (e.g. AIS)
- detected by OAM mechanisms
- When there is no SF or SD, the state is called No
Request (NR)
9Switching time (1)
- SONET/SDH protection switching takes place in
under 50 ms - Regarding multiplex section shared protection
rings, G.841 states - The following network objectives apply
- 1) Switch time In a ring with no extra traffic,
all nodes in the idle state (no detected
failures, - no active automatic or external commands, and
receiving only Idle K-bytes), and with less - than 1200 km of fibre, the switch (ring and span)
completion time for a failure on a single - span shall be less than 50 ms. On rings under all
other conditions, the switch completion - time can exceed 50 ms (the specific interval is
under study) to allow time to remove extra - traffic, or to negotiate and accommodate
coexisting APS requests. - while for linear VC trail protection, it says
- The following network objectives apply
- 1) Switch time The APS algorithm for LO/HO VC
trail protection shall operate as fast as - possible. A value of 50 ms has been proposed as a
target time. Concerns have been - expressed over this proposed target time when
many VCs are involved. This is for further - study. Protection switch completion time excludes
the detection time necessary to initiate the - protection switch, and the hold-off time.
- There are similar statements in other clauses as
well
10Switching time (2)
- This 50 ms time has become the golden standard
- and new protection schemes are expected to meet
this objective - However, studying the literature that lead up to
SONET/SDH standards - shows that the objective was to attain the
minimum possible time - for the sum of
- persistent (i.e. non-transient) failure detection
- speed of light propagation
- signaling protocol time
- regaining sync alignment
- and 50 ms was the minimum that was considered
practical ! - Many modern standards have built in 50 ms
- and much marketing literature boasts faster
than 50 ms - But there is really nothing special about 50 ms
- 50 ms gaps in voiced speech are noticeable,
- but not fatal if infrequent
- 50 ms of data at high rates can not be stored and
later forwarded - timing circuits can withstand much more than 50
ms without clock
11Protection classes
- It is useful to distinguish two different
protection classes - path protection (AKA trail protection, end-to-end
protection) - when a failure is detected on the end-to-end path
- we switch to an alternative end-to-end path
- the failure is usually detected by end-to-end OAM
- local protection (AKA local restoration, SNC
protection, bypass, detour) - we protect individual network elements, links, or
groups of same - when such an entity fails
- only that local entity is bypassed
- the failure may be detected by link OAM or
physical layer means
12APS entities (1)
- The following entities are important in APS
- working channel channel used when no failure
exists - protection channel channel used when a failure
exists - head-end entity transmitting data to
working/protection channel - tail-end entity receiving data from the
working/protection channel - Note we will usually consider traffic to be
bidirectional - so that the head-end for one direction
- is the tail-end for the opposite direction
13APS entities (2)
- Bridge function at head-end that connects
traffic (including extra traffic) to the working
and protection channels - Selector function at tail-end that extracts
traffic (perhaps extra traffic) from the working
or protection channel - APS signaling channel channel used to
communicate between head-end and tail-end for APS
purposes - Trail termination function responsible for
failure detection - including injection and extraction of OAM
working channel
tail-end (selector)
head-end (bridge)
protection channel
signaling channel
14Revertive operation
- Reversion means returning to use the working
channel - after the failure has been rectified
- Protection mechanisms can be revertive or
nonrevertive - Revertive mechanisms may be preferable
- when the working channel has better performance
(free BW, BER, delay) - when there are frequent switches (easier to
manage) - when there is extra traffic
- but nonrevertive also has advantages
- only one service disruption due to protection
switching - may be simpler to implement
-
15Uni/bi-directional
- We will usually consider bidirectional traffic
- but even then the failures can be uni- or bi-
directional - and for unidirectional failures there can be uni-
or bi- directional switching
16Uni- / bi- directional switching
- Unidirectional switching may be advantageous
- for 11 - faster and no signaling channel is
needed - no unnecessary service disruption for direction
without failure - higher chance of protection under multiple
failures - easier to implement for local protection
- maintains extra traffic in direction without
failure - But bidirectional may be preferable
- easier management since directions traverse same
network elements - does not disrupt delay balance between direction
- may simplify repair since failed spans are unused
17Protection types
- We distinguish several different protection types
- 11
- 11
- 1n
- mn
- (11)n
- Each type has its applicability, advantages, and
disadvantages - and there are trade-offs between
- simplicity
- BW consumption
- protection switch time
- signaling requirements
1811 protection
- Simplest and fastest form of protection
- but wasteful - only 50 of actual physical
capacity is used - Head-end bridge always sends data on both
channels - Tail-end selector chooses channel to use (based
on BER, dLOS, etc.) - For unidirectional11 switching there is no need
for APS signaling - If non-revertive
- there is no distinction between working and
protection channels
1911 protection
- Head-end bridge usually sends data on working
channel - When failure detected it starts sending data over
protection channel - and tail-end needs to select the protection
channel - When not in use, protection channel can be used
for extra traffic - However, since failure is detected by tail-end,
APS signaling is needed - Protection channel should have OAM running to
ensure its functionality
201n protection
- One protection channel is allocated for n working
channels - Only can protect one working channel at a time
- but improbable that more than 1 working channel
will simultaneously fail - Only 1/(n1) of total capacity is reserved for
protection
21mn protection
- To enable protection of more than 1 channel
- m protection channels are allocated for n working
channels (m lt n) - m simultaneous failures can be protected
- Less protection capacity dedicated than for n
times 11 - When failure detected,
- 1 of the m protection channels need to be
assigned and signaled - High complexity but conserves resources
22(11)n protection
- This is like n times 11 but the n protection
channels share bandwidth - Only 1 failed working channel can be protected
- This is different from 1n since
- n protection channels are preconfigured
- n working channels need not be of the same type
- Protection bandwidth must be at least that of the
largest working channel
23APS algorithm
- We have seen that protection switching is a
tricky business - So it is not surprising that network elements
that support APS - run an APS algorithm
- This algorithm inputs
- configuration (protection type, revertive?,
available channels, ) - failure indications (NR, SF, SD)
- operator commands
- APS signaling (more on that soon)
- and makes switching decisions
- The algorithm maintains state information for
head-end and tail-end - APS algorithms are detailed in standards documents
24Priority
- Not every failure event / operator command
results in a protection switch - For example
- in 1n protection the protection channel may
already be in use ! - Conflicts are resolved by assigning priorities to
events/commands - When an event is detected or a command received
- the APS algorithm will not act
- if an event/command or equal or higher priority
is already in effect - True failure conditions usually have higher
priority than manual commands
25Timers
- Even failure events with priority are not acted
upon immediately - to do so would cause unnecessary switches after
transient defects - The APS algorithm may maintains several timers,
such as - Holdoff timers
- the time between detection of a SF or SD event
- and the APS algorithm acting upon this even
- the algorithm usually used is called peek twice
- i.e., the condition is checked again after the
timer expires - Wait To Restore timer
- for revertive switching, the time between
detection of the failure being cleared and the
APS algorithm acting upon this event - also used in SDH optimized bidirectional 11
(nonrevertive) - Guard timer
- for rings blockout time during which APS
messages are ignored (since they may be old and
outdated)
26APS signaling
- In all types except unidirectional 11, some APS
signaling is needed - APS signaling is used to synchronize between
head-end and tail-end - It is critical that head-end and tail-end always
be in the same state - Example messages include
- No Request (NR)
- by tail-end to inform head-end of Signal Failure
(SF) - by head-end to confirm the events priority
- by head-end to report the particular protection
channel - by head-end to inform tail-end of Reverse
(bidirectional) Request (RR) - by tail-end after failure cleared to Wait To
Restore (WTR) - by tail-end after failure cleared to Do Not
Revert (DNR) for nonrevertive
27APS signaling phases
- When APS signaling is used, it needs to be as
rapid as possible - Depending on the scenario it may be
- 1-phase tail?head (fastest)
- tail-end informs head-end of failure
- both ends uniquely know the protection channel to
be used - only for 11 and unidirectional-(11)n
(including 11) - 2-phase 1) tail?head 2) head?tail
- tail-end informs head-end of failure
- head-end signals that it has switched to
protection channel - not for bidirectional-1n or mn
- 3-phase 1) tail?head 2) head?tail 3) tail?head
(slowest) - works for all protection types (including mn)
28Examples of 1-phase
- Example of when 1-phase signaling is possible is
11 or (11)n - 1. upon detection of failure the tail-end sends
SF to the head-end - and immediately changes its selector (blind
switch) - upon receipt the head-end changes the bridge
setting - (no priority is checked)
- 1-phase can also be used for bidirectional 11
- 1. upon detection of failure the tail-end sends
SF to the head-end - and immediately changes both its selector and
bridge - upon receipt the head-end changes its bridge and
selector
29Example of 2-phase
- 2-phase is useful for unidirectional 1n with
priority checking - 1. upon detection of failure the tail-end sends
SF to the head-end - but does not change its selector
- 2. the head-end checks priority
- sends confirmation to tail-end (with identity of
working channel) - the bridge setting is changed
- 3. the tail-end changes its selector
30Example of 3-phase
- 3-phase signaling is imperative for bidirectional
1n - 1. upon detection of failure the tail-end sends
SF to the head-end - but does not change its selector
- 2. the head-end checks priority, and sends
confirmation to tail-end - head-end changes its bridge setting
- and also sends a reverse request
- 3. the tail-end changes selector
- checks priority and sends confirmation to
head-end - tail-end changes its bridge setting (as head-end
of opposite direction) - head-end receives confirmation and changes its
selector -
31For G.805 buffs
- to add 11 trail protection to a trail - expand a
trail termination function - we use a special transport processing function -
the protection switch
the unprotected TTs report status to the
protection switch
32SONET/SDH APS
33SONET protection ?
- SONET/SDH networks need to be highly reliable
(five nines) - Down-time should be minimal (less than 50 msec)
- So systems must repair themselves (no time for
manual intervention) - Upon detection of a failure (dLOS, dLOF, high
BER) - the network must reroute traffic (protection
switching) - from working channel to protection channel
- SDH APS is unidirectional
- SDH APS may be revertive
34SONET/SDH layers
- Between regenerators there are sections
(regenerator sections) - Between ADMs there are lines (multiplex sections)
- Between path terminations there are paths
- Protection can be at OC-n level (different
physical fibers) - or at STM/VC level
- or end-to-end path (trail protection)
35Line APS
90 columns
Synchronous Payload Envelope
3 rows
9 rows
9 rows
6 rows
TOH
A1 A2 J0
B1 E1 F1
D1 D2 D3
H1 H2 H3
B2 K1 K2
D4 D5 D6
D7 D8 D9
DA DB DC
S1 M0 E2
- TOH consists of
- 3 rows of section overhead - frame sync, trace,
EOC, - 6 rows of line overhead - pointers, SSM, FEBE,
and - Line APS signaling uses bytes K1 and K2
36HO Path APS
- POH is responsible for type, status, path
performance monitoring, VCAT, trace - HO Path APS signaling uses 4 MSBs of byte K3
37LO Path APS
- VC OH is responsible for
- Timing, PM, REI,
- LO Path APS signaling is
- 4 MSBs of byte K4
38How does it work?
- Head-end and tail-end NEs have bridges (muxes)
- Head-end and tail-end NEs maintain bidirectional
signaling channel - Signaling is contained in K bytes of protection
channel - For line APS
- K1 tail-end status and requests
- K2 head-end status
39Linear 11 protection
- Can be at OC-n level (different physical fibers)
- or at STM/VC level (SubNetwork Connection
Protection) - or end-to-end path (called trail protection)
- Head-end bridge always sends data on both
channels - Tail-end chooses channel to use based on BER,
dLOS, etc. - No need for signaling
- If non-revertive
- there is no distinction between working and
protection channels
40Linear 11 protection
- Head-end bridge usually sends data on working
channel - When tail-end detects failure it signals (using
K1) to head-end - Head-end then starts sending data over protection
channel - When not in use
- protection channel can be used for (discounted)
extra traffic - (pre-emptible unprotected traffic)
- May be at any layer (but only OC-n level protects
against fiber cuts)
working channel
extra traffic
protection channel
41Linear 1N protection
- In order to save BW
- we allocate 1 protection channel for every N
working channels - N limited to 14
- 4 bits in K1 byte from tail-end to head-end
- 0 protection channel
- 1-14 working channels
- 15 extra traffic channel
42Two fiber vs. Four-fiber rings
- Ring based protection is popular in North America
(100K rings) - Full protection against physical fiber cuts
- Simpler and less expensive than mesh topologies
- Protection at line (multiplexed section) or path
layer - Four-fiber rings
- fully redundant at OC level
- can support bidirectional routing at line layer
- Two-fiber rings
- support unidirectional routing at line layer
2 fibers in opposite directions
43Unidirectional vs. bidirectional
- Unidirectional routing
- working channel B-A same direction (e.g.
clockwise) as A-B - management simplicity A-B and B-A can occupy
same timeslots - Inefficient waste in ring BW and excessive delay
in one direction - Bidirectional routing
- A-B and B-1 are opposite in direction
- both using shortest route
- spatial reuse timeslots can be reused in other
sections -
44UPSR vs. BLSR (MS-SPRing)
Path switching Line switching
Two-fiber Four-fiber
Unidirectional Bidirectional
UPSR
BLSR
- Of all the possible combinations, only a few are
in use - Unidirectional (routing) Path Switched Rings
- protects tributaries
- extension of 11 to ring topology
- Bidirectional (routing) Line Switched Rings
(two-fiber and four-fiber versions) - called Multiplex Section Shared Protection Ring
in SDH - simultaneously protects all tributaries in STM
- extension of 11 to ring topology
45UPSR
- Working channel is in one direction
- protection channel in the opposite direction
- All path traffic is added in both directions
(11) - decision as to which to use is made at drop point
(no signaling) - Normally non-revertive, so effectively two
diversity paths - Good match for access networks
- 1 access resilient ring
- less expensive than fiber pair per customer
- Inefficient for core networks
- no spatial reuse
- every signal in every span
- in both directions
- node needs to continuously monitor
- every tributary to be dropped
46BLSR
- Switch at line level less monitoring
- When failure detected tail-end NE signals
head-end NE - Works for unidirectional/bidirectional fiber
cuts, and NE failures - Two-fiber version
- half of OC-N capacity devoted to protection
- only half capacity available for traffic
- Four-fiber version
- full redundant OC-N devoted to protection
- twice as many NEs as compared to two-fiber
Example recovery from unidirectional fiber cut
47Ethernet linear APS
48STP
- The original Spanning Tree Protocol automatically
removed loops - from arbitrary networks (with loops)
- However, its convergence was very slow (about a
minute) - STP can not be used as a protection mechanism
- since its reconvergence time is very long
- due to a cumbersome protocol
- and long holdoff timer settings
- An evolutionary update called Rapid STP 802.1w
- was incorporated into 802.1D-2004 clause 17
- that converges in about the same time as STP
- but can reconverge after a topology change in
less than 1 second - RSTP can be used to detect failures and
reconverge - and thus can be used as a primitive protection
mechanism - However, the switching time will be many tens of
ms to 100s of ms
49Use of LAG
- Ethernet link aggregation (AKA bonding,
Ethernet trunk, inverse mux, NIC teaming) - enables bonding several ports together as single
uplink - Defined by 802.3ad task force and folded into
802.3-2000 as clause 43 - Binding of ports to Link Aggregation Groups
(LAGs) distributed via - Link Aggregation Control Protocol (LACP)
- LACP uses slow protocol frames (up to 5 per
second) - Links may be dynamically added/removed from LAG
- and LACP continuously monitors to detect if
changes needed - Upon link failure LAG delivers traffic at a
reduced rate - Thus LAG can be used as a primitive protection
mechanism - When used this way it is called worker/standby or
NN mode - The restoration time will be on the order of 1
second
50G.8031
- Q9 of SG15 in the ITU-T is responsible for
protection switching - In 2006 it produced G.8031 Linear Ethernet
Protection Switching - G.8031 uses standard Ethernet formats, but is
incompatible with STP - The standard addresses
- point-to-point VLAN connections
- SNC (local) protection class
- 11 and 11 protection types
- unidirectional and bidirectional switching for
11 - bidirectional switching for 11
- revertive and nonrevertive modes
- 1-phase signaling protocol
- G.8031 uses Y.1731 OAM CCM messages in order to
detect failures - G.8031 defines a new OAM opcode (39) for APS
signaling messages - Switching times should be under 50 ms (only
holdoff timers when groups)
51G.8031 signaling
- The APS signaling message looks like this
- regular APS messages are sent 1 per 5 seconds
- after change 3 messages are sent at max rate (300
per sec) - where
- req/state identifies the message (NR, SF, WTR,
SD, forced switch, etc) - prot. type identifies the protection type (11,
11, uni/bidirectional, etc.) - requested and bridged signal identify incoming /
outgoing traffic - since only 11 and 11 they are either null or
traffic (all other values reserved)
52G.8031 11 revertive operation
- In the normal (NR) state
- head-end and tail-end exchange CCM (at 300 per
second rate) - on both working and protection channels
- head-end and tail-end exchange NR APS messages
- on the protection channel (every 5 seconds)
- When a failure appears in the working channel
- tail-end stops receiving 3 CCM messages on
working channel - tail-end enters SF state
- tail-end sends 3 SF messages at 300 per second on
the APS channel - tail-end switches selector (bi-d and bridge) to
the protection channel - head-end (receiving SF) switches bridge (bi-d and
selector) to protection channel - tail-end continues sending SF messages every 5
seconds - head-end sends NR messages but with
bridgednormal - When the failure is cleared
- tail-end leaves SF state and enters WTR state
(typically 5 minutes, 5..12 min) - tail-end sends WTR message to head-end (in
nonrevertive - DNR message) - tail-end sends WTR every 5 seconds
- when WTR expires both sides enter NR state
53Ethernet ring APS
54Ethernet rings ?
- Ethernet has become carrier grade
- deterministic connection-oriented forwarding
- OAM
- synchronization
- The only thing missing to completely replace SDH
is ring protection - However, Ethernet and ring architectures dont go
together - Ethernet has no TTL, so looped traffic will loop
forever - STP builds trees out of any architecture no
loops allowed - There are two ways to make an Ethernet ring
- open loop
- cut the ring by blocking some link
- when protection is required - block the failed
link - closed loop
- disable STP (but avoid infinite loops in some way
!) - when protection is required - steer and/or wrap
traffic
55Ethernet ring protocols
- Open loop methods
- G.8032 (ERPS)
- rSTP (ex 802.1w)
- RFER (RAD)
- ERP (NSN)
- RRST (based on RSTP)
- REP (Cisco)
- RRSTP (Alcatel)
- RRPP (Huawei)
- EAPS (Extreme, RFC 3619)
- EPSR (Allied Telesis)
- PSR (Overture)
- Closed loop methods
- RPR (IEEE 802.17)
- CLEER and NERT (RAD)
56G.8032
- Q9 of SG15 produced G.8032 between 2006 and 2008
- G.8032 is similar to G.8031
- strives for 50 ms protection (lt 1200 km, lt 16
nodes) - but here this number is deceiving as MAC table is
flushed - standard Ethernet format but incompatible with
STP - uses Y.1731 CCM for failure detection
- employs Y.1731 extension for R-APS signaling
(opcode40) - R-APS message format similar to APS of G.8031
- (but between every 2 nodes and to MAC address
01-19-A7-00-00-01) - revertive and nonrevertive operation defined
- However, G.8032 is more complex due to
- requirement to avoid loop creation under any
circumstances - need to localize failures
- need to maintain consistency between all nodes on
ring - existence of a special node (RPL owner)
57RPL
- G.8032v1 defines the Ring Protection Link (RPL)
- as the link to be blocked (to avoid closing the
loop) in NR state - One of the 2 nodes connected to the RPL
- is designated the RPL owner
- Unlike RFER
- there is only one RPL owner
- the RPL and owner are designated before setup
- operation is usually revertive
- All ring nodes are simultaneously in 1 of 2 modes
idle or protecting - in idle mode the RPL is blocked
- in protecting mode the failed link is blocked and
RPL is unblocked - in revertive operation
- once the failure is cleared the block link is
unblocked - and the RPL is blocked again
58G.8032 revertive operation
- In the idle state
- adjacent nodes exchange CCM at 300 per second
rate (including over RPL) - exchange NR RB (RPL Blocked) messages in
dedicated VLAN every 5 seconds (but not over RPL) - R-APS messages are never forwarded
- When a failure appears between 2 nodes
- node(s) missing CCM messages peek twice with
holdoff time - node(s) block failed link and flush MAC table
- node(s) send SF message (3 times _at_ max rate, then
every 5 sec) - node receiving SF message will check priority and
unblock any blocked link - node receiving SF message will send SF message to
its other neighbor - in stable protecting state SF messages over every
unblocked link - When the failure is cleared
- node(s) detect CCM and start guard timer (blocks
acting on R-APS messages) - node(s) send NR messages to neighbors (3 times _at_
max rate, then every 5 sec) - RPL owner receiving NR starts WTR timer
- when WTR expires RPL owner blocks RPL, flushes
table, and sends NR RB - node receiving NR RB flushes table, unblocks any
blocked ports, sends NR RB
59G.8032-2010
- After coming out with G.8032 in 2008 (G.8032v1)
- the ITU came out with G.8032-2010 (G.8032v2) in
2010 - This new version is not backwards-compatible with
v1 - but a v2 node must support v1 as well (but then
operation is according to v1) - Major differences
- 2 designated nodes RPL owner node and RPL
neighbor node - and for optional flush-optimization
next neighbor node - significant changes to
- state machine
- priority logic
- commands (forced/manual/clear) and protocol
- new Wait To Block timer
- supports more general topologies (sub-rings)
- ladders (For Further Study in v1)
- multi-ring
- ring topology discovery
- virtual channel based on VLAN or MAC address
ring
subring
subring
ladder
60RPR 802.17
- Resilient Packet Rings
- are compatible with standard Ethernet, but
different frame format - are robust (lossless, lt50ms protection, OAM)
- are fair (based on client throttling)
- support QoS (3 classes A, B, C)
- are efficient (full spatial reuse)
- are plug and play (automatic station
autodiscovery) - extend use of existing fiber rings
- counter-rotating add/drop ringlets, running
- SONET/SDH (any rate, PoS, GFP or LAPS) or
- packetPHY (1 or 10 Gb/s ETH PHY)
- developed by 802.17 WG
- based on Ciscos Spatial Reuse Protocol (RFC
2892)
ringlet selection
61Basic RPR queuing
traffic going around ring placed into internal
buffer in dual-transit queue mode placed into 1
of 2 buffers according to service class sent
according to fairness
traffic for local sink placed in output buffer
according to service class
- traffic from local source
- sent according to fairness
- first sent to ringlet selection
Primary/Secondary Transit Queue
62RPR service classes
- RPR defines 3 main classes
- class A real time (low latency/FDV)
- class B near real time (bounded predictable
latency/FDV) - class C best effort
class use info rate D/FDV FE
A0 RT reserved low No
A1 RT allocated, reclaimable low No
B-CIR near RT allocated, reclaimable bounded No
B-EIR near RT opportunistic unbounded Yes
C BE opportunistic unbounded Yes
63RPR Class use
- A0 ring BW is reserved not reclaimed even if no
traffic - in dual-transit queue mode
- class A frames from the ring are queued in PTQ
- class B, C in STQ
- priority for egress
- frames in PTQ
- local class A frames
- local class B (when no frames in PTQ)
- frames in STQ
- local class C (when no PTQ, STQ, local A or B)
- Notes
- class A have minimal delay
- class B have higher priority than STQ transit
frames, so bounded delay/FDV - classes B and C share STQ, so once in ring have
similar delay
64RPR - protection
- rings give inherent protection against single
point of failure - RPR specifies 2 mechanisms
- steering
- wrapping (optional)
- (implementations may also do wrapping then
steering)
steering info
wrap
65NERT and CLEER
- New Ethernet Ring Technology / Closed Loop
Encapsulated Ethernet Ring - Similar to RPR but uses real Ethernet format
- NERT and CLEER distinguish between
- ring nodes
- switches connected to ring nodes
- Traffic in ring is MAC-in-MAC encapsulated
- External MACs are of ring node
- Internal MACs are original
- Unexpected external MACs discarded
- External MACs learned as in 1ah
- Ring nodes forward according to table
- NERT floods, CLEER never floods
- Protection switch only involves changing table
- so service restoration is fast
66MPLS fast reroute
67IP FRR
- True protection mechanisms do not exist for
connectionless IP - In practice, routing protocols discover breaks
and recalculate routes - but this usually takes a long time
- Link-state IGPs detect link-down state using
hellos - for OSPF - typically every 10 sec, and detection
after 40 sec - and then Dijkstra algorithm avoids the failed
link - BFD can be used to speed up the detection
- However,
- the information still has to be propagated
further (seconds?) - and FIBs updated (100s of ms)
- Various IP Fast ReRoute (IP FRR) mechanisms have
been proposed - but true protection is best done at the MPLS
level -
68MPLS fast reroute
- RSVP-TE enables MPLS traffic engineering by fine
control over placement - specifies explicit path using information
gathered from IGP - resources may be reserved at LSRs along the way
- RFC 4090 defines extensions to RSVP-TE Fast
ReRoute (FRR) - LSRs along the path preconfigure local bypasses
(detours) - Upon detection of failure by
- BFD (specified in microseconds, typically 10s of
ms) or - RSVP hellos (RFC default is 5 ms) or
- RESV / PATH messages (driven by IGP)
- upstream LSR simply enables the detour
- Since this is a local action, it should be fast
- RFC 4090 only discusses adding FRR to RSVP-TE
network - but its use with LDP is possible if there is a
single label generator
not discussed in RFC 4090
69PLRs and MPs
- A fundamental entities in MPLS FRR are
- Point of Local Repair (PLR)
- Merge Point (MP)
- A PLR is the LSR before the failed element (link
or node) - All LSRs except the egress LER can be PLRs
- The PLR is solely responsible for the FRR (no
explicit APS signaling) - During path setup, potential PLRs create detours
towards the egress LER - A MP is the LSR where the detour rejoins the LSP
- All LSRs except the ingress LER can be MPs
70Methods
- RFC 4090 defines two different protection methods
- Usually one or the other is employed in a given
network - One-to-one backup
- each LSP protected separately
- detour LSP created for each LSP at each potential
PLR - no labels pushed
- Facility backup
- backup tunnel for multiple LSPs
- bypass tunnel created at each potential PLR
- uses label stacking
71NHOP and NNHOP
- MPLS FRR can bypass a failed link or a failed
node - In order to bypass a single failed link
- we need an alternative path to the next hop
(NHOP) - In order to bypass a single failed node, we need
an alternative path to the next next hop (NNHOP)
72MPLS TP APS
- RFC 6372 (MPLS-TP Survivability Framework)
- RFC 6378 (MPLS-TP Linear Protection)
- draft-ietf-mpls-tp-ring-protection
73MPLS-TP resilience
- Since it strives to be a carrier-grade transport
network - TP has strong protection switching requirements
- APS has been almost as contentious issue as OAM
- and indeed the arguments are inter-related
- RFC 6372 gives a general framework
- and differentiates between
- linear
- shared-mesh and
- ring protection
74Linear protection
- from RFC 6378 (ex draft-ietf-mpls-tp-linear-protec
tion) - 11, 11, 1n and uni/bidi are supported
- APS signaling protocol (for all modes except 11
uni) - is single-phase
- and called the Protection State Coordination
protocol - PSC messages are sent over the protection
channel - APS messages are sent over the GACh with a
single channel type - message functions identified by a request field
- 6 states normal, protecting due to failure,
admin protecting, - WTR, protection path
unavailable, DNR - when revertive, a WTR timer is used
75PSC message format
- Request NR, SF, SD, manual switch, forced
switch, lockout, WTR, DNR - PT Protection Type uni 11, bidi 11, bidi
11/1n - R Revertive
- FPath which path has fault Path which data
path is on protection channel
76PSC control logic states
- Normal state - no trigger events reported
- Unavailable state - protection path is
unavailable - Protecting failure state
- traffic is being transported on the protection
path - Protecting administrative state
- operator issued command switching traffic to
protection path - Wait-to-Restore state - recovering from working
path SF/SD - WTR timer not up
- Do-not-Revert state - recovered from a protecting
state - but operator has configured DNR
77PSC local requests
- In order from highest to lowest priority
- 1. Clear (operator command)
- 2. Lockout of protection (operator command)
- 3. Forced Switch (operator command)
- 4. Signal Fail on protection (OAM / control-plane
/ server indication) - 5. Signal Fail on working (OAM / control-plane /
server indication) - 6. Signal Degrade on working (OAM / control-plane
/ server indication) - 7. Clear Signal Fail/Degrade (OAM / control-plane
/ server indication) - 8. Manual Switch (operator command)
- 9. WTR Expires (WTR timer)
- 10. No Request (default)
78Linear protection ITU style
from draft-zulr-mpls-tp-linear-protection-switchin
g Similar to previous, but uses Y.1731/G.8031
format (no surprise!)
79Ring protection
once again there were two drafts, both supporting
p2p and p2mp, wrapping and steering, link/node
failures draft-ietf-mpls-tp-ring-protection (not
yet RFC) Between any 2 LSRs can define a Sub-Path
Maintenance Entity So between 2 LSRs on a ring
there are 2 SPMEs we define 1 as the
working channel and 1 as the protection
channel Now we re-use the linear protection
mechanisms, including the PSC protocol draft-helvo
ort-mpls-tp-ring-protection-switching Both
counter-rotating rings carry working and
protection traffic The bandwidth on each ring is
divided X BW is dedicated to working traffic
and Y dedicated to protection traffic The
protection bandwidth of one ring is used to
protect the other ring Each node should have
information about the sequence of ring nodes
MPLS-TP Ring Protection Switching is
G.8032-like, but forwards non-NR msgs