Automatic Protection Switching

About This Presentation
Title:

Automatic Protection Switching

Description:

is a functionality of carrier-grade transport networks. is often called resilience ... local protection (AKA local restoration, SNC protection, bypass, detour) ... – PowerPoint PPT presentation

Number of Views:381
Avg rating:3.0/5.0

less

Transcript and Presenter's Notes

Title: Automatic Protection Switching


1
AutomaticProtection Switching
  • Yaakov (J) Stein
  • CTO
  • RAD Data Communications

Mar 2012
2
  • Course Outline
  • General protection switching principles
  • Examples of protection mechanisms
  • SONET/SDH
  • Ethernet linear protection
  • Ethernet ring protection
  • MPLS fast reroute
  • MPLS-TP APS

3
General principles
Definition References Traffic types Network
topologies Triggers Protection classes Entities Pr
otection types Signaling
4
Definition
  • Automatic Protection Switching (APS)
  • is a functionality of carrier-grade transport
    networks
  • is often called resilience
  • since it enables service to quickly recover
    from failures
  • is required to ensure high reliability and
    availability
  • APS includes
  • detection of failures (signal fail or signal
    degrade) on a working channel
  • switching traffic transmission to a protection
    channel
  • selecting traffic reception from the protection
    channel
  • (optionally) reverting back to the working
    channel once failure is repaired
  • Automatic means uses (at most) control plane
    protocols
  • no management layer or manual operations
    needed

5
Some useful references
  • G.808.1 generic linear protection
  • G.808.2 generic ring protection (not yet
    written)
  • G.841 and G.842 SDH
  • G.774.3/4/9/10 SDH protection management
  • G.870 and G.873.1 OTN
  • G.8031 Ethernet linear protection
  • G.8032 Ethernet ring protection
  • G.8131 T-MPLS APS
  • Y.1720 MPLS
  • I.630 ATM
  • M.495 analog signal protection
  • G.781 clock selection (can be used to protect
    synchronization)
  • RFC 4090 MPLS Fast ReRoute
  • RFC 6372 MPLS-TP Survivability Framework
  • RFC 6378 MPLS-TP Linear Protection

6
Traffic types
  • In a network with APS capabilities, there are
    three types of traffic
  • protected traffic
  • traffic that may be rapidly switched to
    protection channel
  • at any time it may be on the working channel or
    protection channel
  • Nonpreemptible Unprotected Traffic (NUT)
  • noncritical traffic that does not require
    protection mechanism
  • not affected by protection mechanism
  • somewhat less expensive to customer
  • extra (preemptible) traffic
  • best effort background traffic that runs on
    protection channel
  • preempted (blocked) when protection channel is
    needed
  • very inexpensive to customer

7
Network topologies
  • APS can be defined for any topology with
    redundant links
  • e.g., for tree topologies no protection is
    possible
  • We will often discuss protection of individual
    links
  • However, there are two topologies that are of
    particular interest
  • rings
  • protection is natural for rings
  • although there are other reasons for using rings
    as well
  • rings are so important that protection for other
    topologies
  • is often called linear protection
  • dense meshes
  • for this topology multiple local bypasses can be
    preconfigured
  • protection switching is similar to routing
    change, but faster
  • often called Fast ReRoute (FRR)

8
Triggers
  • Protection switching is usually triggered by a
    failure
  • although the operator may manually force a
    protection switch
  • A failure is declared when a fault condition
  • persists long enough
  • for the ability to perform the required function
  • to be considered terminated
  • Failures are Signal Fail (SF) or Signal Degrade
    (SD) (of various types)
  • and may be
  • detected by physical layer
  • indicated by signaling (e.g. AIS)
  • detected by OAM mechanisms
  • When there is no SF or SD, the state is called No
    Request (NR)

9
Switching time (1)
  • SONET/SDH protection switching takes place in
    under 50 ms
  • Regarding multiplex section shared protection
    rings, G.841 states
  • The following network objectives apply
  • 1) Switch time In a ring with no extra traffic,
    all nodes in the idle state (no detected
    failures,
  • no active automatic or external commands, and
    receiving only Idle K-bytes), and with less
  • than 1200 km of fibre, the switch (ring and span)
    completion time for a failure on a single
  • span shall be less than 50 ms. On rings under all
    other conditions, the switch completion
  • time can exceed 50 ms (the specific interval is
    under study) to allow time to remove extra
  • traffic, or to negotiate and accommodate
    coexisting APS requests.
  • while for linear VC trail protection, it says
  • The following network objectives apply
  • 1) Switch time The APS algorithm for LO/HO VC
    trail protection shall operate as fast as
  • possible. A value of 50 ms has been proposed as a
    target time. Concerns have been
  • expressed over this proposed target time when
    many VCs are involved. This is for further
  • study. Protection switch completion time excludes
    the detection time necessary to initiate the
  • protection switch, and the hold-off time.
  • There are similar statements in other clauses as
    well

10
Switching time (2)
  • This 50 ms time has become the golden standard
  • and new protection schemes are expected to meet
    this objective
  • However, studying the literature that lead up to
    SONET/SDH standards
  • shows that the objective was to attain the
    minimum possible time
  • for the sum of
  • persistent (i.e. non-transient) failure detection
  • speed of light propagation
  • signaling protocol time
  • regaining sync alignment
  • and 50 ms was the minimum that was considered
    practical !
  • Many modern standards have built in 50 ms
  • and much marketing literature boasts faster
    than 50 ms
  • But there is really nothing special about 50 ms
  • 50 ms gaps in voiced speech are noticeable,
  • but not fatal if infrequent
  • 50 ms of data at high rates can not be stored and
    later forwarded
  • timing circuits can withstand much more than 50
    ms without clock

11
Protection classes
  • It is useful to distinguish two different
    protection classes
  • path protection (AKA trail protection, end-to-end
    protection)
  • when a failure is detected on the end-to-end path
  • we switch to an alternative end-to-end path
  • the failure is usually detected by end-to-end OAM
  • local protection (AKA local restoration, SNC
    protection, bypass, detour)
  • we protect individual network elements, links, or
    groups of same
  • when such an entity fails
  • only that local entity is bypassed
  • the failure may be detected by link OAM or
    physical layer means

12
APS entities (1)
  • The following entities are important in APS
  • working channel channel used when no failure
    exists
  • protection channel channel used when a failure
    exists
  • head-end entity transmitting data to
    working/protection channel
  • tail-end entity receiving data from the
    working/protection channel
  • Note we will usually consider traffic to be
    bidirectional
  • so that the head-end for one direction
  • is the tail-end for the opposite direction

13
APS entities (2)
  • Bridge function at head-end that connects
    traffic (including extra traffic) to the working
    and protection channels
  • Selector function at tail-end that extracts
    traffic (perhaps extra traffic) from the working
    or protection channel
  • APS signaling channel channel used to
    communicate between head-end and tail-end for APS
    purposes
  • Trail termination function responsible for
    failure detection
  • including injection and extraction of OAM

working channel
tail-end (selector)
head-end (bridge)
protection channel
signaling channel
14
Revertive operation
  • Reversion means returning to use the working
    channel
  • after the failure has been rectified
  • Protection mechanisms can be revertive or
    nonrevertive
  • Revertive mechanisms may be preferable
  • when the working channel has better performance
    (free BW, BER, delay)
  • when there are frequent switches (easier to
    manage)
  • when there is extra traffic
  • but nonrevertive also has advantages
  • only one service disruption due to protection
    switching
  • may be simpler to implement

15
Uni/bi-directional
  • We will usually consider bidirectional traffic
  • but even then the failures can be uni- or bi-
    directional
  • and for unidirectional failures there can be uni-
    or bi- directional switching

16
Uni- / bi- directional switching
  • Unidirectional switching may be advantageous
  • for 11 - faster and no signaling channel is
    needed
  • no unnecessary service disruption for direction
    without failure
  • higher chance of protection under multiple
    failures
  • easier to implement for local protection
  • maintains extra traffic in direction without
    failure
  • But bidirectional may be preferable
  • easier management since directions traverse same
    network elements
  • does not disrupt delay balance between direction
  • may simplify repair since failed spans are unused

17
Protection types
  • We distinguish several different protection types
  • 11
  • 11
  • 1n
  • mn
  • (11)n
  • Each type has its applicability, advantages, and
    disadvantages
  • and there are trade-offs between
  • simplicity
  • BW consumption
  • protection switch time
  • signaling requirements

18
11 protection
  • Simplest and fastest form of protection
  • but wasteful - only 50 of actual physical
    capacity is used
  • Head-end bridge always sends data on both
    channels
  • Tail-end selector chooses channel to use (based
    on BER, dLOS, etc.)
  • For unidirectional11 switching there is no need
    for APS signaling
  • If non-revertive
  • there is no distinction between working and
    protection channels

19
11 protection
  • Head-end bridge usually sends data on working
    channel
  • When failure detected it starts sending data over
    protection channel
  • and tail-end needs to select the protection
    channel
  • When not in use, protection channel can be used
    for extra traffic
  • However, since failure is detected by tail-end,
    APS signaling is needed
  • Protection channel should have OAM running to
    ensure its functionality

20
1n protection
  • One protection channel is allocated for n working
    channels
  • Only can protect one working channel at a time
  • but improbable that more than 1 working channel
    will simultaneously fail
  • Only 1/(n1) of total capacity is reserved for
    protection

21
mn protection
  • To enable protection of more than 1 channel
  • m protection channels are allocated for n working
    channels (m lt n)
  • m simultaneous failures can be protected
  • Less protection capacity dedicated than for n
    times 11
  • When failure detected,
  • 1 of the m protection channels need to be
    assigned and signaled
  • High complexity but conserves resources

22
(11)n protection
  • This is like n times 11 but the n protection
    channels share bandwidth
  • Only 1 failed working channel can be protected
  • This is different from 1n since
  • n protection channels are preconfigured
  • n working channels need not be of the same type
  • Protection bandwidth must be at least that of the
    largest working channel

23
APS algorithm
  • We have seen that protection switching is a
    tricky business
  • So it is not surprising that network elements
    that support APS
  • run an APS algorithm
  • This algorithm inputs
  • configuration (protection type, revertive?,
    available channels, )
  • failure indications (NR, SF, SD)
  • operator commands
  • APS signaling (more on that soon)
  • and makes switching decisions
  • The algorithm maintains state information for
    head-end and tail-end
  • APS algorithms are detailed in standards documents

24
Priority
  • Not every failure event / operator command
    results in a protection switch
  • For example
  • in 1n protection the protection channel may
    already be in use !
  • Conflicts are resolved by assigning priorities to
    events/commands
  • When an event is detected or a command received
  • the APS algorithm will not act
  • if an event/command or equal or higher priority
    is already in effect
  • True failure conditions usually have higher
    priority than manual commands

25
Timers
  • Even failure events with priority are not acted
    upon immediately
  • to do so would cause unnecessary switches after
    transient defects
  • The APS algorithm may maintains several timers,
    such as
  • Holdoff timers
  • the time between detection of a SF or SD event
  • and the APS algorithm acting upon this even
  • the algorithm usually used is called peek twice
  • i.e., the condition is checked again after the
    timer expires
  • Wait To Restore timer
  • for revertive switching, the time between
    detection of the failure being cleared and the
    APS algorithm acting upon this event
  • also used in SDH optimized bidirectional 11
    (nonrevertive)
  • Guard timer
  • for rings blockout time during which APS
    messages are ignored (since they may be old and
    outdated)

26
APS signaling
  • In all types except unidirectional 11, some APS
    signaling is needed
  • APS signaling is used to synchronize between
    head-end and tail-end
  • It is critical that head-end and tail-end always
    be in the same state
  • Example messages include
  • No Request (NR)
  • by tail-end to inform head-end of Signal Failure
    (SF)
  • by head-end to confirm the events priority
  • by head-end to report the particular protection
    channel
  • by head-end to inform tail-end of Reverse
    (bidirectional) Request (RR)
  • by tail-end after failure cleared to Wait To
    Restore (WTR)
  • by tail-end after failure cleared to Do Not
    Revert (DNR) for nonrevertive

27
APS signaling phases
  • When APS signaling is used, it needs to be as
    rapid as possible
  • Depending on the scenario it may be
  • 1-phase tail?head (fastest)
  • tail-end informs head-end of failure
  • both ends uniquely know the protection channel to
    be used
  • only for 11 and unidirectional-(11)n
    (including 11)
  • 2-phase 1) tail?head 2) head?tail
  • tail-end informs head-end of failure
  • head-end signals that it has switched to
    protection channel
  • not for bidirectional-1n or mn
  • 3-phase 1) tail?head 2) head?tail 3) tail?head
    (slowest)
  • works for all protection types (including mn)

28
Examples of 1-phase
  • Example of when 1-phase signaling is possible is
    11 or (11)n
  • 1. upon detection of failure the tail-end sends
    SF to the head-end
  • and immediately changes its selector (blind
    switch)
  • upon receipt the head-end changes the bridge
    setting
  • (no priority is checked)
  • 1-phase can also be used for bidirectional 11
  • 1. upon detection of failure the tail-end sends
    SF to the head-end
  • and immediately changes both its selector and
    bridge
  • upon receipt the head-end changes its bridge and
    selector

29
Example of 2-phase
  • 2-phase is useful for unidirectional 1n with
    priority checking
  • 1. upon detection of failure the tail-end sends
    SF to the head-end
  • but does not change its selector
  • 2. the head-end checks priority
  • sends confirmation to tail-end (with identity of
    working channel)
  • the bridge setting is changed
  • 3. the tail-end changes its selector

30
Example of 3-phase
  • 3-phase signaling is imperative for bidirectional
    1n
  • 1. upon detection of failure the tail-end sends
    SF to the head-end
  • but does not change its selector
  • 2. the head-end checks priority, and sends
    confirmation to tail-end
  • head-end changes its bridge setting
  • and also sends a reverse request
  • 3. the tail-end changes selector
  • checks priority and sends confirmation to
    head-end
  • tail-end changes its bridge setting (as head-end
    of opposite direction)
  • head-end receives confirmation and changes its
    selector

31
For G.805 buffs
  • to add 11 trail protection to a trail - expand a
    trail termination function
  • we use a special transport processing function -
    the protection switch

the unprotected TTs report status to the
protection switch
32
SONET/SDH APS
33
SONET protection ?
  • SONET/SDH networks need to be highly reliable
    (five nines)
  • Down-time should be minimal (less than 50 msec)
  • So systems must repair themselves (no time for
    manual intervention)
  • Upon detection of a failure (dLOS, dLOF, high
    BER)
  • the network must reroute traffic (protection
    switching)
  • from working channel to protection channel
  • SDH APS is unidirectional
  • SDH APS may be revertive

34
SONET/SDH layers
  • Between regenerators there are sections
    (regenerator sections)
  • Between ADMs there are lines (multiplex sections)
  • Between path terminations there are paths
  • Protection can be at OC-n level (different
    physical fibers)
  • or at STM/VC level
  • or end-to-end path (trail protection)

35
Line APS
90 columns
Synchronous Payload Envelope
3 rows
9 rows
9 rows
6 rows
TOH
A1 A2 J0
B1 E1 F1
D1 D2 D3
H1 H2 H3
B2 K1 K2
D4 D5 D6
D7 D8 D9
DA DB DC
S1 M0 E2
  • TOH consists of
  • 3 rows of section overhead - frame sync, trace,
    EOC,
  • 6 rows of line overhead - pointers, SSM, FEBE,
    and
  • Line APS signaling uses bytes K1 and K2

36
HO Path APS
  • POH is responsible for type, status, path
    performance monitoring, VCAT, trace
  • HO Path APS signaling uses 4 MSBs of byte K3

37
LO Path APS
  • VC OH is responsible for
  • Timing, PM, REI,
  • LO Path APS signaling is
  • 4 MSBs of byte K4

38
How does it work?
  • Head-end and tail-end NEs have bridges (muxes)
  • Head-end and tail-end NEs maintain bidirectional
    signaling channel
  • Signaling is contained in K bytes of protection
    channel
  • For line APS
  • K1 tail-end status and requests
  • K2 head-end status

39
Linear 11 protection
  • Can be at OC-n level (different physical fibers)
  • or at STM/VC level (SubNetwork Connection
    Protection)
  • or end-to-end path (called trail protection)
  • Head-end bridge always sends data on both
    channels
  • Tail-end chooses channel to use based on BER,
    dLOS, etc.
  • No need for signaling
  • If non-revertive
  • there is no distinction between working and
    protection channels

40
Linear 11 protection
  • Head-end bridge usually sends data on working
    channel
  • When tail-end detects failure it signals (using
    K1) to head-end
  • Head-end then starts sending data over protection
    channel
  • When not in use
  • protection channel can be used for (discounted)
    extra traffic
  • (pre-emptible unprotected traffic)
  • May be at any layer (but only OC-n level protects
    against fiber cuts)

working channel
extra traffic
protection channel
41
Linear 1N protection
  • In order to save BW
  • we allocate 1 protection channel for every N
    working channels
  • N limited to 14
  • 4 bits in K1 byte from tail-end to head-end
  • 0 protection channel
  • 1-14 working channels
  • 15 extra traffic channel

42
Two fiber vs. Four-fiber rings
  • Ring based protection is popular in North America
    (100K rings)
  • Full protection against physical fiber cuts
  • Simpler and less expensive than mesh topologies
  • Protection at line (multiplexed section) or path
    layer
  • Four-fiber rings
  • fully redundant at OC level
  • can support bidirectional routing at line layer
  • Two-fiber rings
  • support unidirectional routing at line layer

2 fibers in opposite directions
43
Unidirectional vs. bidirectional
  • Unidirectional routing
  • working channel B-A same direction (e.g.
    clockwise) as A-B
  • management simplicity A-B and B-A can occupy
    same timeslots
  • Inefficient waste in ring BW and excessive delay
    in one direction
  • Bidirectional routing
  • A-B and B-1 are opposite in direction
  • both using shortest route
  • spatial reuse timeslots can be reused in other
    sections

44
UPSR vs. BLSR (MS-SPRing)
Path switching Line switching
Two-fiber Four-fiber
Unidirectional Bidirectional
UPSR
BLSR
  • Of all the possible combinations, only a few are
    in use
  • Unidirectional (routing) Path Switched Rings
  • protects tributaries
  • extension of 11 to ring topology
  • Bidirectional (routing) Line Switched Rings
    (two-fiber and four-fiber versions)
  • called Multiplex Section Shared Protection Ring
    in SDH
  • simultaneously protects all tributaries in STM
  • extension of 11 to ring topology

45
UPSR
  • Working channel is in one direction
  • protection channel in the opposite direction
  • All path traffic is added in both directions
    (11)
  • decision as to which to use is made at drop point
    (no signaling)
  • Normally non-revertive, so effectively two
    diversity paths
  • Good match for access networks
  • 1 access resilient ring
  • less expensive than fiber pair per customer
  • Inefficient for core networks
  • no spatial reuse
  • every signal in every span
  • in both directions
  • node needs to continuously monitor
  • every tributary to be dropped

46
BLSR
  • Switch at line level less monitoring
  • When failure detected tail-end NE signals
    head-end NE
  • Works for unidirectional/bidirectional fiber
    cuts, and NE failures
  • Two-fiber version
  • half of OC-N capacity devoted to protection
  • only half capacity available for traffic
  • Four-fiber version
  • full redundant OC-N devoted to protection
  • twice as many NEs as compared to two-fiber

Example recovery from unidirectional fiber cut
47
Ethernet linear APS
  • STP
  • LAG
  • G.8031

48
STP
  • The original Spanning Tree Protocol automatically
    removed loops
  • from arbitrary networks (with loops)
  • However, its convergence was very slow (about a
    minute)
  • STP can not be used as a protection mechanism
  • since its reconvergence time is very long
  • due to a cumbersome protocol
  • and long holdoff timer settings
  • An evolutionary update called Rapid STP 802.1w
  • was incorporated into 802.1D-2004 clause 17
  • that converges in about the same time as STP
  • but can reconverge after a topology change in
    less than 1 second
  • RSTP can be used to detect failures and
    reconverge
  • and thus can be used as a primitive protection
    mechanism
  • However, the switching time will be many tens of
    ms to 100s of ms

49
Use of LAG
  • Ethernet link aggregation (AKA bonding,
    Ethernet trunk, inverse mux, NIC teaming)
  • enables bonding several ports together as single
    uplink
  • Defined by 802.3ad task force and folded into
    802.3-2000 as clause 43
  • Binding of ports to Link Aggregation Groups
    (LAGs) distributed via
  • Link Aggregation Control Protocol (LACP)
  • LACP uses slow protocol frames (up to 5 per
    second)
  • Links may be dynamically added/removed from LAG
  • and LACP continuously monitors to detect if
    changes needed
  • Upon link failure LAG delivers traffic at a
    reduced rate
  • Thus LAG can be used as a primitive protection
    mechanism
  • When used this way it is called worker/standby or
    NN mode
  • The restoration time will be on the order of 1
    second

50
G.8031
  • Q9 of SG15 in the ITU-T is responsible for
    protection switching
  • In 2006 it produced G.8031 Linear Ethernet
    Protection Switching
  • G.8031 uses standard Ethernet formats, but is
    incompatible with STP
  • The standard addresses
  • point-to-point VLAN connections
  • SNC (local) protection class
  • 11 and 11 protection types
  • unidirectional and bidirectional switching for
    11
  • bidirectional switching for 11
  • revertive and nonrevertive modes
  • 1-phase signaling protocol
  • G.8031 uses Y.1731 OAM CCM messages in order to
    detect failures
  • G.8031 defines a new OAM opcode (39) for APS
    signaling messages
  • Switching times should be under 50 ms (only
    holdoff timers when groups)

51
G.8031 signaling
  • The APS signaling message looks like this
  • regular APS messages are sent 1 per 5 seconds
  • after change 3 messages are sent at max rate (300
    per sec)
  • where
  • req/state identifies the message (NR, SF, WTR,
    SD, forced switch, etc)
  • prot. type identifies the protection type (11,
    11, uni/bidirectional, etc.)
  • requested and bridged signal identify incoming /
    outgoing traffic
  • since only 11 and 11 they are either null or
    traffic (all other values reserved)

52
G.8031 11 revertive operation
  • In the normal (NR) state
  • head-end and tail-end exchange CCM (at 300 per
    second rate)
  • on both working and protection channels
  • head-end and tail-end exchange NR APS messages
  • on the protection channel (every 5 seconds)
  • When a failure appears in the working channel
  • tail-end stops receiving 3 CCM messages on
    working channel
  • tail-end enters SF state
  • tail-end sends 3 SF messages at 300 per second on
    the APS channel
  • tail-end switches selector (bi-d and bridge) to
    the protection channel
  • head-end (receiving SF) switches bridge (bi-d and
    selector) to protection channel
  • tail-end continues sending SF messages every 5
    seconds
  • head-end sends NR messages but with
    bridgednormal
  • When the failure is cleared
  • tail-end leaves SF state and enters WTR state
    (typically 5 minutes, 5..12 min)
  • tail-end sends WTR message to head-end (in
    nonrevertive - DNR message)
  • tail-end sends WTR every 5 seconds
  • when WTR expires both sides enter NR state

53
Ethernet ring APS
  • G.8032
  • RPR
  • CLEER

54
Ethernet rings ?
  • Ethernet has become carrier grade
  • deterministic connection-oriented forwarding
  • OAM
  • synchronization
  • The only thing missing to completely replace SDH
    is ring protection
  • However, Ethernet and ring architectures dont go
    together
  • Ethernet has no TTL, so looped traffic will loop
    forever
  • STP builds trees out of any architecture no
    loops allowed
  • There are two ways to make an Ethernet ring
  • open loop
  • cut the ring by blocking some link
  • when protection is required - block the failed
    link
  • closed loop
  • disable STP (but avoid infinite loops in some way
    !)
  • when protection is required - steer and/or wrap
    traffic

55
Ethernet ring protocols
  • Open loop methods
  • G.8032 (ERPS)
  • rSTP (ex 802.1w)
  • RFER (RAD)
  • ERP (NSN)
  • RRST (based on RSTP)
  • REP (Cisco)
  • RRSTP (Alcatel)
  • RRPP (Huawei)
  • EAPS (Extreme, RFC 3619)
  • EPSR (Allied Telesis)
  • PSR (Overture)
  • Closed loop methods
  • RPR (IEEE 802.17)
  • CLEER and NERT (RAD)

56
G.8032
  • Q9 of SG15 produced G.8032 between 2006 and 2008
  • G.8032 is similar to G.8031
  • strives for 50 ms protection (lt 1200 km, lt 16
    nodes)
  • but here this number is deceiving as MAC table is
    flushed
  • standard Ethernet format but incompatible with
    STP
  • uses Y.1731 CCM for failure detection
  • employs Y.1731 extension for R-APS signaling
    (opcode40)
  • R-APS message format similar to APS of G.8031
  • (but between every 2 nodes and to MAC address
    01-19-A7-00-00-01)
  • revertive and nonrevertive operation defined
  • However, G.8032 is more complex due to
  • requirement to avoid loop creation under any
    circumstances
  • need to localize failures
  • need to maintain consistency between all nodes on
    ring
  • existence of a special node (RPL owner)

57
RPL
  • G.8032v1 defines the Ring Protection Link (RPL)
  • as the link to be blocked (to avoid closing the
    loop) in NR state
  • One of the 2 nodes connected to the RPL
  • is designated the RPL owner
  • Unlike RFER
  • there is only one RPL owner
  • the RPL and owner are designated before setup
  • operation is usually revertive
  • All ring nodes are simultaneously in 1 of 2 modes
    idle or protecting
  • in idle mode the RPL is blocked
  • in protecting mode the failed link is blocked and
    RPL is unblocked
  • in revertive operation
  • once the failure is cleared the block link is
    unblocked
  • and the RPL is blocked again

58
G.8032 revertive operation
  • In the idle state
  • adjacent nodes exchange CCM at 300 per second
    rate (including over RPL)
  • exchange NR RB (RPL Blocked) messages in
    dedicated VLAN every 5 seconds (but not over RPL)
  • R-APS messages are never forwarded
  • When a failure appears between 2 nodes
  • node(s) missing CCM messages peek twice with
    holdoff time
  • node(s) block failed link and flush MAC table
  • node(s) send SF message (3 times _at_ max rate, then
    every 5 sec)
  • node receiving SF message will check priority and
    unblock any blocked link
  • node receiving SF message will send SF message to
    its other neighbor
  • in stable protecting state SF messages over every
    unblocked link
  • When the failure is cleared
  • node(s) detect CCM and start guard timer (blocks
    acting on R-APS messages)
  • node(s) send NR messages to neighbors (3 times _at_
    max rate, then every 5 sec)
  • RPL owner receiving NR starts WTR timer
  • when WTR expires RPL owner blocks RPL, flushes
    table, and sends NR RB
  • node receiving NR RB flushes table, unblocks any
    blocked ports, sends NR RB

59
G.8032-2010
  • After coming out with G.8032 in 2008 (G.8032v1)
  • the ITU came out with G.8032-2010 (G.8032v2) in
    2010
  • This new version is not backwards-compatible with
    v1
  • but a v2 node must support v1 as well (but then
    operation is according to v1)
  • Major differences
  • 2 designated nodes RPL owner node and RPL
    neighbor node
  • and for optional flush-optimization
    next neighbor node
  • significant changes to
  • state machine
  • priority logic
  • commands (forced/manual/clear) and protocol
  • new Wait To Block timer
  • supports more general topologies (sub-rings)
  • ladders (For Further Study in v1)
  • multi-ring
  • ring topology discovery
  • virtual channel based on VLAN or MAC address

ring
subring
subring
ladder
60
RPR 802.17
  • Resilient Packet Rings
  • are compatible with standard Ethernet, but
    different frame format
  • are robust (lossless, lt50ms protection, OAM)
  • are fair (based on client throttling)
  • support QoS (3 classes A, B, C)
  • are efficient (full spatial reuse)
  • are plug and play (automatic station
    autodiscovery)
  • extend use of existing fiber rings
  • counter-rotating add/drop ringlets, running
  • SONET/SDH (any rate, PoS, GFP or LAPS) or
  • packetPHY (1 or 10 Gb/s ETH PHY)
  • developed by 802.17 WG
  • based on Ciscos Spatial Reuse Protocol (RFC
    2892)

ringlet selection
61
Basic RPR queuing
traffic going around ring placed into internal
buffer in dual-transit queue mode placed into 1
of 2 buffers according to service class sent
according to fairness
traffic for local sink placed in output buffer
according to service class
  • traffic from local source
  • sent according to fairness
  • first sent to ringlet selection

Primary/Secondary Transit Queue
62
RPR service classes
  • RPR defines 3 main classes
  • class A real time (low latency/FDV)
  • class B near real time (bounded predictable
    latency/FDV)
  • class C best effort

class use info rate D/FDV FE
A0 RT reserved low No
A1 RT allocated, reclaimable low No
B-CIR near RT allocated, reclaimable bounded No
B-EIR near RT opportunistic unbounded Yes
C BE opportunistic unbounded Yes
63
RPR Class use
  • A0 ring BW is reserved not reclaimed even if no
    traffic
  • in dual-transit queue mode
  • class A frames from the ring are queued in PTQ
  • class B, C in STQ
  • priority for egress
  • frames in PTQ
  • local class A frames
  • local class B (when no frames in PTQ)
  • frames in STQ
  • local class C (when no PTQ, STQ, local A or B)
  • Notes
  • class A have minimal delay
  • class B have higher priority than STQ transit
    frames, so bounded delay/FDV
  • classes B and C share STQ, so once in ring have
    similar delay

64
RPR - protection
  • rings give inherent protection against single
    point of failure
  • RPR specifies 2 mechanisms
  • steering
  • wrapping (optional)
  • (implementations may also do wrapping then
    steering)

steering info
wrap
65
NERT and CLEER
  • New Ethernet Ring Technology / Closed Loop
    Encapsulated Ethernet Ring
  • Similar to RPR but uses real Ethernet format
  • NERT and CLEER distinguish between
  • ring nodes
  • switches connected to ring nodes
  • Traffic in ring is MAC-in-MAC encapsulated
  • External MACs are of ring node
  • Internal MACs are original
  • Unexpected external MACs discarded
  • External MACs learned as in 1ah
  • Ring nodes forward according to table
  • NERT floods, CLEER never floods
  • Protection switch only involves changing table
  • so service restoration is fast

66
MPLS fast reroute
  • IP FRR
  • RFC 4090

67
IP FRR
  • True protection mechanisms do not exist for
    connectionless IP
  • In practice, routing protocols discover breaks
    and recalculate routes
  • but this usually takes a long time
  • Link-state IGPs detect link-down state using
    hellos
  • for OSPF - typically every 10 sec, and detection
    after 40 sec
  • and then Dijkstra algorithm avoids the failed
    link
  • BFD can be used to speed up the detection
  • However,
  • the information still has to be propagated
    further (seconds?)
  • and FIBs updated (100s of ms)
  • Various IP Fast ReRoute (IP FRR) mechanisms have
    been proposed
  • but true protection is best done at the MPLS
    level

68
MPLS fast reroute
  • RSVP-TE enables MPLS traffic engineering by fine
    control over placement
  • specifies explicit path using information
    gathered from IGP
  • resources may be reserved at LSRs along the way
  • RFC 4090 defines extensions to RSVP-TE Fast
    ReRoute (FRR)
  • LSRs along the path preconfigure local bypasses
    (detours)
  • Upon detection of failure by
  • BFD (specified in microseconds, typically 10s of
    ms) or
  • RSVP hellos (RFC default is 5 ms) or
  • RESV / PATH messages (driven by IGP)
  • upstream LSR simply enables the detour
  • Since this is a local action, it should be fast
  • RFC 4090 only discusses adding FRR to RSVP-TE
    network
  • but its use with LDP is possible if there is a
    single label generator

not discussed in RFC 4090
69
PLRs and MPs
  • A fundamental entities in MPLS FRR are
  • Point of Local Repair (PLR)
  • Merge Point (MP)
  • A PLR is the LSR before the failed element (link
    or node)
  • All LSRs except the egress LER can be PLRs
  • The PLR is solely responsible for the FRR (no
    explicit APS signaling)
  • During path setup, potential PLRs create detours
    towards the egress LER
  • A MP is the LSR where the detour rejoins the LSP
  • All LSRs except the ingress LER can be MPs

70
Methods
  • RFC 4090 defines two different protection methods
  • Usually one or the other is employed in a given
    network
  • One-to-one backup
  • each LSP protected separately
  • detour LSP created for each LSP at each potential
    PLR
  • no labels pushed
  • Facility backup
  • backup tunnel for multiple LSPs
  • bypass tunnel created at each potential PLR
  • uses label stacking

71
NHOP and NNHOP
  • MPLS FRR can bypass a failed link or a failed
    node
  • In order to bypass a single failed link
  • we need an alternative path to the next hop
    (NHOP)
  • In order to bypass a single failed node, we need
    an alternative path to the next next hop (NNHOP)

72
MPLS TP APS
  • RFC 6372 (MPLS-TP Survivability Framework)
  • RFC 6378 (MPLS-TP Linear Protection)
  • draft-ietf-mpls-tp-ring-protection

73
MPLS-TP resilience
  • Since it strives to be a carrier-grade transport
    network
  • TP has strong protection switching requirements
  • APS has been almost as contentious issue as OAM
  • and indeed the arguments are inter-related
  • RFC 6372 gives a general framework
  • and differentiates between
  • linear
  • shared-mesh and
  • ring protection

74
Linear protection
  • from RFC 6378 (ex draft-ietf-mpls-tp-linear-protec
    tion)
  • 11, 11, 1n and uni/bidi are supported
  • APS signaling protocol (for all modes except 11
    uni)
  • is single-phase
  • and called the Protection State Coordination
    protocol
  • PSC messages are sent over the protection
    channel
  • APS messages are sent over the GACh with a
    single channel type
  • message functions identified by a request field
  • 6 states normal, protecting due to failure,
    admin protecting,
  • WTR, protection path
    unavailable, DNR
  • when revertive, a WTR timer is used

75
PSC message format
  • Request NR, SF, SD, manual switch, forced
    switch, lockout, WTR, DNR
  • PT Protection Type uni 11, bidi 11, bidi
    11/1n
  • R Revertive
  • FPath which path has fault Path which data
    path is on protection channel

76
PSC control logic states
  • Normal state - no trigger events reported
  • Unavailable state - protection path is
    unavailable
  • Protecting failure state
  • traffic is being transported on the protection
    path
  • Protecting administrative state
  • operator issued command switching traffic to
    protection path
  • Wait-to-Restore state - recovering from working
    path SF/SD
  • WTR timer not up
  • Do-not-Revert state - recovered from a protecting
    state
  • but operator has configured DNR

77
PSC local requests
  • In order from highest to lowest priority
  • 1. Clear (operator command)
  • 2. Lockout of protection (operator command)
  • 3. Forced Switch (operator command)
  • 4. Signal Fail on protection (OAM / control-plane
    / server indication)
  • 5. Signal Fail on working (OAM / control-plane /
    server indication)
  • 6. Signal Degrade on working (OAM / control-plane
    / server indication)
  • 7. Clear Signal Fail/Degrade (OAM / control-plane
    / server indication)
  • 8. Manual Switch (operator command)
  • 9. WTR Expires (WTR timer)
  • 10. No Request (default)

78
Linear protection ITU style
from draft-zulr-mpls-tp-linear-protection-switchin
g Similar to previous, but uses Y.1731/G.8031
format (no surprise!)
79
Ring protection
once again there were two drafts, both supporting
p2p and p2mp, wrapping and steering, link/node
failures draft-ietf-mpls-tp-ring-protection (not
yet RFC) Between any 2 LSRs can define a Sub-Path
Maintenance Entity So between 2 LSRs on a ring
there are 2 SPMEs we define 1 as the
working channel and 1 as the protection
channel Now we re-use the linear protection
mechanisms, including the PSC protocol draft-helvo
ort-mpls-tp-ring-protection-switching Both
counter-rotating rings carry working and
protection traffic The bandwidth on each ring is
divided X BW is dedicated to working traffic
and Y dedicated to protection traffic The
protection bandwidth of one ring is used to
protect the other ring Each node should have
information about the sequence of ring nodes
MPLS-TP Ring Protection Switching is
G.8032-like, but forwards non-NR msgs
Write a Comment
User Comments (0)