Title: Failure Inferencing based Fast Rerouting for Handling Transient Link and Node Failures
1Failure Inferencing based Fast Rerouting for
Handling Transient Link and Node Failures
- Zifei Zhong
- University of South Carolina, Columbia
- Joint work with
- Srihari Nelakuditi, Junling Wang
- University of South Carolina, Columbia
- Sanghwan Lee, Yinzhe Yu
- University of Minnesota, Minneapolis
- Chen-Nee Chuah
- University of California, Davis
2Outline
- Motivation
- Wired Network Failure Characteristics
- Service Availability Expectations
- Limitations of Current Routing Schemes
- Failure Inferencing based Fast Rerouting (FIFR)
- FIFRL for single link failures IWQoS03,
Infocom04 - FIFRN for single link node failures
- FIFRN Evaluation Summary
3Failure Characteristics
- Study by Sprintlabs on failures in an IP backbone
Infocom04 - Failures are fairly common, occur almost everyday
- Maintenance, faulty interfaces, router crashes,
fiber cuts, misconfigurations - Majority of the unplanned failures are transient
- 46 lt 1 minute, 86 lt 10 minutes
- Majority of them are single link/node failures
- 85 of unplanned failures affect a single
link/router - Focus transient single link and node failures
4Availability Expectations
- Increased expectation from the Internet
- Wish for five nines IP network
- Disruption-sensitive applications
- Voice over IP
- Break gt 60 ms in voice traffic is noticeable
- E-commerce
- Even small downtime affects business and
reputation - High capacity links
- Short outage, large impact
- Link down for 10 seconds ? 3 million packets lost
- Goal high availability despite transient failures
5Current Schemes for Failure Resiliency
- Traditional link state routing protocols
(OSPF/ISIS) - React to failures with global rerouting
- Global link state updates routing table
recomputations - Trade-off between stability and continuity
- Convergence delay to resume forwarding after a
link failure - MPLS based recovery
- Local rerouting along a preconfigured LSP
- Label stacking enables tunneling of affected LSPs
thru protection LSP - Requires a shift to label switching paradigm
- Label swapping instead of destination based
forwarding
6Failure Inferencing based Fast Rerouting
- Provides fast local loop-free rerouting
- Without explicit failure notification
- Prepares for failures
- Failure inferencing
- Infer failures based on packets incoming
interface - Interface-specific forwarding
- Next hop based on destination and incoming
interface - Local rerouting upon adjacent link/node failures
- Suppress link state update and trigger local
rerouting
7Illustration No Failure Scenario
Route (1 to 6) 1-gt2-gt5-gt6
8Illustration Local Rerouting without FIFR
Route (1 to 6) 1-gt2-gt1-gt2-gt
-gt (loop!!)
9Illustration Local Rerouting with FIFRL
Route (1 to 6) 1-gt2-gt1-gt3-gt5-gt6
10Illustration FIFRL with Node Failures
Route (1 to 6) 1-gt2-gt1-gt3-gt1-gt-gt (loop!!)
11Inferencing for Node Failures
- Infer node failures instead of link failures?
- Yes, but how?
- Can it still guarantee loop-freedom?
- Yes.
12Illustration Local Rerouting with FIFRN
Route (1 to 6) 1-gt2-gt1-gt4-gt6
13Forwarding Table Computation
- Assumptions
- Links are bidirectional with equal weight
- At most a single node failure is suppressed
- Infer failed nodes from packets arrival at an
interface - set of key nodes whose failure causes
packet to d arrive at i from j - A node u is included in the set of key nodes if
- with u, j is a next hop from i to d
- without u, edge j?i is along the shortest path
from the upstream of u to d - Avoid all key nodes in choosing packets next hop
- set of next hops to d from i when packet
arrives at i from j, - Forwarding tables are pre-computed
14Illustration Key Nodes Computation
15Loop-freedom of FIFRN
- Forwarding around the failed node
- When no more than one node failure is suppressed,
FIFRN can find a loop-free path to a destination
if one such path exists
16Performance Evaluation
- Stretch
- Stretch vs. nodes
Varying number of nodes with degree 6
17Performance Evaluation
Varying number of degree with 200 nodes
18FIFRN Summary
- Fast reroute for IP networks without MPLS
- Without explicit failure notification
- Protection against any singe node failures
- Can actually handle both single link and node
failures - Better stability and availability than OSPF
- Particularly when failures are frequent and
transient - Minimal changes to current network infrastructure
- Only need to replace SPF algorithm with FIFR
algorithm
19Questions, please! ?