Resilient Overlay Network - PowerPoint PPT Presentation

1 / 43
About This Presentation
Title:

Resilient Overlay Network

Description:

Every RON router implements outage detection. uses an active probing mechanism for this. ... Outage detection and recovery in about 15 seconds ... – PowerPoint PPT presentation

Number of Views:104
Avg rating:3.0/5.0
Slides: 44
Provided by: zheng85
Category:

less

Transcript and Presenter's Notes

Title: Resilient Overlay Network


1
Resilient Overlay Network
  • David Andersen, Hari Balakrishnan,
  • Frans Kaashoek, Robert Morris
  • MIT Laboratory for Computer Science
  • http//nms.lcs.mit.edu/ron/
  • 18th ACM Symp. on Operating Systems Principles
    (SOSP) October 2001,
  • Banff, Canada.

2
Outline
  • Introduction
  • Design Goal
  • Design
  • Implementation
  • Evaluation
  • Discussion
  • Conclusion

3
Fault-tolerant networking
B
A
C
D
  • Packet switching and route around failures

4
The Internet
Mom-and-popISP
Really-big ISP everyones afraid of
Big ISP
Autonomous System (AS)
Peering
BGP4
Scalability via aggressive aggregation and
information hiding Commercial reality via peering
transit relationships
5
How Robust is Internet Routing?
  • Slow outage detection and recovery
  • Inability to detect badly performing paths
  • Inability to efficiently leverage redundant paths
  • Inability to perform application-specific routing
  • Inability to express sophisticated routing policy

6
Introducing RON
  • Resilient Overlay Networks (RONs)
  • Remedy for some of these problems
  • Rapid detection and recovery of Internet path
    outages and performance degrades
  • Distributed application-layer overlay
  • Nodes cooperate to forward data for each other
  • Exploit redundancy in underlying Internet paths

7
Routing Using Overlays
  • Cooperating end-systems in different routing
    domains can conspire to do better than scalable
    wide-area protocols

Scalable BGP-based IP routing substrate
  • Types of failures
  • Outages Configuration/operational errors,
    backhoes, etc.
  • Performance failures Severe congestion,
    denial-of-service attacks, etc.

8
Design Goals
  • RON nodes can communicate with each other in face
    of problems with underlying Internet paths
  • Each RON node obtains the path metrics
  • active probing experiments
  • passive observations
  • exchange information about the quality of the
    paths via a routing protocol
  • build forwarding tables based on path metrics
  • Latency, Packet loss rate, available throughput
  • designed to be limited in size

9
Design Goals (Cont.)
  • Integrate routing and path selection with
    distributed applications more tightly
  • The ability to incorporate application-specific
    notions of what network conditions constitute a
    fault.
  • The ability to consult application-specific
    metrics in selecting paths
  • Variety uses
  • Provide a framework for the implementation of
    expressive routing policies
  • BGP-4 is incapable of expressing fine-grained
    policies aimed at users or hosts.
  • This lack of precision
  • reduces the set of paths available in the case of
    a failure
  • inhibits innovation in the use of carefully
    targeted policies

10
(No Transcript)
11
RON Design
Nodes in different routing domains (ASes)
Virtual link
RON library
Application-specific routing tables
Policy routing module
Performance Database
12
Software Architecture
application
application
Application-specific routing tables Policy
routing module
13
Software Architecture
  • RON client
  • Each program that communicates with the RON lib
    on a node
  • The overlay network is defined by a single group
    of clients
  • Service-specific routing metrics
  • Conduit
  • an API across which A RON client interacts with
    RON
  • Send (pkt, dst, via_ron)
  • Recv (pkt, via_ron)

14
Software Architecture
  • Forwarder object
  • Implement basic RON functionality
  • RON router
  • Implements a routing protocol
  • Application can choose which router to use
  • RON membership manager
  • Maintain the list of members of a RON

15
Routing and Path Selection
  • The entry node
  • Encapsulate packet with a RON packet header
  • Path selection
  • tags the packets RON header with a flow ID
  • support multi-hop routing
  • tie a packet flow to a chosen path
  • The small size of a RON allows to
  • maintain information for each virtual link
  • (i) latency, (ii) packet loss rate,(iii)
    throughput
  • select the path that best suits the RON client

16
Routing and Path Selection
17
Routing and Path Selection
  • Link-state dissemination
  • The default RON router uses a link-state routing
    protocol to disseminate topology information
    between routers
  • information is sent via the RON forwarding mesh
    itself
  • Thus, the RON routing protocol is itself a RON
    client

18
Routing and Path Selection
  • Path Evaluation and Selection
  • Every RON router implements outage detection
  • uses an active probing mechanism for this.
  • By default, every RON router implements three
    different routing metrics
  • latency-minimizer
  • loss-minimizer
  • TCP throughput-optimizer.

19
Routing and Path Selection
  • Latency-minimizer
  • For any link
  • For a RON path, the overall latency is the sum of
    the individual virtual link latencies
  • loss-minimizer
  • TCP throughput-optimizer
  • Select when estimated throughput improves by 2x

20
Monitoring Virtual Links
  • Both sides get an RTT sample without requiring
    sync clocks
  • Parameters
  • PROBE_INTERVAL random(0 1/3 PROBE_INTERVAL)
  • PROBE_TIMEOUT
  • OUTAGE_THRESHOLD

21
Routing and Path Selection
  • Performance database

22
Policy Routing
  • RON allows users or administrators to define the
    types of traffic allowed on particular network
    links.
  • RON separates policy routing into two components
  • classification
  • Entry node classifies packet with a policy tag
  • routing table formation.
  • Router computes a set of forwarding tables for
    each policy
  • Two ways of describing policies
  • Exclusive cliques
  • E.g., only students in CoC are allowed to use
    GTs connection to Internet2
  • General policies
  • BPF-like packet matcher, which returns a policy
  • A list of links that are denied by the policy

23
Data Forwarding
24
Data Forwarding
25
Bootstrap and Membership Management
  • RON provides two system membership managers
  • static membership mechanism
  • dynamic membership protocol
  • The new node uses this neighbor to broadcast its
    existence using a flooder
  • The main challenge in the dynamic membership
    protocol is to avoid confusing a path outage to a
    node from its having left the RON
  • Each node periodically exchange its peer list
    with others
  • 1-hour timeout

26
Implementation
  • Resilient IP Forwarder

27
Generating Routing Tables
Single-hop indirection
28
Evaluation
  • N(N-1) different paths in a N-site RON deployment
  • RON1 N12 132 distinct paths
  • RON2 N16 240 distinct paths
  • Raw measurement datasets
  • Probe packets
  • Throughput samples
  • Traceroute results
  • Note Experiments done with No-Internet2-for-comm
    ercial-use policy

29
RON deployment (19 sites)
To vu.nl lulea.se ucl.uk
To kaist.kr, .ve
.com (ca), .com (ca), dsl (or), cci (ut), aros
(ut), utah.edu, .com (tx) cmu (pa), dsl (nc), nyu
, cornell, cable (ma), cisco (ma), mit, vu.nl,
lulea.se, ucl.uk, kaist.kr, univ-in-venezuela
30
AS view
31
Major Results
  • RON reduced outages by a factor 5 to 10, and
    routed around all major outages
  • RON takes 18s (average) to route around a
    failure, and can do so in the face of flooding
    attacks
  • RON successfully routed around bad throughput
    failures, doubling TCP throughput in 5 of all
    samples
  • In 5 of the samples, RON reduced the loss
    probability by 0.05 or more
  • Single route indirection delivers the majority
    RON benefits

32
EvaluationOvercoming Path Outages
30-min average loss rate on Internet
RON loss rate never more than 30
13,000 samples
30-min average loss rate with RON
33
An order-of-magnitude fewer failures
30-minute average loss rates
6,825 path hours represented here 12 path
hours of essentially complete outage 76 path
hours of TCP outage RON routed around all of
these! One indirection hop provides almost all
the benefit!
34
EvaluationOverhead
  • 50 nodes allows recovery times between 12 and
    25 s
  • growth in total traffic the cost of
    fault tolerance

35
EvaluationHandling Packet Floods
Flood attack
36
EvaluationLoss Rate
37
EvaluationLatency
38
EvaluationLatency
39
EvaluationTCP Throughput
40
EvaluationWhy does one hop work
R RON nodes
RON
Pi
Ps
RON
source
target

RON
  • a single-intermediate RON path is optimal (for
    latency) given that the direct path is not
    optimal
  • either the direct path, or a single-hop
    intermediate path is the optimal path
  • if

41
Discussion
  • RONs relating to routing policy
  • Possibility of misuse policy
  • Prevent misuse need authentication and AC
  • For small RON, these can be solved at
    administrative level
  • Scalability
  • Active probes
  • Operation across NATs
  • Naming
  • Two host behind NATs
  • Application??

42
Conclusion
  • Improved availability of Internet communication
    paths using small overlays
  • Layered above scalable IP substrate
  • RON provides a set of libraries and programs to
    facilitate this application-specific routing
  • Experimental data suggest that approach works
  • Over 10X availability
  • Outage detection and recovery in about 15 seconds
  • Able to route around certain denial-of-service
    attacks

43
  • Thank you!
Write a Comment
User Comments (0)
About PowerShow.com