An Overlay Infrastructure for Decentralized Object Location and Routing - PowerPoint PPT Presentation

About This Presentation
Title:

An Overlay Infrastructure for Decentralized Object Location and Routing

Description:

Cooperative approach to large-scale applications ... landmark routing (Brocade) IPTPS 02. DOLR. PRR 97. multicast (Bayeux) NOSSDAV 02. file system ... – PowerPoint PPT presentation

Number of Views:80
Avg rating:3.0/5.0
Slides: 44
Provided by: benz3
Category:

less

Transcript and Presenter's Notes

Title: An Overlay Infrastructure for Decentralized Object Location and Routing


1
An Overlay Infrastructure for Decentralized
Object Location and Routing
  • Ben Y. Zhaoravenben_at_eecs.berkeley.edu
  • University of California at Berkeley
  • Computer Science Division

2
Peer-based Distributed Computing
  • Cooperative approach to large-scale applications
  • peer-based available resources scale w/ of
    participants
  • better than client/server limited resources
    scalability
  • Large-scale, cooperative applications are coming
  • content distribution networks (e.g. FastForward)
  • large-scale backup / storage utilities
  • leverage peers storage for higher resiliency /
    availability
  • cooperative web caching
  • application-level multicast
  • video on-demand, streaming movies

3
What Are the Technical Challenges?
  • File system replicate files for
    resiliency/performance
  • how do you find close by replicas?
  • how does this scale to millions of users?
    billions of files?

4
Node Membership Changes
  • Nodes join and leave the overlay, or fail
  • data or control state needs to know about
    available resources
  • node membership management a necessity

5
A Fickle Internet
  • Internet disconnections are not rare
    (UMichTR98,IMC02)
  • TCP retransmission is not enough, need
    route-around
  • IP route repair takes too long IS-IS ? 5s, BGP
    ? 3-15mins
  • good end-to-end performance requires fast
    response to faults

6
An Infrastructure Approach
  • First generation of large-scale apps vertical
    approach
  • Hard problems, difficult to get right
  • instead, solve common challenges once
  • build single overlay infrastructure at
    application layer

overlay
application
presentation
session
transport
network
link
physical
Internet
7
Personal Research Roadmap
TSpaces
DOLR
SPAA 02 / TOCS
IPTPS 03
ICNP 03
modeling of non-stationary datasets
JSAC 04
8
Talk Outline
  • Motivation
  • Decentralized object location and routing
  • Resilient routing
  • Tapestry deployment performance
  • Wrap-up

9
What should this infrastructure look like?
here is one appealing direction
10
Structured Peer-to-Peer Overlays
  • Node IDs and keys from randomized namespace
    (SHA-1)
  • incremental routing towards destination ID
  • each node has small set of outgoing routes, e.g.
    prefix routing
  • log (n) neighbors per node, log (n) hops
    between any node pair

ID ABCE
ABC0
To ABCD
AB5F
A930
11
Related Work
  • Unstructured Peer to Peer Approaches
  • Napster, Gnutella, KaZaa
  • probabilistic search (optimized for the hay, not
    the needle)
  • locality-agnostic routing (resulting in high
    network b/w costs)
  • Structured Peer to Peer Overlays
  • the first protocols (2001) Tapestry, Pastry,
    Chord, CAN
  • then Kademlia, SkipNet, Viceroy, Symphony,
    Koorde, Ulysseus
  • distinction how to choose your neighbors
  • Tapestry, Pastry latency-optimized routing mesh
  • distinction application interface
  • distributed hash table put (key, data) data
    get (key)
  • Tapestry decentralized object location and
    routing

12
Defining the Requirements
  • efficient routing to nodes and data
  • low routing stretch (ratio of latency to
    shortest path distance)
  • flexible data location
  • applications want/need to control data placement
  • allows for application-specific performance
    optimizations
  • directory interface publish (ObjID),
    RouteToObj(ObjID, msg)
  • resilient and responsive to faults
  • more than just retransmission, route around
    failures
  • reduce negative impact (loss/jitter) on the
    application

13
Decentralized Object Location Routing
routeobj(k)
routeobj(k)
k
publish(k)
k
  • redirect data traffic using log(n) in-network
    redirection pointers
  • average of pointers/machine log(n) avg
    files/machine
  • keys to performance
  • proximity-enabled routing mesh with routing
    convergence

14
Why Proximity Routing?
01234
01234
  • Fewer/shorter IP hops shorter e2e latency, less
    bandwidth/congestion, less likely to cross
    broken/lossy links

15
Performance Impact (Proximity)
  • Simulated Tapestry w/ and w/o proximity on 5000
    node transit-stub network
  • Measure pair-wise routing stretch between 200
    random nodes

16
DOLR vs. Distributed Hash Table
  • DHT hash content ? name ? replica placement
  • modifications ? replicating new version into DHT
  • DOLR app places copy near requests, overlay
    routes msgs to it

17
Performance Impact (DOLR)
  • simulated Tapestry w/ DOLR and DHT interfaces on
    5000 node T-S
  • measure route to object latency from clients in 2
    stub networks
  • DHT 5 object replicas DOLR 1 replica
    placed in each stub network

18
Talk Outline
  • Motivation
  • Decentralized object location and routing
  • Resilient and responsive routing
  • Tapestry deployment performance
  • Wrap-up

19
How do you get fast responses to faults?
Response time fault-detection alternate path
discovery time to
switch
20
Fast Response via Static Resiliency
  • Reducing fault-detection time
  • monitor paths to neighbors with periodic UDP
    probes
  • O(log(n)) neighbors higher frequency w/ low
    bandwidth
  • exponentially weighted moving average for link
    quality estimation
  • avoid route flapping due to short term loss
    artifacts
  • loss rate Ln (1 - ?) ? Ln-1 ? ? ?p
  • Eliminate synchronous backup path discovery
  • actively maintain redundant paths, redirect
    traffic immediately
  • repair redundancy asynchronously
  • create and store backups at node insertion
  • restore redundancy via random pair-wise queries
    after failures
  • End result
  • fast detection precomputed paths increased
    responsiveness

21
Routing Policies
  • Use estimated overlay link quality to choose
    shortest usable link
  • Use shortest overlay link withminimal quality gt
    T
  • Alternative policies
  • prioritize low loss over latency
  • use least lossy overlay link
  • use path w/ minimal cost functioncf x??
    latency y?? loss rate

22
Talk Outline
  • Motivation
  • Decentralized object location and routing
  • Resilient and responsive routing
  • Tapestry deployment performance
  • Wrap-up

23
Tapestry, a DOLR Protocol
  • Routing based on incremental prefix matching
  • Latency-optimized routing mesh
  • nearest neighbor algorithm (HKRZ02)
  • supports massive failures and large group joins
  • Built-in redundant overlay links
  • 2 backup links maintained w/ each primary
  • Use objects as endpoints for rendezvous
  • nodes publish names to announce their presence
  • e.g. wireless proxy publishes nearby laptops ID
  • e.g. multicast listeners publish multicast
    session name to self organize

24
Weaving a Tapestry
  • inserting node (0123) into network
  • route to own ID, find 012X nodes, fill last
    column
  • request backpointers to 01XX nodes
  • measure distance, add to rTable
  • prune to nearest K nodes
  • repeat 24

Existing Tapestry
25
Implementation Performance
  • Java implementation
  • 35000 lines in core Tapestry, 1500 downloads
  • Micro-benchmarks
  • per msg overhead 50?s, most latency from byte
    copying
  • performance scales w/ CPU speedup
  • 5KB msgs on P-IV 2.4Ghz throughput 10,000
    msgs/sec
  • Routing stretch
  • route to node lt 2
  • route to objects/endpoints lt 3higher stretch
    for close by objects

26
Responsiveness to Faults (PlanetLab)
  • 0.2
  • 0.4
  • B/W ? network size N, N300 ? 7KB/s/node, N106 ?
    20KB/s
  • sim if link failure lt 10, can route around 90
    of survivable failures

27
Stability Under Membership Changes
  • Routing operations on 40 node Tapestry cluster
  • Churn nodes join/leave every 10 seconds, average
    lifetime 2mins

28
Talk Outline
  • Motivation
  • Decentralized object location and routing
  • Resilient and responsive routing
  • Tapestry deployment performance
  • Wrap-up

29
Lessons and Takeaways
  • Consider system constraints in algorithm design
  • limited by finite resources (e.g. file
    descriptors, bandwidth)
  • simplicity wins over small performance gains
  • easier adoption and faster time to implementation
  • Wide-area state management (e.g. routing state)
  • reactive algorithm for best-effort, fast response
  • proactive periodic maintenance for correctness
  • Naïve event programming model is too low-level
  • much code complexity from managing stack state
  • important for protocols with asychronous control
    algorithms
  • need explicit thread support for callbacks /
    stack management

30
Future Directions
  • Ongoing work to explore p2p application space
  • resilient anonymous routing, attack resiliency
  • Intelligent overlay construction
  • router-level listeners allow application queries
  • efficient meshes, fault-independent backup links,
    failure notify
  • Deploying and measuring a lightweight peer-based
    application
  • focus on usability and low overhead
  • p2p incentives, security, deployment meet the
    real world
  • A holistic approach to overlay security and
    control
  • p2p good for self-organization, not for security/
    management
  • decouple administration from normal operation
  • explicit domains / hierarchy for configuration,
    analysis, control

31
Thanks!
Questions, comments? ravenben_at_eecs.berkeley.edu
32
Impact of Correlated Events



event handler
  • correlated requests ABC?D
  • e.g. online continuous queries, sensor
    aggregation, p2p control layer, streaming data
    mining
  • web / application servers
  • independent requests
  • maximize individual throughput

33
Some Details
  • Simple fault detection techniques
  • periodically probe overlay links to neighbors
  • exponentially weighted moving average for link
    quality estimation
  • avoid route flapping due to short term loss
    artifacts
  • loss rate Ln (1 - ?) ? Ln-1 ? ? ?p
  • p instantaneous loss rate, ? filter constant
  • other techniques topics of open research
  • How do we get and repair the backup links?
  • each hop has flexible routing constraint
  • e.g. in prefix routing, 1st hop just requires 1
    fixed digit
  • backups always available until last hop to
    destination
  • create and store backups at node insertion
  • restore redundancy via random pair-wise queries
    after failures
  • e.g. to replace 123X neighbor, talk to local 12XX
    neighbors

34
Route Redundancy (Simulator)
  • Simulation of Tapestry, 2 backup paths per
    routing entry
  • 2 backups low maintenance overhead, good
    resiliency

35
Another Perspective on Reachability
Portion of all pair-wise paths where no
failure-free paths remain
A path exists, but neither IP nor FRLS can locate
the path
Portion of all paths where IP and FRLS both route
successfully
FRLS finds path, where short-term IP routing fails
36
Single Node Software Architecture
37
Related Work
  • Unstructured Peer to Peer Applications
  • Napster, Gnutella, KaZaa
  • probabilistic search, difficult to scale,
    inefficient b/w
  • Structured Peer to Peer Overlays
  • Chord, CAN, Pastry, Kademlia, SkipNet, Viceroy,
    Symphony, Koorde, Coral, Ulysseus,
  • routing efficiency
  • application interface
  • Resilient routing
  • traffic redirection layers
  • Detour, Resilient Overlay Networks (RON),
    Internet Indirection Infrastructure (I3)
  • our goals scalability, in-network traffic
    redirection

38
Node to Node Routing (PlanetLab)
Median31.5, 90th percentile135
  • Ratio of end-to-end latency to ping distance
    between nodes
  • All node pairs measured, placed into buckets

39
Object Location (PlanetLab)
90th percentile158
  • Ratio of end-to-end latency to client-object ping
    distance
  • Local-area stretch improved w/ additional
    location state

40
Micro-benchmark Results (LAN)
  • Per msg overhead 50?s, latency dominated by
    byte copying
  • Performance scales with CPU speedup
  • For 5K messages, throughput 10,000 msgs/sec

41
Traffic Tunneling
Legacy Node B
Legacy Node A
B
P(B)
A, B are IP addresses
Proxy
Proxy
Structured Peer to Peer Overlay
  • Store mapping from end host IP to its proxys
    overlay ID
  • Similar to approach in Internet Indirection
    Infrastructure (I3)

42
Constrained Multicast
  • Used only when all paths are below quality
    threshold
  • Send duplicate messages on multiple paths
  • Leverage route convergence
  • Assign unique message IDs
  • Mark duplicates
  • Keep moving window of IDs
  • Recognize and drop duplicates
  • Limitations
  • Assumes loss not from congestion
  • Ideal for local area routing

2225
2299
2274
2286
2046
2281
2530
?
?
?
1111
43
Link Probing Bandwidth (PL)
  • Bandwidth increases logarithmically with overlay
    size
  • Medium sized routing overlays incur low probing
    bandwidth

44
Control Plane vs. Data Plane
  • impact varies with application domain
  • control plane
  • use overlay as a lookup service
  • minimize performance impact
  • requires more end-host intervention
  • example Internet Indirection Infrastructure
  • do extra work to locate nearby server, amortize
    cost over time
  • data plane
  • leverage overlay for data traffic
  • efficient overlay routing is critical
  • build additional logic into overlay hops
  • examples routing for resilience, anonymity
  • efficiency always desirable, question is who
    provides it
Write a Comment
User Comments (0)
About PowerShow.com