Pastry Scalable, decentralized object location and routing for large-scale peer-to-peer systems - PowerPoint PPT Presentation

1 / 68
About This Presentation
Title:

Pastry Scalable, decentralized object location and routing for large-scale peer-to-peer systems

Description:

Each node maintains IP addresses of the nodes with the L/2 numerically closest ... Lookup - retrieve file from a nearby live storage node that holds a copy ... – PowerPoint PPT presentation

Number of Views:812
Avg rating:3.0/5.0
Slides: 69
Provided by: anKai
Category:

less

Transcript and Presenter's Notes

Title: Pastry Scalable, decentralized object location and routing for large-scale peer-to-peer systems


1
PastryScalable, decentralized object location
and routing for large-scale peer-to-peer systems
  • Peter Druschel, Rice University
  • Antony Rowstron,
  • Microsoft Research Cambridge, UK
  • Modified and Presented by
  • JinYoung You

2
Outline
  • Background
  • Pastry
  • Pastry proximity routing
  • Related Work
  • Conclusions

3
Background
  • Peer-to-peer systems
  • distribution
  • decentralized control
  • self-organization
  • symmetry (communication, node roles)

4
Common issues
  • Organize, maintain overlay network
  • node arrivals
  • node failures
  • Resource allocation/load balancing
  • Resource location
  • Network proximity routing
  • IDea provide a generic p2p substrate

5
Architecture
Event notification
Network storage
?
p2p application layer
p2p substrate (self-organizing overlay network)
Pastry
TCP/IP
Internet
6
Structured p2p overlays
  • One primitive
  • route(M, X) route message M to the live node
    with nodeID closest to key X
  • nodeIDs and keys are from a large, sparse id
    space

7
Distributed Hash Tables (DHT)
nodes
k1,v1
k2,v2
k3,v3
P2P overlay network
Operations insert(k,v) lookup(k)
k4,v4
k5,v5
k6,v6
  • p2p overlay maps keys to nodes
  • completely decentralized and self-organizing
  • robust, scalable

8
Why structured p2p overlays?
  • Leverage pooled resources (storage, bandwidth,
    CPU)
  • Leverage resource diversity (geographic,
    ownership)
  • Leverage existing shared infrastructure
  • Scalability
  • Robustness
  • Self-organization

9
Outline
  • Background
  • Pastry
  • Pastry proximity routing
  • Related Work
  • Conclusions

10
Pastry
  • Generic p2p location and routing substrate
  • Self-organizing overlay network
  • Lookup/insert object in lt log16 N routing steps
    (expected)
  • O(log N) per-node state
  • Network proximity routing

11
Pastry Object distribution
  • Consistent hashing Karger et al. 97
  • 128 bit circular id space
  • nodeIDs (uniform random)
  • objIDs (uniform random)
  • Invariant node with numerically closest nodeID
    maintains object

2128-1
O
objID
nodeIDs
12
Pastry Object insertion/lookup
2128-1
O
Msg with key X is routed to live node with nodeID
closest to X Problem complete routing table
not feasible
X
Route(X)
13
Pastry Routing
  • Tradeoff
  • O(log N) routing table size
  • O(log N) message forwarding steps

14
Pastry Routing table ( 65a1fcx)
Row 0
Row 1
Row 2
Row 3
log16 N rows
15
Pastry Routing
d471f1
d467c4
d462ba
d46a1c
d4213f
  • Properties
  • log16 N steps
  • O(log N) state

Route(d46a1c)
d13da3
65a1fc
16
Pastry Leaf sets
  • Each node maintains IP addresses of the nodes
    with the L/2 numerically closest larger and
    smaller nodeIDs, respectively.
  • routing efficiency/robustness
  • fault detection (keep-alive)
  • application-specific local coordination

17
Pastry Routing procedure
if (destination is within range of our leaf set)
forward to numerically closest member else let
l length of shared prefix let d value of
l-th digit in Ds address if (Rld exists)
forward to Rld else forward to a known
node that (a) shares at least as long a
prefix (b) is numerically closer than this node
18
Pastry Performance
  • Integrity of overlay/ message delivery
  • guaranteed unless L/2 simultaneous failures of
    nodes with adjacent nodeIDs
  • Number of routing hops
  • No failures lt log16 N expected, 128/b 1 max
  • During failure recovery
  • O(N) worst case, average case much better

19
Pastry Self-organization
  • Initializing and maintaining routing tables and
    leaf sets
  • Node addition
  • Node departure (failure)

20
Pastry Node addition
d471f1
d467c4
d462ba
d46a1c
d4213f
New node d46a1c
Route(d46a1c)
d13da3
65a1fc
21
Node departure (failure)
  • Leaf set members exchange keep-alive messages
  • Leaf set repair (eager) request set from
    farthest live node in set
  • Routing table repair (lazy) get table from peers
    in the same row, then higher rows

22
Pastry Experimental results
  • Prototype
  • implemented in Java
  • emulated network
  • deployed testbed (currently 25 sites worldwide)

23
Pastry Average of hops
L16, 100k random queries
24
Pastry of hops (100k nodes)
L16, 100k random queries
25
Pastry routing hops (failures)
L16, 100k random queries, 5k nodes, 500 failures
26
Outline
  • Background
  • Pastry
  • Pastry proximity routing
  • Related Work
  • Conclusions

27
Pastry Proximity routing
  • Assumption scalar proximity metric
  • e.g. ping delay, IP hops
  • a node can probe distance to any other node
  • Proximity invariant
  • Each routing table entry refers to a node
    close to the local node (in the proximity space),
    among all nodes with the appropriate nodeID
    prefix.

28
Pastry Routes in proximity space
29
Pastry Distance traveled
L16, 100k random queries, Euclidean proximity
space
30
Pastry Locality properties
  • Expected distance traveled by a message in the
    proximity space is within a small constant of the
    minimum
  • Routes of messages sent by nearby nodes with same
    keys converge at a node near the source nodes
  • 3) Among k nodes with nodeIDs closest to the
    key, message likely to reach the node closest to
    the source node first

31
Pastry Node addition
32
Pastry delay
GATech top., .5M hosts, 60K nodes, 20K random
messages
33
Pastry API
  • route(M, X) route message M to node with nodeID
    numerically closest to X
  • deliver(M) deliver message M to application
  • forwarding(M, X) message M is being forwarded
    towards key X
  • newLeaf(L) report change in leaf set L to
    application

34
Pastry Security
  • Secure nodeID assignment
  • Secure node join protocols
  • Randomized routing
  • Byzantine fault-tolerant leaf set membership
    protocol

35
Outline
  • Background
  • Pastry
  • Pastry proximity routing
  • Related Work
  • Conclusions

36
Pastry Related work
  • Chord Sigcomm01
  • CAN Sigcomm01
  • Tapestry TR UCB/CSD-01-1141
  • PAST SOSP01
  • SCRIBE NGC01

37
Outline
  • Background
  • Pastry
  • Pastry proximity routing
  • Related Work
  • Conclusions

38
Conclusions
  • Generic p2p overlay network
  • Scalable, fault resilient, self-organizing,
    secure
  • O(log N) routing steps (expected)
  • O(log N) routing table size
  • Network proximity routing
  • For more information
  • http//www.cs.rice.edu/CS/Systems/Pastry

39
Thanks
  • Any Questions?

40
(No Transcript)
41
(No Transcript)
42
Outline
  • PAST
  • SCRIBE

43
PAST Cooperative, archival file storage and
distribution
  • Layered on top of Pastry
  • Strong persistence
  • High availability
  • Scalability
  • Reduced cost (no backup)
  • Efficient use of pooled resources

44
PAST API
  • Insert - store replica of a file at k diverse
    storage nodes
  • Lookup - retrieve file from a nearby live storage
    node that holds a copy
  • Reclaim - free storage associated with a file
  • Files are immutable

45
PAST File storage
fileID
Insert fileID
46
PAST File storage
Storage Invariant File replicas are stored
on k nodes with nodeIDs closest to fileID (k
is bounded by the leaf set size)
47
PAST File Retrieval

C
k replicas
Lookup
file located in log16 N steps (expected) usually
locates replica nearest client C
fileID
48
PAST Exploiting Pastry
  • Random, uniformly distributed nodeIDs
  • replicas stored on diverse nodes
  • Uniformly distributed fileIDs
  • e.g. SHA-1(filename,public key, salt)
  • approximate load balance
  • Pastry routes to closest live nodeID
  • availability, fault-tolerance

49
PAST Storage management
  • Maintain storage invariant
  • Balance free space when global utilization is
    high
  • statistical variation in assignment of files to
    nodes (fileID/nodeID)
  • file size variations
  • node storage capacity variations
  • Local coordination only (leaf sets)

50
Experimental setup
  • Web proxy traces from NLANR
  • 18.7 Gbytes, 10.5K mean, 1.4K median, 0 min,
    138MB max
  • Filesystem
  • 166.6 Gbytes. 88K mean, 4.5K median, 0 min, 2.7
    GB max
  • 2250 PAST nodes (k 5)
  • truncated normal distributions of node storage
    sizes, mean 27/270 MB

51
Need for storage management
  • No diversion (tpri 1, tdiv 0)
  • max utilization 60.8
  • 51.1 inserts failed
  • Replica/file diversion (tpri .1, tdiv .05)
  • max utilization gt 98
  • lt 1 inserts failed

52
PAST File insertion failures
53
PAST Caching
  • Nodes cache files in the unused portion of their
    allocated disk space
  • Files caches on nodes along the route of lookup
    and insert messages
  • Goals
  • maximize query xput for popular documents
  • balance query load
  • improve client latency

54
PAST Caching
fileID
Lookup topicID
55
PAST Caching
56
PAST Security
  • No read access control users may encrypt content
    for privacy
  • File authenticity file certificates
  • System integrity nodeIDs, fileIDs non-forgeable,
    sensitive messages signed
  • Routing randomized

57
PAST Storage quotas
  • Balance storage supply and demand
  • user holds smartcard issued by brokers
  • hides user private key, usage quota
  • debits quota upon issuing file certificate
  • storage nodes hold smartcards
  • advertise supply quota
  • storage nodes subject to random audits within
    leaf sets

58
PAST Related Work
  • CFS SOSP01
  • OceanStore ASPLOS 2000
  • FarSite Sigmetrics 2000

59
Outline
  • PAST
  • SCRIBE

60
SCRIBE Large-scale, decentralized multicast
  • Infrastructure to support topic-based
    publish-subscribe applications
  • Scalable large numbers of topics, subscribers,
    wide range of subscribers/topic
  • Efficient low delay, low link stress, low node
    overhead

61
SCRIBE Large scale multicast
topicID
Publish topicID
Subscribe topicID
62
Scribe Results
  • Simulation results
  • Comparison with IP multicast delay, node stress
    and link stress
  • Experimental setup
  • Georgia Tech Transit-Stub model
  • 100,000 nodes randomly selected out of .5M
  • Zipf-like subscription distribution, 1500 topics

63
Scribe Topic popularity
gsize(r) floor(Nr -1.25 0.5) N100,000 1500
topics
64
Scribe Delay penalty
Relative delay penalty, average and maximum
65
Scribe Node stress
66
Scribe Link stress
One message published in each of the 1,500 topics
67
Scribe Summary
  • Self-configuring P2P framework for topic-based
    publish-subscribe
  • Scribe achieves reasonable performance when
    compared to IP multicast
  • Scales to a large number of subscribers
  • Scales to a large number of topics
  • Good distribution of load
  • For more information
  • http//www.cs.rice.edu/CS/Systems/Pastry

68
Status
  • Functional prototypes
  • Pastry Middleware 2001
  • PAST HotOS-VIII, SOSP01
  • SCRIBE NGC 2001, IEEE JSAC
  • SplitStream submitted
  • Squirrel PODC02
  • http//www.cs.rice.edu/CS/Systems/Pastry

69
Current Work
  • Security
  • secure routing/overlay maintenance/nodeID
    assignment
  • quota system
  • Keyword search capabilities
  • Support for mutable files in PAST
  • Anonymity/Anti-censorship
  • New applications
  • Free software releases
Write a Comment
User Comments (0)
About PowerShow.com