Large Scale Sharing

About This Presentation

Title:

Large Scale Sharing

Description:

Large Scale Sharing. Marco F. Duarte. COMP 520: Distributed Systems ... P2P sharing systems are very popular ... Features desired in a P2P sharing system ... – PowerPoint PPT presentation

Number of Views:41

Avg rating:3.0/5.0

Slides: 30

Provided by: marcof

Learn more at: https://www.cs.rice.edu

Category:

more less

Transcript and Presenter's Notes

Title: Large Scale Sharing

1
Large Scale Sharing

Marco F. Duarte
COMP 520 Distributed Systems
September 19, 2004

2
Introduction

P2P sharing systems are very popular
In P2P, all nodes have identical capabilities and
responsibilities
Popular approaches are partially centralized, do
not scale well, or do not provide desired
anonymity
Scalability of systems critical
Need for decentralized, load-balancing
architectures

3
Features desired in a P2P sharing system

Decentralized architecture no single point of
failure
Scalability bandwidth and load balancing
Fault tolerance content replication
Anonymity for users posters, readers, storers
Resilient against DoS attacks

4
Freenet provides anonymity

No requester, provider information implicit in
communication
Presence of a file in a node does not imply
authorship
Popular files are replicated to improve locality
Does not intend to provide permanent storage

5
Freenet Queries

Files receive FileIDs (160-bit SHA-1 hash of
file identifier)
Queries have pseudo-unique random identifiers
(QueryIDs) and hops-to-live count.
Routing tables contain table of previously
retrieved FileIDs and their locations
Queries are routed to location with closest
FileID at each stage loops are detected with
QueryID

31302313?
6
Freenet Queries Lookups and Stores
a
b

Copies of the file are stored at all nodes
File record for a is added to routing tables
Writes perform lookup, insert file along path if
no match found

e
7
Freenet Properties

FileID-based clustering allows for improved
routing as usage increases
LRU-like capacity management rarely used files
are purged from the system
Random nature of FileIDs allow for diversity of
information at nodes
Attempts to supplant existing files will lead to
real file propagation
Anonymity features
File ownership assumed randomly by other nodes
Minimal routing information necessary at each hop
Hops-to-live count of 1 updated randomly

8
Freenet Problems

Files that are stored in the network may not be
found.
Freenet does not provide reliable storage
No notion of locality in routing
Simulations do not involve file insertion or node
discovery

9
PAST Reliable Distributed Storage

Customizable file persistence
High availability and load balancing
Efficient Routing and Storage Allocation
Uses FileIDs generated from hashes like in
Freenet
Uses owner credentials to verify identity of
authors
Interface Insert, Lookup, Reclaim

10
PAST Architecture

FileID computed from hash of filename, owners
public key and a random salt.
Each node receives a pseudorandom NodeID,
independent of the node properties.
Owner specifies number k of replicas of a file to
store in the system on insert.
File is stored in the k nodes with NodeIDs
closest to the FileID.
Routing provided by Pastry.

11
Pastry Routing for P2P Networks

Paths with less than hops
Delivery guaranteed under at most node
failures
Flexible proximity metric.
Each node contains
Leaf set l nodes with closest NodeIDs
Routing table set of neighbors organized by
NodeIDs
Neighborhood set l closest nodes
Each NodeID is paired with its networkaddress
Direct routes to neighbors and l closest NodeIDs

12
Pastry Example

Routing table organized by similarity to NodeID.
Neighborhood set used for node addition/recovery.
Queries are forwarded to a numerically closer
node (by shared NodeID header, and NodeID
proximity).

13
Pastry Routing Table
02M
0231
3321
3133
0302
Neighborhood Set
1033
3013
Leaf Set
1123
2300
2121
1202
1311
2031
14
Pastry Routing Example
02M
0231
3321
Other nodes exist but are not shown
3133
0302
1033
3013
3133?
1123
2300
2121
1202
1311
2031
15
Pastry Node Insertion Example
02M
0231
3321
3130
3133
0302
1033
3013
Leaf Set
1123
2300
3130
2121
1202
Neighborhood Set
1311
2031
16
Pastry Node Removal Example
02M
3321
3133
3013
17
PAST Insertions

fileID Insert(name, owner-credentials, k, file)

02M
0231
3321
3130 File, Certificate
0302
3133
1033
3013
3130 File, Certificate
1123
3130 File, Certificate
2300
Insert File, FileID 3130
Insert File K times
2121
1202
Owner
1311
2031
18
PAST Insertions

fileID Insert(name, owner-credentials, k, file)

02M
0231
3321
k Store Receipts
0302
3133
1033
3013
k Store Receipts
1123
k Store Receipts
2300
2121
1202
Owner
1311
2031
19
PAST Semantics

fileID lookup(fileID)
Routed to NodeID FileID
First of k closest nodes found returns file,
credentials
Reclaim(fileID, owner-credentials)
Same semantics as Insert
Owner issues Reclaim Certificate
Storing nodes issue Reclaim Receipt
Changes in leaf sets will trigger changes in
replica locations
A new node creates pointers to files it should
contain migration is gradual

20
Load Balancing in PAST Replica Diversion
3201
Leaf Set
3130 Leaf Set
21
Load Balancing in PAST File Diversion
3201
Leaf Set
3130 Leaf Set
Change ID by changing salt
Policies for acceptance of replicas and diverted
replicas, and selection of diverted replica
node. Maximum ratio of file size to free space
for insertion tpri, tdiv
22
Caching in PAST

Highly popular files might demand more replicas
than specified.
Files located far away only need to be fetched
once locally
Unused disk space is allocated as cache.
Caching performance degrades gradually with
increased utilization
Cache insertion policy similar to diversion
policies.

23
PAST Performance tpri comparison, tdiv 0.05
24
PAST Performance tpri comparison, tdiv 0.05
25
PAST PerformanceRatio of File Diversions
26
PAST Performance Ratio of Replica Diversions
27
PAST Performance Failed Insertions
28
PAST Performance Cache Hits
29
Conclusions