Title: Large Scale Sharing
1Large Scale Sharing
- Marco F. Duarte
- COMP 520 Distributed Systems
- September 19, 2004
2Introduction
- P2P sharing systems are very popular
- In P2P, all nodes have identical capabilities and
responsibilities - Popular approaches are partially centralized, do
not scale well, or do not provide desired
anonymity - Scalability of systems critical
- Need for decentralized, load-balancing
architectures
3Features desired in a P2P sharing system
- Decentralized architecture no single point of
failure - Scalability bandwidth and load balancing
- Fault tolerance content replication
- Anonymity for users posters, readers, storers
- Resilient against DoS attacks
4Freenet provides anonymity
- No requester, provider information implicit in
communication - Presence of a file in a node does not imply
authorship - Popular files are replicated to improve locality
- Does not intend to provide permanent storage
5Freenet Queries
- Files receive FileIDs (160-bit SHA-1 hash of
file identifier) - Queries have pseudo-unique random identifiers
(QueryIDs) and hops-to-live count. - Routing tables contain table of previously
retrieved FileIDs and their locations - Queries are routed to location with closest
FileID at each stage loops are detected with
QueryID
31302313?
6Freenet Queries Lookups and Stores
a
b
- Copies of the file are stored at all nodes
- File record for a is added to routing tables
- Writes perform lookup, insert file along path if
no match found
e
7Freenet Properties
- FileID-based clustering allows for improved
routing as usage increases - LRU-like capacity management rarely used files
are purged from the system - Random nature of FileIDs allow for diversity of
information at nodes - Attempts to supplant existing files will lead to
real file propagation - Anonymity features
- File ownership assumed randomly by other nodes
- Minimal routing information necessary at each hop
- Hops-to-live count of 1 updated randomly
8Freenet Problems
- Files that are stored in the network may not be
found. - Freenet does not provide reliable storage
- No notion of locality in routing
- Simulations do not involve file insertion or node
discovery
9PAST Reliable Distributed Storage
- Customizable file persistence
- High availability and load balancing
- Efficient Routing and Storage Allocation
- Uses FileIDs generated from hashes like in
Freenet - Uses owner credentials to verify identity of
authors - Interface Insert, Lookup, Reclaim
10PAST Architecture
- FileID computed from hash of filename, owners
public key and a random salt. - Each node receives a pseudorandom NodeID,
independent of the node properties. - Owner specifies number k of replicas of a file to
store in the system on insert. - File is stored in the k nodes with NodeIDs
closest to the FileID. - Routing provided by Pastry.
11Pastry Routing for P2P Networks
- Paths with less than hops
- Delivery guaranteed under at most node
failures - Flexible proximity metric.
- Each node contains
- Leaf set l nodes with closest NodeIDs
- Routing table set of neighbors organized by
NodeIDs - Neighborhood set l closest nodes
- Each NodeID is paired with its networkaddress
- Direct routes to neighbors and l closest NodeIDs
12Pastry Example
- Routing table organized by similarity to NodeID.
- Neighborhood set used for node addition/recovery.
- Queries are forwarded to a numerically closer
node (by shared NodeID header, and NodeID
proximity).
13Pastry Routing Table
02M
0231
3321
3133
0302
Neighborhood Set
1033
3013
Leaf Set
1123
2300
2121
1202
1311
2031
14Pastry Routing Example
02M
0231
3321
Other nodes exist but are not shown
3133
0302
1033
3013
3133?
1123
2300
2121
1202
1311
2031
15Pastry Node Insertion Example
02M
0231
3321
3130
3133
0302
1033
3013
Leaf Set
1123
2300
3130
2121
1202
Neighborhood Set
1311
2031
16Pastry Node Removal Example
02M
3321
3133
3013
17PAST Insertions
- fileID Insert(name, owner-credentials, k, file)
02M
0231
3321
3130 File, Certificate
0302
3133
1033
3013
3130 File, Certificate
1123
3130 File, Certificate
2300
Insert File, FileID 3130
Insert File K times
2121
1202
Owner
1311
2031
18PAST Insertions
- fileID Insert(name, owner-credentials, k, file)
02M
0231
3321
k Store Receipts
0302
3133
1033
3013
k Store Receipts
1123
k Store Receipts
2300
2121
1202
Owner
1311
2031
19PAST Semantics
- fileID lookup(fileID)
- Routed to NodeID FileID
- First of k closest nodes found returns file,
credentials - Reclaim(fileID, owner-credentials)
- Same semantics as Insert
- Owner issues Reclaim Certificate
- Storing nodes issue Reclaim Receipt
- Changes in leaf sets will trigger changes in
replica locations - A new node creates pointers to files it should
contain migration is gradual
20Load Balancing in PAST Replica Diversion
3201
Leaf Set
3130 Leaf Set
21Load Balancing in PAST File Diversion
3201
Leaf Set
3130 Leaf Set
Change ID by changing salt
Policies for acceptance of replicas and diverted
replicas, and selection of diverted replica
node. Maximum ratio of file size to free space
for insertion tpri, tdiv
22Caching in PAST
- Highly popular files might demand more replicas
than specified. - Files located far away only need to be fetched
once locally - Unused disk space is allocated as cache.
- Caching performance degrades gradually with
increased utilization - Cache insertion policy similar to diversion
policies.
23PAST Performance tpri comparison, tdiv 0.05
24PAST Performance tpri comparison, tdiv 0.05
25PAST PerformanceRatio of File Diversions
26PAST Performance Ratio of Replica Diversions
27PAST Performance Failed Insertions
28PAST Performance Cache Hits
29Conclusions
- Content based routing improves scalability of
distributed storage systems. - Need for user authentication in distributed
systems. - Caching is crucial for system performance.
- Diversion allows for graceful performance
degradation. - Need file mutability, file search or indexing
services