Large Scale Sharing - PowerPoint PPT Presentation

About This Presentation
Title:

Large Scale Sharing

Description:

Large Scale Sharing. Marco F. Duarte. COMP 520: Distributed Systems ... P2P sharing systems are very popular ... Features desired in a P2P sharing system ... – PowerPoint PPT presentation

Number of Views:41
Avg rating:3.0/5.0
Slides: 30
Provided by: marcof
Learn more at: https://www.cs.rice.edu
Category:
Tags: scale | sharing

less

Transcript and Presenter's Notes

Title: Large Scale Sharing


1
Large Scale Sharing
  • Marco F. Duarte
  • COMP 520 Distributed Systems
  • September 19, 2004

2
Introduction
  • P2P sharing systems are very popular
  • In P2P, all nodes have identical capabilities and
    responsibilities
  • Popular approaches are partially centralized, do
    not scale well, or do not provide desired
    anonymity
  • Scalability of systems critical
  • Need for decentralized, load-balancing
    architectures

3
Features desired in a P2P sharing system
  • Decentralized architecture no single point of
    failure
  • Scalability bandwidth and load balancing
  • Fault tolerance content replication
  • Anonymity for users posters, readers, storers
  • Resilient against DoS attacks

4
Freenet provides anonymity
  • No requester, provider information implicit in
    communication
  • Presence of a file in a node does not imply
    authorship
  • Popular files are replicated to improve locality
  • Does not intend to provide permanent storage

5
Freenet Queries
  • Files receive FileIDs (160-bit SHA-1 hash of
    file identifier)
  • Queries have pseudo-unique random identifiers
    (QueryIDs) and hops-to-live count.
  • Routing tables contain table of previously
    retrieved FileIDs and their locations
  • Queries are routed to location with closest
    FileID at each stage loops are detected with
    QueryID

31302313?
6
Freenet Queries Lookups and Stores
a
b
  • Copies of the file are stored at all nodes
  • File record for a is added to routing tables
  • Writes perform lookup, insert file along path if
    no match found

e
7
Freenet Properties
  • FileID-based clustering allows for improved
    routing as usage increases
  • LRU-like capacity management rarely used files
    are purged from the system
  • Random nature of FileIDs allow for diversity of
    information at nodes
  • Attempts to supplant existing files will lead to
    real file propagation
  • Anonymity features
  • File ownership assumed randomly by other nodes
  • Minimal routing information necessary at each hop
  • Hops-to-live count of 1 updated randomly

8
Freenet Problems
  • Files that are stored in the network may not be
    found.
  • Freenet does not provide reliable storage
  • No notion of locality in routing
  • Simulations do not involve file insertion or node
    discovery

9
PAST Reliable Distributed Storage
  • Customizable file persistence
  • High availability and load balancing
  • Efficient Routing and Storage Allocation
  • Uses FileIDs generated from hashes like in
    Freenet
  • Uses owner credentials to verify identity of
    authors
  • Interface Insert, Lookup, Reclaim

10
PAST Architecture
  • FileID computed from hash of filename, owners
    public key and a random salt.
  • Each node receives a pseudorandom NodeID,
    independent of the node properties.
  • Owner specifies number k of replicas of a file to
    store in the system on insert.
  • File is stored in the k nodes with NodeIDs
    closest to the FileID.
  • Routing provided by Pastry.

11
Pastry Routing for P2P Networks
  • Paths with less than hops
  • Delivery guaranteed under at most node
    failures
  • Flexible proximity metric.
  • Each node contains
  • Leaf set l nodes with closest NodeIDs
  • Routing table set of neighbors organized by
    NodeIDs
  • Neighborhood set l closest nodes
  • Each NodeID is paired with its networkaddress
  • Direct routes to neighbors and l closest NodeIDs

12
Pastry Example
  • Routing table organized by similarity to NodeID.
  • Neighborhood set used for node addition/recovery.
  • Queries are forwarded to a numerically closer
    node (by shared NodeID header, and NodeID
    proximity).

13
Pastry Routing Table
02M
0231
3321
3133
0302
Neighborhood Set
1033
3013
Leaf Set
1123
2300
2121
1202
1311
2031
14
Pastry Routing Example
02M
0231
3321
Other nodes exist but are not shown
3133
0302
1033
3013
3133?
1123
2300
2121
1202
1311
2031
15
Pastry Node Insertion Example
02M
0231
3321
3130
3133
0302
1033
3013
Leaf Set
1123
2300
3130
2121
1202
Neighborhood Set
1311
2031
16
Pastry Node Removal Example
02M
3321
3133
3013
17
PAST Insertions
  • fileID Insert(name, owner-credentials, k, file)

02M
0231
3321
3130 File, Certificate
0302
3133
1033
3013
3130 File, Certificate
1123
3130 File, Certificate
2300
Insert File, FileID 3130
Insert File K times
2121
1202
Owner
1311
2031
18
PAST Insertions
  • fileID Insert(name, owner-credentials, k, file)

02M
0231
3321
k Store Receipts
0302
3133
1033
3013
k Store Receipts
1123
k Store Receipts
2300
2121
1202
Owner
1311
2031
19
PAST Semantics
  • fileID lookup(fileID)
  • Routed to NodeID FileID
  • First of k closest nodes found returns file,
    credentials
  • Reclaim(fileID, owner-credentials)
  • Same semantics as Insert
  • Owner issues Reclaim Certificate
  • Storing nodes issue Reclaim Receipt
  • Changes in leaf sets will trigger changes in
    replica locations
  • A new node creates pointers to files it should
    contain migration is gradual

20
Load Balancing in PAST Replica Diversion
3201
Leaf Set
3130 Leaf Set
21
Load Balancing in PAST File Diversion
3201
Leaf Set
3130 Leaf Set
Change ID by changing salt
Policies for acceptance of replicas and diverted
replicas, and selection of diverted replica
node. Maximum ratio of file size to free space
for insertion tpri, tdiv
22
Caching in PAST
  • Highly popular files might demand more replicas
    than specified.
  • Files located far away only need to be fetched
    once locally
  • Unused disk space is allocated as cache.
  • Caching performance degrades gradually with
    increased utilization
  • Cache insertion policy similar to diversion
    policies.

23
PAST Performance tpri comparison, tdiv 0.05
24
PAST Performance tpri comparison, tdiv 0.05
25
PAST PerformanceRatio of File Diversions
26
PAST Performance Ratio of Replica Diversions
27
PAST Performance Failed Insertions
28
PAST Performance Cache Hits
29
Conclusions
  • Content based routing improves scalability of
    distributed storage systems.
  • Need for user authentication in distributed
    systems.
  • Caching is crucial for system performance.
  • Diversion allows for graceful performance
    degradation.
  • Need file mutability, file search or indexing
    services
Write a Comment
User Comments (0)
About PowerShow.com