PAST: A largescale persistent peertopeer storage utility - PowerPoint PPT Presentation

1 / 37

About This Presentation

Title:

PAST: A largescale persistent peertopeer storage utility

Description:

Napster: A peer-to-peer file sharing application ... Napster. decentralized storage of actual content ... like a decentralized Napster. distributed index and search ... – PowerPoint PPT presentation

Number of Views:127

Avg rating:3.0/5.0

Slides: 38

Provided by: debo129

Category:

more less

Transcript and Presenter's Notes

Title: PAST: A largescale persistent peertopeer storage utility

1
PAST A large-scale persistent peer-to-peer
storage utility

LECS Reading Group
10/23/2001

2
P2P in the Internet

Napster A peer-to-peer file sharing application
allow Internet users to exchange files directly
simple idea hugely successful
fastest growing Web application
50 Million users in January 2001
shut down in February 2001
similar systems/startups followed in rapid
succession
Gnutella, Scour, Freenet, Groove, Flycode, vTrails

3
Peer-to-peer computing

Peer-to-peer systems
distributed, nodes have identical capabilities
and responsibilities, communication is
asymmetric
Technical Potential
can harness huge amounts of resources.
user PCs disk space, upstream bandwidth, CPU
cycles
without requiring expensive hardware,
bandwidth, rack space
completely distributed
robust, less vulnerable to DoS attacks, harder to
censor

Technical Challenges are decentralized
control, self-organization, adaptation and
scalability!
4
Napster
128.1.2.3
(xyz.mp3, 128.1.2.3)
Central Napster server
5
Napster
128.1.2.3
xyz.mp3 ?
128.1.2.3
Central Napster server
6
Napster
128.1.2.3
xyz.mp3 ?
Central Napster server
7
Gnutella
8
Gnutella
xyz.mp3 ?
9
Gnutella
10
Gnutella
xyz.mp3
11
Peer-to-peer File Sharing

Napster
decentralized storage of actual content
transfer content directly from one peer (client)
to another
centralized index and search
simple, but O(N) state and single point of
failure
Gnutella
like a decentralized Napster
distributed index and search
Robust, but worst case O(N) messages per lookup

Next generation systems build on distributed
indexing, lookup services
12
Large-scale Storage Management Systems

Distributed storage infrastructure
PAST (Rice and Microsoft Research, routing
substrate - Pastry)
OceanStore (U.C.Berkeley, routing substrate -
Tapestry)
Publius (ATT)
Farsite (Microsoft Research)
CFS (MIT, routing substrate - Chord)
GRCD(UC Berkeley, builds on CAN)
Goals
Continuous access to persistent information
Utility infrastructure that manages customer
content
Resilience to DoS attacks, censorship, other node
failures.

13
PAST

Internet-based, peer-to-peer global storage
utility
Goals strong persistence, high availability,
scalability and security
Overview
PAST API for Clients
Pastry
Peer-to-peer routing substrate
Storage management
store multiple replicas of files
Cache management
cache additional copies of popular files

14
PAST API for Clients

fileId Insert(name, owner-credentials, k, file)
stores file at k distinct nodes in the PAST
network
fileId SHA-1(name, owner-credentials, random
number)
file Lookup(fileId)
reliably retrieve a copy of the file, normally
from a nearby node
Reclaim(fileId, owner-credentials)
reclaims the storage occupied by the k copies of
the file identified by fileId.

Archival storage and content distribution not a
general purpose FS No searching, directory
lookup, or key distribution operations
15
PAST IDs

File Identifier - 160 bits 128 msb forms the
KeyID
Node Identifier -128 bits
Both are uniformly distributed
Both lie in the same namespace
How to map Key IDs to Node IDs?
Use Pastry

16
Pastry Peer-to-peer routing substrate

Provide generic, scalable indexing, data location
and routing for peer-to-peer applications
Inspiration from Plaxtons algorithm (used in web
content distribution eg. Akamai) and Landmark
hierarchy routing
Goals
Efficiency
Scalability
Fault Resilience
Self-organization (completely decentralized)

17
Pastry Basic Idea
18
Pastry Basic Idea
insert(K1,V1)
19
Pastry Basic Idea
insert(K1,V1)
20
Pastry Basic Idea
(K1,V1)
21
Pastry Basic Idea
retrieve (K1)
22
PAST/Pastry Node ID space
128 bits ( max. 2128 nodes)
Node id
0
1
L1

b bits
L levels b 128/L bits per level NodeId
sequence of L, base 2b (b-bit) digits
21280
2128 - 1
1
1
Circular Namespace
23
State of a Pastry Node
Node 1 2 3 has routing table

Entries consist of nodeId,
IP address of node
Routing Table
ceil(log2b N ) levels each level corresponds to
a row
2b 1 entries per level i.e., columns per row
each entry per level n corresponds to a node
whose nodeId
matches in the first n digits, differs in digit
(n1)

Xi Yi
1 Yi
1 2
is every number in 0,.., 2b-1 1
is every number in 0,.., 2b-1 2
is every number in 0,.., 2b-1 3

Xi, Yi are any numbers in 0,.., 2b-1
24
State of a Pastry Node

Leaf Set
l nearby nodes based on proximity in nodeId
space
Neighborhood Set
l nearby nodes based on network proximity metric
not used for routing
used during node addition/recovery

16-bit nodeId space l 8, b 2
Leaf Set Entries
10233102
25
Routing Requests in Pastry

Route (my-id, key-id, message)
if (key-id in range of my leaf-set)
forward to the numerically closest node in
leaf set
else
forward to a node node-id in the routing table
s. th. node-id shares a longer prefix with
key-id than my-id
else
forward to a node node-id that shares the same
length prefix with key-id as my-id but is
numerically closer

Routing takes O(log N) messages
26
Node Addition
A 10

X joining node
A node nearby X (network proximity)
Z node numerically closest to X2
Routing Table of X
leaf-set(X) leaf-set(Z)
neighborhood-set(X)
neighborhood-set(A)
routing table X, row i routing
table Ni, row i, where Ni is the ith node
encountered along the route from A to Z
X notifies all-nodes in leaf-set(X) which update
their state.

N1
N36
Lookup(216)
N2
240
Z 210
27
Node Failures, Recovery

Rely on a soft-state protocol to deal with node
failures
Neighboring nodes in the nodeId space
periodically
exchange keepalive msgs
unresponsive nodes for a period T removed from
leaf-sets
recovering nodes contacts last known leaf set,
updates its own leaf set, notifies members of its
presence.
Randomized routing to deal with malicious nodes
that can cause repeated query failures

PASTRY details buried in Middleware 2001 paper
28
PAST Storage Management

Goals
High global storage utilization
Graceful degradation near maximal utilization
Design Goals
Local coordination among nodes.
Fully integrate storage management w/ file
insertion.
Modest performance overheads.
Challenges
Balancing unused storage among nodes vs.
requirement to maintain copies of each file at k
nodes with nodeIds closest to fileId

29
Storage Load Imbalance

Causes
storage capacity differences among individual
PAST nodes
high variance in file size distribution
statistical variation in fileID and nodeID
assignments
Impact
not all of the k-closest nodes can accommodate a
file replica
3 solutions to deal with imbalances

30
1 Per-node storage control

No more than 2 orders of magnitude difference in
storage capacity of individual nodes assumed
Advertised capacity controls admission of new
nodes (compared to average capacity)
too large split into multiple nodeIds
too small reject

31
2 Replica Diversion

Necessary when a node A among the k closest (to
the fileId) cannot accommodate the file copy
locally
GOAL balance the unused storage space among the
nodes in a leaf set
Node A diverts copy to node B in its leaf set if
B is not among k-closest
B does not already have a diverted replica
Replica diversion controlled by 3 policies to
avoid performance penalty of unnecessary replica
diversion.

32
3 File Diversion

Necessary when file insert fails even with
replica diversion
GOAL Balance the unused storage space among
different portions of the nodeId space in PAST
client generates new fileId for the file and
retries up to 3 times
application notified after 4 successive file
insert failures
can retry with smaller file size or k (
replicas) value.

33
PAST Cache Management at Nodes

Why cache file copies?
k replicas may not be enough for very popular
files
beneficial if there exists spatial locality among
clients of a particular file
Goals
minimize client access latencies
fetch distance in terms of Pastry Routing hops
maximize query throughput
balance query load in the system

34
Caching Policies

Insertion Insert a file routed through a node
as part of an Insert or Lookup operation if
file size size
Replacement GreedyDual-Size (GD-S) Policy
assign weight Hd to each file d, weight
inversely proportional to file size d
evict file v with minimum weight Hv
subtract Hv from the weights of all remaining
cached files (enforces aging)

35
Evaluation

PAST implemented in JAVA
Network Emulation using JavaVM
2 workloads (based on NLANR traces) for
file sizes
4 normal distributions of node storage sizes

36
Key Results

STORAGE
Replica and file diversion improved global
storage utilization from 60.8 to 98 compared to
without
insertion failures drop to
Caveat Storage capacities used in experiment,
1000x times below what might be expected in
practice.
CACHING
Routing Hops with caching lower than without
caching even with 99 storage utilization
Caveat median file sizes very low, likely
caching performance will degrade if this is
higher.

37
Questions

Is PASTRY really self-organizing?
IP multicast based expanding ring search etc.
not viable.
Get Nearest network node externally for Node
Joins/Additions how will you do this in
practice?
Is strong persistence an overkill?
Makes the system needlessly complicated
(especially w.r.t replica maintenance and
diversion policies)
k the number of replicas anyway.
How do caches purge copies of Reclaimed files?
How to deal with arbitrary large files?
Isnt CFS block based storage scheme much better
in this case?