Title: Storage management and caching in PAST, a largescale, persistent peertopeer storage utility
1Storage management and caching in PAST, a
large-scale, persistent peer-to-peer storage
utility
- Presented by
- Deniz Hastorun
2Outline
- PAST motivation and goals
- PAST API
- Security
- Storage management
- File and replica diversion
- Replica management
- Caching
- Performance
- Conclusion
3PAST
- Internet based p2p global storage and content
distribution utility - Based on a self-organizing overlay network of
storage nodes - Cooperative routing, replication of files,
caching of popular files - File Storage rather than Block Storage
- Built on top of Pastry
4Motivation and Goals
- Currently p2p application research more directed
towards constructing apps and understanding
related issues - Goals
- Strong persistence by providing persistent
storage for replicated read-only files - High availability through replication and caching
- Scalability by obtaining high storage utilization
via local cooperation - Secure system
- Not intended as a general purpose FS
- No searching, directory lookup, or key
distribution operations
5PAST nodes
- Multitude and diversity of nodes in the Internet
are exploited - The collection of PAST nodes form an overlay
network - Minimally, a PAST node is an access point
- Optionally, it contributes to storage and
participate in the routing - PAST permits nodes to jointly store and publish
content exceeding capacity or BW of any node
6PAST API
- NodeID a 128-bit node identifier distributed
amongst a circular namespace - SHA-1 of the nodes public key
- quasi random assignment of nodeIDs
- No correlation btw node ID and nodes geographic
location - FileID 160-bit file identifier
- SHA-1 of the file name, owners public key and
salt - Root Node the node which is numerically closest
in hash value to the fileID - Uniform distribution of nodeID and fileIDs over
the keyspace
7PAST API(2)
- FileIDInsert(filename,credentials, k, file)
- Insert k copies of the file into the network, or
fail. - Fileid assigned (filename, credentials, salt) as
result - Successful if receipts received from k nodes
- Filelookup(fileID)
- Return a copy of the file if it exists and if one
of the k nodes is accessible - Reclaim(fileID, credentials)
- Reclaim accepted if requested by the owner
- Allows, but does not require, storage reclamation
- No longer guarentee a return for lookup operation
8PASTRY
- Routing Route a fileID to closest nodeID in less
than O(log2b (N)) hops (b4) - A file stored on k nodes, nodeIDs closest to 128
msbs of the fileID - Routing Table nth row in the table has 2b-1 node
IP addresses that are the same as the current
node - Leaf Set L/2 nodes numerically greater than and
less than current nodeID - Neighborhood Set L nodes closest to current node
by proximity metric (used in recovery updates) - Each hop is sent to a node which has one more
prefix digit in common - Routing table entries are updated lazily
9PAST Operations
- File Insertion
- Lookup
- Reclaim
10File Insertion
- Computes a fileID and a storage certificate
- Certificate fileID SHA-1 hash of content k
salt creation date optional file metadata - Debit storage kfilesize from quota
- Routes file and storage certificate using PASTRY
towards fileID - Node verifies the integrity of the file, stores
it, and asks k-1 closest nodes to store the file.
- K-1 nodes in leaf set (k-1 lt l)
- Upon acceptance nodes return ack with k signed
store receipts, or an appropriate error to the
client. - Files are immutable
11Lookup and Reclamation
- Lookup request- client sends a req msg using the
requested fileID as the destination - Pastry ensures replica is found
- Since a lookup is routed to the closest nodeID
and replicas stored on k nodes w/ adjacent
nodeIDs - Reclamation- analagous to Insert
- Client generates a reclaim certificate
- Certificate routed to the fileID via PASTRY
- Storing nodes verify the certificate issue
reclaim receipt - Client reclaims credit for users storage quota
12Security
- Smartcard based security model
- For each PAST node and each user of the system a
smartcard - Private/public key pair is associated with each
card - Smartcards generate, verify certificates and
maintain storage quotas - Ensure the integrity of nodeID and fileID
assignments - Store receipts prevent nodes from not making k
replicas - File certificates to verify integrity and
authenticity of stored content - File and reclaim certificates help enforce client
storage quotas - Data not stored as encrypted
13Storage Management
- Goal High global storage utilization and
graceful degradation as max utilization reached - Rely on local coordination among nodes w/ adj
nodeIDs - Fully integrate file insertion w/storage
management - Incur only modest performance overhead
14Storage Management(2)
- Case where k closest nodes cannot store a replica
- Storage node imbalance
- Reasons
- Statistical variation in the assignment of
nodeIDs and fileIDs, some nodes might store more
than the others - High variance in inserted file size distribution
- Difference in the storage capacity of PAST nodes
- Assumption Difference is no more than 2 orders
of magnitude
15Storage Management Techniques
- Replica Diversion -one of k nodes overloaded
- Balance the differences in storage capacity and
utilization of nodes within a leaf set - Node A diverts copy to node B in its leaf set if
- B is not among k-closest, does not already have a
diverted replica - A enters the pointer to copy at B in its table
and issues store receipt - A issues storage certificate
- A also inserts a pointer on the k1th closest
node C - If B fails a replacement replica created
- If C fails, A installs another pointer on the
current k1 th node
16Storage Management
- File diversion
- Balance remaining free storage space among
different portions of the nodeID space - In case of a neg ack in file insertion
- Generate a new fileID using a different salt
value - Retry the insert operation
- Repeat this process 3 times
- If still fails, abort and return error to the
operation
17Storage Policies
- How does a node choose to accept or reject a
replica? - Computes sizeof(file)/sizeof(free_space)
- Compares to Tpri or Tdiv depending nodes role
- Tpri gt Tdiv
- A node accepts all but oversized files as long as
its utilization is low - Prevents unnecasary diversion
- Discrimation against large files, threshold
decreased
18Maintaining Replicas
- Node join/leave causes leaf set adjustments
- In case of a node failure, node removed from leaf
sets of l nodes and the live node w/ next closest
nodeID included - In case of a new node, joining node included, one
node dropped from the leaf sets - Newly responsible node must copy files
- Either acquire a replica copy immediately
- Or include a reference pointer to old owner-gt
gradual copy migration - Diverted replicas
- Target of diversion may move out of leaf set
- Node to store replica and the node referencing
may not be from same leaf set - Must exchange keepalive messages themselves
- Should be gradually relocated to a node in the
same leaf set as the refererring node
19Maintaining Replicas (2)
- Node failure may cause storage shortage
- No node in the leaf set can take over ownership
- To maintain storage invariant
- Ask the 2 most distant nodes in the leaf set to
locate storage - Increases search space to 2l nodes
- If no storage space found, fail.
- Number of replicas drops below k until space
becomes available
20Caching
- Minimize client access latencies, maximize query
throughput and balance query load - K replicas maintained for high availability
- Pastry routes client lookup request to the
replica closest to client - For popular files and performance improvement
cached copies are stored - Caches inserted to nodes along the route path btw
client and fileID - Insertion policy file size lt c( nodes current
cache size) - Cache replacement policy based on GreedyDual-Size
(GD-S) Policy - maintain Weight per file H cost(file)/size(file
) - Eviction
- Pick file with minimum weight
- Subtract weight of evicted file from all others
- If cost(file) 1, cache hit rate maximized
21Experimental Results
- Web proxy and filesystems workloads used for
evaluation of storage management and caching - First set of experiments- no diversion
- Tpri1 , tdiv0 and failure after first insert
failure - 51.1 of file insertions failed
- Global storage utilization only 60.8
- Results obviates the need for storage management
in a system like PAST
22Experimental Results(2)
- With diversion
- Tpri0.1, tdiv0.05, l16 or 32
- With l16 utilization rises to gt94
- w/ l32 to gt98
- Larger leaf set increases the scope for load
balancing - No further improvement beyond l32, but node
arrival and departure costs increased
23Experiments (3)
As tpri is increased fewer successful files
insertions but higher storage utilization
24Experiments (4)
As tdiv is increased fewer successful files
insertions but higher storage utilization
25Impact of File and Replica Diversion
File diversion negligible if storage utilization
below 83
Number of diverted replicas remain small even at
high utilization 10 at 80 util
Overhead imposed is moderate as long as
utilization remains below 95
26Impact of File Size
- Up to 80 utilization no file less than .5 MBs is
dropped (large files can find adequate resources)
27Impact of Caching
- 8 combined NLANR traces used
- Requests from clients in each trace are mapped to
close PAST nodes - When caching disabled of routing hops constant
to 70 utilization then begins to rise - At low utilization rates, files cached in the
network close to where requested - At low global cache hit ratio due to high
utilization rate, avg number of routing hops
increases - Caching results still better than no caching even
at 99 utilization
28Conclusion
- Storage management ideas effective, but more work
is required to make it operational. - Support for directory service, key lookup needed
- A third party evaluation of the systems needed
- More experiments needed to compare how caching
performs with other systems - Experiments for performance of file retrieval
and reclaim performance, number of hops required
for file insertion, overhead due to diversion,
overlay routing overhead, effort to cache,etc - Possible improvements
- Avoiding or reducing replica diversion
- nodes could simply forward on to the next node or
store statistics in routing table - A directory service for retrieval performance.
- Centralized yet distributed storage quota
management mechanisms - "Masters nodes" or distributed proxies possible
solutions