Storage Management and Caching in PAST, a large-scale, persistent peer-to-peer storage utility - PowerPoint PPT Presentation

About This Presentation
Title:

Storage Management and Caching in PAST, a large-scale, persistent peer-to-peer storage utility

Description:

Storage Management and Caching in PAST, a large-scale, persistent peer-to-peer storage utility Authors: Antony Rowstorn (Microsoft Research) Peter Druschel (Rice Univ) – PowerPoint PPT presentation

Number of Views:199
Avg rating:3.0/5.0
Slides: 22
Provided by: Moon90
Category:

less

Transcript and Presenter's Notes

Title: Storage Management and Caching in PAST, a large-scale, persistent peer-to-peer storage utility


1
Storage Management and Caching in PAST, a
large-scale, persistent peer-to-peer storage
utility
  • Authors
  • Antony Rowstorn (Microsoft Research)
  • Peter Druschel (Rice Univ)
  • Presented by
  • Rama Alebouyeh

2
Outline
  • Goals
  • PAST
  • Security
  • Storage management
  • Caching
  • Experimental results
  • Notes

3
Goals
  • Goals
  • Strong persistence by providing persistent
    storage for replicated read-only files
  • High availability through replication and caching
  • Scalability by obtaining high storage utilization
    via local cooperation
  • Security by using smart cards and store receipts
  • PAST is archival storage and content distribution
    utility
  • PAST is not a replacement for traditional file
    systems but it assumes that traditional FSs could
    be used as local cache for PAST.

4
PAST overview
  • PAST is built on PASTRY
  • fileId160 bits
  • nodeId128 bits
  • fileId and nodeIds are uniformly distributed in
    their respective domains.
  • fileId is computed as a secure hash (SHA-1) of
    the files name, the owner public key, and a
    salt.
  • Stores the file on k PAST nodes with numerically
    closest nodeIds to the 128 msb of fileId.

5
PAST operations
  • fileIdInsert (name, owner-credentials, k,file)
  • k is user specified number of file replicas
  • k replica is maintained over the life time of the
    file
  • file Lookup (fileId)
  • Client must provide fileId
  • Retrieve form live node closest to client
  • Reclaim (fileId, owner-credentials)
  • Does not guarantee deletion of all replicas
  • Does not guarantee return from Lookup

6
PAST operations (2)
  • Insert
  • File certificate is issued and signed by owners
    private key.
  • File certificate contains fileId, SHA-1 of file
    content, k, salt, date, file meta data.
  • File and its associate certificate will be routed
    to node with closest nodeId to 128 msb of fileId.
  • On success, store receipt will be sent back to
    the client, other wise an error will be reported
    to the client

7
PAST operations (3)
  • Lookup
  • Sends a request message with fileId as the
    destination
  • As soon as request reaches a node with the file,
    node sends the file and its certificate and stop
    forwarding the request.
  • Reclaim
  • Analogous to Insert
  • Client issues a reclaim certificate

8
PAST Security
  • PAST provides security by
  • Smart cards (node and user)
  • File and reclaim certificates
  • Store and reclaim receipts
  • Randomized PASTRY routing scheme
  • Routing table entries signed by associated nodes

9
Storage management
  • The goal is to achieve high global storage
    utilization and graceful degradation as system
    reaches its maximum utilization.
  • The Responsibilities of storage management are
    to
  • Balance the remaining free space among nodes as
    utilization approaches its maximum.
  • Maintain the invariant that copies of each file
    are maintained by k nodes with the closest nodeId
    to the fileId
  • It relies on local coordination of nodes

10
Replica diversion
  • If a node A can not store a replica, it chooses
    node B in its leaf-set to divert the replica
  • B shouldnt be among the k closest node
  • B shouldnt already hold a directed replica
  • A keeps a pointer to B in its table and issue a
    store receipt
  • A also enters a pointer on the k1th closest node
    C
  • If B fails a replacement replica created
  • If C fails, A installs another pointer on the
    current k1 th node

11
File diversion
  • goal is to balance the remaining free storage
    space among different portions of nodeId space
  • When a client receives a NACK back in response of
    Insert operation
  • Create another fileId with different salt
  • Retry Insert operation
  • Try three time

12
Storage managementpolicy
  • File acceptance policy
  • if SD / FN lt t
  • SD size of file D
  • FN node N free storage space
  • Tpri k closest node to fileId
  • Tdiv nodes that are not among k

13
Maintaining replicas
  • Nodes are aware of their neighbors by PASTRY
    leaf-set periodically keep-alive messages
  • When a node joins or gets back on-line it enters
    a pointer to replica of the file and gradually
    transfer files
  • Nodes also exchange explicit keep alive messages
    with the node that holds their replica
  • In high utilization nodes may ask their the two
    most distant nodes in their leaf-set to locate a
    node in their leaf-set that can store the file.
  • In high utilization is possible that number of
    replicas goes below k

14
Caching
  • Goal is to minimize client access latency,
    maximize query throughput, and balance the query
    load in the system
  • Unused portion of advertised storage is used as
    cache
  • Cache files can be evicted at any time
  • Cache when a file is routed through a node as
    part of lookup or insert
  • File size is smaller than a fraction (c) of the
    nodes current cache size
  • Cache replacement policy is GreedyDual-Size
    (GD-S)

15
Experimental Results
  • Two sets of data a data set from 8 web proxy
    logs, another data set from file system
  • K5, b4 (PASTRY), N2250
  • First experiment with no diversion
  • Tpri1 , tdiv0 51.1 of file insertions failed
  • Global storage utilization only 60.8
  • Results obviates the need for storage management
    in a system like PAST

16
Experimental Results
  • Tpri0.1, Tdiv0.05, l16 or 32
  • l16 utilization gt 94
  • l32 utilization gt 98
  • Larger leaf set increases the scope for load
    balancing
  • Larger l increases cost of node
    arrivals/departures

17
Experimental Results
  • Varying tpri
  • lower the value of tpri less likely a large file
    can be stored on a node
  • Many small files can be stored, therefore number
    of files stored increases as tpri decreases
  • Utilization drops b/c large files are rejected at
    low utilization levels

18
Experimental Results
  • Varying tdiv

As tdiv is increased fewer successful files
insertions but higher storage utilization
19
Impact of File and Replica Diversion
File diversion negligible if storage utilization
below 83
Number of diverted replicas remain small even at
high utilization 10 at 80 util
20
Impact of caching
21
Notes
  • Key lookup and directory search are needed
  • Immutable file property and lack of directory
    search limit the applications of PAST
  • File reclaim effect on performance is not measured
Write a Comment
User Comments (0)
About PowerShow.com