A Locality Preserving Decentralized File System - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

A Locality Preserving Decentralized File System

Description:

Each server responsible for pseudo-random range of ID space. Object are given pseudo-random IDs ... Isolating user data in clusters [Archipelago] ... – PowerPoint PPT presentation

Number of Views:48
Avg rating:3.0/5.0
Slides: 23
Provided by: michael874
Category:

less

Transcript and Presenter's Notes

Title: A Locality Preserving Decentralized File System


1
A Locality Preserving Decentralized File System
  • Jeffrey Pang
  • Haifeng Yu
  • Phil Gibbons
  • Michael Kaminsky
  • Srini Seshan

2
Project Intro
  • Defragmenting DHT data layout for
  • Improved availability for entire tasks
  • Amortize data lookup latency

Current DHT Data Layout random placement
Defragmented DHT Data Layout sequential placement
  • Typical Task/Operation Sizes
  • 30-65 access gt10 8kb-blocks
  • 8-30 access gt100 8kb-blocks

3
Background
  • EXISTING DHT STORAGE SYSTEMS
  • Each server responsible for pseudo-random range
    of ID space
  • Object are given pseudo-random IDs

324
987
160
211-400
401-513
150-210
800-999
4
Project Overview
  • Goal Produce a decentralized read-mostly
    filesystem with following properties
  • Sequential layout of related data
  • Amortized lookup latency
  • Improved availability
  • Some Challenges
  • Load balancing
  • Download throughput
  • Project Focus
  • System design and implementation

Current DHT Data Layout random placement
Defragmented DHT Data Layout sequential
placement
5
Overview
  • Background Motivation
  • Preserving Object Locality
  • Dynamic Load Balancing
  • Results
  • Future Work

6
Preserving Object Locality
  • Motivation
  • Fate sharing all objects in a single operation
    are more likely to be available at once
  • Effective caching/prefetching servers Ive
    contacted recently are more likely to have what I
    want next
  • Design options
  • Namespace locality (e.g., filesystem hierarchy)
  • Dynamic clustering (e.g., based on observed
    access patterns)

7
Is Namespace Locality Good Enough?
  • Initial trace evaluation
  • Workloads
  • HP block-level disk trace (1999)
  • Harvard research NFS trace (2003)
  • NLANR webcache trace (2003)
  • Setup
  • Order files alphabetically according to filepath
  • 10,000 data blocks/server
  • Calculate failure prob. of each operation
  • Node failure probability of 5
  • 3 replicas

8
Estimated Availability Across Workloads
9
Encoding Object Names
160 bits
Traditional DHT key encoding
SHA-1 Hash
SHA1(data)
data
  • Leverage
  • Large key space (amortized cost over wide-area is
    minimal)
  • Workload properties (e.g., 99 of the time
    directory depth lt 12)
  • Corner cases
  • Depth or width overflow use 1 bit to signal
    overflow region and just use SHA1(filepath)

10
Encoding Object Names
Bill
6
userid
path encode
blockid
Docs
6
1
0

bid
1
6
1
1

bid
1
6
1
2

bid
2
Bob
7
570-600
601-660
661-700
11
Dynamic Load Balancing
  • Motivation
  • Hash function is no longer uniform
  • Uniform ID assignments to nodes leads to load
    imbalance
  • Design options
  • Simple item balancing (MIT)
  • Mercury (CMU)

storage load
node number
Load balance with 1024 nodes using the Harvard
trace
12
Load Balancing Algorithm
  • Basic Idea
  • Contact a random node in the ring
  • If myLoad gt deltahisLoad (or vis versa), the
    lighter node changes its ID to move before the
    heavier node.
  • Heavy nodes load splits in two.
  • Node load within factor of 4 in O(log(n)) steps
  • Mercury optimizations
  • Continuous sampling of load around the ring
  • Use estimated load histogram to do informed probes

13
Handling Temporary Resource Constraints
  • Drastic storage distribution changes can cause
    frequent data movement
  • Node storage can be temporarily constrained
    (i.e., no more disk space)
  • Solution
  • Lazy data movement
  • Node responsible for a key keeps a pointer to
    actual data blocks
  • Data blocks can be stored anywhere in system

14
Handling Temporary Resource Constraints
data
data
WRITE
data
15
Results
  • How much improvement in availability and lookup
    latency can we expect?
  • Setup
  • Trace-based simulation with Harvard trace
  • File blocks named using our encoding scheme
  • Same availability calculation as before
  • Clients keep open connections to 1-100 of the
    most recently contacted data servers
  • 1024 servers

16
Potential Reduction in Lookups
17
Potential Availability Improvement
Random (expected) Ordered (unif) Optimal
  • Encoding has nearly identical failure prob as the
    alphabetical encoding (differs by 0.0002)

18
Results
  • What is the overhead of load balancing?
  • Setup
  • Simulated load balancing with Harvard trace
  • 1024 servers
  • Each load balance step uses a histogram estimated
    with 4 random samples

initial distribution
19
Load Balance Over Time
20
Data Migration Overhead
21
Related Work
  • Namespace Locality
  • Cylinder group allocation FFS
  • Co-locating datameta-data C-FFS
  • Isolating user data in clusters Archipelago
  • Namespace flattening in object based storage
    Self-
  • Load Balancing Data Indirection
  • DHT Item Balancing SkipNets, Mercury
  • Data Indirection Total Recall

22
(Near) Future Work
  • Finish up implementation
  • Currently finished
  • data block, storage/retrieval data indirection,
    some load balancing
  • Still requires
  • Some debugging
  • Interfacing with NFS filesystem loopback
  • Evaluation on real testbeds (Emulab and
    Planetlab)
  • Targeting submission for NSDI 2006
  • Mid-October deadline
Write a Comment
User Comments (0)
About PowerShow.com