Boxwood: Distributed Data Structures as Storage Infrastructure - PowerPoint PPT Presentation

1 / 14
About This Presentation
Title:

Boxwood: Distributed Data Structures as Storage Infrastructure

Description:

Scalability, automatic reconfiguration, load balancing, and fault tolerance ... Caters to the needs of a wide variety of clients ... – PowerPoint PPT presentation

Number of Views:40
Avg rating:3.0/5.0
Slides: 15
Provided by: lidon
Category:

less

Transcript and Presenter's Notes

Title: Boxwood: Distributed Data Structures as Storage Infrastructure


1
Boxwood Distributed Data Structures as Storage
Infrastructure
  • Lidong Zhou
  • Microsoft Research Silicon Valley
  • Team Members
  • Chandu Thekkath, Marc Najork, Nick Murphy

2
Trends in Storage Systems Distribution,
Virtualization, and Abstractions
  • The case for distributed storage architecture
  • Enables incremental expandability
  • Benefits performance and reliability
  • Virtualization facilitates management
  • Scalability, automatic reconfiguration, load
    balancing, and fault tolerance
  • Abstractions for managing complexity

3
Going Beyond Virtual Disks
  • Virtual disk provides a low-level abstraction
  • Complexity pushed to each client and duplicated
  • Limited intelligence due to lack of structural
    info.
  • Advantages of higher-level abstractions
  • Reduce client complexity and eliminate
    duplication
  • Exploit structural information for better load
    balancing, pre-fetching, and caching
  • There is no universal abstraction

4
Boxwood System Architecture
  • Supports multiple data abstractions
  • Space abstraction Segments named by UIDs
  • Structure abstractions (B-Link Tree, Hash Table,
    Skip List)

B-Link Tree
Hash Table
Structure Abstractions
UID Space
5
UID Space Abstraction
  • Segments in a flat UID space
  • UID names segments
  • Deallocated UIDs are never reused
  • Offloads address management from clients
  • Clients can provide hints (e.g., for
    co-location)
  • A virtualization layer
  • Distributed, fault tolerant, and incrementally
    expandable
  • Performs simple capacity and load balancing

6
UID Space Design and Implementation
Client
Consensus (Paxos)
Space Clerk
UID Space
Disks
7
Higher-Level Structure Abstractions
  • UID Space emphasizes universality and simplicity
  • Structure abstractions more attractive for
    sophisticated clients
  • Co-location, pre-fetching, and caching strategies
    supported inside Boxwood
  • Better abstractions for databases and file
    systems
  • First Boxwood structure abstraction B-Link trees
  • Supports tree creation/destruction, insert,
    delete, lookup, and enumeration
  • A good abstraction with wide applicability
  • Enough complexity to expose fundamental issues

8
B-Link Tree Layer
  • Built on the UID space
  • Highly concurrent B-Link tree operations with
    distributed locking
  • Logging/recovery to ensure atomicity of B-Link
    tree operations

B-Link Tree
B-Link Tree
Distributed Locking
UID Space
9
Current Status
  • UID space and B-Link tree modules
  • Distributed UID space with capacity balancing
  • B-Link tree algorithm with distributed locking
  • Logging/recovery for tolerating transient
    failures
  • Paxos consensus
  • On-going Work
  • Run-time verification of the B-Link tree
    algorithm (Shaz Qadeer and Serdar Tasiran)
  • A distributed file system on Boxwood abstractions

10
File System on BoxwoodA High-Level Design
  • B-Link tree abstraction is ideal for directories
    and meta-data
  • Files are implemented using both B-Link tree
    abstraction and UID space abstraction
  • B-Link trees for i-node mapping from file
    offsets to (UID, offset)
  • UID space for actual user data store

11
File System on BoxwoodThe Architecture
B-Link Tree
B-Link Tree
Distributed Locking
UID Space
12
Future Work
  • Finish module implementation
  • load balancing
  • automatic reconfiguration
  • chained de-clustered disk
  • More abstractions and clients to explore utility,
    generality, and flexibility of Boxwood

13
Related Work
  • Distributed Storage/Operating Systems
  • Virtual/Logical disks
  • File systems
  • Database systems
  • Scalable Distributed Data Structures
  • Linear Hash Table (LH) and its variants (Litwin,
    1980--present)
  • Scalable distributed hash table (Gribble, et al.,
    2000)
  • Highly concurrent B-tree (LehmanYao, 1981
    Sagiv, 1986)

14
Early Experience and Observations
  • Virtualized distributed storage infrastructure
    that exports good abstractions is promising
  • Multiple layers of abstractions are beneficial
  • Manages complexity
  • Caters to the needs of a wide variety of clients
  • Distributed file system on Boxwood is
    straightforward
  • Use of a matching high-level abstraction is ideal
  • A low-level abstraction offers more flexibility,
    but requires more bookkeeping
  • A good exercise to uncover fundamental principles
    in scalable/reliable distributed systems
Write a Comment
User Comments (0)
About PowerShow.com