The Zebra Striped Network File System - PowerPoint PPT Presentation

About This Presentation
Title:

The Zebra Striped Network File System

Description:

Zebra applies techniques of Log-File System and Per-Client Striping ... Zebra uses checkpoint and roll-forward method for restoring consistency. ... – PowerPoint PPT presentation

Number of Views:202
Avg rating:3.0/5.0
Slides: 23
Provided by: mac90
Category:

less

Transcript and Presenter's Notes

Title: The Zebra Striped Network File System


1
The Zebra Striped Network File System
  • Presentation by
  • Joseph Thompson

2
Purpose
  • Single file server architectures will not be able
    to support future throughput needs.
  • Need a striping technique that will support all
    size of files writes in effective and uniform
    manner.

3
Striping in Zebra
  • RAID
  • Per-File Striping in a Network File System
  • Log-Structured File Systems and Per-Client
    Striping

4
RAID-Problems
  • Small writes in RAID are about four times as
    expensive as they would be in a disk array with
    parity.
  • All the disks are attacked to a single machine,
    so its memory and I/O system are performance
    bottlenecks.
  • Note no reason there has to be a dedicated
    parity disk.

5
Per-File Striping in a Network File System
  • Note A collection of file data that spans the
    servers is called a stripe, and the portion of a
    stripe stored on a single server is called a
    stripe fragment.
  • Small files are difficult to handle efficiently
  • Inefficient parity management during updates

6
Log-Structured File Systems and Per-Client
Striping
  • Solution to per-file problems
  • Zebra applies techniques of Log-File System and
    Per-Client Striping
  • Creates an append only log for each client who
    then can convert many small writes into one large
    writes to a single stripe. (client is responsible
    for calculating parity)
  • Requires a File Manager to facilitate client
    interaction and keep record of file metadata such
    as file attributes, directory structures, etc.
  • Like all other LFSs, this solution also requires
    a stripe cleaner.

7
Zebra Components
  • Storage Servers
  • Clients
  • File Mangers
  • Stripe Cleaners

8
Storage Servers
  • Storage server requirements
  • Store a fragment
  • Append to an existing fragment
  • Used for periodic writes of a log
  • Retrieve a fragment
  • Delete a fragment
  • Identify fragments
  • Used to identify end of client logs after crashes

9
Clients
  • On Read
  • Client must determine which stripe fragments
    store the desired data, retrieve the data from
    the storage servers, and return them to the
    application.
  • On Write
  • Client appends the new data to its log by
    creating new stripes to hold the data, computing
    the parity of the stripes, and writing the stripe
    to the storage servers.

10
File Mangers
  • File Manager stores all of the information in the
    file system except for file data.
  • The client requests block pointers for the File
    Manager, and accesses the block data itself.
  • Performance if the File Manager is a concern
    because it is a centralized resource.
  • Solution clients cache naming information from
    File Manager so that the client contacts the file
    manager less often.

11
Stripe Cleaners(first glance)
  • The only way to reuse free space in a stripe is
    to clean the stripe so that is contains no live
    data, then delete it.
  • Since the cleaner is a client itself, it just
    reads live data from stripes with the largest
    amounts of free fragments, appends the data to
    its own client log to be written to a new stripe,
    and then deletes the old stripes.

12
System Operations
  • Communication Deltas
  • Stripe Cleaning (additional details)
  • Adding Additional Storage Servers

13
Communication Deltas
  • Deltas provide a simple and reliable way for
    various system components to communicate changes
    to files.
  • A client's log also contains deltas.
  • Delta Information
  • File ID, File Version(time edited), Block Number,
    Old Block pointer, New Block pointer.
  • Three types of deltas
  • Update delta, cleaner delta, reject delta.

14
Stripe Cleaning (additional details)
  • Evaluating stripe space utilization
  • Cleaner must process the number of deltas in
    every client log (stripe) to keeping a running
    count of free fragments.
  • The cleaner appends all of the deltas that refer
    to a given stripe to a special file for that
    stripe, called a Stripe Status File.
  • Conflicts between cleaning and file access
  • Stripe cleaner does not lock any files during
    cleaning. Only issues a special cleaner delta.
  • If a conflict did a occur when a update took
    place during a cleaning, the file manager will
    notice two different deltas and make sure the
    final pointer for the block reflects the update
    delta.
  • The manager generates a reject delta that the
    cleaner uses to tell that the new block it
    created is unused.
  • (just to show how adding a stripe cleaner
    significantly adds complexity)

15
Adding Additional Storage Servers
  • When a new storage sever becomes available, all
    that must be done is notify the clients, file
    manager, and stripe cleaner that each stripe
    group has one more fragment.

16
Restoring Consistency After Crashes
  • Two general issues upon crash
  • Consistency
  • Availability
  • Zebra uses checkpoint and roll-forward method for
    restoring consistency.
  • Three new consistency problems
  • Stripes may become internally inconsistent
  • Some of the data or parity written but not all of
    it
  • Information written to stripes may become
    inconsistent with metadata
  • Stripe cleaner state becomes inconsistent with
    stripes

17
Stripes may become internally inconsistent
  • Zebra stores simple checksum for each fragment.
  • On storage server reboot
  • Verifies checksums only around the time of crash
    (using deltas)
  • Discards incomplete checksums
  • Queries other stripes to find out what new
    stripes were written when server was down.

18
Information written to stripes may become
inconsistent with metadata
  • If Client crashes file manager must check logs to
    make sure the last log written successfully.
  • If manager crashes it has to run through every
    clients log from the managers last check point
    and roll forward through the rest of the log to
    update info since last checkpoint.

19
Stripe cleaner state becomes inconsistent with
stripes
  • Strip cleaner also stores periodic checkpoints
    and on restart reads (and corrects) status files
    then starts collecting more utilization
    information from the point of its last checkpoint
    (roll-forward).

20
Performance overview
21
Performance overview cont
22
Conclusion
  • Zebra provides higher throughput , availability,
    and scalability, than previous file systems at
    the cost of increased system complexity.
  • Its only step one. As we saw that xFS included
    and improved Zebras core functionality.
Write a Comment
User Comments (0)
About PowerShow.com