A Low-bandwidth Network File System - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

A Low-bandwidth Network File System

Description:

People have occasion to work over networks slower than LANs (WAN, cable ... Copy, concatenation, auto-save file, RCS, object files - significant duplication ! ... – PowerPoint PPT presentation

Number of Views:50
Avg rating:3.0/5.0
Slides: 19
Provided by: ssrnet
Category:

less

Transcript and Presenter's Notes

Title: A Low-bandwidth Network File System


1
A Low-bandwidth Network File System
  • Athicha Muthitacharoen,
  • Benjie Chen,
  • David Mazières
  • ACM Symposium on Operating Systems Principles
  • (SOSP), 2001

2
Introduction
  • People have occasion to work over networks slower
    than LANs (WAN, cable modem, modem..)
  • To access remote data without network file system
  • Make and edit local copies of files-gt risk of
    update conflict
  • Remote login-gt interactive applications are slow
    in responding to user inputs
  • Network file system
  • Offers interfaces people already prefer for LAN
  • Provides tight consistency
  • Better tolerates network latency than remote
    login sessions

3
Introduction
  • LBFS
  • Designed for low-bandwidth networks
  • Exploits cross-file similarities
  • Between files
  • Between versions of same file
  • Example
  • Copy, concatenation, auto-save file, RCS, object
    files-gt significant duplication !

4
Design
client
server
File hello
H
E
L
L
O
cache
H(A)
A
H(H)
H
H(L)
L
Index
Data block
5
Design
  • Choice of hash funcgion SHA-1
  • Output is 160 bit (20 byte)
  • Extremely low collision probability

6
Design
  • Indexing(chunk decision) method candidate 1
  • Indexing hashes of all aligned 8KB blocks in
    file

H(A)
H(B)
H(C)

A
B
C

8K
8K
  • Problem
  • Single byte insertion at start of a large file
    changes all hash values


A
B
C
8K
8K
7
Design
  • Indexing(chunk decision) method candidate 2
  • Indexing hashes of all overlapping 8KB blocks at
    all offsets

H(A)
H(B)
H(C)

  • Problem
  • Almost one index entry per byte
  • Every file modification might require thousands
    of index insertions

8
Design
  • Indexing method of LBFS
  • Non-overlapping variable sized chunks
  • Chunk boundaries are determined by file contents
  • Dividing a file into chunks
  • LBFS examines every overlapping 48-byte region
  • If Rabin fingerprint of this region happen to
    be pre-configured magic number, this region is
    considered as end of a chunk

9
Design
  • Rabin fingerprint
  • Polynomial representation of the data modulo a
    pre-determined irreducible polynomial
  • RF for a sequence of bytes t1, t2, , t? of
    length ?where p and M are constant intergers
  • Efficient to compute on a sliding window in a file

10
Design
  • Chunk decision example
  • If M is 213 and magic number is 0,

A
RF(A) 113
48 byte sliding window
B
RF(B) 54
C
RF(C) 33

X
RF(X) 0
11
Design
12
Design
  • Comparison between LBFS and rsync
  • Rsync considers only two files in same name
  • Doesnt consider commonality between
  • foo.c / foo.c / foo.c
  • RCS temporary file
  • Object file and library made by ar
  • Not adequate for file system

13
Design
  • Protocol

14
Evaluation
  • Amount of new data in a file or directory, given
    an older version

15
Evaluation
  • Distribution of chunk size in the /usr/local
  • Run mkdb (LBFS utility) on servers /usr/local
  • 354 MB of data in 10,702 files
  • chunk DB consumed 4.7MB space(1.3 size of
    directory), 9 mins to construct DB
  • Mean chunk size is 8570 bytes (close to expected
    value of 8240 bytes)

16
Evaluation
  • Normalized bandwidth of three workloads

17
Evaluation
  • Performance of gcc workload

18
Conclusion
  • LBFS is a network file system that saves
    bandwidth by taking advantage of commonality
    between files
  • Under common operations such as editing documents
    and compiling SW, LBFS can consume an order of
    magnitude less bandwidth than traditional file
    systems
Write a Comment
User Comments (0)
About PowerShow.com