Title: Network File Systems II
1Network File Systems II
- Frangipani A Scalable Distributed File System
-
- A Low-bandwidth Network File System
2Why Network File Systems?
- Scalability
- support more users and data
- handle server failure gracefully
- Improved accessibility
- allow more users access
- extend conditions under which access is feasible
3File System Requirements
- Coherence consistent, predictable file state
- Efficiency timely reads and writes
- Security provide access control
- Recoverability allow backup of file system
4Frangipani and LBFS
- Frangipani file system transparent scalability
- easy administration at any scale
- takes advantage of parallelism for good
performance - Low Bandwidth File System (LBFS) reduce
bandwidth to increase performance - takes advantage of duplicate file information
- uses cacheing and compression to limit data
volume
5Features of Frangipani
- Petal shared virtual disk
- Frangipani provides naming and structure for
Petal - Lock system distributed across servers
- Leases manage connections with lower state
requirements - Backups generated from Petal snapshots using
the recovery process
6An Example Configuration
7The Petal Virtual Disk
- Storage read/written in blocks
- Sparse address space 264
- Physical storage allotted only on write
- Allows replication for high availability
- Read-only snapshot feature
8Frangipani Disk Layout
- Region 1 Disk configuration info (1 TB)
- Region 2 Log space (1 TB), divided into 256
individual server logs - Region 3 Allocation bitmaps (3 TB), chunks
owned by individual servers
9More Frangipani Disk Layout
- Region 4 Inodes (1 TB), 231 512 byte inodes
- Region 5 Small data blocks (128 TB) 235 blocks
at 4 KB each - Region 6 Large data blocks, 224 1 TB blocks
10Frangipani Server Logs
- Bounded 128 KB, split across physical disks
- Circular buffer scheme 25 reclaimed when full
- Uses sequence numbers to mark wrap point
- 1000 to 1600 operations can be held in the log
(entry size 80 to 128 bytes)
11Server Logging
- Write-ahead redo policy
- File metadata and physical file dated updated on
disk after log write - Unix daemon handles disk writes every 30 seconds
12Lock Service
- Many reader/single writer sticky locks
- Asynchronous communication
- Lamports Paxos algorithm replicates
infrequently-changed data - Heartbeat messages determine liveness
13Locking Avoiding Contention
- Single lockable data structure per disk sector
eliminates false sharing - Each file, directory, or symlink and its inode
treated as a single lockable segment - Lock algorithm for aquiring multiple locks to
avoid deadlock
14Crash Recovery
- Detection of server crash based on lapsed leases,
no network response - Recovery daemon given now owns log and locks
- Metadata sequence nums prevent update replay
No high-level semantic guarantee to
users! Petal snapshot can be used for entire
system recovery
15Performance Benchmarks
16Frangipani Conclusions
- Frangipani meets the goals set for it
- coherent access
- easy administration
- scalable performance (limit is network itself)
- good failure recovery
- Testing on a larger scale will be the true
- test of Frangipani
17Introduction to LBFS
- Designed for efficient remote file access over
low bandwidth networks - Exploits similarities between files and file
versions - Client maintains a large cache of working files
- Compression further reduces data volume
- Uses NFS protocol for access control and access
to existing file systems
18Why Do We Need LBFS?
- Typical network file systems designed for 10
Mbit/sec or better bandwidth - Problems using FS over WAN
- interactive programs that freeze
- batch commands that run several times slower
- less agressive applications are starved
- some applictions may not run at all!
19Why LBS (contined)
- Downloading and editing files locally can lead to
version conflicts - Upstream bandwidth is still limited with broadband
LBFS eliminates these problems while still
preserving consistency
20LBFS File Chunk Scheme
In order to exploit commonality, files need to be
broken into chunks
- Server and client keep index of hashed chunks
- Server index has chunk hashes for entire FS
- Client index has chunk hashes for working files
21Chunk Creation Algorithm
- Need to handle shifting offsets while keeping the
chunk index managable - Examine every overlapping 48 byte region of the
file - With probability 2-13, consider a region to be a
breakpoint, or file chunk end marker
22Rabin Fingerprints
- Rabin fingerprints help find breakpoints
- Polynomial representation of data modulo an
irreducible polynomial - When the low 13 bits of a regions fingerprint
equals a certain number, then it is selected - Given random data, the expected chunk size is 213
8192 8 KB 48 byte breakpoint
23File Revisions With Breakpoints
- a. Original file c. Insert that includes
breakpoint - b. Text Insertion d. Elimination of a
breakpoint
24Breakpoint Pathological Cases
- Data is usually not random! Worst case
scenarios - All 48 byte regions are breakpoints the chunk
index same size as file - No 48 byte regions are breakpoints large chunks
take extra time and memory for RPC - Solution define bounds
- min chunk size 2 KB
- max chunk size 64 KB
25The Chunk Database
- Each chunk indexed by the first 64 bits of its
SHA-1 hash - Keys index ltfile, offset, countgt tuples must
update when chunk changes - LBFS always recomputes hash value before use
- hash collisions are detected
- penalty of bad DB data only performance hit
26Benefits Provided by NFS 3
- NFS 3 IDs files by opaque handles that persist
through file renaming - Handles access control for LBFS
- Allows LBFS to use NFS protocol to access
existing file systems - Disadvantage i-number not changed when file is
overwritten, so extra copy required
27LBFS Protocol Enhancements
- Leases save permissions checks and data
validation for recently-accessed files - Uses RPC, but with agressive pipelining
- Gzip compression
28Maintaining File Consistency
- Close-to-open consistency
- Client needs whole-file cache
- Multiple processes on a single client are allowed
write access to same file simultaneously - LBFS writes back to file system on each close
- Last close overwrites previous changes
29Profile of a Read Request
30Profile of a Write Request
31Security One Concern
- It is possible, through a systematic use of the
CONDWRITE RPC call, to determine whether a
particular hashed chunk exists in the file
system given away by response time variations
32LBFS Server Implementation
- LBFS can run on top of another FS
- server pretends to be an NFS client
- Server creates a .lbfs.trash dir at root of every
exported system - stores temp files indefinitely and garbage
collect a random file when full
33LBFS Client Implementation
- Client uses xfs device driver
- passes messages through device node in /dev
- xfs tells LBFS when to transfer file contents
to/from server - LBFS fetches files to client cache, notifies xfs
driver of bindings between cache contents and
open files
34LBFS Performance Testing
- LBFS consumed far less bandwidth and allowed
better application performance under test
conditions - Workloads tested were typical applications of
MSWord, gcc, and ed - CIFS, NFS, and AFS were tested (based on
workload) for comparison - Also tested a Leases and Gzip only version
35LBFS Conclusions
- In low-bandwidth networks, LBFS out-performs the
traditional file systems tested - similar consistency guarantees
- implemented as transparent layer on top of an
existing file system - public key cryptography provides security
- client cacheing distributes load and reduces
network dependency
36Last Word Frangipani LBFS
- Both Frangipani and LBFS meet file system and
distributed system requirements, but targeted
different problems - Frangipani achieved transparent scalability
without performance loss - LBFS achieved feasible performance over WANs as a
transparent add-on to a traditional FS using
improved protocols and load sharing