Title: Bandwidth%20and%20latency%20optimizations
1Bandwidth and latency optimizations
- Jinyang Li
- w/ speculator slides from Ed Nightingale
2What weve learnt so far
- Programming tools
- Consistency
- Fault tolerance
- Security
- Today performance boosting techniques
- Caching
- Leases
- Group commit
- Compression
- Speculative execution
3Performance metrics
- Throughput
- Measures the achievable rate (ops/sec)
- Limited by the bottleneck resource
- 10Mbps link max 150 ops/sec for writing 8KB
blocks - Increase tput by using less bottleneck resource
- Latency
- Measures the latency of a single client response
- Reduce latency by pipelining multiple operations
4Caching (in NFS)
- NFS clients cache file content and directory name
mappings - Caching saves network bandwidth, improves latency
5Leases (not in NFS)
- Leases eliminate latency in freshness check, at
the cost of keeping extra state at the server
6Group commit (in NFS)
- Group commit reduces the latency of a sequence of
writes
7Two cool tricks
- Further optimization for b/w and latency is
necessary for wide area - Wide area network challenges
- Low bandwidth (10100Mbps)
- High latency (10100ms)
- Promising solutions
- Compression (LBFS)
- Speculative execution (Speculator)
8Low Bandwidth File System
- Goal avoid redundant data transfer between
clients and the server - Why isnt caching enough?
- A file with duplicate content ? duplicate cache
blocks - Two files that share content ? duplicate cache
blocks - A file thats modified ? previous cache is useless
9LBFS insights name by content hash
- Traditional cache naming (fh, offset)
- LBFS naming SHA-1(cached block)
- Same contents have the same name
- Two identical files share cached blocks
- Cached blocks keep the same names despite file
changes
10Naming granularity
- Name each file by its SHA-1 hash
- Its rare for two files to be exactly identical
- No cache reuse across file modifications
- Cut a file into 8KB blocks, name each
x8K,(x1)8K) range by hash - If block boundaries misalign, two almost
identical files could share no common block - If block boundaries misalign, a new file could
share no common block with its old version
11Align boundaries across different files
- Idea determine boundary based on the actual
content - If two boundaries have the same 48-byte content,
they probably correspond to the same position in
a contiguous region of identical content
12Align boundaries across different files
ab9f..0a
87e6b..f5
ab9f..0a
87e6b..f5
13LBFS content-based chunking
- Examine every sliding window of 48-bytes
- Compute a 2-byte Rabin fingerprint f of 48-byte
window - If the lower 13-bit of f is equal to v, f
corresponds to a breakpoint - 2 consecutive breakpoints define a chunk
- Average chunk size?
14LBFS chunking
f1
f2
f3
f4
f1
f2
f3
f4
- Two files with the same but misaligned content of
x bytes - How many fingerprints for each x-byte content?
How many breakpoints? Breakpoints aligned?
15Why Rabin fingerprints?
- Why not use the lower 13 bit of every 2-byte
sliding window for breakpoints? - Data is not random, resulting in extremely
variable chunk size - Rabin fingerprints computes a random 2-byte value
out of 48-bytes data
16Rabin fingerprint is fast
- Treat 48-byte data D as a 48 digit radix-256
number - f47 fingerprint of D047
- ( D47 256D46 25646D1
- 25647D0 ) q
- f48 fingerprint of D1..48
- ((f47 - D025647) 256 D48 ) q
17LBFS reads
GETHASH
File not in cache
(h1, size1, h2, size2, h3, size3)
Fetching missing chunks Only saves b/w by reusing
common cached blocks across different files or
different versions of the same file
Ask for missing Chunks h1, h2
READ(h1,size1)
READ(h2,size2)
Reconstruct file as h1,h2,h3
18LBFS writes
MKTMPFILE(fd)
Create tmp file fd
CONDWRITE(fd, h1,size1,
h2,size2,
h3,size3)
Transferring missing chunks saves b/w if
different files or different versions of the
same file have pieces of identical content
Reply with missing chunks h1, h2
HASHNOTFOUND(h1,h2)
TMPWRITE(fd, h1)
Construct tmp file from h1,h2,h3
TMPWRITE(fd, h2)
COMMITTMP(fd, target_fhandle)
copy tmp file content to target file
19LBFS evaluations
- In practice, there are lots of content overlap
among different files and different version of
the same file - Save a Word document
- Recompile after a header change
- Different versions of a software package
- LBFS results in 1/10 b/w use
20Speculative Execution in a Distributed File System
- Nightingale et al.
- SOSP05
21How to reduce latency in FS?
- What are potentially wasteful latencies?
- Freshness check
- Client issues GETATTR before reading from cache
- Incurs an extra RTT for read
- Why wasteful? Most GETATTRs confirm freshness ok
- Commit ordering
- Client waits for commit on modification X to
finish before starting modification Y - No pipelining of modifications on X Y
- Why wasteful? Most commits succeed!
22Key Idea Speculate on RPC responses
Client
Server
1) Checkpoint
RPC Req
RPC Req
Block!
2) Speculate!
RPC Resp
RPC Resp
3) Correct?
Yes discard ckpt.
No restore process re-execute
RPC Req
RPC Resp
- Guarantees without blocking I/O!
23Conditions of useful speculation
- Operations are highly predictable
- Checkpoints are cheaper than network I/O
- 52 µs for small process
- Computers have resources to spare
- Need memory and CPU cycles for speculation
24Implementing Speculation
1) System call
2) Create speculation
Time
Process
Checkpoint
25Speculation Success
1) System call
2) Create speculation
3) Commit speculation
Time
Process
Checkpoint
26Speculation Failure
2) Create speculation
1) System call
3) Fail speculation
Time
Process
Process
Checkpoint
27Ensuring Correctness
- Speculative processes hit barriers when they need
to affect external state - Cannot roll back an external output
- Three ways to ensure correct execution
- Block
- Buffer
- Propagate speculations (dependencies)
- Need to examine syscall interface to decide how
to handle each syscall
28Handle systems calls
- Block calls that externalize state
- Allow read-only calls (e.g. getpid)
- Allow calls that modify only task state (e.g.
dup2) - File system calls -- need to dig deeper
- Mark file systems that support Speculator
getpid
Call sys_getpid()
reboot
Block until specs resolved
mkdir
Allow only if fs supports Speculator
29Output Commits
1) sys_stat
2) sys_mkdir
3) Commit speculation
Time
Process
stat worked
Checkpoint
Checkpoint
mkdir worked
30Multi-Process Speculation
- Processes often cooperate
- Example make forks children to compile, link,
etc. - Would block if speculation limited to one task
- Allow kernel objects to have speculative state
- Examples inodes, signals, pipes, Unix sockets,
etc. - Propagate dependencies among objects
- Objects rolled back to prior states when specs
fail
31Multi-Process Speculation
Checkpoint
Checkpoint
Checkpoint
Checkpoint
Checkpoint
pid 8001
pid 8000
Chown-1
Chown-1
Write-1
Write-1
inode 3456
32Multi-Process Speculation
- Whats handled
- DFS objects, RAMFS, Ext3, Pipes FIFOs
- Unix Sockets, Signals, Fork Exit
- Whats not handled (i.e. block)
- System V IPC
- Multi-process write-shared memory
33Example NFSv3 Linux
Client 1
Client 2
Server
Modify B
Write
Commit
Open B
Getattr
34Example SpecNFS
Client 1
Client 2
Server
WriteCommit
Modify B
speculate
Getattr
Open B
speculate
Getattr
Open B
speculate
35Problem Mutating Operations
Client 1 1. cat foo gt bar
Client 2 2. cat bar
- bar depends on speculative execution of cat foo
- If bars state could be speculative, what does
client 2 view in bar?
36Solution Mutating Operations
- Server determines speculation success/failure
- State at server is never speculative
- Clients send server hypothesis speculation based
on - List of speculations an operation depends on
- Server reports failed speculations
- Server performs in-order processing of messages
37Server checks speculations status
Server
Client 1
Cat foogtbar
WriteCommit
Check if foo indeed has version1, if no
fail
38Group Commit
- Previously sequential ops now concurrent
- Sync ops usually committed to disk
- Speculator makes group commit possible
Client
Client
Server
Server
write
commit
write
commit
39Putting it Together SpecNFS
- Apply Speculator to an existing file system
- Modified NFSv3 in Linux 2.4 kernel
- Same RPCs issued (but many now asynchronous)
- SpecNFS has same consistency, safety as NFS
- Getattr, lookup, access speculate if data in
cache - Create, mkdir, commit, etc. always speculate
40Putting it Together BlueFS
- Design a new file system for Speculator
- Single copy semantics
- Synchronous I/O
- Each file, directory, etc. has version number
- Incremented on each mutating op (e.g. on write)
- Checked prior to all operations.
- Many ops speculate and check version async
41Apache Benchmark
- SpecNFS up to 14 times faster
42Rollback cost is small
- All files out of date SpecNFS up to 11x faster
43What weve learnt today
- Traditional Performance boosting techniques
- Caching
- Group commit
- Leases
- Two new techniques
- Content-based hash and chunking
- Speculative execution