Bandwidth%20and%20latency%20optimizations - PowerPoint PPT Presentation

About This Presentation
Title:

Bandwidth%20and%20latency%20optimizations

Description:

Bandwidth and latency optimizations Jinyang Li w/ speculator s from Ed Nightingale – PowerPoint PPT presentation

Number of Views:100
Avg rating:3.0/5.0
Slides: 44
Provided by: Jiny157
Category:

less

Transcript and Presenter's Notes

Title: Bandwidth%20and%20latency%20optimizations


1
Bandwidth and latency optimizations
  • Jinyang Li
  • w/ speculator slides from Ed Nightingale

2
What weve learnt so far
  • Programming tools
  • Consistency
  • Fault tolerance
  • Security
  • Today performance boosting techniques
  • Caching
  • Leases
  • Group commit
  • Compression
  • Speculative execution

3
Performance metrics
  • Throughput
  • Measures the achievable rate (ops/sec)
  • Limited by the bottleneck resource
  • 10Mbps link max 150 ops/sec for writing 8KB
    blocks
  • Increase tput by using less bottleneck resource
  • Latency
  • Measures the latency of a single client response
  • Reduce latency by pipelining multiple operations

4
Caching (in NFS)
  • NFS clients cache file content and directory name
    mappings
  • Caching saves network bandwidth, improves latency

5
Leases (not in NFS)
  • Leases eliminate latency in freshness check, at
    the cost of keeping extra state at the server

6
Group commit (in NFS)
  • Group commit reduces the latency of a sequence of
    writes

7
Two cool tricks
  • Further optimization for b/w and latency is
    necessary for wide area
  • Wide area network challenges
  • Low bandwidth (10100Mbps)
  • High latency (10100ms)
  • Promising solutions
  • Compression (LBFS)
  • Speculative execution (Speculator)

8
Low Bandwidth File System
  • Goal avoid redundant data transfer between
    clients and the server
  • Why isnt caching enough?
  • A file with duplicate content ? duplicate cache
    blocks
  • Two files that share content ? duplicate cache
    blocks
  • A file thats modified ? previous cache is useless

9
LBFS insights name by content hash
  • Traditional cache naming (fh, offset)
  • LBFS naming SHA-1(cached block)
  • Same contents have the same name
  • Two identical files share cached blocks
  • Cached blocks keep the same names despite file
    changes

10
Naming granularity
  • Name each file by its SHA-1 hash
  • Its rare for two files to be exactly identical
  • No cache reuse across file modifications
  • Cut a file into 8KB blocks, name each
    x8K,(x1)8K) range by hash
  • If block boundaries misalign, two almost
    identical files could share no common block
  • If block boundaries misalign, a new file could
    share no common block with its old version

11
Align boundaries across different files
  • Idea determine boundary based on the actual
    content
  • If two boundaries have the same 48-byte content,
    they probably correspond to the same position in
    a contiguous region of identical content

12
Align boundaries across different files
ab9f..0a
87e6b..f5
ab9f..0a
87e6b..f5
13
LBFS content-based chunking
  • Examine every sliding window of 48-bytes
  • Compute a 2-byte Rabin fingerprint f of 48-byte
    window
  • If the lower 13-bit of f is equal to v, f
    corresponds to a breakpoint
  • 2 consecutive breakpoints define a chunk
  • Average chunk size?

14
LBFS chunking
f1
f2
f3
f4
f1
f2
f3
f4
  • Two files with the same but misaligned content of
    x bytes
  • How many fingerprints for each x-byte content?
    How many breakpoints? Breakpoints aligned?

15
Why Rabin fingerprints?
  • Why not use the lower 13 bit of every 2-byte
    sliding window for breakpoints?
  • Data is not random, resulting in extremely
    variable chunk size
  • Rabin fingerprints computes a random 2-byte value
    out of 48-bytes data

16
Rabin fingerprint is fast
  • Treat 48-byte data D as a 48 digit radix-256
    number
  • f47 fingerprint of D047
  • ( D47 256D46 25646D1
  • 25647D0 ) q
  • f48 fingerprint of D1..48
  • ((f47 - D025647) 256 D48 ) q

17
LBFS reads
GETHASH
File not in cache
(h1, size1, h2, size2, h3, size3)
Fetching missing chunks Only saves b/w by reusing
common cached blocks across different files or
different versions of the same file
Ask for missing Chunks h1, h2
READ(h1,size1)
READ(h2,size2)
Reconstruct file as h1,h2,h3
18
LBFS writes
MKTMPFILE(fd)
Create tmp file fd
CONDWRITE(fd, h1,size1,
h2,size2,
h3,size3)
Transferring missing chunks saves b/w if
different files or different versions of the
same file have pieces of identical content
Reply with missing chunks h1, h2
HASHNOTFOUND(h1,h2)
TMPWRITE(fd, h1)
Construct tmp file from h1,h2,h3
TMPWRITE(fd, h2)
COMMITTMP(fd, target_fhandle)
copy tmp file content to target file
19
LBFS evaluations
  • In practice, there are lots of content overlap
    among different files and different version of
    the same file
  • Save a Word document
  • Recompile after a header change
  • Different versions of a software package
  • LBFS results in 1/10 b/w use

20
Speculative Execution in a Distributed File System
  • Nightingale et al.
  • SOSP05

21
How to reduce latency in FS?
  • What are potentially wasteful latencies?
  • Freshness check
  • Client issues GETATTR before reading from cache
  • Incurs an extra RTT for read
  • Why wasteful? Most GETATTRs confirm freshness ok
  • Commit ordering
  • Client waits for commit on modification X to
    finish before starting modification Y
  • No pipelining of modifications on X Y
  • Why wasteful? Most commits succeed!

22
Key Idea Speculate on RPC responses
Client
Server
1) Checkpoint
RPC Req
RPC Req
Block!
2) Speculate!
RPC Resp
RPC Resp
3) Correct?
Yes discard ckpt.
No restore process re-execute
RPC Req
RPC Resp
  • Guarantees without blocking I/O!

23
Conditions of useful speculation
  • Operations are highly predictable
  • Checkpoints are cheaper than network I/O
  • 52 µs for small process
  • Computers have resources to spare
  • Need memory and CPU cycles for speculation

24
Implementing Speculation
1) System call
2) Create speculation
Time
Process
Checkpoint
25
Speculation Success
1) System call
2) Create speculation
3) Commit speculation
Time
Process
Checkpoint
26
Speculation Failure
2) Create speculation
1) System call
3) Fail speculation
Time
Process
Process
Checkpoint
27
Ensuring Correctness
  • Speculative processes hit barriers when they need
    to affect external state
  • Cannot roll back an external output
  • Three ways to ensure correct execution
  • Block
  • Buffer
  • Propagate speculations (dependencies)
  • Need to examine syscall interface to decide how
    to handle each syscall

28
Handle systems calls
  • Block calls that externalize state
  • Allow read-only calls (e.g. getpid)
  • Allow calls that modify only task state (e.g.
    dup2)
  • File system calls -- need to dig deeper
  • Mark file systems that support Speculator

getpid
Call sys_getpid()
reboot
Block until specs resolved
mkdir
Allow only if fs supports Speculator
29
Output Commits
1) sys_stat
2) sys_mkdir
3) Commit speculation
Time
Process
stat worked
Checkpoint
Checkpoint
mkdir worked
30
Multi-Process Speculation
  • Processes often cooperate
  • Example make forks children to compile, link,
    etc.
  • Would block if speculation limited to one task
  • Allow kernel objects to have speculative state
  • Examples inodes, signals, pipes, Unix sockets,
    etc.
  • Propagate dependencies among objects
  • Objects rolled back to prior states when specs
    fail

31
Multi-Process Speculation
Checkpoint
Checkpoint
Checkpoint
Checkpoint
Checkpoint
pid 8001
pid 8000
Chown-1
Chown-1
Write-1
Write-1
inode 3456
32
Multi-Process Speculation
  • Whats handled
  • DFS objects, RAMFS, Ext3, Pipes FIFOs
  • Unix Sockets, Signals, Fork Exit
  • Whats not handled (i.e. block)
  • System V IPC
  • Multi-process write-shared memory

33
Example NFSv3 Linux
Client 1
Client 2
Server
Modify B
Write
Commit
Open B
Getattr
34
Example SpecNFS
Client 1
Client 2
Server
WriteCommit
Modify B
speculate
Getattr
Open B
speculate
Getattr
Open B
speculate
35
Problem Mutating Operations
Client 1 1. cat foo gt bar
Client 2 2. cat bar
  • bar depends on speculative execution of cat foo
  • If bars state could be speculative, what does
    client 2 view in bar?

36
Solution Mutating Operations
  • Server determines speculation success/failure
  • State at server is never speculative
  • Clients send server hypothesis speculation based
    on
  • List of speculations an operation depends on
  • Server reports failed speculations
  • Server performs in-order processing of messages

37
Server checks speculations status
Server
Client 1
Cat foogtbar
WriteCommit
Check if foo indeed has version1, if no
fail
38
Group Commit
  • Previously sequential ops now concurrent
  • Sync ops usually committed to disk
  • Speculator makes group commit possible

Client
Client
Server
Server
write
commit
write
commit
39
Putting it Together SpecNFS
  • Apply Speculator to an existing file system
  • Modified NFSv3 in Linux 2.4 kernel
  • Same RPCs issued (but many now asynchronous)
  • SpecNFS has same consistency, safety as NFS
  • Getattr, lookup, access speculate if data in
    cache
  • Create, mkdir, commit, etc. always speculate

40
Putting it Together BlueFS
  • Design a new file system for Speculator
  • Single copy semantics
  • Synchronous I/O
  • Each file, directory, etc. has version number
  • Incremented on each mutating op (e.g. on write)
  • Checked prior to all operations.
  • Many ops speculate and check version async

41
Apache Benchmark
  • SpecNFS up to 14 times faster

42
Rollback cost is small
  • All files out of date SpecNFS up to 11x faster

43
What weve learnt today
  • Traditional Performance boosting techniques
  • Caching
  • Group commit
  • Leases
  • Two new techniques
  • Content-based hash and chunking
  • Speculative execution
Write a Comment
User Comments (0)
About PowerShow.com