Title: Exercising DHash with Distributed Backup
1Exercising DHash with Distributed Backup
- Robert Morris
- Frans Kaashoek, David Karger, Russ Cox, Frank
Dabek, Emil Sit, Josh Cates, Jacob Strauss, James
Robertson - http//pdos.lcs.mit.edu/chord
- MIT Laboratory for Computer Science
2Backup Characteristics
- Full dump only once ever
- Then just incremental snapshots
- Dump reads a lot, to verify old blocks
- Data caching isnt very useful
- Mostly bulk data movement
- Archived snapshots are on-line
- So interactive read also matters
3What do we deserve?
- Throughput
- Same as one TCP? N TCPs?
- Use window of RPCs to cover latency
- Lookup latency very cheap w/ proximity
- Read latency (w/ k copies) max / 2k
- Write latency max
4Initial Performance
- Bulk read 50 kbytes/second
- Bulk write 10 kbytes/second
- 90 nodes, Planetlab RON
- 5 replicas of each block
5What went wrong?
- Forced to talk to the predecessor
- Fetch from replica closest to predecessor
- Replicas are large, so puts were slow
- Unified congestion window
- Mimiced single TCP increase/decrease
- Worst-case timeouts, no fast retransmit
6Old Chord/DHash
replicas on successors
(key)
originator
4.
3.
predecessor, returns successor list
2.
1.
7Solutions
- 7-of-14 coding w/ IDA, not replication
- Maintain proximity to originator
- Avoid predecessor, or any particular node
- Fetch data from nodes close to originator
- Predict better RPC timeouts
- Key tool synthetic coordinates
8Vivaldi Synthetic Coordinates
- Each node estimates its own position
- Position (x,y) synthetic coordinates
- x and y units are time (milliseconds)
- Distance predicts network latency
- Key point predict w/o pinging first
- Like GNP, but no landmarks
9Vivaldi synthetic coordinates
- Each node starts with a random incorrect position
2,3
1,2
0,1
3,0
10Vivaldi synthetic coordinates
- Each node starts with a random incorrect position
A
- Each node pings a few other nodes to measure
network latency (distance)
2 ms
1 ms
1 ms
2 ms
B
11Vivaldi synthetic coordinates
- Each node starts with a random incorrect position
- Each node pings a few other nodes to measure
network latency (distance)
2
1
1
- Each nodes moves to cause measured distances to
match coordinates
2
12Vivaldi synthetic coordinates
- Each node starts with a random incorrect position
- Each node pings a few other nodes to measure
network latency (distance)
2
1
1
- Each nodes moves to cause measured distance to
match coordinates
2
13Vivaldi in action
- Execution on 86 PlanetLab Internet hosts
- Each host only pings a few other random hosts
- Most hosts find useful coordinates after a few
dozen pings
14Vivaldi predicts latency well
15New lookupfetch
G
Cs successor list
F
E
A
D
3.
C
2.
1.
B
- API lookup(k, m) returns first m successors of
k - Stops when successor list overlaps m
16Vivaldi helps decrease DHash fetch times
- 77 PlanetLab 18 RON nodes
- Fundamental limit about 100 ms?
17DHash throughput
Throughput (kbytes/second)
Berkeley
Mazu
NYU
Australia
18Conclusions
- Unclear what fair throughput is vs TCP
- Main latency costs
- Data is not likely to be close
- Last steps of lookup are expensive
- Need latency predictions to as-yet unknown nodes
- Synthetic coordinates are useful in many ways
http//pdos.lcs.mit.edu/chord
19Vivaldi vs. network coordinates
- Vivaldis coordinates match geography well
- over-sea distances shrink (faster than over-land)
- orientation of Australia and Europe is wrong