Title: A Case Study in Building Layered DHT Applications
1A Case Study in Building Layered DHT Applications
- Yatin Chawathe
- Sriram Ramabhadran, Sylvia Ratnasamy, Anthony
LaMarca, Scott Shenker, Joseph Hellerstein
2Building distributed applications
- Distributed systems are designed to be scalable,
available and robust - What about simplicity of implementation and
deployment? - DHTs proposed as simplifying building block
- Simple hash-table API put, get, remove
- Scalable content-based routing, fault tolerance
and replication
3Can DHTs help
- Can we layer complex functionality on top of
unmodified DHTs? - Can we outsource the entire DHT operation to a
third-party DHT service, e.g., OpenDHT? - Existing DHT applications fall into two classes
- Simple unmodified DHT for rendezvous or storage,
e.g., i3, CFS, FOOD - Complex apps that modify the DHT for enhanced
functionality, e.g, Mercury, CoralCDN
4Outline
- Motivation
- A case study Place Lab
- Range queries with Prefix Hash Trees
- Evaluation
- Conclusion
5A Case Study Place Lab
- Positioning service for location-enhanced apps
- Clients locate themselves by listening for known
radio beacons (e.g. WiFi APs) - Database of APs and their known locations
Place Lab service Computes maps of AP MAC
address ? lat,lon
lat, lon ? list of APs . . .
AP ? lat, lon . . .
6Why Place Lab
- Developed by group of ubicomp researchers
- Not experts in system design and management
- Centralized deployment since March 2004
- Software downloaded by over 6000 sites
- Concerns over organizational control ?
decentralize the service - But, want to avoid implementation and deployment
overhead of distributed service
7How DHTs can help Place Lab
DHT storage and routing
Clients download local WiFi maps
War-drivers submit neighborhood logs
Place Lab servers compute AP location
- Automatic content-based routing
- Route logs by AP MAC address to appropriate Place
Lab server - Robustness and availability
- DHT managed entirely by third party
- Provides automatic replication and failure
recovery of database content
8Downloading WiFi Maps
?
DHT storage and routing
Clients download local WiFi maps
War-drivers submit neighborhood logs
Place Lab servers compute AP location
- Clients perform geographic range queries
- Download segments of the databasee.g., all
access points in Philadelphia - Can we perform this entirely on top of unmodified
third-party DHT - DHTs provide exact-match queries, not range
queries
9Supporting range queries
- Prefix Hash Trees
- Index built entirely with put, get,
removeprimitives - No changes to DHT topology or routing
- Binary tree structure
- Node label is a binary prefix of values stored
under it - Nodes split when they get too big
- Stored in DHT with node label as key
- Allows for direct access to interior and leaf
nodes
R
R0
R1
R11
R10
R01
R00
0 0000
3 0011
8 1000
R111
R110
R011
R010
6 0110
12 1100
14 1110
4 0100
5 0101
13 1101
15 1111
10PHT operations
- Lookup(K)
- Find leaf node whose label is prefix of K
- Binary search across Ks bits
- O(log log D) where D size of key space
- Insert(K, V)
- Lookup leaf node for K
- If full, split node into two
- Put value V into leaf node
- Query(K1, K2)
- Lookup node for P, where Plongest common prefix
of K1,K2 - Traverse subtree rooted at node for P
R
R0
R1
R11
R10
R01
R00
0 0000
3 0011
8 1000
R111
R110
R011
R010
6 0110
12 1100
14 1110
4 0100
5 0101
13 1101
15 1111
112-D geographic queries
7
- Convert lat/lon into 1-D key
- Use z-curve linearization
- Interleave lat/lon bits to create z-curve key
- Linearized query results may not be contiguous
- Start at longest prefix subtree
- Visit child nodes only if they can contribute to
query result
6
5
4
latitude
3
2
1
0
0
1
2
3
4
5
6
7
longitude
P10
12PHT Visualization
13Ease of implementation and deployment
- 2,100 lines of code to hook Place Lab into
underlying DHT service - Compare with 14,000 lines for the DHT
- Runs entirely on top of deployed OpenDHT service
- DHT handles fault tolerance and robustness, and
masks failures of Place Lab servers
14Flexibility of DHT APIs
- Range queries use only the get operation
- Updates use combination of put, get, remove
- But
- Concurrent updates can cause inefficiencies
- No support for concurrency in existing DHT APIs
- A test-and-set extension can be beneficial to
PHTs and a range of other applications - put_conditional perform the put only if value
has not changed since previous get
15PHT insert performance
- Median insert latency is 1.45 sec
- w/o caching 3.25 sec with caching 0.76 sec
16PHT query performance
Data size Latency (sec)
5k 2.13
10k 2.76
50k 3.18
100k 3.75
- Queries on average take 24 seconds
- Varies with block size
- Smaller (or very large) block size implies longer
query time
17Conclusion
- Concrete example of building complex applications
on top of vanilla DHT service - DHT provides ease of implementation and
deployment - Layering allows inheriting of robustness,
availability and scalable routing from DHT - Sacrifices performance in return