Title: OpenDHT: A Public DHT Service
1OpenDHT A Public DHT Service
- Sean C. Rhea
- UC Berkeley
- June 2, 2005
Joint work with Brighten Godfrey, Brad Karp,
John Kubiatowicz, Sylvia Ratnasamy, Scott
Shenker, Ion Stoica, and Harlan Yu
2Peer-to-Peer File Sharing
- Very simple insight
- Most computers unused most of the time
- Idea harness this spare capacity to
- Quickly download music files Napster, Gnutella
- Search for aliens SETI_at_Home
- Make free long-distance phone calls Skype
- Question how to find desired resource(s)?
- Early approaches scoped flooding
- Downsides scalability, accuracy
3A Better Search FacilityThe Distributed Hash
Table (DHT)
- Same interface as a programmatic hash table,
- put(key, value) stores value under key
- get(key) returns the value(s) stored under key
- But shared across many machines
- Implemented via an overlay network
4A Better Search FacilityThe Distributed Hash
Table (DHT)
stores k1,v1
put(k1,v1)
get(k1)
5DHTs and File SharingDHT Stores Pointers to
Files
6DHTs and File SharingDHT Stores Pointers to
Files
pointer to file
7DHTs and Spam DetectionDetecting Similar
Messages
8DHTs and Spam DetectionDetecting Similar
Messages
I love you!
I love you!
9DHTs and Spam DetectionDetecting Similar
Messages
I love you!
I love you!
10DHTs and Spam DetectionDetecting Similar
Messages
I love you!
I love you!
I love you!
11DHTs and Spam DetectionDetecting Similar
Messages
I love you!
I love you!
I love you!
12More DHT Applications
- Distributed Storage Systems
- CFS, HiveCache, PAST, Pastiche
- OceanStore / Pond
- Content Distribution Networks / Web Caches
- Bslash, Coral, Squirrel
- Indexing / Naming Systems
- Chord-DNS, CoDoNS, DOA, SFR
- Internet Query Processors
- Catalogs, PIER
- Communication Systems
- Bayeux, i3, MCAN, SplitStream
13Some Areas of DHT Research
- Better routing protocols
- One-hop, degree-optimal
- Load balancing
- Non-uniform key distributions
- Security
- Byzantine fault-tolerant routing
- Data redundancy and fault tolerance
- Replication, erasure-coding
- Stronger semantics
- Supporting read-modify-write
14How Many DHTs Will There Be?
File Sharing
Company Machine Cant Share Files
Owns Stock in Spam Company
15How Many DHTs Will There Be?
File Sharing
Redundant Link
16How Many DHTs Will There Be?
File Sharing
Unshared Links
17Benefits of Sharing a DHT
- Amortizes costs across applications
- Maintenance bandwidth, connection state, etc.
- Facilitates bootstrapping of new applications
- Working infrastructure already in place
- Allows for statistical multiplexing of resources
- Takes advantage of spare storage and bandwidth
- Facilitates upgrading existing applications
- Share DHT between application versions
18Challenges in Sharing a DHT
- Robustness
- Must be available 24/7
- Shared Interface Design
- Should be general, yet easy to use
- Resource Allocation
- Must protect against malicious/over-eager users
- Economics
- What incentives are there to provide resources?
19Challenges in Sharing a DHT
- Robustness
- Must be available 24/7
- Shared Interface Design
- Should be general, yet easy to use
- Resource Allocation
- Must protect against malicious/over-eager users
- Economics
- What incentives are there to provide resources?
20The DHT as a Service
21The DHT as a Service
OpenDHT
22The DHT as a Service
OpenDHT Clients
23The DHT as a Service
OpenDHT
24The DHT as a Service
What is this interface?
OpenDHT
25The Traditional Interface lookup
26The Traditional Interface lookup
lookup(k)
On reaching the successor of k, message passed to
an upcall
27DHTs and Spam DetectionDetecting Similar
Messages
Upcall Ive seen this message before!
I love you!
I love you!
28DHTs and Spam DetectionDetecting Similar
Messages
I love you!
I love you!
29Upcall Challenges
- Distribution
- How do we get new upcall code to all nodes?
30Upcall Challenges
31Upcall Challenges
- Distribution
- How do we get new upcall code to all nodes?
- Active networking experience is a warning
32Upcall Challenges
- Distribution
- How do we get new upcall code to all nodes?
- Active networking experience is a warning
- Security
- How do we safely run untrusted clients upcalls?
33What about Put/Get?
- Works great for some applications
- File sharing, for example
34DHTs and File SharingDHT Stores Pointers to
Files
35What about Put/Get?
- Works great for some applications
- File sharing, for example
- What about applications with upcalls?
- Our spam detection application, for example
36What about Put/Get?
- Works great for some applications
- File sharing, for example
- What about applications with upcalls?
- Our spam detection application, for example
- Idea let application nodes run the upcalls
- Each node only runs upcalls for the applications
that its participating in
37Upcall Example
File Sharing
put/get
OpenDHT
put/get
38Upcall Example
File Sharing
Spam Detection
put/get
OpenDHT
Whos handling hash(message)?
put/get
I love you!
39Upcall Example
File Sharing
Spam Detection
put/get
OpenDHT
Whos handling hash(message)?
put/get
I love you!
I love you!
40Upcall Example
File Sharing
Spam Detection
put/get
DHT keeps track of which nodes support which
upcalls via Recursive Distributed Rendezvous
(ReDiR)
OpenDHT
put/get
I love you!
I love you!
41ReDiR
- Goal Implement two functions using put/get
- join(namespace, node)
- node lookup(namespace, identifier)
L0
L1
L2
42ReDiR
- Goal Implement two functions using put/get
- join(namespace, node)
- node lookup(namespace, identifier)
A
L0
A
L1
A, B
C
L2
43ReDiR
- Goal Implement two functions using put/get
- join(namespace, node)
- node lookup(namespace, identifier)
A
L0
A, C
D
L1
A, B
C
D
L2
44ReDiR
- Goal Implement two functions using put/get
- join(namespace, node)
- node lookup(namespace, identifier)
A, D
L0
A, C
D
L1
A, B
C
D
E
L2
45ReDiR
- Goal Implement two functions using put/get
- join(namespace, node)
- node lookup(namespace, identifier)
A, D
L0
A, C
D, E
L1
A, B
C
D
E
L2
46ReDiR
- Join cost
- Worst case O(log n) puts and gets
- Average case O(1) puts and gets
A, D
L0
A, C
D, E
L1
A, B
C
D
E
L2
47ReDiR
- Goal Implement two functions using put/get
- join(namespace, node)
- node lookup(namespace, identifier)
A, D
L0
A, C
D, E
L1
successor
A, B
C
D
E
L2
H(A)
H(B)
H(C)
H(D)
H(E)
48ReDiR
- Goal Implement two functions using put/get
- join(namespace, node)
- node lookup(namespace, identifier)
A, D
L0
successor
A, C
D, E
L1
no successor
A, B
C
D
E
L2
H(A)
H(B)
H(C)
H(D)
H(E)
49ReDiR
- Goal Implement two functions using put/get
- join(namespace, node)
- node lookup(namespace, identifier)
successor
A, D
L0
no successor
A, C
D, E
L1
no successor
A, B
C
D
E
L2
H(A)
H(B)
H(C)
H(D)
H(E)
50ReDiR
- Lookup cost
- Worst case O(log n) gets
- Average case O(1) gets
A, D
L0
A, C
D, E
L1
A, B
C
D
E
L2
H(A)
H(B)
H(C)
H(D)
H(E)
51ReDiR Performance(On PlanetLab)
52OpenDHT Design Summary
- OpenDHT is a common infrastructure for
- Storage of values, pointers, etc.
- Organizing clients that handle application
upcalls - Benefits
- Amortizes maintenance costs across applications
- Facilitates bootstrapping of new applications
- Allows for statistical multiplexing of resources
53Impact
54Future Work
- OpenDHT makes a great common substrate for
- Soft-state storage
- Naming and rendezvous
- Many P2P applications also need to
- Traverse NATs
- Redirect packets within the infrastructure (as in
i3) - Refresh puts while intermittently connected
- All of these can be implemented with upcalls
- Who provides the machines that run the upcalls?
55Future Work
- We dont want to add upcalls to the core DHT
- Keep the main service simple, fast, and robust
- Can we build a separate upcall service?
- Some other set of machines organized with ReDiR
- Security can only accept incoming connections,
cant write to local storage, etc. - This should be enough to implement
- NAT traversal, reput service
- Some (most?) packet redirection
- What about more expressive security policies?
56For more information, seehttp//opendht.org/