Title: Distributed Hash-based Lookup for Peer-to-Peer Systems
1Distributed Hash-based Lookupfor Peer-to-Peer
Systems
- Mohammed Junaid Azad 09305050
- Gopal Krishnan 09305915
- Mtech1 ,CSE
2Agenda
- Peer-to-Peer System
- Initial Approaches to Peer-to-Peer Systems
- Their Limitations
- Distributed Hash Tables
- CAN-Content Addressable Network
- CHORD
3Peer-to-Peer Systems
- Distributed and Decentralized Architecture
- No centralized Server(Unlike Client Server
Architecture) - Any Peer can behave as Server
4Napster
- P2P file sharing system
- Central Server stores the index of all the files
available on the network - To retrieve a file, central server contacted to
obtain location of desired file - Not completely decentralized system
- Central directory not scalable
- Single point of failure
5Gnutella
- P2P file sharing system
- No Central Server store to index the files
available on the network - File location process decentralized as well
- Requests for files are flooded on the network
- No Single point of failure
- Flooding on every request not scalable
6File Systems for P2P systems
- The file system would store files and their
metadata across nodes in the P2P network - The nodes containing blocks of files could be
located using hash based lookup - The blocks would then be fetched from those nodes
7Scalable File indexing Mechanism
- In any P2P system, File transfer process is
inherently scalable - However, the indexing scheme which maps file
names to location crucial for scalability - Solution- Distributed Hash Table
8Distributed Hash Tables
- Traditional name and location services provide a
direct mapping between keys and values - What are examples of values? A value can be an
address, a document, or an arbitrary data item - Distributed hash tables such as CAN/Chord
implement a distributed service for storing and
retrieving key/value pairs
9DNS vs. Chord/CAN
- DNS
- provides a host name to IP address mapping
- relies on a set of special root servers
- names reflect administrative boundaries
- is specialized to finding named hosts or services
- Chord
- can provide same service Name key, value IP
- requires no special servers
- imposes no naming structure
- can also be used to find data objects that are
not tied to certain machines
10Example Application using ChordCooperative
Mirroring
- Highest layer provides a file-like interface to
user including user-friendly naming and
authentication - This file systems maps operations to lower-level
block operations - Block storage uses Chord to identify responsible
node for storing a block and then talk to the
block storage server on that node
11CAN
12What is CAN ?
- CAN is a distributed infrastructure that provides
hash table like functionality - CAN is composed of many individual nodes
- Each CAN node stores a chunk (zone) of the entire
hash table - Request for a particular key is routed by
intermediate CAN nodes whose zone contains that
key - The design can be implemented in application
level (no changes to kernel required)
13Co-ordinate space in CAN
14Design Of CAN
- Involves a virtual d-dimensional Cartesian
Co-ordinate space - The co-ordinate space is completely logical
- Lookup keys hashed into this space
- The co-ordinate space is partitioned into zones
among all nodes in the system - Every node in the system owns a distinct zone
- The distribution of zones into nodes forms an
overlay network
15Design of CAN (..continued)
- To store (Key,value) pairs, keys are mapped
deterministically onto a point P in co-ordinate
space using a hash function - The (Key,value) pair is then stored at the node
which owns the zone containing P - To retrieve an entry corresponding to Key K, the
same hash function is applied to map K to the
point P - The retrieval request is routed from requestor
node to node owning zone containing P
16Routing in CAN
- Every CAN node holds IP address and virtual
co-ordinates of each of its neighbours - Every message to be routed holds the destination
co-ordinates - Using its neighbours co-ordinate set, a node
routes a message towards the neighbour with
co-ordinates closest to the destination
co-ordinates - Progress how much closer the message gets to the
destination after being routed to one of the
neighbours
17Routing in CAN(continued)
- For a d-dimensional space partitioned into n
equal zones, routing path length O(d.n1/d )
hops - With increase in no. of nodes, routing path
length grows as O(n1/d ) - Every node has 2d neighbours
- With increase in no. of nodes, per node state
does not change
18Before a node joins CAN
19After a Node Joins
20Allocation of a new node to a zone
- First the new node must find a node already in
CAN(Using Bootstrap Nodes) - The new node randomly chooses a point P in the
co-ordinate space - It sends a JOIN request to point P via any
existing CAN node - The request is forwarded using CAN routing
mechanism to the node D owning the zone
containing P - D then splits its node into half and assigns one
half to new node - The new neighbour information is determined for
both the nodes
21Failure of node
- Even if one of the neighbours fails, messages can
be routed through other neighbours in that
direction - If a node leaves CAN, the zone it occupies is
taken over by the remaining nodes - If a node leaves voluntarily, it can handover
its database to some other node - When a node simply becomes unreachable, the
database of the failed node is lost - CAN depends on sources to resubmit data, to
recover lost data
22CHORD
23Features
- CHORD is a distributed hash table implementation
- Addresses a fundamental problem in P2P
- Efficient location of the node that stores
desired data item - One operation Given a key, maps it onto a node
- Data location by associating a key with each data
item - Adapts Efficiently
- Dynamic with frequent node arrivals and
departures - Automatically adjusts internal tables to ensure
availability - Uses Consistent Hashing
- Load balancing in assigning keys to nodes
- Little movement of keys when nodes join and leave
24Features (continued)
- Efficient Routing
- Distributed routing table
- Maintains information about only O(logN) nodes
- Resolves lookups via O(logN) messages
- Scalable
- Communication cost and state maintained at each
node scales logarithmically with number of nodes - Flexible Naming
- Flat key-space gives applications flexibility to
map their own names to Chord keys - Decentralized
25Some Terminology
- Key
- Hash key or its image under hash function, as per
context - m-bit identifier, using SHA-1 as a base hash
function - Node
- Actual node or its identifier under the hash
function - Length m such that low probability of a hash
conflict - Chord Ring
- The identifier circle for ordering of 2m node
identifiers - Successor Node
- First node whose identifier is equal to or
follows key k in the identifier space - Virtual Node
- Introduced to limit the bound on keys per node to
K/N - Each real node runs O(logN) virtual nodes with
its own identifier
26Chord Ring
27Consistent Hashing
- A consistent hash function is one which changes
minimally with changes in the range of keys and a
total remapping is not required - Desirable properties
- High probability that the hash function balances
load - Minimum disruption, only O(1/N) of the keys moved
when a nodes joins or leaves - Every node need not know about every other node,
but a small amount of routing information - m-bit identifier for each node and key
- Key k assigned to Successor Node
28Simple Key Location
29Example
30Scalable Key Location
- A very small amount of routing information
suffices to implement consistent hashing in a
distributed environment - Each node need only be aware of its successor
node on the circle - Queries for a given identifier can be passed
around the circle via these successor pointers - Resolution scheme correct, BUT inefficient it
may require traversing all N nodes!
31Acceleration of Lookups
- Lookups are accelerated by maintaining additional
routing information - Each node maintains a routing table with (at
most) m entries (where N2m) called the finger
table - ith entry in the table at node n contains the
identity of the first node, s, that succeeds n by
at least 2i-1 on the identifier circle
(clarification on next slide) - s successor(n 2i-1) (all arithmetic mod 2)
- s is called the ith finger of node n, denoted by
n.finger(i).node
32Finger Tables (1)
1 2 4
1,2) 2,4) 4,0)
1 3 0
33Finger Tables (2) - characteristics
- Each node stores information about only a small
number of other nodes, and knows more about nodes
closely following it than about nodes farther
away - A nodes finger table generally does not contain
enough information to determine the successor of
an arbitrary key k - Repetitive queries to nodes that immediately
precede the given key will lead to the keys
successor eventually
34Pseudo code
35Example
36Node Joins with Finger Tables
finger table
keys
start
int.
succ.
6
1 2 4
1,2) 2,4) 4,0)
1 3 0
6
6
finger table
keys
start
int.
succ.
2
4 5 7
4,5) 5,7) 7,3)
0 0 0
6
6
37Node Departures with Finger Tables
finger table
keys
start
int.
succ.
1 2 4
1,2) 2,4) 4,0)
1 3 0
3
6
finger table
keys
start
int.
succ.
1
2 3 5
2,3) 3,5) 5,1)
3 3 0
6
finger table
keys
start
int.
succ.
6
7 0 2
7,0) 0,2) 2,6)
0 0 3
finger table
keys
start
int.
succ.
2
4 5 7
4,5) 5,7) 7,3)
6 6 0
0
38Source of InconsistenciesConcurrent Operations
and Failures
- Basic stabilization protocol is used to keep
nodes successor pointers up to date, which is
sufficient to guarantee correctness of lookups - Those successor pointers can then be used to
verify the finger table entries - Every node runs stabilize periodically to find
newly joined nodes
39Pseudo code
40Pseudo Code(Continue..)
41Stabilization after Join
- n joins
- predecessor nil
- n acquires ns as successor via some n
- n notifies ns being the new predecessor
- ns acquires n as its predecessor
- np runs stabilize
- np asks ns for its predecessor (now n)
- np acquires n as its successor
- np notifies n
- n will acquire np as its predecessor
- all predecessor and successor pointers are now
correct - fingers still need to be fixed, but old fingers
will still work
ns
pred(ns) n
n
succ(np) ns
pred(ns) np
succ(np) n
np
42Failure Recovery
- Key step in failure recovery is maintaining
correct successor pointers - To help achieve this, each node maintains a
successor-list of its r nearest successors on the
ring - If node n notices that its successor has failed,
it replaces it with the first live entry in the
list - stabilize will correct finger table entries and
successor-list entries pointing to failed node - Performance is sensitive to the frequency of node
joins and leaves versus the frequency at which
the stabilization protocol is invoked
43Impact of Node Joins on Lookups Correctness
- For a lookup before stabilization has finished,
- Case 1 All finger table entries involved in the
lookup are reasonably current then lookup finds
correct successor in O(logN) steps - Case 2 Successor pointers are correct, but
finger pointers are inaccurate. This scenario
yields correct lookups but may be slower - Case 3 Incorrect successor pointers or keys not
migrated yet to newly joined nodes. Lookup may
fail. Option of retrying after a quick pause,
during which stabilization fixes successor
pointers
44Impact of Node Joins on Lookups Performance
- After stabilization, no effect other than
increasing - the value of N in O(logN)
- Before stabilization is complete
- Possibly incorrect finger table entries
- Does not significantly affect lookup speed, since
distance halving property depends only on
ID-space distance - If new nodes IDs are between the target
predecessor and the target, then lookup speed is
influenced - Still takes O(logN) time for N new nodes
45Handling Failures
- Problem what if node does not know who its new
successor is, after failure of old successor - May be in a gap in the finger table
- Chord would be stuck!
- Maintain successor list of size r, containing
the nodes first r successors - If immediate successor does not respond,
substitute the next entry in the successor list - Modified version of stabilize protocol to
maintain the successor list - Modified closest_preceding_node to search not
only finger table but also successor list for
most immediate predecessor - If find_successsor fails, retry after some time
- Voluntary Node Departures
- Transfer keys to successor before departure
- Notify predecessor p and successor s before
leaving
46Theorems
- Theorem IV.3 Inconsistencies in successor are
transient - If any sequence of join operations is executed
interleaved with stabilizations, then at sometime
after the last join the successor pointers will
form a cycle on all the nodes in the network. - Theorem IV.4 Lookup take log(N) time with high
probability even if N nodes join a stable N node
network, once successor pointers are correct,
even if finger pointers are not updated - Theorem IV.6 If network is initially stable,
even if every node fails with probability ½,
expected time to execute find_succcessor is O(log
N)
47Simulation
- Implements Iterative Style (other one is
recursive style) - Node resolving a lookup initiates all
communication unlike Recursive Style, where
intermediate nodes forward request Optimizations - During stabilization, a node updates its
immediate - successor and 1 other entry in successor list or
finger table - Each entry out of k unique entries gets refreshed
once in - k stabilization rounds
- Size of successor list is 1
- Immediate notification of predecessor change to
old - predecessor, without waiting for next
stabilization round
48Parameters
- Mean of delay of each packet is 50 ms
- Round trip time is 500 ms
- Number of nodes is 104
- Number of Keys vary from 104 to 106
49Load Balance
- Test ability of consistent hashing, to allocate
keys - to nodes evenly
- Number of keys per node exhibits large
variations, - that increase linearly with the number of keys
- Association of keys with Virtual Nodes Makes the
number of keys per node more uniform and
Significantly improves load balance - Asymptotic value of query path length not
affected much - Total identifier space covered remains same on
average - Worst-case number of queries does not change
- Not much increase in routing state maintained
- Asymptotic number of control messages not affected
50In the absence of Virtual Node
51In Presence of Virtual Nodes
52Path Length
- Number of nodes that must be visited to resolve
- a query, measured as the query path length
- As per theorem, IV.2
- The number of nodes that must be contacted to
find a successor in an N-node Network is O(log N) - Observed Results
- Mean query path length increases logarithmically
with number of nodes - Average Same as expected average query path length
53Path Length Simulator Parameters
- A network with N 2K nodes
- No of Keys 100 x 2K
- K varied between 3 to 14 and Path length is
measured
54Graph
55Future Work
- Resilience against Network Partitions
- Detect and heal partitions
- For every node have a set of initial nodes
- Maintain a long term memory of a random set of
nodes - Likely to include nodes from other partition
- Handle Threats to Availability of data
- Malicious participants could present incorrect
view of data - Periodical Global Consistency Checks for each
node - Better Efficiency
- O(logN) messages per lookup too many for some
apps - Increase the number of fingers
56References
- Chord A Scalable Peer-to-Peer Lookup Service for
Internet Applications, I. Stoica, R. Morris, D.
Karger, M. Frans Kaashoek, H. Balakrishnan, In
Proc. ACM SIGCOMM 2001. Expanded version appears
in IEEE/ACM Trans. Networking, 11(1), February
2003. - A Scalable Content-Addressable Network,S.
Ratnasamy, P. Francis, M. Handley, R. Karp, S.
Shenker, In Proc. ACM SIGCOMM 2001)Â - Querying the Internet with PIERÂ Ryan Huebsch,
Joseph M. Hellerstein, Nick Lanham, Boon Thau
Loo, Scott Shenker, and Ion Stoica, VLDB 03Â
57Thank You !
58