Title: p2p06
1CAN, CHORD, BATON (extensions)
2Structured P2P
- Additional issues
- Fault tolerance, load balancing, network
awareness, concurrency - Replicate cache
- Performance evaluation
- CHORD
- ?????
3Summary Design parameters and performance (CAN)
Only on replicated data
4CHORD
- Additional issues discussed for CHORD
- Fault tolerance
- Concurrency
- Replication
- (Data) Load balancing
5CHORD Stabilization
Need to keep the finger tables consistent in the
case of concurrent inserts Keep the successors
of the nodes up-to-date Then, these successors
can be used to verify and correct the finger
tables
6CHORD Stabilization
- Thus
- Connect to the network by finding your successor
- Periodically run stabilize (to fix successors)
and less often run fix table
What about similar problems in CAN? If we forget
performance, in general, it suffices to keep the
network connected
7CHORD Stabilization
- A lookup before stabilization
- All finger tables reasonably current, an entry
is found in O(logN) steps - Successors are correct but finger tables are
inaccurate, correct but maybe slow lookups - Incorrect successors pointers or keys have not
moved yet, lookup may fail and needs to be retried
8CHORD Stabilization
- Works
- Concurrent joins
- Lost and reordered messages
- May not work (?)
- When system is split into multiple disjoint
circles - A single cycle that loops around the identifier
space - Caused by failures, network partitioning, etc
in general, left unclear
9CHORD Stabilization
Join
Upon joining, node n calls a known node n and
asks n to locate its successor
n.join(n) predecessor nil successor
n.find_succesor(n)
Note, the rest of the network does not know n yet
to achieve this run stabilize
10CHORD Stabilization
Stabilize
- Every node n runs stabilize periodically
- n asks its successor for its predecessor, say p
- n checks whether p should be its successor
instead - (this is, if p has joined the network in between)
this is how nodes find out of new nodes
n.stabilize() x successor.predecessor
if(x ? (n, successor)) successor x
successor.notify(n)
11CHORD Stabilization
Notify
- successor(n) is also notified and checks whether
it should make n its predecessor
n.notify(n) if (predecessor is nil or
n ? (predecessor, n)) predecessor n
12CHORD Stabilization
Node n joins
np
np.successor ns
n.predecessor nil n.successor ns
n
ns.predecessor np
ns
13Example n.stabilize
n runs stabilize
np
np.successor ns
n.predecessor nil n.successor ns
n
ns.notify(n)
ns.predecessor np
ns.predecessor n
ns
14Example - np.stabilize
np runs stabilize
np
np.successor ns
np.successor n
np.stabilize()
n.predecessor nil n.successor ns
n
ns.predecessor n
ns
15Example - np.stabilize
np
np.successor n
n.notify(np)
n.predecessor nil n.successor ns
n.predecessor np n.successor ns
n
ns.predecessor n
ns
16Chord Stabilization Fix fingers
- Finger tables are not updated immediately
- Thus, lookups may be slow by find_predecessor and
find_successor work - Periodically run fix_fingers
- pick a random entry in the finger table and
update it
n.fix_fingers() i random_index gt 1 into
finger fingeri.node find_successor(finge
ri.start)
17Stabilization
- Finger tables
- Must have a finger at each interval, then the
distance halving argument still holds - Lookup finds the predecessor of the target t,
say p, then p finds the target (it is its
successor) - Problem only when a new node enters between p
and t - Then we may need to scan these nodes linearly, ok
if O(logN) such nodes
18Stabilization
Eventually succeed Invariant Once a node can
reach a node r via successor pointers, it always
can Termination argument Assume two nodes (n1,
n2) that both think that they have the same
successor s Both attempt to notify s s will
eventually choose the one that is closer of the
two as its predecessor, say n1 The farthest of
the two n2 will then learn by s of a better
successor than s (n1) Thus, there is progress to
a better successor each time
19Failures
- When a node n fails
- Nodes that have n as their successors in the
finger table must be informed and find the
successor of n to replace it in their finger
table - Lookups in progress must continue
- Maintain correct successor pointers
20Failures
- Replication
- Each node maintain a successor list of its r
nearest successors - Upon failure, use the next successor in the list
- Modify stabilize to fix the list
21Failures
Other nodes may attempt to send requests through
the failed node Use alternate nodes found in the
routing table of preceding nodes or in the
successor list
22Example r3
Failures
3, 5, 6
15, 3, 5
5, 6, 9
14, 15, 3
6, 9, 12
9, 12, 14
12, 14, 15
23Failures
- Theorem If we use a successor list with r
?(logN) in an initially stable network and then
every node fails with probability 1/2, then - with high probability, find_successor returns
the closest living successor - the expected time to execute find_successor in
the failed network is O(logN)
A lookup fails, if all r nodes in the successor
list fail. All fail with probability 2-r
(independent failures) 1/N
24Replication
Store replicas of a key at the k nodes succeeding
the key Successor list helps to keep the number
of replicas per item known Other approach store
a copy per region
25Load balance
K keys, N nodes ideal allocation, each node gets
K/N
104 nodes, 105 106 keys, step of 105
Mean 1st and 99th percentiles (t? p?s?st? t??
?ata??µ?? p?? e??a? µ????te?? ? ?s?) of the
number of keys per node Large variation which
increases with the number of keys
26Load balance
K keys, N nodes ideal allocation, each node K/N
104 nodes, 5 x 105 keys
Probability Density Function
27Load balance
Node identifiers do not uniformly cover the
entire identifier space Assume N keys and N
nodes If we divide the space in N equal-sized
bins, then we would like to see one key per
node The probability that a particular bin is
empty is (1 1/N)N For large N, this approaches
e-1 0.368
28Load balance Virtual Nodes
- Introduce virtual nodes
- Map multiple virtual nodes with unrelated
identifiers to each real node - Each key is hashed into a virtual node which is
next mapped to an actual node - Increase number of virtual nodes from N -gt N logN
- Worst-case path length O(N logN)
- Each actual node needs r times as much space for
the finger tables of its virtual nodes. - If r log N, then log2N entries, for N 106
then 400 entries - The routing messages per node also increase
29Load balance
30Load balance
CAN One virtual node (zone) -gt Many physical
nodes reduce virtual nodes -gt decrease path
length physical network awareness Many virtual
nodes -gt One physical node increase virtual
nodes -gt increase path length data load balance
31Performance Evaluation
CAN Simulation Knobs-on full vs Bare bones Use
network topology generators CHORD Simulation
(runs in iterative style, note how does this
affect network proximity) Also a small size
distributed experiment that reports latency
measurements 10 sites in the USA, simulate more
than 10 nodes by running more than one copies of
CHORD at each of the sites
32Metrics for System Performance
- Path length overlay hops to route between two
nodes in the CAN space - Latency
- End-to-end latency between two nodes
- Per-hop latency end-to-end latency for the whole
path length - Neighbor-state
- Volume per node (indicative of data and query
load) - Load per node
lookup
33Metrics for System Performance
- Routing fault tolerance
- Data fault tolerance
- Maintenance cost cost of join/leave, replication
etc - How? Either separately or as the overall network
traffic - Churn (dynamic behavior)
34CHORD Magic Constants
Period of stabilize Period of fix_finger Maximum
number of virtual nodes m Number of virtual
nodes per physical node (O(logN)) Size of the
successor list (O(logN)) Hash function
35Path Length
Actually is ½ log N Follow half the log N bits
212 nodes
36CHORD more
- Simultaneous Node Failures
- Randomly select a fraction p of nodes that fail
- The network stabilizes
- Lookup failure rate is p (could be worst if say
network partition) - Lookups during stabilization
37CHORD future
- Detect malicious participants
- Or take a false position in the ring to steal
data - Physical network awareness
38BATON
- Additional issues
- Fault tolerance
- Other ways to restructure/balance the tree
- (Workload) load balance
39Failures
There is routing redundancy
- Upon node departure or failure, the parent can
reconstruct the entries - Assume node x fails, any detected failures of x
are reported to its parent y - y regenerates the routing tables of x Theorem 2
- Messages are routed
- Sideways (redundancy similar to CHORD)
- Up-down (can find its parent through its
neighbors)
40AVL-like Restructuring
The network may be restructured using AVL-like
rotations No data movement is needed, but some
routing tables need to be updated
41Load Balance
Each node keeps statistics about the number of
queries or messages it receives Adjust the data
range to equalize the workload between adjacent
nodes for leaves, find another (less loaded
leaf) say v have it transfer its load to its
parent and make v join again as the nodes
child Performance results (simulation) are
reported not surprises
42Replication - Beehive
- Proactive model-driven replication
- Passive (demand-driven) replication such as
caching objects along a lookup path - Hint for BATON
- Beehive
- The length of the average query path reduced by
one when an object is proactively replicated at
all nodes logically preceding that node on all
query paths - BATON
- Range queries
- Many paths to data