Introduction to Structured Overlay Networks

About This Presentation

Title:

Introduction to Structured Overlay Networks

Description:

Gnutella. Completely decentralized. Ask everyone you know to find data. Very inefficient ... scalability that systems like Gnutella display because of their ... – PowerPoint PPT presentation

Number of Views:52

Avg rating:3.0/5.0

Slides: 70

Provided by: sei114

Category:

more less

Transcript and Presenter's Notes

Title: Introduction to Structured Overlay Networks

1
Introduction to Structured Overlay Networks

Seif Haridi
KTH/SICS

1
2
Presentation Overview

Gentle introduction to Structured Overlay
Networks and Distributed Hash Tables
Chord algorithms and others

3
Whats a Distributed Hash Table (DHT)?
, which is distributed

An ordinary hash table
Every node provides a lookup operation
Provide the value associated with a key
Nodes keep routing pointers
If item not found, route to another node

11/20/2009
3
4
So what?
Time to find data is logarithmic Size of routing
tables is logarithmic Example log2(1000000)20 E
FFICIENT!
Store number of items proportional to number of
nodes Typically With D items and n nodes Store
D/n items per node Move D/n items when nodes
join/leave/fail EFFICIENT!

Self-management routing info
Ensure routing information is up-to-date
Self-management of items
Ensure that data is always replicated and
available

Characteristic properties
Scalability
Number of nodes can be huge
Number of items can be huge
Self-manage in presence joins/leaves/failures
Routing information
Data items

11/20/2009
4
5
Traditional Motivation (1/2)

Peer-to-Peer file sharing very popular
Napster
Completely centralized
Central server knows who has what
Judicial problems
Gnutella
Completely decentralized
Ask everyone you know to find data
Very inefficient

central index
decentralized index
11/20/2009
5
6
Traditional Motivation (2/2)

Grand vision of DHTs
Provide efficient file sharing
Quote from Chord In particular, Chord can
help avoid single points of failure or control
that systems like Napster possess, and the lack
of scalability that systems like Gnutella display
because of their widespread use of broadcasts.
Stoica et al. 2001
Hidden assumptions
Millions of unreliable nodes
User can switch off computer any time
(leavefailure)
Extreme dynamism (nodes joining/leaving/failing)
Heterogeneity of computers and latencies
Untrusted nodes

11/20/2009
6
7
Motivation DHT overlay as communication
infra-structure

Internet communication
IP/port, TCP and UDP
Not suited for 21st century computing
Firewalls
NATs
Changing IP addresses

11/20/2009
7
8
Name based communication

DHTs can overcome these
How?
Use the DHT
Map names to locations
Bypass firewalls and NATs by routing through
neighbors

11/20/2009
8
9
Name based communication

What about group communication?
IP Multicast is not enabled on the Internet
Use the overlay to broadcast to all nodes
Create multiple groups, broadcast within each

11/20/2009
9
10
Whats it good for?

Lets look at 10 applications built using such
systems

11
Distributed Backup

Setup
Clients installed the backup tool
Decide on amount of space to share
Choose files for backup
Regular backup
Data is encrypted
Stored in the directory

12
Distributed File System

Similar to AFS and NFS
Files stored in directory
What is new?
Application logic self-managed
Add/remove servers on the fly
Automatically handles failures
Automatically load-balances
No manual configuration needed

13
P2P Cache

A distributed cache
Every node in an org. runs a client
Want to browse a web page?
If exists locally -gt download it from a peer
Otherwise, fetch and cache
No central proxy needed

14
P2P Web Servers

Distributed Web Server
Pages stored in the directory
What is new?
Application logic self-managed
Automatically load-balances
Add/remove servers on the fly
Automatically handles failures

15
P2P SIP

Session Initiation Protocol
Used to initiate calls on the Internet
Is being standardized
Use the directory to find end-hosts
Improving Skype

16
Host Identity Payload (HIP)

Uses the directory to provide seamless mobility
Unlike Mobile IP
No home agent needed
Self-managing

17
PIER (databases)

A relational view of the directory
Use SQL to fetch data
Standard operations (projection, selection,
equi-join)

18
Summary

DHT is a useful data structure
Assumptions mentioned might not be true
Moderate amount of dynamism
Leave not same thing as failure
Dedicated servers
Nodes can be trusted
Less heterogeneity

19
Chord as Example of DHT
20
How to construct a DHT (Chord)?

Use a logical name space, called the identifier
space, consisting of identifiers 0,1,2,, N-1
Identifier space is a logical ring modulo N
Every node picks a random identifier though Hash
H
Example
Space N16 0,,15
Five nodes a, b, c, d, e
a picks 6
b picks 5
c picks 0
d picks 11
e picks 2

11/20/2009
20
21
Definition of Successor

The successor of an identifier is the
first node met going in clockwise direction
starting at the identifier
Example
succ(12)14
succ(15)2
succ(6)6

11/20/2009
21
22
Where to store data (Chord) ?

Use globally known hash function, H
Each item ltkey,valuegt gets
identifier H(key) k
Store each item at its successor
Node n is responsible for item k
Example
H(Marina)12
H(Peter)2
H(Seif)9
H(Stefan)14

Store number of items proportional to number of
nodes Typically (on average) With D items and n
nodes Store D/n items per node Move D/n items
when nodes join/leave/fail EFFICIENT!
11/20/2009
22
23
Where to point (Chord) ?

Each node points to its successor
The successor of a node n is succ(n1)
Known as a nodes succ pointer
Each node points to its predecessor
First node met in anti-clockwise direction
starting at n-1
Known as a nodes pred pointer
Example
0s successor is succ(1)2
2s successor is succ(3)5
5s successor is succ(6)6
6s successor is succ(7)11
11s successor is succ(12)0

11/20/2009
23
24
DHT Lookup

To lookup a key k
Calculate H(k)
Follow succ pointers until item k is found
Example
Lookup Seif at node 2
H(Seif)9
Traverse nodes
2, 5, 6, 11 (BINGO)
Return Stockholm to initiator

11/20/2009
24
25
DHT Lookup

(a, b the segment of the ring moving clockwise
from but not including a until and including b
n.foo(.) denotes an RPC of foo(.) to node n
n.bar denotes and RPC to fetch the value of the
variable bar in node n
We call the process of finding the successor of
an id a LOOKUP
// ask node n to find the successor of id
procedure n.findSuccessor(id)
if predecessor ? nil ? id ? (predecessor, n
then return n
else if id ?(n, successor then
return successor
else // forward the query around the circle
return successor.findSuccessor(id)

11/20/2009
25
26
DHT Lookup and Update

// ask node n to find the successor of id
procedure n.put(id,value)
s findSuccessor(id)
s.store(id,value)
procedure n.get(id)
s findSuccessor(id)
return s.retrieve(id)
PUT and GET are nothing but lookups!!

11/20/2009
26
27
Speeding up lookups

If only pointer to succ(n1) is used
Worst case lookup time is N, for N nodes
Improving lookup time (finger/routing table)
Point to succ(n1)
Point to succ(n2)
Point to succ(n4)
Point to succ(n8)
Point to succ(n2M-1)
Distance always halved to
the destination

Time to find data is logarithmic Size of routing
tables is logarithmic Example log2(1000000)20 E
FFICIENT!
11/20/2009
27
28
Chord Routing (1/7)
Get(15)
0
15
1
15

Routing table size M, where N 2M
Every node n knows successor(n 2 i-1) ,for i
1..M
Routing entries log2(N)
log2(N) hops from any node to any other node

2
14
13
3
12
4
11
5
10
6
9
7
8
11/20/2009
28
29
Chord Routing (2/7)
0
15
1
15

Routing table size M, where N 2M
Every node n knows successor(n 2 i-1) ,for i
1..M
Routing entries log2(N)
log2(N) hops from any node to any other node

2
14
13
3
12
4
11
5
10
6
9
Get(15)
7
8
11/20/2009
29
30
Chord Routing (3/7)
Get(15)
0
15
1
15

Routing table size M, where N 2M
Every node n knows successor(n 2 i-1) ,for i
1..M
Routing entries log2(N)
log2(N) hops from any node to any other node

2
14
13
3
12
4
11
5
10
6
9
7
8
11/20/2009
30
31
Chord Routing (4/7)
Get(15)
0
15
1
15

From node 1, only 2 hops to node 0 where item 15
is stored
For an id space of 16 is, the maximum is log2(16)
4 hops between any two nodes
In fact, if nodes are uniformly distributed, the
maximum is log2( of nodes), i.e. log2(8) hops
between any two nodes
The average complexity is
½ log(nodes)

2
14
13
3
12
4
11
5
10
6
9
7
8
11/20/2009
31
32
Chord Routing (5/7) Pseudo code
findSuccessor(.)

// ask node n to find the successor of id
procedure n.findSuccessor(id)
if predecessor ? nil ? id ? (predecessor, n
then return n
if id ?(n, successor then
return successor
else
n closestPrecedingNode(id)
return n.findSuccessor(id)
// search locally for the highest predecessor of
id
procedure closestPrecedingNode(id)
for i m downto 1 do
if fingeri ?(n, id) then return
fingeri
end
return n

11/20/2009
32
33
Chord Discussion

We are basically done
But.
What about joins and failures/leaves?
Nodes come and go as they wish
What about data?
Should I lose my doc because some kid decided to
shut down his machine and he happened to store my
file? What about storing addresses of files
instead of files?
What did we gain compared to Gnutella? Increased
guarantees and determinism?
So actually we just started..

11/20/2009
33
34
Agenda

Handling successor pointers
Joins, Leaves
Scalability
Routing table reducing the cost from O(N) to
O(logN)
Failures (for all the above)

11/20/2009
34
35
Handling SuccessorsRing maintenance

Every thing depends on successor pointers, so, we
better have them right all the time!!
In Chord, in addition to the successor pointer,
every node has a predecessor pointer as well for
ring maintenance

11/20/2009
35
36
Handling Dynamism

Periodic stabilization is used to make pointers
eventually correct
Try pointing succ to closest alive successor
Try pointing pred to closest alive predecessor

When receiving notify(p) at n
if prednil or p is in (pred,n
set predp

Periodically at n
vsucc.pred
if v?nil and v is in (n,succ
set succv
send a notify(n) to succ

11/20/2009
36
37
Handling joins

When n joins
Find ns successor with lookup(n)
Set succ to ns successor
Stabilization fixes the rest

15
13
11

Periodically at n
set vsucc.pred
if v?nil and v is in (n,succ
set succv
send a notify(n) to succ

When receiving notify(p) at n
if prednil or p is in (pred,n
set predp

11/20/2009
S. Haridi, ID2210, Lecture 02
37
38
Handling Successors - Chord Algorithm
nil
11/20/2009
38
39
Handling Join/Leaves For FingersFinger
Stabilization (1/5)

Periodically refresh finger table entries, and
store the index of the next finger to fix
This is also the initialization procedure for the
finger table (copy the finger table of succ, then
fix )
Local variable next initially 0
procedure n.fixFingers()next next1if next gt
m then next 1fingernext findSuccessor(n
? 2next-1)

11/20/2009
39
40
Examplefinger stabilization (2/5)

Current situation succ(N48) is N60
Succ(N21.Fingerj.start) Succ(53)
N21.Fingerj.node N60

N21.Fingerj.node
N21.Fingerj.start
N21
N32
N26
N60
N48
N53
11/20/2009
40
41
Examplefinger stabilization (3/5)

New node N56 joins and stabilizes successor
pointer
Finger j of node N21 is wrong
N21 eventually try to fix finger j by looking up
53 which stops at N48, however and nothing
changes

N21.Fingerj.node
N21.Fingerj.start
N21
N32
N26
N60
N48
N53
N56
11/20/2009
41
42
Examplefinger stabilization (4/5)

N48 will eventually stabilize its successor
This means the ring is correct now.

N21.Fingerj.node
N21.Fingerj.start
N21
N32
N26
N60
N56
N48
N53
11/20/2009
42
43
Examplefinger stabilization (5/5)

When N21 tries to fix Finger j again, this time
the response from N48 will be correct and N21
corrects the finger

N21.Fingerj.node
N21.Fingerj.start
N21
N32
N26
N60
N56
N48
N53
11/20/2009
43
44
Agenda

Handling successor pointers
Joins, Leaves,
Scalability
Routing table reducing the cost from O(N) to
O(log N)
Failures (for all the above)
Handling data
Joins, Leaves

11/20/2009
44
45
Handling Failures Replication of Successors

Evidently the failure of one successor pointer
means total collapse
Solution A node has a successors list of size
r containing the immediate r successors
How big should r be? log(nodes) or a large
constant should be ok
Enhance periodic stabilization to handle failures

11/20/2009
45
46
Dealing with failures

Each node keeps a successor-list
Pointer to r closest successors
succ(n1)
succ(succ(n1)1)
succ(succ(succ(n1)1)1)
...
If successor fails
Replace with closest alive successor
If predecessor fails
Set pred to nil

11/20/2009
46
47
Handling leaves

When n leaves
Just dissappear (like failure)
When pred detected failed
Set pred to nil
When succ detected failed
Set succ to closest alive in successor list

15
13
11

Periodically at n
set vsucc.pred
if v?nil and v is in (n,succ
set succv
send a notify(n) to succ

When receiving notify(p) at n
if prednil or p is in (pred,n
set predp

11/20/2009
S. Haridi, ID2210, Lecture 02
47
48
Handling Failures- Ring (1/5)

Maintaining the ring
Each node maintains a successor list of length r
If a nodes immediate successor fails, it uses
the second entry in its successor list
updateSuccessorList copies a successor list from
s removing last entry, and prepending s
Join a Chord containing node n
procedure n.join(n) predecessor nil s
n.findSuccessor(n) updateSuccessorList(s.success
orList)

11/20/2009
48
49
Handling Failures- Ring (2/5)

Check whether predecessor has failed (Failure
detector)
procedure n.checkPredecessor()if predecessor
has failed then predecessor nil

11/20/2009
49
50
Handling Failures- Ring (3/5)

procedure n.stabilize()
s Find first alive node in successorList
x s.predecessorif x not nil and x ? (n, s)
then s x endupdateSuccessorList(s.successorLis
t) s.notify(n)
procedure n.notify(n)if predecessor nil or
n? (predecessor, n) then predecessor n

11/20/2009
50
51
Failure Ring (4/5)Example Node failure (N26)

Initially

suc(N21,2)
suc(N21,1)
suc(N26,1)
N32
N26
N21
pred(N32)
pred(N32)

After N21 performed stabilize(), before
N21.notify(N32)

suc(N21,1)
N32
N26
N21
pred(N32)
11/20/2009
51
52
Failure Ring (5/5)Example - Node failure
(N26)

After N21 performed stabilize(), before
N21.notify(N32)
N21.notify(N32) has no effect

suc(N21,1)
N32
N26
N21
pred(N32)

After N32.checkPredecessor()

suc(N21,1)
N32
N26
N21

Next N21.stabilize() fixes N32s predecessor

11/20/2009
52
53
Failure Lookups (1/5)

// ask node n to find the successor of id
procedure n.findSuccessor(id)
if id ?(n, successor then
return successor
else
n closestPreceedingNode(id)
return try
n.findSuccessor(id) catch failure of n
then mark n in finger. as
failed n.findSuccessor(id)
// search locally for the highest predecessor of
id
procedure closestPreceedingNode(id)
for i m downto 1 do
if fingeri.node is alive and
fingeri ?(n, id) then return fingeri
end
return n

11/20/2009
53
54
Some Chord Results Load balancing of keys

For any set of N nodes and K keys, with high
probability
Each node is responsible for at most (1 ?)K/N
keys
When an (N 1)st node joins or leaves the
network, responsibility for O(K/N) keys changes
hands (and only to or from the joining or leaving
node)
? is bounded by (at most) O(log N)

55
Some Chord resultsLoad balancing of keys

? is reduced to a small constant by running log N
virtual nodes (each with own identifier) on each
physical node.

56
Some Chord ResultsLookup is logarithmic in
number of Nodes

With high probability, the number of nodes that
must be contacted to find a successor in an
N-node network is O(log N).
This is only if node and key identifiers are
random.

57
Some Chord ResultsSuccessor List Failure

If we use a successor list of length r ?(logN)
in a network that is initially stable, and then
every node fails with probability 1/2, then with
high probability find successor returns the
closest living successor to the query key
Notice it required the nodes in the successor
list are random

58
Variations of Chord

DKS
Chord

59
DKS Routing

Generalization of Chord to provide arbitrary
arity
Provide logk(n) hops per lookup
k being a configurable parameter
n being the number of nodes
Instead of only log2(n)

60
Achieving logk(n) lookup

Each node logk(N)L levels, NkL
Each level contains k intervals,
Example, k4, N64 (43), node 0

0
4
8
12
48
16
32
61
Achieving logk(n) lookup

Each node logk(N) levels, NkL
Each level contains k intervals,
Example, k4, N64 (43), node 0

0
4
8
12
48
16
32
62
Achieving logk(n) lookup

Each node logk(N) levels, NkL
Each level contains k intervals,
Example, k4, N64 (43), node 0

0
4
8
12
48
16
32
63
Arity is Important

Maximum number of hops can be configured
Example, a 2-hop system

64
Chord

The routing table has exponentially increasing
pointers on the ring (node space) and NOT the
identifier space (skip-list like structure)

65
Routing Table of Chord

Building the routing table

log2N pointers
exponentially spaced pointers

Chord
66
Chord vs. Chord
Good for load balancing
67
Effect of virtual nodes
68
Stretch (proximity routing)

the ratio between the
latency of a Chord lookup from the time the
lookup is initiated to the time the result is
returned to the initiator, and
latency of an optimal lookup using the underlying
network
Network lookup
is computed as the round-trip time between the
initiator and the server responsible for the
queried ID.

69
Stretch

Write a Comment

User Comments (0)