Title: Peer-To-Peer Systems
1Peer-To-Peer Systems
2Introduction
- Monolithic application
- Simple client-server
- Multi-tier client-server
- Request-response
- Pull /Push mode
- Tightly/loosely coupled
- Centralized/distributed systems
- Master-slave systems
- Peer-to-peer systems
- The concept of overlays
3Peer-to-peer systems
- Represents a paradigm for construction of
distributed systems and applications in which
data and computational resources are contributed
by many hosts on the internet. - Key issues
- Placement of data objects among many hosts
- Provision for accessing data that assures
balanced workload and availability - Low overhead
4P2P Systems
- Exploit existing naming, routing, data
replication, and security techniques to provide
reliable resource sharing layer over an
unreliable and untrusted collection of computers
and networks. - They are more suitable for immutable data.
- Fully decentralized and self-organizing.
- Important issues are placement and delivery of
data.
5Three Generations of P2P
- First generation is Napster music exchange server
- Second generation file sharing with greater
scalability, anonymity, and fault tolerance - Freenet, Gnutella, Kazaa, BitTorrent
- Third generation characterized by emergence of
middleware for application independence - Pastry, Tapestry,
6IP vs. overlay routing for peer-to-peer
applications
IP
Application
-
level routing over
l
ay
Scale
IPv4 is li
m
ited
to
2
32
addressable
n
odes.
The
Peer
-
to
-
peer systems can address
m
ore
objects.
IPv6 name space is much more
g
ener
o
us
The GUID name space is very large
and flat
(2
128
), but add
r
esses
in
both
versi
o
ns
are
(gt2 128
), allowing it to be much more
fully
hierarch
ically structured a
n
d
much
of
the
sp
a
ce
occupi
e
d.
is pre
-
allocated accordi
n
g
to
administrative
requirements.
Load balanc
i
ng
Loads on routers are determin
e
d
by
netwo
r
k
Object locations can be ra
n
domized
a
nd
h
e
nce
topology
a
nd
ass
o
ciated
traffic
patterns.
traffic patterns are divorced from the network
topology.
Network dynamics
IP routing
tables are updated asy
n
chr
o
nously
o
n
Routing tables can be u
p
dated
s
y
nchr
o
no
u
sly
or
(addition/deletion of
a best
-
efforts basis with time constants on
the
asynch
r
on
o
usly
with
fractions
of
a
second
objects/no
d
es)
order of 1 hour.
delays.
Fault tolerance
Redun
d
ancy
is
desi
g
ned
into
the
IP
network
by
Routes and object refer
e
nces
can
b
e
replicated
its managers, ensuring toleran
c
e
of
a
single
n
-
fold, ensuring toleran
c
e
of
n
failures of
nodes
router or network co
n
nectivity
failure.
n
-
fold
or conn
e
ctions.
replication is costly.
Target identificatio
n
Each IP address maps to exactly one target
Messages can be rout
e
d
to
the
nearest
replica
of
node.
a target object.
Security and
a
no
n
ymity
Addressing is only secu
r
e
when
all
nodes
are
Security can be achiev
e
d
ev
e
n
in
environm
e
nts
trusted. Anonymity for the owners of
addr
e
sses
with lim
i
ted
trust.
A
lim
i
ted
degree
of
is not achievable.
ano
n
ymity
can
be
pr
o
vided.
7Curious to know how GUID is generated?
Determine the values for the UTC-based timestamp
and clock sequence to be used in the UUID For the
purposes of this algorithm, consider the
timestamp to be a 60-bit unsigned integer and the
clock sequence to be a 14-bit unsigned integer.
Sequentially number the bits in a field, starting
with zero for the least significant bit. Set the
time_low field equal to the least significant 32
bits (bits zero through 31) of the timestamp in
the same order of significance. Set the time_mid
field equal to bits 32 through 47 from the
timestamp in the same order of significance. Set
the 12 least significant bits (bits zero through
11) of the time_hi_and_version field equal to
bits 48 through 59 from the timestamp in the same
order of significance. Set the four most
significant bits (bits 12 through 15) of the
time_hi_and_version field to the 4-bit version
number corresponding to the UUID version being
created, as shown in the table above. Set the
clock_seq_low field to the eight least
significant bits (bits zero through 7) of the
clock sequence in the same order of significance.
8More on GUID
- Are usually secure hash of attributes of the
resource so you dont expect the state of the
resource to change. - Hash of global clock resource state ip
address - http//www.famkruithof.net/uuid/uuidgen
- Remember N-fold replication possible, thus many
replications of an object of a given GUID may be
present.
9Overlay Routing
- It is different than IP routing however is
strongly influenced by IP routing. - Routing tables may be updated synchronously or
asynchronously. - N-fold replication affects the routing.
10Napster peer-to-peer file sharing with a
centralized, replicated index
11Napster (contd.)
- Napster demonstrated the feasibility of building
a useful large-scale service which depends wholly
on individual computers. - Music files are not updated state of resource
stable - No guarantees about availability it fit well for
music files
12Freenet and Gnutella
- Napster maintained a unified index of available
files (resources) - Gnutella and Freenet used partitioned and
distributed indexes and algorithms specific to
each system - file location problem NFS like system requires
substantial configuration.
13Peer-to-Peer Middleware
- Functional requirements
- Enable clients to locate and communicate with any
individual resource - Add and remove nodes
- Add and remove resources
- Simple API
- Non-functional requirements global scalability,
load balancing, accommodating to highly dynamic
host availability, trust, anonymity, deniability - Active research issue routing overlay
14Routing Overlay
- Routing overlay locates nodes and objects. It is
middleware layer responsible for routing requests
from clients to hosts that holds the object to
which request is addressed. - Main difference is that routing is implemented is
in application layer (besides the IP routing at
network layer)
15Distribution of information in a routing overlay
Ds routing knowledge
As routing knowledge
C
A
D
B
Object
Bs routing knowledge
Cs routing knowledge
Node
16Basic programming interface for a distributed
hash table (DHT) as implemented by the PAST API
over Pastry
put(GUID, data) The data is stored in replicas
at all nodes responsible for the object
identified by GUID. remove(GUID) Deletes all
references to GUID and the associated data. value
get(GUID) The data associated with GUID is
retrieved from one of the nodes responsible it.
17Basic programming interface for distributed
object location and routing (DOLR) as implemented
by Tapestry
publish(GUID ) GUID can be computed from the
object (or some part of it, e.g. its name). This
function makes the node performing a publish
operation the host for the object corresponding
to GUID. unpublish(GUID) Makes the object
corresponding to GUID inaccessible. sendToObj(msg,
GUID, n) Following the object-oriented
paradigm, an invocation message is sent to an
object in order to access it. This might be a
request to open a TCP connection for data
transfer or to return a message containing all or
part of the objects state. The final optional
parameter n, if present, requests the delivery
of the same message to n replicas of the object.
18Pastry Tapestry Routing
- Prefix routing based on GUIDs
- Narrow search for next node along the route by
applying a binary mask that selects an increasing
number of hexadecimal digits from the destination
GUID after each hop.
19Figure 10.7 First four rows of a Pastry routing
table
20Figure 10.8 Pastry routing example Based on
Rowstron and Druschel 2001
21Figure 10.9 Pastrys routing algorithm
22Figure 10.10 Tapestry routing From Zhao et al.
2004