CS 162: P2P Networks - PowerPoint PPT Presentation

1 / 33

About This Presentation

Title:

CS 162: P2P Networks

Description:

BitTorrent (1/2) Allow fast downloads even when sources have low connectivity. How does it work? ... BitTorrent (2/2) Download consists of three phases: Start: ... – PowerPoint PPT presentation

Number of Views:33

Avg rating:3.0/5.0

Slides: 34

Provided by: sto2

Learn more at: https://people.eecs.berkeley.edu

Category:

more less

Transcript and Presenter's Notes

Title: CS 162: P2P Networks

1
CS 162 P2P Networks

Computer Science Division
Department of Electrical Engineering and Computer
Sciences
University of California, Berkeley
Berkeley, CA 94720-1776

2
Main Challenge

Find where a particular file is stored
Note problem similar to finding a particular
page in web caching (see last lecture what are
the differences?)

E
F
D
E?
C
A
B
3
Other Challenges

Scale up to hundred of thousands or millions of
machines
Dynamicity machines can come and go any time

4
Napster

Assume a centralized index system that maps files
(songs) to machines that are alive
How to find a file (song)
Query the index system ? return a machine that
stores the required file
Ideally this is the closest/least-loaded machine
ftp the file
Advantages
Simplicity, easy to implement sophisticated
search engines on top of the index system
Disadvantages
Robustness, scalability (?)

5
Napster Example
m5
E
m6
F
D
m1 A m2 B m3 C m4 D m5 E m6 F
m4
C
A
B
m3
m1
m2
6
Gnutella

Distribute file location
Idea broadcast the request
Hot to find a file
Send request to all neighbors
Neighbors recursively multicast the request
Eventually a machine that has the file receives
the request, and it sends back the answer
Advantages
Totally decentralized, highly robust
Disadvantages
Not scalable the entire network can be swamped
with requests (to alleviate this problem, each
request has a TTL)

7
Gnutella Example

Assume m1s neighbors are m2 and m3 m3s
neighbors are m4 and m5

m5
E
m6
F
D
m4
C
A
B
m3
m1
m2
8
Two-Level Hierarchy
Oct 2003 Crawl on Gnutella

Current Gnutella implementation, KaZaa
Leaf nodes are connected to a small number of
ultrapeers (suppernodes)
Query
A leaf sends query to its ultrapeers
If ultrapeers dont know the answer, they flood
the query to other ultrapeers
More scalable
Flooding only among ultrapeers

Ultrapeer nodes
Leaf nodes
9
Skype
login server

Peer-to-peer Internet Telephony
Two-level hierarchy like KaZaa
Ultrapeers used mainly to route traffic between
NATed end-hosts (see next slide)
plus a login server to
authenticate users
ensure that names are unique across network

B
Messages exchanged to login server
A
Data traffic
(Note probable protocol Skype protocol is not
published)
10
Detour NAT (1/3)

Network Address Translation
Motivation address scarcity problem in IPv4
Allow to independently allocate addresses to
hosts behind NAT
Two hosts behind two different NATs can have the
same address

64.36.12.64
Internet
192.168.0.1
NAT box
192.168.0.1
NAT box
169.32.41.10
192.168.0.2
128.2.12.30
Same address
11
Detour NAT (2/3)

Main idea use port numbers to multiplex/demultipl
ex connections of NATed end-hosts
Map (IPaddr, Port) of a NATed host to (IPaddrNAT,
PortNAT)

192.168.0.1
64.36.12.64
Internet
NAT box
169.32.41.10
12
Detour NAT (3/3)

Limitations
Number of machines behind a NAT lt 64000. Why?
A host outside NAT cannot initiate connection to
a host behind a NAT
Skype and other P2P systems use
Login servers and ultrapeers to solve limitation
(2)
How? (Hint ultrapeers have globally unique
(Internet-routable) IP addresses)

13
BitTorrent (1/2)

Allow fast downloads even when sources have low
connectivity
How does it work?
Split each file into pieces ( 256 KB each), and
each piece into sub-pieces ( 16 KB each)
The loader loads one piece at a time
Within one piece, the loader can load up to five
sub-pieces in parallel

14
BitTorrent (2/2)

Download consists of three phases
Start get a piece as soon as possible
Select a random piece
Middle spread all pieces as soon as possible
Select rarest piece next
End avoid getting stuck with a slow source, when
downloading the last sub-pieces
Request in parallel the same sub-piece
Cancel slowest downloads once a sub-piece has
been received

(For details see http//bittorrent.com/bittorrent
econ.pdf)
15
Distributed Hash Tables

Problem
Given an ID, map to a host
Challenges
Scalability hundreds of thousands or millions of
machines
Instability
Changes in routes, congestion, availability of
machines
Heterogeneity
Latency 1ms to 1000ms
Bandwidth 32Kb/s to 100Mb/s
Nodes stay in system from 10s to a year
Trust
Selfish users
Malicious users

16
Content Addressable Network (CAN)

Associate to each node and item a unique id in an
d-dimensional space
Properties
Routing table size O(d)
Guarantees that a file is found in at most dn1/d
steps, where n is the total number of nodes

17
CAN Example Two Dimensional Space

Space divided between nodes
All nodes cover the entire space
Each node covers either a square or a rectangular
area of ratios 12 or 21
Example
Assume space size (8 x 8)
Node n1(1, 2) first node that joins ? cover the
entire space

7
6
5
4
3
n1
2
1
0
2
3
4
5
6
7
0
1
18
CAN Example Two Dimensional Space

Node n2(4, 2) joins ? space is divided between
n1 and n2

7
6
5
4
3
n1
n2
2
1
0
2
3
4
5
6
7
0
1
19
CAN Example Two Dimensional Space

Node n2(4, 2) joins ? space is divided between
n1 and n2

7
6
n3
5
4
3
n1
n2
2
1
0
2
3
4
5
6
7
0
1
20
CAN Example Two Dimensional Space

Nodes n4(5, 5) and n5(6,6) join

7
6
n5
n4
n3
5
4
3
n1
n2
2
1
0
2
3
4
5
6
7
0
1
21
CAN Example Two Dimensional Space

Nodes n1(1, 2) n2(4,2) n3(3, 5)
n4(5,5)n5(6,6)
Items f1(2,3) f2(5,1) f3(2,1) f4(7,5)

7
6
n5
n4
n3
5
f4
4
f1
3
n1
n2
2
f3
1
f2
0
2
3
4
5
6
7
0
1
22
CAN Example Two Dimensional Space

Each item is stored by the node who owns its
mapping in the space

7
6
n5
n4
n3
5
f4
4
f1
3
n1
n2
2
f3
1
f2
0
2
3
4
5
6
7
0
1
23
CAN Query Example

Each node knows its neighbors in the d-space
Forward query to the neighbor that is closest to
the query id
Example assume n1 queries f4

7
6
n5
n4
n3
5
f4
4
f1
3
n1
n2
2
f3
1
f2
0
2
3
4
5
6
7
0
1
24
Chord

Associate to each node and item a unique ID in an
uni-dimensional space
Properties
Routing table size O(log(N)) , where N is the
total number of nodes
Guarantees that a file is found in O(log(N)) steps

25
Data Structure

Assume identifier space is 0..2m
Each node maintains
Finger table
Entry i in the finger table of n is the first
node that succeeds or equals n 2i
Predecessor node
An item identified by id is stored on the
succesor node of id

26
Chord Example

Assume an identifier space 0..8
Node n1(1) joins?all entries in its finger table
are initialized to itself

Succ. Table
0
i id2i succ 0 2 1 1 3 1 2 5
1
1
7
2
6
3
5
4
27
Chord Example

Node n2(3) joins

Succ. Table
0
i id2i succ 0 2 2 1 3 1 2 5
1
1
7
2
6
Succ. Table
i id2i succ 0 3 1 1 4 1 2 6
1
3
5
4
28
Chord Example
Succ. Table

Nodes n3(0), n4(6) join

i id2i succ 0 1 1 1 2 2 2 4
0
Succ. Table
0
i id2i succ 0 2 2 1 3 6 2 5
6
1
7
Succ. Table
i id2i succ 0 7 0 1 0 0 2 2
2
2
6
Succ. Table
i id2i succ 0 3 6 1 4 6 2 6
6
3
5
4
29
Chord Examples
Succ. Table
Items

Nodes n1(1), n2(3), n3(0), n4(6)
Items f1(7), f2(2)

7
i id2i succ 0 1 1 1 2 2 2 4
0
0
Succ. Table
Items
1
1
7
i id2i succ 0 2 2 1 3 6 2 5
6
2
6
Succ. Table
i id2i succ 0 7 0 1 0 0 2 2
2
Succ. Table
i id2i succ 0 3 6 1 4 6 2 6
6
3
5
4
30
Query

Upon receiving a query for item id, a node
Check whether stores the item locally
If not, forwards the query to the largest node in
its successor table that does not exceed id

Succ. Table
Items
7
i id2i succ 0 1 1 1 2 2 2 4
0
0
Succ. Table
Items
1
1
7
i id2i succ 0 2 2 1 3 6 2 5
6
query(7)
2
6
Succ. Table
i id2i succ 0 7 0 1 0 0 2 2
2
Succ. Table
i id2i succ 0 3 6 1 4 6 2 6
6
3
5
4
31
Discussion

Query can be implemented
Iteratively
Recursively
Performance routing in the overlay network can
be more expensive than in the underlying network
Because usually there is no correlation between
node IDs and their locality a query can
repeatedly jump from Europe to North America,
though both the initiator and the node that store
the item are in Europe!
Solutions Tapestry takes care of this
implicitly CAN and Chord maintain multiple
copies for each entry in their routing tables and
choose the closest in terms of network distance

32
Discussion

Robustness
Maintain multiple copies associated to each entry
in the routing tables
Replicate an item on nodes with close ids in the
identifier space
Security
Can be build on top of CAN, Chord, Tapestry, and
Pastry

33
Discussion

The key challenge of building wide area P2P
systems is a scalable and robust location service
Naptser centralized solution
Guarantee correctness and support approximate
matching
but neither scalable nor robust
Gnutella, KaZaa
Support approximate queries, scalable, and
robust
but doesnt guarantee correctness (i.e., it may
fail to locate an existing file)
Distributed Hash Tables
Guarantee correctness, highly scalable and
robust
but difficult to implement approximate matching