CS 162: P2P Networks - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

CS 162: P2P Networks

Description:

BitTorrent (1/2) Allow fast downloads even when sources have low connectivity. How does it work? ... BitTorrent (2/2) Download consists of three phases: Start: ... – PowerPoint PPT presentation

Number of Views:33
Avg rating:3.0/5.0
Slides: 34
Provided by: sto2
Category:

less

Transcript and Presenter's Notes

Title: CS 162: P2P Networks


1
CS 162 P2P Networks
  • Computer Science Division
  • Department of Electrical Engineering and Computer
    Sciences
  • University of California, Berkeley
  • Berkeley, CA 94720-1776

2
Main Challenge
  • Find where a particular file is stored
  • Note problem similar to finding a particular
    page in web caching (see last lecture what are
    the differences?)

E
F
D
E?
C
A
B
3
Other Challenges
  • Scale up to hundred of thousands or millions of
    machines
  • Dynamicity machines can come and go any time

4
Napster
  • Assume a centralized index system that maps files
    (songs) to machines that are alive
  • How to find a file (song)
  • Query the index system ? return a machine that
    stores the required file
  • Ideally this is the closest/least-loaded machine
  • ftp the file
  • Advantages
  • Simplicity, easy to implement sophisticated
    search engines on top of the index system
  • Disadvantages
  • Robustness, scalability (?)

5
Napster Example
m5
E
m6
F
D
m1 A m2 B m3 C m4 D m5 E m6 F
m4
C
A
B
m3
m1
m2
6
Gnutella
  • Distribute file location
  • Idea broadcast the request
  • Hot to find a file
  • Send request to all neighbors
  • Neighbors recursively multicast the request
  • Eventually a machine that has the file receives
    the request, and it sends back the answer
  • Advantages
  • Totally decentralized, highly robust
  • Disadvantages
  • Not scalable the entire network can be swamped
    with requests (to alleviate this problem, each
    request has a TTL)

7
Gnutella Example
  • Assume m1s neighbors are m2 and m3 m3s
    neighbors are m4 and m5

m5
E
m6
F
D
m4
C
A
B
m3
m1
m2
8
Two-Level Hierarchy
Oct 2003 Crawl on Gnutella
  • Current Gnutella implementation, KaZaa
  • Leaf nodes are connected to a small number of
    ultrapeers (suppernodes)
  • Query
  • A leaf sends query to its ultrapeers
  • If ultrapeers dont know the answer, they flood
    the query to other ultrapeers
  • More scalable
  • Flooding only among ultrapeers

Ultrapeer nodes
Leaf nodes
9
Skype
login server
  • Peer-to-peer Internet Telephony
  • Two-level hierarchy like KaZaa
  • Ultrapeers used mainly to route traffic between
    NATed end-hosts (see next slide)
  • plus a login server to
  • authenticate users
  • ensure that names are unique across network

B
Messages exchanged to login server
A
Data traffic
(Note probable protocol Skype protocol is not
published)
10
Detour NAT (1/3)
  • Network Address Translation
  • Motivation address scarcity problem in IPv4
  • Allow to independently allocate addresses to
    hosts behind NAT
  • Two hosts behind two different NATs can have the
    same address

64.36.12.64
Internet
192.168.0.1
NAT box
192.168.0.1
NAT box
169.32.41.10
192.168.0.2
128.2.12.30
Same address
11
Detour NAT (2/3)
  • Main idea use port numbers to multiplex/demultipl
    ex connections of NATed end-hosts
  • Map (IPaddr, Port) of a NATed host to (IPaddrNAT,
    PortNAT)

192.168.0.1
64.36.12.64
Internet
NAT box
169.32.41.10
12
Detour NAT (3/3)
  • Limitations
  • Number of machines behind a NAT lt 64000. Why?
  • A host outside NAT cannot initiate connection to
    a host behind a NAT
  • Skype and other P2P systems use
  • Login servers and ultrapeers to solve limitation
    (2)
  • How? (Hint ultrapeers have globally unique
    (Internet-routable) IP addresses)

13
BitTorrent (1/2)
  • Allow fast downloads even when sources have low
    connectivity
  • How does it work?
  • Split each file into pieces ( 256 KB each), and
    each piece into sub-pieces ( 16 KB each)
  • The loader loads one piece at a time
  • Within one piece, the loader can load up to five
    sub-pieces in parallel

14
BitTorrent (2/2)
  • Download consists of three phases
  • Start get a piece as soon as possible
  • Select a random piece
  • Middle spread all pieces as soon as possible
  • Select rarest piece next
  • End avoid getting stuck with a slow source, when
    downloading the last sub-pieces
  • Request in parallel the same sub-piece
  • Cancel slowest downloads once a sub-piece has
    been received

(For details see http//bittorrent.com/bittorrent
econ.pdf)
15
Distributed Hash Tables
  • Problem
  • Given an ID, map to a host
  • Challenges
  • Scalability hundreds of thousands or millions of
    machines
  • Instability
  • Changes in routes, congestion, availability of
    machines
  • Heterogeneity
  • Latency 1ms to 1000ms
  • Bandwidth 32Kb/s to 100Mb/s
  • Nodes stay in system from 10s to a year
  • Trust
  • Selfish users
  • Malicious users

16
Content Addressable Network (CAN)
  • Associate to each node and item a unique id in an
    d-dimensional space
  • Properties
  • Routing table size O(d)
  • Guarantees that a file is found in at most dn1/d
    steps, where n is the total number of nodes

17
CAN Example Two Dimensional Space
  • Space divided between nodes
  • All nodes cover the entire space
  • Each node covers either a square or a rectangular
    area of ratios 12 or 21
  • Example
  • Assume space size (8 x 8)
  • Node n1(1, 2) first node that joins ? cover the
    entire space

7
6
5
4
3
n1
2
1
0
2
3
4
5
6
7
0
1
18
CAN Example Two Dimensional Space
  • Node n2(4, 2) joins ? space is divided between
    n1 and n2

7
6
5
4
3
n1
n2
2
1
0
2
3
4
5
6
7
0
1
19
CAN Example Two Dimensional Space
  • Node n2(4, 2) joins ? space is divided between
    n1 and n2

7
6
n3
5
4
3
n1
n2
2
1
0
2
3
4
5
6
7
0
1
20
CAN Example Two Dimensional Space
  • Nodes n4(5, 5) and n5(6,6) join

7
6
n5
n4
n3
5
4
3
n1
n2
2
1
0
2
3
4
5
6
7
0
1
21
CAN Example Two Dimensional Space
  • Nodes n1(1, 2) n2(4,2) n3(3, 5)
    n4(5,5)n5(6,6)
  • Items f1(2,3) f2(5,1) f3(2,1) f4(7,5)

7
6
n5
n4
n3
5
f4
4
f1
3
n1
n2
2
f3
1
f2
0
2
3
4
5
6
7
0
1
22
CAN Example Two Dimensional Space
  • Each item is stored by the node who owns its
    mapping in the space

7
6
n5
n4
n3
5
f4
4
f1
3
n1
n2
2
f3
1
f2
0
2
3
4
5
6
7
0
1
23
CAN Query Example
  • Each node knows its neighbors in the d-space
  • Forward query to the neighbor that is closest to
    the query id
  • Example assume n1 queries f4

7
6
n5
n4
n3
5
f4
4
f1
3
n1
n2
2
f3
1
f2
0
2
3
4
5
6
7
0
1
24
Chord
  • Associate to each node and item a unique ID in an
    uni-dimensional space
  • Properties
  • Routing table size O(log(N)) , where N is the
    total number of nodes
  • Guarantees that a file is found in O(log(N)) steps

25
Data Structure
  • Assume identifier space is 0..2m
  • Each node maintains
  • Finger table
  • Entry i in the finger table of n is the first
    node that succeeds or equals n 2i
  • Predecessor node
  • An item identified by id is stored on the
    succesor node of id

26
Chord Example
  • Assume an identifier space 0..8
  • Node n1(1) joins?all entries in its finger table
    are initialized to itself

Succ. Table
0
i id2i succ 0 2 1 1 3 1 2 5
1
1
7
2
6
3
5
4
27
Chord Example
  • Node n2(3) joins

Succ. Table
0
i id2i succ 0 2 2 1 3 1 2 5
1
1
7
2
6
Succ. Table
i id2i succ 0 3 1 1 4 1 2 6
1
3
5
4
28
Chord Example
Succ. Table
  • Nodes n3(0), n4(6) join

i id2i succ 0 1 1 1 2 2 2 4
0
Succ. Table
0
i id2i succ 0 2 2 1 3 6 2 5
6
1
7
Succ. Table
i id2i succ 0 7 0 1 0 0 2 2
2
2
6
Succ. Table
i id2i succ 0 3 6 1 4 6 2 6
6
3
5
4
29
Chord Examples
Succ. Table
Items
  • Nodes n1(1), n2(3), n3(0), n4(6)
  • Items f1(7), f2(2)

7
i id2i succ 0 1 1 1 2 2 2 4
0
0
Succ. Table
Items
1
1
7
i id2i succ 0 2 2 1 3 6 2 5
6
2
6
Succ. Table
i id2i succ 0 7 0 1 0 0 2 2
2
Succ. Table
i id2i succ 0 3 6 1 4 6 2 6
6
3
5
4
30
Query
  • Upon receiving a query for item id, a node
  • Check whether stores the item locally
  • If not, forwards the query to the largest node in
    its successor table that does not exceed id

Succ. Table
Items
7
i id2i succ 0 1 1 1 2 2 2 4
0
0
Succ. Table
Items
1
1
7
i id2i succ 0 2 2 1 3 6 2 5
6
query(7)
2
6
Succ. Table
i id2i succ 0 7 0 1 0 0 2 2
2
Succ. Table
i id2i succ 0 3 6 1 4 6 2 6
6
3
5
4
31
Discussion
  • Query can be implemented
  • Iteratively
  • Recursively
  • Performance routing in the overlay network can
    be more expensive than in the underlying network
  • Because usually there is no correlation between
    node IDs and their locality a query can
    repeatedly jump from Europe to North America,
    though both the initiator and the node that store
    the item are in Europe!
  • Solutions Tapestry takes care of this
    implicitly CAN and Chord maintain multiple
    copies for each entry in their routing tables and
    choose the closest in terms of network distance

32
Discussion
  • Robustness
  • Maintain multiple copies associated to each entry
    in the routing tables
  • Replicate an item on nodes with close ids in the
    identifier space
  • Security
  • Can be build on top of CAN, Chord, Tapestry, and
    Pastry

33
Discussion
  • The key challenge of building wide area P2P
    systems is a scalable and robust location service
  • Naptser centralized solution
  • Guarantee correctness and support approximate
    matching
  • but neither scalable nor robust
  • Gnutella, KaZaa
  • Support approximate queries, scalable, and
    robust
  • but doesnt guarantee correctness (i.e., it may
    fail to locate an existing file)
  • Distributed Hash Tables
  • Guarantee correctness, highly scalable and
    robust
  • but difficult to implement approximate matching
Write a Comment
User Comments (0)
About PowerShow.com