Evolution of P2P Content Distribution - PowerPoint PPT Presentation

About This Presentation

Title:

Evolution of P2P Content Distribution

Description:

Circular 7-bit. ID space. Key 5. Node 105. Basic Lookup. N32. N90 ... file.torrent. 1. Seed. Whole file. A. 3. 2. BitTorrent: Piece Replication Algorithms ' ... – PowerPoint PPT presentation

Number of Views:61

Avg rating:3.0/5.0

Slides: 74

Provided by: goog61

Learn more at: http://crypto.stanford.edu

Category:

more less

Transcript and Presenter's Notes

Title: Evolution of P2P Content Distribution

1
Evolution of P2P Content Distribution

Pei Cao

2
Outline

History of P2P Content Distribution Architectures
Techniques to Improve Gnutella
Brief Overview of DHT
Techniques to Improve BitTorrent

3
History of P2P

Napster
Gnutella
KaZaa
Distributed Hash Tables
BitTorrent

4
Napster

Centralized directory
A central website to hold directory of contents
of all peers
Queries performed at the central directory
File transfer occurs between peers
Support arbitrary queries
Con Single point of failure

5
Gnutella

Decentralized homogenous peers
No central directory
Queries performed distributed on peers via
flooding
Support arbitrary queries
Very resilient against failure
Problem Doesnt scale

6
FastTrack/KaZaa

Distributed Two-Tier architecture
Supernodes keep content directory for regular
nodes
Regular nodes do not participate in Query
processing
Queries performed by Supernodes only
Support arbitrary queries
Con supernodes stability affect system
performance

7
Distributed Hash Tables

Structured Distributed System
Structured all nodes participate in a precise
scheme to maintain certain invariants
Provide a directory service
Directory service
Routing
Extra work when nodes join and leave
Support key-based lookups only

8
BitTorrent

Distribution of very large files
Tracker connects peers to each other
Peers exchange file blocks with each other
Use Tit-for-Tat to discourage free loading

9
Improving Gnutella

10
Gnutella-Style Systems

Advantages of Gnutella
Support more flexible queries
Typically, precise name search is a small
portion of all queries
Simplicity
High resilience against node failures
Problems of Gnutella Scalability
Flooding ? of messages O(NE)

11
Flooding-Based Searches
1
3
2
4
6
5
7
8
. . . . . . . . . . . .

Duplication increases as TTL increases in
flooding
Worst case a node A is interrupted by N q
degree(A) messages

12
Load on Individual Nodes

Why is a node interrupted
To process a query
To route the query to other nodes
To process duplicated queries sent to it

13
Communication Complexity

Communication complexity determined by
Network topology
Distribution of object popularity
Distribution of replication density of objects

14
Network Topologies

Uniform Random Graph (Random)
Average and median node degree is 4
Power-Law Random Graph (PLRG)
max node degree 1746, median 1, average 4.46
Gnutella network snapshot (Gnutella)
Oct 2000 snapshot
max degree 136, median 2, average 5.5
Two-dimensional grid (Grid)

15
Modeling Methods

Object popularity distribution pi
Uniform
Zipf-like
Object replication density distribution ri
Uniform
Proportional ri ? pi
Square-Root ri ? ? pi

16
Evaluation Metrics

Overhead average of messages per node per
query
Probability of search success Pr(success)
Delay of hops till success

17
Duplications in Various Network Topologies
18
Relationship between TTL and Search Successes
19
Problems with Simple TTL-Based Flooding

Hard to choose TTL
For objects that are widely present in the
network, small TTLs suffice
For objects that are rare in the network, large
TTLs are necessary
Number of query messages grow exponentially as
TTL grows

20
Idea 1 Adaptively Adjust TTL

Expanding Ring
Multiple floods start with TTL1 increment TTL
by 2 each time until search succeeds
Success varies by network topology
For Random, 30- to 70- fold reduction in
message traffic
For Power-law and Gnutella graphs, only
3- to 9- fold reduction

21
Limitations of Expanding Ring
22
Idea 2 Random Walk

Simple random walk
takes too long to find anything!
Multiple-walker random walk
N agents after each walking T steps visits as
many nodes as 1 agent walking NT steps
When to terminate the search check back with the
query originator once every C steps

23
Search Traffic Comparison
24
Search Delay Comparison
25
Flexible Replication

In unstructured systems, search success is
essentially about coverage visiting enough nodes
to probabilistically find the object
replication density matters
Limited node storage whats the optimal
replication density distribution?
In Gnutella, only nodes who query an object store
it ri ? pi
What if we have different replication strategies?

26
Optimal ri Distribution

Goal minimize ?( pi/ ri ), where ? ri R
Calculation
introduce Lagrange multiplier ?, find ri and ?
that minimize
?( pi/ ri ) ? (? ri - R)
? - pi/ ri2 0 for all i
ri ? ? pi

27
Square-Root Distribution

General principle to minimize ?( pi/ ri ) under
constraint ? ri R, make ri proportional to
square root of pi
Other application examples
Bandwidth allocation to minimize expected
download times
Server load balancing to minimize expected
request latency

28
Achieving Square-Root Distribution

Suggestions from some heuristics
Store an object at a number of nodes that is
proportional to the number of node visited in
order to find the object
Each node uses random replacement
Two implementations
Path replication store the object along the path
of a successful walk
Random replication store the object randomly
among nodes visited by the agents

29
Evaluation of Replication Methods

Metrics
Overall message traffic
Search delay
Dynamic simulation
Assume Zipf-like object query probability
5 query/sec Poisson arrival
Results are during 5000sec-9000sec

30
Distribution of ri
31
Total Search Message Comparison

Observation path replication is slightly
inferior to random replication

32
Search Delay Comparison
33
Summary

Multi-walker random walk scales much better than
flooding
It wont scale as perfectly as structured
network, but current unstructured network can be
improved significantly
Square-root replication distribution is desirable
and can be achieved via path replication

34
KaZaa

Use Supernodes
Regular Nodes Supernodes 100 1
Simple way to scale the system by a factor of 100

35
DHTs A Brief Overview(Slides by Bard Karp)

36
What Is a DHT?

Single-node hash table
key Hash(name)
put(key, value)
get(key) - value
How do I do this across millions of hosts on the
Internet?
Distributed Hash Table

37
Distributed Hash Tables

Chord
CAN
Pastry
Tapastry
etc. etc.

38
The Problem
N2
N1
N3
Internet
Put (Keytitle Valuefile data)
?
Client
Publisher
Get(keytitle)
N6
N4
N5

Key Placement
Routing to find key

39
Key Placement

Traditional hashing
Nodes numbered from 1 to N
Key is placed at node (hash(key) N)
Why Traditional Hashing have problems

40
Consistent Hashing IDs

Key identifier SHA-1(key)
Node identifier SHA-1(IP address)
SHA-1 distributes both uniformly
How to map key IDs to node IDs?

41
Consistent Hashing Placement
A key is stored at its successor node with next
higher ID
42
Basic Lookup
43
Finger Table Allows log(N)-time Lookups
½
¼
1/8
1/16
1/32
1/64
1/128
N80
44
Finger i Points to Successor of n2i
N120
112
½
¼
1/8
1/16
1/32
1/64
1/128
N80
45
Lookups Take O(log(N)) Hops
N5
N10
N110
K19
N20
N99
N32
Lookup(K19)
N80
N60
46
Joining Linked List Insert
N25
N36
1. Lookup(36)
K30 K38
N40
47
Join (2)
N25
2. N36 sets its own successor pointer
N36
K30 K38
N40
48
Join (3)
N25
3. Copy keys 26..36 from N40 to N36
N36
K30
K30 K38
N40
49
Join (4)
N25
4. Set N25s successor pointer
N36
K30
K30 K38
N40
Predecessor pointer allows link to new
host Update finger pointers in the
background Correct successors produce correct
lookups
50
Chord Lookup Algorithm Properties

Interface lookup(key) ? IP address
Efficient O(log N) messages per lookup
N is the total number of servers
Scalable O(log N) state per node
Robust survives massive failures
Simple to analyze

51
Many Many Variations of The Same Theme

Different ways to choose the fingers
Ways to make it more robust
Ways to make it more network efficient
etc. etc.

52
Improving BitTorrent

53
BitTorrent File Sharing Network

Goal replicate K chunks of data among N nodes
Form neighbor connection graph
Neighbors exchange data

54
BitTorrent Neighbor Selection
Tracker file.torrent
1
Seed
Whole file
4
3
2
5
A
55
BitTorrent Piece Replication
Tracker file.torrent
1
Seed
Whole file
2
3
A
56
BitTorrent Piece Replication Algorithms

Tit-for-tat (choking/unchoking)
Each peer only uploads to 7 other peers at a time
6 of these are chosen based on amount of data
received from the neighbor in the last 20 seconds
The last one is chosen randomly, with a 75 bias
toward new comers
(Local) Rarest-first replication
When peer 3 unchokes peer A, A selects which
piece to download

57
Performance of BitTorrent

Conclusion from modeling studies BitTorrent is
nearly optimal in idealized, homogeneous networks
Demonstrated by simulation studies
Confirmed by theoretical modeling studies
Intuition in a random graph,
Prob(Peer As content is a subset of Peer Bs)
50

58
Lessons from BitTorrent

Often, randomized simple algorithms perform
better than elaborately designed deterministic
algorithms

59
Problems of BitTorrent

ISPs are unhappy
BitTorrent is notoriously difficult to traffic
engineer
ISPs different links have different monetary
costs
BitTorrent
Peers are all equal
Choices made based on measured performance
No regards for underlying ISP topology or
preferences

60
BitTorrent and ISPs Play Together?

Current state of affairs a clumsy co-existence
ISPs throttle BitTorrent traffic along
high-cost links
Users suffer
Can they be partners?
ISPs inform BitTorrent of its preferences
BitTorrent schedules traffic in ways that benefit
both Users and ISPs

61
Random Neighbor Selection

Existing studies all assume random neighbor
selection
BitTorrent no longer optimal if nodes in the same
ISP only connect to each other
Random neighbor selection ? high cross-ISP
traffic
Q Can we modify the neighbor selection scheme
without affecting performance?

62
Biased Neighbor Selection

Idea of N neighbors, choose N-k from peers in
the same ISP, and choose k randomly from peers
outside the ISP

ISP
63
Implementing Biased Neighbor Selection

By Tracker
Need ISP affiliations of peers
Peer to AS maps
Public IP address ranges from ISPs
Special X- HTTP header
By traffic shaping devices
Intercept peer ? tracker messages and
manipulate responses
No need to change tracker or client

64
Evaluation Methodology

Event-driven simulator
Use actual client and tracker codes as much as
possible
Calculate bandwidth contention, assume perfect
fair-share from TCP
Network settings
14 ISPs, each with 50 peers, 100Kb/s upload,
1Mb/s download
Seed node, 400Kb/s upload
Optional university nodes (1Mb/s upload)
Optional ISP bottleneck to other ISPs

65
Limitation of Throttling
66
Throttling Cross-ISP Traffic
Redundancy Average of times a data chunk
enters the ISP
67
Biased Neighbor Selection Download Times
68
Biased Neighbor Selection Cross-ISP Traffic
69
Importance of Rarest-First Replication

Random piece replication performs badly
Increases download time by 84 - 150
Increase traffic redundancy from 3 to 14
Biased neighbors Rarest-First ? More uniform
progress of peers

70
Biased Neighbor Selection Single-ISP Deployment
71
Presence of External High-Bandwidth Peers

Biased neighbor selection alone
Average download time same as regular BitTorrent
Cross-ISP traffic increases as of university
peers increase
Result of tit-for-tat
Biased neighbor selection Throttling
Download time only increases by 12
Most neighbors do not cross the bottleneck
Traffic redundancy (i.e. cross-ISP traffic) same
as the scenario without university peers

72
Comparison with Alternatives

Gateway peer only one peer connects to the peers
outside the ISP
Gateway peer must have high bandwidth
It is the seed for this ISP
Ends up benefiting peers in other ISPs
Caching
Can be combined with biased neighbor selection
Biased neighbor selection reduces the bandwidth
needed from the cache by an order of magnitude

73
Summary

By choosing neighbors well, BitTorrent can
achieve high peer performance without increasing
ISP cost
Biased neighbor selection choose initial set of
neighbors well
Can be combined with throttling and caching
? P2P and ISPs can collaborate!

Write a Comment

User Comments (0)