Evolution of P2P Content Distribution - PowerPoint PPT Presentation

About This Presentation
Title:

Evolution of P2P Content Distribution

Description:

Circular 7-bit. ID space. Key 5. Node 105. Basic Lookup. N32. N90 ... file.torrent. 1. Seed. Whole file. A. 3. 2. BitTorrent: Piece Replication Algorithms ' ... – PowerPoint PPT presentation

Number of Views:60
Avg rating:3.0/5.0
Slides: 74
Provided by: goog61
Category:

less

Transcript and Presenter's Notes

Title: Evolution of P2P Content Distribution


1
Evolution of P2P Content Distribution
  • Pei Cao

2
Outline
  • History of P2P Content Distribution Architectures
  • Techniques to Improve Gnutella
  • Brief Overview of DHT
  • Techniques to Improve BitTorrent

3
History of P2P
  • Napster
  • Gnutella
  • KaZaa
  • Distributed Hash Tables
  • BitTorrent

4
Napster
  • Centralized directory
  • A central website to hold directory of contents
    of all peers
  • Queries performed at the central directory
  • File transfer occurs between peers
  • Support arbitrary queries
  • Con Single point of failure

5
Gnutella
  • Decentralized homogenous peers
  • No central directory
  • Queries performed distributed on peers via
    flooding
  • Support arbitrary queries
  • Very resilient against failure
  • Problem Doesnt scale

6
FastTrack/KaZaa
  • Distributed Two-Tier architecture
  • Supernodes keep content directory for regular
    nodes
  • Regular nodes do not participate in Query
    processing
  • Queries performed by Supernodes only
  • Support arbitrary queries
  • Con supernodes stability affect system
    performance

7
Distributed Hash Tables
  • Structured Distributed System
  • Structured all nodes participate in a precise
    scheme to maintain certain invariants
  • Provide a directory service
  • Directory service
  • Routing
  • Extra work when nodes join and leave
  • Support key-based lookups only

8
BitTorrent
  • Distribution of very large files
  • Tracker connects peers to each other
  • Peers exchange file blocks with each other
  • Use Tit-for-Tat to discourage free loading

9
Improving Gnutella

10
Gnutella-Style Systems
  • Advantages of Gnutella
  • Support more flexible queries
  • Typically, precise name search is a small
    portion of all queries
  • Simplicity
  • High resilience against node failures
  • Problems of Gnutella Scalability
  • Flooding ? of messages O(NE)

11
Flooding-Based Searches
1
3
2
4
6
5
7
8
. . . . . . . . . . . .
  • Duplication increases as TTL increases in
    flooding
  • Worst case a node A is interrupted by N q
    degree(A) messages

12
Load on Individual Nodes
  • Why is a node interrupted
  • To process a query
  • To route the query to other nodes
  • To process duplicated queries sent to it

13
Communication Complexity
  • Communication complexity determined by
  • Network topology
  • Distribution of object popularity
  • Distribution of replication density of objects

14
Network Topologies
  • Uniform Random Graph (Random)
  • Average and median node degree is 4
  • Power-Law Random Graph (PLRG)
  • max node degree 1746, median 1, average 4.46
  • Gnutella network snapshot (Gnutella)
  • Oct 2000 snapshot
  • max degree 136, median 2, average 5.5
  • Two-dimensional grid (Grid)

15
Modeling Methods
  • Object popularity distribution pi
  • Uniform
  • Zipf-like
  • Object replication density distribution ri
  • Uniform
  • Proportional ri ? pi
  • Square-Root ri ? ? pi

16
Evaluation Metrics
  • Overhead average of messages per node per
    query
  • Probability of search success Pr(success)
  • Delay of hops till success

17
Duplications in Various Network Topologies
18
Relationship between TTL and Search Successes
19
Problems with Simple TTL-Based Flooding
  • Hard to choose TTL
  • For objects that are widely present in the
    network, small TTLs suffice
  • For objects that are rare in the network, large
    TTLs are necessary
  • Number of query messages grow exponentially as
    TTL grows

20
Idea 1 Adaptively Adjust TTL
  • Expanding Ring
  • Multiple floods start with TTL1 increment TTL
    by 2 each time until search succeeds
  • Success varies by network topology
  • For Random, 30- to 70- fold reduction in
    message traffic
  • For Power-law and Gnutella graphs, only
  • 3- to 9- fold reduction

21
Limitations of Expanding Ring
22
Idea 2 Random Walk
  • Simple random walk
  • takes too long to find anything!
  • Multiple-walker random walk
  • N agents after each walking T steps visits as
    many nodes as 1 agent walking NT steps
  • When to terminate the search check back with the
    query originator once every C steps

23
Search Traffic Comparison
24
Search Delay Comparison
25
Flexible Replication
  • In unstructured systems, search success is
    essentially about coverage visiting enough nodes
    to probabilistically find the object
    replication density matters
  • Limited node storage whats the optimal
    replication density distribution?
  • In Gnutella, only nodes who query an object store
    it ri ? pi
  • What if we have different replication strategies?

26
Optimal ri Distribution
  • Goal minimize ?( pi/ ri ), where ? ri R
  • Calculation
  • introduce Lagrange multiplier ?, find ri and ?
    that minimize
  • ?( pi/ ri ) ? (? ri - R)
  • ? - pi/ ri2 0 for all i
  • ri ? ? pi

27
Square-Root Distribution
  • General principle to minimize ?( pi/ ri ) under
    constraint ? ri R, make ri proportional to
    square root of pi
  • Other application examples
  • Bandwidth allocation to minimize expected
    download times
  • Server load balancing to minimize expected
    request latency

28
Achieving Square-Root Distribution
  • Suggestions from some heuristics
  • Store an object at a number of nodes that is
    proportional to the number of node visited in
    order to find the object
  • Each node uses random replacement
  • Two implementations
  • Path replication store the object along the path
    of a successful walk
  • Random replication store the object randomly
    among nodes visited by the agents

29
Evaluation of Replication Methods
  • Metrics
  • Overall message traffic
  • Search delay
  • Dynamic simulation
  • Assume Zipf-like object query probability
  • 5 query/sec Poisson arrival
  • Results are during 5000sec-9000sec

30
Distribution of ri
31
Total Search Message Comparison
  • Observation path replication is slightly
    inferior to random replication

32
Search Delay Comparison
33
Summary
  • Multi-walker random walk scales much better than
    flooding
  • It wont scale as perfectly as structured
    network, but current unstructured network can be
    improved significantly
  • Square-root replication distribution is desirable
    and can be achieved via path replication

34
KaZaa
  • Use Supernodes
  • Regular Nodes Supernodes 100 1
  • Simple way to scale the system by a factor of 100

35
DHTs A Brief Overview(Slides by Bard Karp)

36
What Is a DHT?
  • Single-node hash table
  • key Hash(name)
  • put(key, value)
  • get(key) - value
  • How do I do this across millions of hosts on the
    Internet?
  • Distributed Hash Table

37
Distributed Hash Tables
  • Chord
  • CAN
  • Pastry
  • Tapastry
  • etc. etc.

38
The Problem
N2
N1
N3
Internet
Put (Keytitle Valuefile data)
?
Client
Publisher
Get(keytitle)
N6
N4
N5
  • Key Placement
  • Routing to find key

39
Key Placement
  • Traditional hashing
  • Nodes numbered from 1 to N
  • Key is placed at node (hash(key) N)
  • Why Traditional Hashing have problems

40
Consistent Hashing IDs
  • Key identifier SHA-1(key)
  • Node identifier SHA-1(IP address)
  • SHA-1 distributes both uniformly
  • How to map key IDs to node IDs?

41
Consistent Hashing Placement
A key is stored at its successor node with next
higher ID
42
Basic Lookup
43
Finger Table Allows log(N)-time Lookups
½
¼
1/8
1/16
1/32
1/64
1/128
N80
44
Finger i Points to Successor of n2i
N120
112
½
¼
1/8
1/16
1/32
1/64
1/128
N80
45
Lookups Take O(log(N)) Hops
N5
N10
N110
K19
N20
N99
N32
Lookup(K19)
N80
N60
46
Joining Linked List Insert
N25
N36
1. Lookup(36)
K30 K38
N40
47
Join (2)
N25
2. N36 sets its own successor pointer
N36
K30 K38
N40
48
Join (3)
N25
3. Copy keys 26..36 from N40 to N36
N36
K30
K30 K38
N40
49
Join (4)
N25
4. Set N25s successor pointer
N36
K30
K30 K38
N40
Predecessor pointer allows link to new
host Update finger pointers in the
background Correct successors produce correct
lookups
50
Chord Lookup Algorithm Properties
  • Interface lookup(key) ? IP address
  • Efficient O(log N) messages per lookup
  • N is the total number of servers
  • Scalable O(log N) state per node
  • Robust survives massive failures
  • Simple to analyze

51
Many Many Variations of The Same Theme
  • Different ways to choose the fingers
  • Ways to make it more robust
  • Ways to make it more network efficient
  • etc. etc.

52
Improving BitTorrent

53
BitTorrent File Sharing Network
  • Goal replicate K chunks of data among N nodes
  • Form neighbor connection graph
  • Neighbors exchange data

54
BitTorrent Neighbor Selection
Tracker file.torrent
1
Seed
Whole file
4
3
2
5
A
55
BitTorrent Piece Replication
Tracker file.torrent
1
Seed
Whole file
2
3
A
56
BitTorrent Piece Replication Algorithms
  • Tit-for-tat (choking/unchoking)
  • Each peer only uploads to 7 other peers at a time
  • 6 of these are chosen based on amount of data
    received from the neighbor in the last 20 seconds
  • The last one is chosen randomly, with a 75 bias
    toward new comers
  • (Local) Rarest-first replication
  • When peer 3 unchokes peer A, A selects which
    piece to download

57
Performance of BitTorrent
  • Conclusion from modeling studies BitTorrent is
    nearly optimal in idealized, homogeneous networks
  • Demonstrated by simulation studies
  • Confirmed by theoretical modeling studies
  • Intuition in a random graph,
  • Prob(Peer As content is a subset of Peer Bs)
    50

58
Lessons from BitTorrent
  • Often, randomized simple algorithms perform
    better than elaborately designed deterministic
    algorithms

59
Problems of BitTorrent
  • ISPs are unhappy
  • BitTorrent is notoriously difficult to traffic
    engineer
  • ISPs different links have different monetary
    costs
  • BitTorrent
  • Peers are all equal
  • Choices made based on measured performance
  • No regards for underlying ISP topology or
    preferences

60
BitTorrent and ISPs Play Together?
  • Current state of affairs a clumsy co-existence
  • ISPs throttle BitTorrent traffic along
    high-cost links
  • Users suffer
  • Can they be partners?
  • ISPs inform BitTorrent of its preferences
  • BitTorrent schedules traffic in ways that benefit
    both Users and ISPs

61
Random Neighbor Selection
  • Existing studies all assume random neighbor
    selection
  • BitTorrent no longer optimal if nodes in the same
    ISP only connect to each other
  • Random neighbor selection ? high cross-ISP
    traffic
  • Q Can we modify the neighbor selection scheme
    without affecting performance?

62
Biased Neighbor Selection
  • Idea of N neighbors, choose N-k from peers in
    the same ISP, and choose k randomly from peers
    outside the ISP

ISP
63
Implementing Biased Neighbor Selection
  • By Tracker
  • Need ISP affiliations of peers
  • Peer to AS maps
  • Public IP address ranges from ISPs
  • Special X- HTTP header
  • By traffic shaping devices
  • Intercept peer ? tracker messages and
    manipulate responses
  • No need to change tracker or client

64
Evaluation Methodology
  • Event-driven simulator
  • Use actual client and tracker codes as much as
    possible
  • Calculate bandwidth contention, assume perfect
    fair-share from TCP
  • Network settings
  • 14 ISPs, each with 50 peers, 100Kb/s upload,
    1Mb/s download
  • Seed node, 400Kb/s upload
  • Optional university nodes (1Mb/s upload)
  • Optional ISP bottleneck to other ISPs

65
Limitation of Throttling
66
Throttling Cross-ISP Traffic
Redundancy Average of times a data chunk
enters the ISP
67
Biased Neighbor Selection Download Times
68
Biased Neighbor Selection Cross-ISP Traffic
69
Importance of Rarest-First Replication
  • Random piece replication performs badly
  • Increases download time by 84 - 150
  • Increase traffic redundancy from 3 to 14
  • Biased neighbors Rarest-First ? More uniform
    progress of peers

70
Biased Neighbor Selection Single-ISP Deployment
71
Presence of External High-Bandwidth Peers
  • Biased neighbor selection alone
  • Average download time same as regular BitTorrent
  • Cross-ISP traffic increases as of university
    peers increase
  • Result of tit-for-tat
  • Biased neighbor selection Throttling
  • Download time only increases by 12
  • Most neighbors do not cross the bottleneck
  • Traffic redundancy (i.e. cross-ISP traffic) same
    as the scenario without university peers

72
Comparison with Alternatives
  • Gateway peer only one peer connects to the peers
    outside the ISP
  • Gateway peer must have high bandwidth
  • It is the seed for this ISP
  • Ends up benefiting peers in other ISPs
  • Caching
  • Can be combined with biased neighbor selection
  • Biased neighbor selection reduces the bandwidth
    needed from the cache by an order of magnitude

73
Summary
  • By choosing neighbors well, BitTorrent can
    achieve high peer performance without increasing
    ISP cost
  • Biased neighbor selection choose initial set of
    neighbors well
  • Can be combined with throttling and caching
  • ? P2P and ISPs can collaborate!
Write a Comment
User Comments (0)
About PowerShow.com