Title: Brief Overview of Academic Research on P2P
1Brief Overview of Academic Research on P2P
2Relevant Conferences
- IPTPS (International Workshop on Peer-to-Peer
Systems) - ICDCS (IEEE Conference on Distributed Computer
Systems) - NSDI (USENIX Symposium on Network System Design
and Implementation) - PODC (ACM Symposium on Principles of Distributed
Computing) - SIGCOMM
3Areas of Research Focus
- Gnutella-Inspired
- The Directory Service Problem
- BitTorrent-Inspired
- The File Distribution Problem
- P2P Live Streaming
- P2P and Net Neutrality
4Gnutella-Inspired Research Studies
5The Applications and The Problems
- Napster, Gnutella, KaZaa/FastTrak, Skype
- Look for a particular content/object, and find
which peer has it ? the directory service
problem - Challenge how to offer a scalable directory
service in a fully decentralized fashion - Arrange direct transfer from the peer ? the
punch a hole in the firewall problem
6Decentralized Directory Services
- Structured Networks
- DHT (Distributed Hash Tables)
- Very active research areas from 2001 to 2004
- Limitation lookup by keys only
- Multi-Attribute DHT
- Limited support for query-based lookup
- Unstructured Networks
- Various improvements to basic flooding based
schemes
7What Is a DHT?
- Single-node hash table
- key Hash(name)
- put(key, value)
- get(key) -gt value
- How do I do this across millions of hosts on the
Internet? - Distributed Hash Table
8Distributed Hash Tables
- Chord
- CAN
- Pastry
- Tapastry
- Symphony
- Koodle
- etc.
9The Problem
N2
N1
N3
Internet
Put (Keytitle Valuefile data)
?
Client
Publisher
Get(keytitle)
N6
N4
N5
- Key Placement
- Routing to find key
10Key Placement
- Traditional hashing
- Nodes numbered from 1 to N
- Key is placed at node (hash(key) N)
- Why Traditional Hashing have problems
11Consistent Hashing IDs
- Key identifier SHA-1(key)
- Node identifier SHA-1(IP address)
- SHA-1 distributes both uniformly
- How to map key IDs to node IDs?
12Consistent Hashing Placement
A key is stored at its successor node with next
higher ID
13Basic Lookup
14Finger Table Allows log(N)-time Lookups
½
¼
1/8
1/16
1/32
1/64
1/128
N80
15Finger i Points to Successor of n2i
N120
112
½
¼
1/8
1/16
1/32
1/64
1/128
N80
16Lookups Take O(log(N)) Hops
N5
N10
N110
K19
N20
N99
N32
Lookup(K19)
N80
N60
17Chord Lookup Algorithm Properties
- Interface lookup(key) ? IP address
- Efficient O(log N) messages per lookup
- N is the total number of servers
- Scalable O(log N) state per node
- Robust survives massive failures
- Simple to analyze
18Related Studies on DHTs
- Many variations of DHTs
- Different ways to choose the fingers
- Ways to make it more robust
- Ways to make it more network efficient
- Studies of different DHTs
- What happens when peers leave aka churns?
- Applications built using DHTs
- Tracker-less BitTorrent
- Beehive --- a P2P based DNS system
19Directory Lookups Unstructured Networks
- Example Gnutella
- Support more flexible queries
- Typically, precise name search is a small
portion of all queries - Simplicity
- High resilience against node failures
- Problems Scalability
- Flooding ? of messages O(NE)
20Flooding-Based Searches
1
3
2
4
6
5
7
8
. . . . . . . . . . . .
- Duplication increases as TTL increases in
flooding - Worst case a node A is interrupted by N q
degree(A) messages
21Problems with Simple TTL-Based Flooding
- Hard to choose TTL
- For objects that are widely present in the
network, small TTLs suffice - For objects that are rare in the network, large
TTLs are necessary - Number of query messages grow exponentially as
TTL grows
22Idea 1 Adaptively Adjust TTL
- Expanding Ring
- Multiple floods start with TTL1 increment TTL
by 2 each time until search succeeds - Success varies by network topology
23Idea 2 Random Walk
- Simple random walk
- takes too long to find anything!
- Multiple-walker random walk
- N agents after each walking T steps visits as
many nodes as 1 agent walking NT steps - When to terminate the search check back with the
query originator once every C steps
24Flexible Replication
- In unstructured systems, search success is
essentially about coverage visiting enough nodes
to probabilistically find the object gt
replication density matters - Limited node storage gt whats the optimal
replication density distribution? - In Gnutella, only nodes who query an object store
it gt ri ? pi - What if we have different replication strategies?
25Optimal ri Distribution
- Goal minimize ?( pi/ ri ), where ? ri R
- Calculation
- introduce Lagrange multiplier ?, find ri and ?
that minimize - ?( pi/ ri ) ? (? ri - R)
- gt ? - pi/ ri2 0 for all i
- gt ri ? ? pi
26Square-Root Distribution
- General principle to minimize ?( pi/ ri ) under
constraint ? ri R, make ri proportional to
square root of pi - Other application examples
- Bandwidth allocation to minimize expected
download times - Server load balancing to minimize expected
request latency
27Achieving Square-Root Distribution
- Suggestions from some heuristics
- Store an object at a number of nodes that is
proportional to the number of node visited in
order to find the object - Each node uses random replacement
- Two implementations
- Path replication store the object along the path
of a successful walk - Random replication store the object randomly
among nodes visited by the agents
28KaZaa
- Use Supernodes
- Regular Nodes Supernodes 100 1
- Simple way to scale the system by a factor of 100
29BitTorrent-Inspired Research Studies
30Modeling and Understanding BitTorrent
- Analysis based on modeling
- View it as a type of Gossip Algorithm
- Usually do not model the Tit-for-Tat aspects
- Assume perfectly connected networks
- Statistical modeling techniques
- Mostly published in PODC or SIGMETRICS
- Simulation Studies
- Different assumption of bottlenecks
- Varying details of the modeling of the data
transfer - Published in ICDCS and SIGCOMM
31Studies on Effect of BitTorrent on ISPs
- Observation P2P contributes to cross-ISP traffic
- SIGCOMM 2006 publication on studies in Japan
backbone traffic - Attempts to improve network locality of
BitTorrent-like applications - ICDCS 2006 publicatoin
- Academic P2P file sharing systems
- Bullet, Julia, etc.
32Techniques to Alleviate the Last Missing Piece
Problem
- Apply Network Coding to pieces exchanged between
peers - Pablo Rodriguez Rodriguez, Microsoft Research
(recently moved to Telefonica Research) - Use a different piece-replication strategy
- Dahlia Makhi, Microsoft Research
- On Collaborative Content Distribution Using
Multi-Message Gossip - Associate age with file segments
33Network Coding
- Main Feature
- Allowing intermediate nodes to encode packets
- Making optimal use of the available network
resources - Similar Technique Erasure Codes
- Reconstructing the original content of size n
from roughly a subset of any n symbols from a
large universe of encoded symbols
34Network Coding in P2P The Model
- Server
- Dividing the file into k blocks
- Uploading blocks at random to different clients
- Clients (Users)
- Collaborating with each other to assemble the
blocks and reconstruct the original file - Exchanging information and data with only a small
subset of others (neighbors) - Symmetric neighborhood and links
35Network Coding in P2P
- Assume a node with blocks B1, B2, , Bk
- Pick random numbers C1, C2, , Ck
- Construct new block
- E C1 B1 C2 B2 Ck Bk
- Send E and (C1, C2, , Ck) to neighbor
- Decoding collect enough linearly independent
Es, solve the linear system - If all nodes pick vector C randomly, chances are
high that after receiving K blocks, can recover
B1 through Bk
36P2P Live Streaming
37Motivations
- Internet Applications
- PPLive, PPStream, etc.
- Challenge QoS Issues
- Raw bandwidth constraints
- Example PPLive utilizes the significant
bandwidth disparity between Univeristy nodes
and Residential nodes - Satisfying demand of content publishers
38P2P Live Streaming Cant Stand on Its Own
- P2P as a complement to IP-Multicast
- Used where IP-Multicast isnt enabled
- P2P as a way to reduce server load
- By sourcing parts of streams from peers, server
load might be reduced by 10 - P2P as a way to reduce backbone bandwidth
requirements - When core network bandwidth isnt sufficient
39P2P and Net-Neutrality
40Its All TCPs Fault
- TCP per-flow fairness
- Browsers
- 2-4 TCP flows per web server
- Contact a few web servers at a time
- Short flows
- P2P applications
- Much higher number of TCP connections
- Many more endpoints
- Long flows
41When and How to Apply Traffic Shaping
- Current practice application recognition
- Needs
- An application ignostic way to trigger the
traffic shaping - A clear statement to users on what happens