Brief Overview of Academic Research on P2P - PowerPoint PPT Presentation

About This Presentation
Title:

Brief Overview of Academic Research on P2P

Description:

Brief Overview of Academic Research on P2P. Pei Cao. Relevant Conferences ... ICDCS (IEEE Conference on Distributed Computer Systems) ... – PowerPoint PPT presentation

Number of Views:67
Avg rating:3.0/5.0
Slides: 42
Provided by: Pei4
Category:

less

Transcript and Presenter's Notes

Title: Brief Overview of Academic Research on P2P


1
Brief Overview of Academic Research on P2P
  • Pei Cao

2
Relevant Conferences
  • IPTPS (International Workshop on Peer-to-Peer
    Systems)
  • ICDCS (IEEE Conference on Distributed Computer
    Systems)
  • NSDI (USENIX Symposium on Network System Design
    and Implementation)
  • PODC (ACM Symposium on Principles of Distributed
    Computing)
  • SIGCOMM

3
Areas of Research Focus
  • Gnutella-Inspired
  • The Directory Service Problem
  • BitTorrent-Inspired
  • The File Distribution Problem
  • P2P Live Streaming
  • P2P and Net Neutrality

4
Gnutella-Inspired Research Studies

5
The Applications and The Problems
  • Napster, Gnutella, KaZaa/FastTrak, Skype
  • Look for a particular content/object, and find
    which peer has it ? the directory service
    problem
  • Challenge how to offer a scalable directory
    service in a fully decentralized fashion
  • Arrange direct transfer from the peer ? the
    punch a hole in the firewall problem

6
Decentralized Directory Services
  • Structured Networks
  • DHT (Distributed Hash Tables)
  • Very active research areas from 2001 to 2004
  • Limitation lookup by keys only
  • Multi-Attribute DHT
  • Limited support for query-based lookup
  • Unstructured Networks
  • Various improvements to basic flooding based
    schemes

7
What Is a DHT?
  • Single-node hash table
  • key Hash(name)
  • put(key, value)
  • get(key) -gt value
  • How do I do this across millions of hosts on the
    Internet?
  • Distributed Hash Table

8
Distributed Hash Tables
  • Chord
  • CAN
  • Pastry
  • Tapastry
  • Symphony
  • Koodle
  • etc.

9
The Problem
N2
N1
N3
Internet
Put (Keytitle Valuefile data)
?
Client
Publisher
Get(keytitle)
N6
N4
N5
  • Key Placement
  • Routing to find key

10
Key Placement
  • Traditional hashing
  • Nodes numbered from 1 to N
  • Key is placed at node (hash(key) N)
  • Why Traditional Hashing have problems

11
Consistent Hashing IDs
  • Key identifier SHA-1(key)
  • Node identifier SHA-1(IP address)
  • SHA-1 distributes both uniformly
  • How to map key IDs to node IDs?

12
Consistent Hashing Placement
A key is stored at its successor node with next
higher ID
13
Basic Lookup
14
Finger Table Allows log(N)-time Lookups
½
¼
1/8
1/16
1/32
1/64
1/128
N80
15
Finger i Points to Successor of n2i
N120
112
½
¼
1/8
1/16
1/32
1/64
1/128
N80
16
Lookups Take O(log(N)) Hops
N5
N10
N110
K19
N20
N99
N32
Lookup(K19)
N80
N60
17
Chord Lookup Algorithm Properties
  • Interface lookup(key) ? IP address
  • Efficient O(log N) messages per lookup
  • N is the total number of servers
  • Scalable O(log N) state per node
  • Robust survives massive failures
  • Simple to analyze

18
Related Studies on DHTs
  • Many variations of DHTs
  • Different ways to choose the fingers
  • Ways to make it more robust
  • Ways to make it more network efficient
  • Studies of different DHTs
  • What happens when peers leave aka churns?
  • Applications built using DHTs
  • Tracker-less BitTorrent
  • Beehive --- a P2P based DNS system

19
Directory Lookups Unstructured Networks
  • Example Gnutella
  • Support more flexible queries
  • Typically, precise name search is a small
    portion of all queries
  • Simplicity
  • High resilience against node failures
  • Problems Scalability
  • Flooding ? of messages O(NE)

20
Flooding-Based Searches
1
3
2
4
6
5
7
8
. . . . . . . . . . . .
  • Duplication increases as TTL increases in
    flooding
  • Worst case a node A is interrupted by N q
    degree(A) messages

21
Problems with Simple TTL-Based Flooding
  • Hard to choose TTL
  • For objects that are widely present in the
    network, small TTLs suffice
  • For objects that are rare in the network, large
    TTLs are necessary
  • Number of query messages grow exponentially as
    TTL grows

22
Idea 1 Adaptively Adjust TTL
  • Expanding Ring
  • Multiple floods start with TTL1 increment TTL
    by 2 each time until search succeeds
  • Success varies by network topology

23
Idea 2 Random Walk
  • Simple random walk
  • takes too long to find anything!
  • Multiple-walker random walk
  • N agents after each walking T steps visits as
    many nodes as 1 agent walking NT steps
  • When to terminate the search check back with the
    query originator once every C steps

24
Flexible Replication
  • In unstructured systems, search success is
    essentially about coverage visiting enough nodes
    to probabilistically find the object gt
    replication density matters
  • Limited node storage gt whats the optimal
    replication density distribution?
  • In Gnutella, only nodes who query an object store
    it gt ri ? pi
  • What if we have different replication strategies?

25
Optimal ri Distribution
  • Goal minimize ?( pi/ ri ), where ? ri R
  • Calculation
  • introduce Lagrange multiplier ?, find ri and ?
    that minimize
  • ?( pi/ ri ) ? (? ri - R)
  • gt ? - pi/ ri2 0 for all i
  • gt ri ? ? pi

26
Square-Root Distribution
  • General principle to minimize ?( pi/ ri ) under
    constraint ? ri R, make ri proportional to
    square root of pi
  • Other application examples
  • Bandwidth allocation to minimize expected
    download times
  • Server load balancing to minimize expected
    request latency

27
Achieving Square-Root Distribution
  • Suggestions from some heuristics
  • Store an object at a number of nodes that is
    proportional to the number of node visited in
    order to find the object
  • Each node uses random replacement
  • Two implementations
  • Path replication store the object along the path
    of a successful walk
  • Random replication store the object randomly
    among nodes visited by the agents

28
KaZaa
  • Use Supernodes
  • Regular Nodes Supernodes 100 1
  • Simple way to scale the system by a factor of 100

29
BitTorrent-Inspired Research Studies

30
Modeling and Understanding BitTorrent
  • Analysis based on modeling
  • View it as a type of Gossip Algorithm
  • Usually do not model the Tit-for-Tat aspects
  • Assume perfectly connected networks
  • Statistical modeling techniques
  • Mostly published in PODC or SIGMETRICS
  • Simulation Studies
  • Different assumption of bottlenecks
  • Varying details of the modeling of the data
    transfer
  • Published in ICDCS and SIGCOMM

31
Studies on Effect of BitTorrent on ISPs
  • Observation P2P contributes to cross-ISP traffic
  • SIGCOMM 2006 publication on studies in Japan
    backbone traffic
  • Attempts to improve network locality of
    BitTorrent-like applications
  • ICDCS 2006 publicatoin
  • Academic P2P file sharing systems
  • Bullet, Julia, etc.

32
Techniques to Alleviate the Last Missing Piece
Problem
  • Apply Network Coding to pieces exchanged between
    peers
  • Pablo Rodriguez Rodriguez, Microsoft Research
    (recently moved to Telefonica Research)
  • Use a different piece-replication strategy
  • Dahlia Makhi, Microsoft Research
  • On Collaborative Content Distribution Using
    Multi-Message Gossip
  • Associate age with file segments

33
Network Coding
  • Main Feature
  • Allowing intermediate nodes to encode packets
  • Making optimal use of the available network
    resources
  • Similar Technique Erasure Codes
  • Reconstructing the original content of size n
    from roughly a subset of any n symbols from a
    large universe of encoded symbols

34
Network Coding in P2P The Model
  • Server
  • Dividing the file into k blocks
  • Uploading blocks at random to different clients
  • Clients (Users)
  • Collaborating with each other to assemble the
    blocks and reconstruct the original file
  • Exchanging information and data with only a small
    subset of others (neighbors)
  • Symmetric neighborhood and links

35
Network Coding in P2P
  • Assume a node with blocks B1, B2, , Bk
  • Pick random numbers C1, C2, , Ck
  • Construct new block
  • E C1 B1 C2 B2 Ck Bk
  • Send E and (C1, C2, , Ck) to neighbor
  • Decoding collect enough linearly independent
    Es, solve the linear system
  • If all nodes pick vector C randomly, chances are
    high that after receiving K blocks, can recover
    B1 through Bk

36
P2P Live Streaming

37
Motivations
  • Internet Applications
  • PPLive, PPStream, etc.
  • Challenge QoS Issues
  • Raw bandwidth constraints
  • Example PPLive utilizes the significant
    bandwidth disparity between Univeristy nodes
    and Residential nodes
  • Satisfying demand of content publishers

38
P2P Live Streaming Cant Stand on Its Own
  • P2P as a complement to IP-Multicast
  • Used where IP-Multicast isnt enabled
  • P2P as a way to reduce server load
  • By sourcing parts of streams from peers, server
    load might be reduced by 10
  • P2P as a way to reduce backbone bandwidth
    requirements
  • When core network bandwidth isnt sufficient

39
P2P and Net-Neutrality

40
Its All TCPs Fault
  • TCP per-flow fairness
  • Browsers
  • 2-4 TCP flows per web server
  • Contact a few web servers at a time
  • Short flows
  • P2P applications
  • Much higher number of TCP connections
  • Many more endpoints
  • Long flows

41
When and How to Apply Traffic Shaping
  • Current practice application recognition
  • Needs
  • An application ignostic way to trigger the
    traffic shaping
  • A clear statement to users on what happens
Write a Comment
User Comments (0)
About PowerShow.com