Title: P2P Tutorial Yang Guo and Christoph Neumann Corporate Research Thomson Inc.
1(No Transcript)
2P2P TutorialYang Guo and Christoph
NeumannCorporate ResearchThomson Inc.
3P2P Tutorial - Outline
- Part I Introduction and Overview
- Part II Popular P2P Applications
- Part III P2P Video-on-Demand
- Part IV Conclusions and Future of P2P
4Part I P2P Introduction and Overview
5P2P Introduction and Overview - Outline
- Part I
- History, motivation and evolution
- History Napster and beyond
- What is Peer-to-peer?
- Why Peer-to-peer?
- Brief P2P technologies overview
- Unstructured p2p-overlays
- Structured p2p-overlays
6History, motivation and evolution
P2P represented 65 of Internet Traffic at end
2006
- 1999 Napster, first widely used p2p-application
7Napster, first widely used p2p-application
- The application
- A p2p application for the distribution of mp3
files - Each user can contribute its own content
- How it works
- Central index server
- Maintains list of all active peers and their
available content - Distributed storage and download
- Client nodes also act as file servers
- All downloaded content is shared
8History, motivation and evolution - Napster
(contd)
- Initial join
- Peers connect to Napster server
- Transmit current listing of shared files to server
join
Central index server
peers
9History, motivation and evolution - Napster
(contd)
- Content search
- Peers sends song request to Napster server
- Napster server checks song database and returns
list of matched peers
1) query
2) answer
Central index server
peers
10History, motivation and evolution - Napster
(contd)
- File retrieval
- The requesting peer contacts the peer having the
file directly and downloads the it
1)
2)
Central index server
peers
11History, motivation and evolution - File Download
- Napster was the first simple but successful
P2P-appliciation. Many others followed - P2P File Download Protocols
- 1999 Napster
- 2000 Gnutella, eDonkey
- 2001 Kazaa
- 2002 eMule, BitTorrent
12History, motivation and evolution - Market share
Marketshare in 2004 (source CacheLogic)
13P2P Introduction and Overview - Outline
- Part I
- History, motivation and evolution
- History Napster and beyond
- What is Peer-to-peer?
- Why Peer-to-peer?
- Brief P2P technologies overview
- Unstructured p2p-overlays
- Structured p2p-overlays
14Definition of Peer-to-peer (or P2P)
- A peer-to-peer (or P2P) computer network is a
network that relies primarily on the computing
power and bandwidth of the participants in the
network rather than concentrating it in a
relatively small number of servers. - A pure peer-to-peer network does not have the
notion of clients or servers, but only equal peer
nodes that simultaneously function as both
"clients" and "servers" to the other nodes on the
network. - This model of network arrangement differs from
the client-server model where communication is
usually to and from a central server.
Taken from the wikipedia free encyclopedia -
www.wikipedia.org
15It is a broad definition with lots of applications
- P2P-File download
- Napster, Gnutella, KaZaa, eDonkey,
- P2P-Communication
- VoIP, Skype, Messaging,
- P2P-Video-on-Demand
- P2P-Computation
- seti_at_home
- P2P-Streaming
- PPLive, ESM,
- P2P-Gaming
16History, motivation and evolution - Applications
Application type
- P2P is not restricted to file download!
- P2P Protocols
- 1999 Napster, End System Multicast (ESM)
- 2000 Gnutella, eDonkey
- 2001 Kazaa
- 2002 eMule, BitTorrent
- 2003 Skype
- 2004 PPLive
- Today TVKoo, TVAnts, PPStream, SopCast
- Next Video-on-Demand, Gaming
File Download Streaming Telephony
Video-on-Demand Gaming
17P2P Introduction and Overview - Outline
- Part I
- History, motivation and evolution
- History Napster and beyond
- What is Peer-to-peer?
- Why Peer-to-peer?
- Brief P2P technologies overview
- Unstructured p2p-overlays
- Structured p2p-overlays
18Why is P2P so successful?
- Scalable Its all about sharing resources
- No need to provision servers or bandwidth
- Each user brings its own resource
- E.g. resistant to flash crowds
- flash crowd a crowd of users all arriving at
the same time
capacity
19Why is P2P so successful? (contd)
- Cheap - No infrastructure needed
- Everybody can bring its own content (at no cost)
- Homemade content
- Ethnic content
- Illegal content
- But also legal content
-
- High availability Content accessible most of
time
20P2P Introduction and Overview - Outline
- Part I
- History, motivation and evolution
- History Napster and beyond
- What is Peer-to-peer?
- Why Peer-to-peer?
- Brief P2P technologies overview
- Unstructured p2p-overlays
- Structured p2p-overlays
21P2P-Overlay
- Build graph at application layer, and forward
packet at the application layer - It is a virtual graph
- Underlying physical graph is transparent to the
user - Edges are TCP connection or simply a entry of an
neighboring nodes IP address - The graph has to be continuously maintained (e.g.
check if nodes are still alive)
22P2P-Overlay (contd)
- It is a virtual graph
- Underlying physical graph is transparent to the
user - Edges are TCP connection or simply a entry of an
neighboring nodes IP address - The graph has to be continuously maintained (e.g.
check if nodes are still alive)
23P2P-Overlay (contd)
Overlay
Underlay
Source
24The P2P enabling technologies
- Unstructured p2p-overlays
- Generally random overlay
- Used for content download, telephony, streaming
- Structured p2p-overlays
- Distributed Hash Tables (DHTs)
- Used for node localization, content download,
streaming
25P2P Introduction and Overview - Outline
- Part I
- History, motivation and evolution
- History Napster and beyond
- What is Peer-to-peer?
- Why Peer-to-peer?
- Brief P2P technologies overview
- Unstructured p2p-overlays
- Structured p2p-overlays
26Unstructured p2p-overlays
- Unstructured p2p-overlays do not really care how
the overlay is constructed - Peers are organized in a random graph topology
- E.g., new node randomly chooses three existing
nodes as neighbors - Flat or hierarchical
- Build your p2p-service based on this graph
- Several proposals
- Gnutella
- KaZaA/FastTrack
- BitTorrent
27Unstructured p2p-overlays (contd)
- Unstructured p2p-overlays are just a framework,
you can build many applications on top of it - Unstructured p2p-overlays pros cons
- Pros
- Very flexible copes with node churn
- Supports complex queries (conversely to
structured overlays) - Cons
- Content search is difficult There is a tradeoff
between generated traffic (overhead) and the
horizon of the partial view - In this tutorial we detail the following
applications - Skype
- BitTorrent
28One Example of usage of unstructured overlays
- Typical problem in unstructured overlays How to
do content search and query? - Flooding
- Limited Scope, send only to a subset of your
neighbors - Time-To-Live, limit the number of hops per
messages
Example of flooding (similar to Gnutella)
Search Britney Spears
Found entry!
29P2P Introduction and Overview - Outline
- Part I
- History, motivation and evolution
- History Napster and beyond
- What is Peer-to-peer?
- Why Peer-to-peer?
- Brief P2P technologies overview
- Unstructured p2p-overlays
- Structured p2p-overlays
30Structured p2p-overlays
- Motivation
- Locate content efficiently
- Solution DHT (Distributed Hash Table)
- Particular nodes hold particular keys
- Locate a key route search request to a
particular node that holds the key - Representative solutions
- Chord, CAN, Pastry/Tapestry, etc.
- Focus on Chord
31Challenges to Structured p2p-overlays
- Load balance
- spreading keys evenly over the nodes.
- Decentralization
- no node is more important than any other.
- Scalability
- Lookup must be efficient even with large systems
- Peer dynamics
- Nodes may come and go, may fail
- Ensure the node responsible for a key can always
be found.
32Chord Id and Consistent Hashing
- Assigns nodes and keys a m-bit identifier using a
base hash function such as SHA-1 - Identifiers are ordered in an identifier circle
33Consistent Hashing (cont.)
- A key is stored at its successor node with next
higher ID
K5
0
IP198.10.10.1
N123
K20
Circular 7-bit ID space
N32
K101
KeyLetItBe
N90
K60
34Consistent Hashing (cont.)
- For any set of N nodes and K keys, with high
probability - Each node is responsible for at most (1e)K /N
keys. - When an (N 1 )st node joins or leaves the
network, responsibility for O (K /N) keys changes
hands (and only to or from the joining or leaving
node).
35Efficient Key Search
- Naive Search time O(N) search each node
individually - Search through routing O(logN)
- Let m be the number of bits in the key/node
identifiers - Each node, n, maintains a routing table with (at
most) m entries, called finger table - The i-th entry in the table at node n contains
the identity of the first node, s, that succeeds
n by at least pow(2,i-1) on the identity circle
36Efficient Key Search Finger Table
- Every node knows m other nodes in the ring
- Increase distance exponentially
37Efficient Key Search Finger Table
- Finger i points to successor of n2i
N120
N16
N112
80 25
80 26
N96
80 24
80 23
80 22
80 21
80 20
N80
38One Lookup Example
N5
99 25 131 128 3
5 22 9
N10
N110
K19
N20
N99
N32
Lookup(K19)
32 26 96
N80
N60
39Example Applications for Structured P2P Network
- eMule Kademlia
- Content search keywords are hashed and results
stored on the responsible peer - Windows XP p2p SDK
- Peer localization
- Overcite
- A distributed version of the online library
citeseer - Killer application still to be found!
40Structured P2P References
- Ion Stoica, Robert Morris, David Liben-Nowell,
David R. Karger, M. Frans Kaashoek, Frank Dabek,
Hari Balakrishnan, Chord A Scalable
Peer-to-peer Lookup Protocol for Internet
Applications, IEEE/ACM Transactions on
Networking - Sylvia Ratnasamy, Paul Francis, Mark Handley,
Richard Karp, Scott Shenker, A Scalable
Content-Addressable Network, SigComm 2001 - M. Castro, P. Druschel, A-M. Kermarrec and A.
Rowstron, One ring to rule them all Service
discovery and binding in structured peer-to-peer
overlay networks, SIGOPS European Workshop,
France, September, 2002
41Discussion Comparing Structured and
Un-structured P2P System
42Part II Survey of popular P2P applications
43Survey of popular peer-to-peer applications -
Outline
- Part II
- P2P-File download
- BitTorrent
- P2P-Telephony
- Skype
44BitTorrent Measurement on SuprNova
overall
videos
games
music
45BitTorrent - Components
- In the initial version of BitTorrent, a torrent
is composed of - A single content
- The content is cut down into pieces
- Pieces are cut down into blocks, which are the
transmission units between peers - The protocol only accounts for transferred
pieces partially received pieces cannot be
served by a peer - A single Central Tracker
- The central tracker has
- the list of all peers participating accessing or
serving the file - the list of all pieces of the file, and their
respective hash values - One or more Seeds
- Seeds have the entire file
- Many Leechers
- Leechers download the file
46BitTorrent Peer-set
- Peer-set
- The list of neighbors a peer is allowed to
communicate with - Peer-set construction
- Each peer (seed or leecher) contacts the tracker
and gets a list of peers participating in the
same session - Typically 50 peers are chosen at random by the
tracker for each peer - The peer-set is augmented by peers connecting
directly to you - The peer-set size is limited to 80 peers
47BitTorrent - Algorithms
- Two components in BitTorrent downloading
algorithm - Peer Selection determines from whom to download
the piece? - Piece Selection determines which piece to
download?
48Tit for Tat
- Based on the English saying meaning "equivalent
retaliation" ("tip for tap"), an agent using this
strategy will respond in kind to a previous
opponent's action. - If the opponent previously was cooperative, the
agent is cooperative. If not, the agent is not. - This strategy is dependent on the following
conditions that has allowed it to become the most
prevalent strategy for the Prisoner's Dilemma - 1. Unless provoked, the agent will always
cooperate - 2. If provoked, the agent will retaliate
- 3. The agent is quick to forgive
Taken from the wikipedia free encyclopedia -
www.wikipedia.org
49BitTorrent - Peer selection
- Choke Algorithm
- Choking is a temporal refusal to upload
- Each peer unchokes a fixed number of peers
(default 4) - 3 peers on tit-for-tat basis
- 1 peer on optimistic unchoke basis
50BitTorrent - Peer selection (contd)
- Tit-for-tat peer selection
- Select the 3 peers from which you downloaded most
and that are interested in your chunks - Peer selection is done every 10 seconds, based on
the download rates are of the last 30 seconds.
51BitTorrent - Peer selection (contd)
- Optimistic unchoke peer selection
- Select one peer at random that is interested in
your chunks, regardless of the current download
rate from it - Rotates every 30 seconds.
- Reason
- To discover currently unused connections that are
better than the ones being used - Corresponds to always cooperating on the first
move in prisoner's dilemma
52BitTorrent - Peer selection (contd)
- Anti-Snubbing
- When a remote peer uploaded no data in 60s, the
local peer assumes that he has been snubbed - In that case the local peer refuses to upload to
it except for the optimistic unchoking
53Properties of tit-for-tat Fairness
characterization
- From the perspective of one local peer
- Created 6 sets of 5 remote peers each
- First set (in black) contains the 5 peers with
most contribution - Last set (in white) represent the 25 to 30 best
contributors
Small number of leechers in torrent
Startup phase
54BitTorrent
- Two components in BitTorrent downloading
algorithm - Peer Selection determines from whom to download
the piece? - Piece Selection determines which piece to
download?
55BitTorrent - Piece selection
- Random first piece
- Only applies if leecher has downloaded less than
4 pieces (chunks) - Choose randomly the next piece to download
- Allows to download quickly your first pieces to
have pieces to reciprocate for the choke
algorithm
56BitTorrent - Piece selection (contd)
- Local rarest first policy
- Determine the pieces that are most rare among
your peers and download those first - Ensures that the most common pieces are left till
the end to download - Rarest first also ensures that a large variety of
pieces are downloaded from the seed
57Properties of local rarest first - Entropy
characterization
- From the perspective of one local peer
- a the time the local peer is interested in
remote peer - c the time remote peer is interested in local
peer - b, d the time the remote peer spent in the
leechers peer-set
In startup phase
Measurement not representative (only a small
number of ratios were available)
58BitTorrent - Summary
- Efficient file download thanks to simple
incentive mechanisms - Local rarest first
- High piece entropy
- Tit-for-tat
- Avoids free-riding
- Optimizes resource utilization
- Space for improvement?
- Steady state very stable and efficient
- Startup-phase still unstable with some
inefficiencies - Is there an advantage of deploying BitTorrent on
Set-Top-Boxes? - Is BitTorrent adapted to mobile terminals/DTN
networks? Possible usage of network coding?
59BitTorrent References
- Section inspired by
- Rarest First and Choke Algorithms are Enough,
Arnaud Legout, G. Urvoy-Keller, P. Michiardi, IMC
2006. - The Bittorrent P2P File-sharing System
Measurements and Analysis, J.A Pouwelse, P.
Garbacki, D.H.J Epema, H.J. Sips, IPTPS 05,
February 2005. - Incentives Build Robustness in BitTorrent, Bram
Cohen, First Workshop on Economics of
Peer-to-peer Systems, June 2003.
60Survey of popular peer-to-peer applications -
Outline
- Part II
- P2P-File download
- BitTorrent
- P2P-Telephony
- Skype
61Skype Overlay
- Protocol not fully understood today
- Proprietary protocol
- Content and control messages are encrypted
- Protocol reuses concepts of the FastTrack overlay
used by KaZaA - Builds upon an unstructured overlay
- Combines
- distributed index servers
- a flat unstructured network between index servers
- Two tier hierarchy
- Super Nodes (SN)
- Ordinary Nodes (ON)
62Skype Overlay (contd)
- Super Nodes (SN)
- Connect to each other, building a flat
unstructured overlay (similar to the Gnutella
overlay) - Ordinary Nodes (ON)
- Connect to Super Nodes that act as a directory
server (similar to the index server in Napster) - Skype login server
- Only central component
- Stores and verifies usernames and passwords
- Stores the buddy list
63Skype Overlay (contd)
Skype login server
Message exchange during login for authentication
Neighbor relationship
SN
ON
64How is the overlay constructed? - Super Node Lists
- Each node keeps a host cache with a list of Super
Nodes IP-addresses - Up to 200 entries
- Some Super Nodes IP-addresses are hard-coded
- Super Nodes provided by Skype
- These lists are used to locate a nodes Super Node
at login
65How is the overlay constructed? - Login
- Contact login server and authenticate
- Advertise your presence to other peers
- Contact a Super Node
- Contact your buddies (through Super Node), and
notify your presence
66Super Nodes Index servers
- Super Nodes are index servers
- I.e. index of locally connected Skype users (and
their IP addresses) - If buddy is not found in local index of a Super
Node - Spread node search to neighboring Super Nodes
- Not clear how this is implemented
- Possibly flood the request similar to Gnutella
67Super Nodes Relay nodes
- Super nodes also act as relay nodes
- Enables NAT traversals
- Avoid congested or faulty paths
68Super Nodes Relay nodes
- Alice would like to call Bob (or inversely)
Alice
Bob
69Super Nodes Relay nodes
- Alice would like to call Bob (or inversely)
Alice
Contact Relay Node
Call
Skype relay node
Bob
70Super Node election
- When does an ordinary node becomes a super node?
- High bandwidth, Public IP address, but details
not clear - Highly dynamic
- Super Node Churn, Short Super Node session time
Churn
Session time
71Super Node election
- A world map of Skype Super Nodes
72Skype - Summary
- VoIP has other requirements than file download
- Delay
- Jitter
- Skype network seems to handle these constraints
in spite of - High node churn
- Protocol not fully understood
73Skype References
- Section inspired by
- An Analysis of the Skype Peer-to-Peer Internet
Telephony Protocol, S.A. Baset and H.G.
Schulzrinne, Infocom 2006, April 2006. - An Experimental Study of the Skype Peer-to-Peer
VoIP System, Saikat Guha, Neil Daswani, Ravi
Jain, IPTPS06, February 2006. - Characterizing and Detecting Skype-Relayed
Traffic, K. Suh, D. R. Figueiredo, J. Kurose, D.
Towsley, Infocom 2006, April 2006.
74Part III P2P Video-on-Demand
75Project Push-to-peer
- Goal
- Provide a Video-on-Demand service to Internet
gateways and Set-Top-Boxes
76Push-to-peer The architecture
Control server
Video server
Internet gateways
. . .
DSLAM
77Push-to-peer The PUSH-phase
- Push videos to Internet gateways
- No gateway has the full content
- Missing video chunks are available on the other
gateways under the same DSLAM
Control server
Video content server
Internet gateways
. . .
DSLAM
78Push-to-peer The PULL-phase
- Watch a movie
- Pull missing content from neighboring gateways
Control server
Video content server
Internet gateways
. . .
DSLAM
79Why Push-to-peer?
- No ISP bandwidth consumption beyond DSLAM
- Retains advantages of content server-based
solution - Under full control of ISP
- Guaranteed content safety
- Short playback delays
- But at a lower cost
- More robust (content server single point of
failure) - No need to provision content server uplink b/w
- Uses Internet gateways storage
80Push-to-peer Quick technical overview
- Assumptions
- A centralized control server is available
- Needed anyhow for billing
- Coordinates all gateways
- Knows where each content is located
- The video server is not used at all in pull phase
- It is owned by the content owner
- We dont have any guarantee with respect to the
performances of that server - Support of trick mode
81Push-to-peer Research challenge
- Ensure efficient resource pooling
- System equivalent to single content server with
- Storage sum of individual storage spaces
- B/w sum of individual uplink bandwidths
?
. . .
82Push-to-peer Content placement strategies
- Candidate strategies
- Coding
- Simplified peer selection
- Increased robustness
- Usage of Windows
- Support of trick mode
- Reduced startup delay
- Prefix
- Reduced startup delay
- Trade storage cost with upload bandwidth cost
83Content placement strategies data format
File prefix
Data window
- Several formats considered
- Proposed solution
- Full striping to achieve maximum resource pooling
- Prefixes to reduce startup latency
- Encoding to increase flexibility (can recover
content from any sufficiently large set of peers)
Window prefix
Encoded data
84Push-to-peer Load Balancing
- Pull data from least loaded boxes
- Provides load balancing
- Load information
- Centralized available at control server
- can achieve perfect resource pooling
- or
- Decentralized obtained by probing
- reduces server load
- Handles fluctuating bandwidth
- resource pooling is not necessarily perfect
RHGs
85Dimensioning analysis
- Models to predict startup delays for varying
- Rate of movie requests
- Memory
86Push-to-peer - Summary
- Benefit from Internet gateways properties
- Always-on
- Closed environment
- Storage (and possibly bandwidth) controlled by
ISP - Communication between gateways takes into account
network topology - Achieving QoS guarantees is possible
- Applicability to other environments?
- Controlled environments
- In-Flight Entertainmnent
- On-the-road Entertainment (taxis, trains)
- Hotel rooms
- Ad hoc networks (phones, PDAs,)
87Part IV Conclusion and future of p2p
88P2P Attracting Attentions from Commercial World
- NBC Universal goes peer-to-peer worldmedia.com
- BitTorrent raised 8.75 million venture capitals
- Teamed with CacheLogic to work for BT
- Startups providing P2P live program pplive,
coolstreaming - BBC Legal Download Platforms iMP / Kontiki
- Allow users in UK to download BBC TV and radio
programs via a program guide for up to 7 days
after broadcast
89P2P Attracting Attentions from Commercial World
- Microsoft is active
- Peer-to-Peer library
- Acquisition of Groove
- Avalanche
- RedCarpet
- P2P Windows update
- Google and Apple are not using P2P... Yet (?)
- they face mounting costs with video
- Google
- Google video is online
- Bought YouTube
- Bought chinese p2p-company Xunlei Network
Technology - Apple
- iTunes changed the world of music
- Will it change the world of video?
- iTV will be a digital media adapter with HDD
90Will P2P Go Beyond Desktop?
- Current device requirement
- CPU, memory, and disk space requirement
- Platforms supported
- Internet connection requirement
- Three categories of p2p application
- file downloading
- BitTorrent already on some SetTop-Boxes and
DSL-routers - Voice
- Skype mobile phones
- Video
- Not yet
91Will P2P Go Beyond Desktop? (Discussion)
- Mobile P2P?
- What benefits does p2p offer over mobile device?
- ???
- What are potential issues?
- Power
- Connection speed
- ???
- P2P on set-top box?
- ???
- Other consumer electronic devices?
- ???
92Future of P2P - Ad-hoc P2P
- Opportunistically use all available technologies!
- Access knowledge and resource of devices you
cross in the street - Local P2P content search
- What is currently the best place to find a cab ?
- What are the results of yesterdays soccer match
?
GSM
93Future of P2P - Ad-hoc P2P (contd)
- Your request or messages are stored and forwarded
- Enable p2p communication even if there is no
direct path between two peers at a given moment
in time
94Conclusions and Future of P2P
- More commercial P2P applications
- Combats between legal and illegal content sharing
will continue - More p2p used in commercial environment
- Reduce distribution cost and compete with illegal
content - Secure P2P
- Better performance
- More intelligent sharing
- More scalable
- Handle churn better
- Competing with other technology
- Supporting diversity long tail content
- YouTube
- Supporting community
- Relationship with ISPs
- Become ubiquitous application ??
95The End