Title: The BitTorrent content distribution system
1The BitTorrentcontent distribution system
- Nikitas Liogkas
- CS219 P2P Systems
- University of California, Los Angeles
2Motivation
- flash crowd (aka slashdot) effect
- many clients, few servers
- Problem servers cannot handle load
- Solution swarming
- clients download pieces of the file from each
other - has been proven to have good scaling and
performance properties
3Presentation outline
- Joining the system
- Encoding / metadata file
- Tracker protocol
- Peer wire protocol
- Piece selection
- Peer selection
- Client implementations
- Resources
4Joining a torrent
metadata file
peer list
join
datarequest
- Peers divided into
- seeds have the entire file
- leechers still downloading
1. obtain the metadata file (out of band)
2. contact the tracker
3. obtain a peer list (contains seeds leechers)
4. contact peers from that list for data
5Exchanging data
!
I have
? Verify pieces using hashes
? Download sub-pieces (blocks) in parallel
? Advertise received pieces to the entire peer
list
? interested need pieces that a given peer has
6Bencoding
- encoding format of all exchanged messages
- four types
- byte strings
- integers
- lists
- dictionaries (mapping keys to values)
- examples
- 4spam represents the string spam
- i10e represents the integer 10
7Metadata file structure
- contains information necessary to contact the
tracker and describes the files in the torrent - announce URL of tracker
- file name
- file length
- piece length (typically 256KB)
- SHA-1 hashes of pieces for verification
- and creation date, comment, creator,
8Tracker protocol
- communicates with clients via HTTP/HTTPS
- client GET request
- info_hash uniquely identifies the file
- peer_id uniquely identifies the client
- client IP and port (typically 6881-6889)
- numwant how many peers to return (defaults to
50) - stats bytes uploaded, downloaded, left
- tracker GET response
- interval how often to contact the tracker
- list of peers, containing peer id, IP and port
- stats complete, incomplete
- tracker-less mode based on the Kademlia DHT
9Presentation outline
- Joining the system
- Encoding / metadata file
- Tracker protocol
- Peer wire protocol
- Piece selection
- Peer selection
- Client implementations
- Resources
10Peer wire protocol
- implemented on top of TCP
- messages
- handshake (maybe with bitfield)
- keep-alive
- choke / unchoke
- interested / not interested
- have (advertisement of a newly-acquired piece)
- request / piece
- cancel (only used in endgame mode)
- port (used in tracker-less mode)
11Piece selection
- when downloading starts choose at random
- get complete pieces as quickly as possible
- obtain something to trade
- after we have 4 pieces (local) rarest first
- achieves the fastest replication of rare pieces
- obtain something of value to trade
- get unique pieces from the seed
- endgame mode
- defense against the last-block problem
- send requests for missing sub-pieces to all
peers in our peer list - send cancel messages upon receipt of a sub-piece
12Last-block problem
- at the end of the download, a peer may have
trouble finding the missing pieces - based on anecdotal evidence
- other proposals
- network coding Gkantsidis et al., Infocom05
- prefer to upload to peers with similar file
completeness unfair for the peers with most of
the pieces Tian et al., Infocom06
13Last-block problem a myth?
- is it a problem after all?
- figure from Legout et al., INRIA-TR-2006, with
permission
14Peer selection - unchoking
- calculate data-receiving rates
- upload to (unchoke) the fastest
- rate calculation is performed periodically (a
round occurs typically every 10 seconds) - constant number of unchoking slots
- attempt to achieve Pareto efficiency
15Optimistic unchoking
- periodically select a peer at random and upload
to it - typically performed every 3 rounds (30 seconds)
- multi-purpose mechanism
- allow bootstrapping of new clients
- continuously look for the fastest partners
- keep the network connected every peer has a
non-zero chance of interacting with any other
peer
16Seed unchoking
- old algorithm
- unchoke the fastest downloaders
- problem fastest peers may monopolize seeds
- new algorithm
- periodically sort all leechers according to
their last unchoke time - prefer the most recently unchoked leechers on
a tie, prefer the fastest - (presumably) achieves equal spread of seed
bandwidth
17Downloading only from seeds
new listrequest
peer list
? Repeatedly query the tracker for peer lists
? Distinguish the seeds, and receive data from
them
? Violates fairness model may be harmful to
honest peers
18Evaluation in private torrents
Download rates for all peers
- Limit bandwidth of leechers 1 to 6, no limit on
seed. - Modest fairness violation (22 better rate) when
selfish peer is fast - Robustness does not suffer most honest slower by
19Evaluation with modified seed
Download rates for all peers
155
- Seed only unchokes one leecher at a time
- Considerable fairness violation selfish peer
faster by 155 - Robustness suffers honest peers slower by at
least 32
20Rate- vs. volume-based selection
- Proponents of rate-based decision metrics
Cohen, P2PECON03 andINRIA TR2006 - Proponents of volume-based metricsBharambe et
al., MSR-TR-2005,Gkantsidis et al.,
Infocom05, Jun et al., P2PECON05,
andeDonkey file-sharing system - No clear winner yet!
21Client implementations
- mainline written in Python right now, the only
one employing the new seed unchoking algorithm - Azureus the most popular, written in Java
implements a special protocol between
clients(e.g. peers can exchange peer lists) - other popular clients ABC, BitComet, BitLord,
BitTornado, µTorrent, Opera browser - various non-standard extensions
- retaliation mode detect compromised/malicious
peers - anti-snubbing ignore a peer who ignores us
- super seeding masqueraded seed
22Resources 1
- Basic BitTorrent mechanisms Cohen, P2PECON03
- BitTorrent specification Wikihttp//wiki.theory.o
rg/BitTorrentSpecification - Measurement studies Izal et al., PAM04,
Pouwelse et al., Delft TR 2004 and IPTPS05,
Guo et al., IMC05, andLegout et al.,
INRIA-TR-2006
23Resources 2
- Theoretical analysis and modeling Qiu et al.,
SIGCOMM04, andTian et al., Infocom06 - Simulations Bharambe et al., MSR-TR-2005
- Incentives and exploiting them Shneidman et
al., PINS04,Jun et al., P2PECON05,
andLiogkas et al., IPTPS06
24Conclusion and food for thought
- BitTorrent is fast and robust
- Yet, many parameters are arbitrarily set
- number of unchoking slots
- round duration
- size of pieces/sub-pieces
- What can we learn from BitTorrent for the design
of future P2P content distribution protocols?