Towards Efficient Distribution of Highvolume Content - PowerPoint PPT Presentation

1 / 43
About This Presentation
Title:

Towards Efficient Distribution of Highvolume Content

Description:

Ph.D. Student, U.C.Berkeley. Thesis Advisor: Prof. Randy Katz. 2. Introduction ... Rule: Round-Robin transmission to neighbours. B1. Server S-000; B1,2,3 ... – PowerPoint PPT presentation

Number of Views:68
Avg rating:3.0/5.0
Slides: 44
Provided by: EEC85
Category:

less

Transcript and Presenter's Notes

Title: Towards Efficient Distribution of Highvolume Content


1
Towards Efficient Distribution of High-volume
Content
  • Mukund Seshadri
  • (mukunds_at_cs.berkeley.edu)
  • Ph.D. Student, U.C.Berkeley
  • Thesis Advisor Prof. Randy Katz

2
Introduction
  • Increasingly high volumes of content carried on
    the Internet
  • Major P2P networks, in 2004, had 10 million users
    online simultaneously, sharing over 10,000,000 GB
    worth of data. 2
  • Typical usage
  • Early 90s Web pages
  • Y2K Music 3MB files
  • Now Movies, short videos, live broadcasts,
    software updates.
  • Issue
  • Higher volumes and larger client sets can stress
    the bandwidth capacity of traditional
    distribution methods.

MLBtv screen-shot watch baseball online
2 CacheLogic Press Release http//www.cachelogi
c.com/home/pages/news/pr040715.php
3
Introduction
  • Current approaches to distributing content
  • 1/few Server(s) -gt Many Clients
  • e.g. YouTube
  • Distributed Set of Caches
  • e.g. Akamai
  • Multicast Trees (Overlay or IP)
  • e.g. End-System Multicast1
  • P2P Networks
  • e.g. Gnutella, BitTorrent
  • Emphasis so far
  • Uncoordinated distribution of files to
    individuals
  • Our focus
  • Coordinated distribution of large files to large
    well-understood communities.

1 Chu et al. A Case for End-System Multicast
SIGMETRICS00. 2 Jannotti et
al.Overcast OSDI00
4
Our Motivating Scenario
1 Server
File 100 MB
1000 clients low churn rate. e.g. UCB
students home PCS e.g. Registered WinXP users
(SP2 was 260MB !)
Environment server-client and client-client
bandwidth is the bottleneck limitation
  • Target Problem minimize time by which all
    clients have received the file.
  • -- i.e., Completion Time --

5
Our Goals
  • Obtain the best possible algorithm for the
    target problem.
  • Establish performance of current known
    approaches.
  • Consider several different environments
  • Cooperative .vs. Non-cooperative clients
  • Homogeneous clients/bandwidths .vs.
    Heterogeneous.
  • Different types of content/application Bulk
    downloads .vs. Streaming Video

6
Talk Roadmap
  • Introduction
  • Background
  • Research Outline
  • Research Details - Analysis of the Cooperative
    Homogeneous Scenario
  • Research Details - Algorithms for Heterogeneous
    Scenarios
  • Research Snapshot Other Scenarios
  • Future Work

7
Background
  • Types of Distribution Methods
  • 1/few Server(s) -gt Many Clients
  • e.g. YouTube
  • Issue Need to provision source bandwidth
    proportional to number of clients
  • Clients upload capacity completely unutilized
  • Distributed Set of Caches
  • e.g. Akamai
  • Issue distributing content to the Caches/Edges
  • Multicast Tree-based methods
  • P2P distribution

8
Multicast trees Background
  • d-ary tree multicast 1,2
  • Operation File (parts) sent by each internal
    tree node to its d children, which propagate it
    similarly.
  • Target client reception rate, in-order delivery,
    low delay
  • Inefficiency leaf node bandwidths unutilized
  • Network-layer .vs. Overlay implementation
  • e.g. ESM 1, Overcast 2

1 Chu et al. A Case for End-System Multicast
SIGMETRICS00. 2 Jannotti et
al.Overcast OSDI00
9
P2P methods
  • Napster, Gnutella, FastTrack
  • Distributed, scalable search
  • Download files from a few end-hosts.
  • Large-fileSingle-source - download not optimized
  • Splitstream and Parallel trees 3,4
  • Pro Utilizes leaf nodes upload capacities
  • Con Useful upload capacity growth is sub-optimal
  • Target load-balance, fairness
  • BitTorrent bittorrent.com
  • Useful for distributing large files to many
    people with low server bandwidths
  • Target per-client download time, incentivizing
    cooperation

3 Karger et al. Consistent Hashing and Random
Trees STOC97. 4 Castro et al.Splitstream
SOSP03
10
BitTorrent .com - Background
  • Tracker enables client rendezvous
  • Clients in random overlay graph
  • Utilizes clients upload capacity
  • Tit-for-tat prioritize transmissions to
    neighbors by incoming bandwidth from them
  • Targets individual performance, incentives,
    dynamicity but performance/optimality in each
    dimension not clear.
  • Completion Time has not been adequately
    researched

11
Background Summary and Issues
  • Summary of Methods -
  • d-ary tree multicast 1,2
  • Target client reception rate, in-order delivery
  • Parallel trees 3,4
  • Target load-balance, fairness
  • BitTorrent bittorrent.com
  • Target per-client download time, incentivizing
    cooperation
  • Different methods gt different goals none
    targeted to reduce completion time.
  • Completion time of these methods not yet
    adequately understood.
  • What is the best, or optimal algorithm?

1 Chu et al. A Case for End-System Multicast
SIGMETRICS00. 2 Jannotti et
al.Overcast OSDI00
3 Karger et al. Consistent Hashing and Random
Trees STOC97. 4 Castro et al.Splitstream
SOSP03
12
Talk Roadmap
  • Introduction
  • Background
  • Research Outline
  • Research Details - Analysis of the Cooperative
    Homogeneous Scenario
  • Research Details - Algorithms for Heterogeneous
    Scenarios
  • Research Snapshot Other Scenarios
  • Future Work

13
Research Outline
  • Approach
  • Simple Initial Scenario Cooperative,
    Homogeneous, Static Client set, bulk content.
  • Given some file size, and client-set size.
  • Find the best possible, or optimal algorithm,
    that would minimize completion time from 1
    server.
  • Use theory if possible else simulate.
  • Compare to other known methods, like BitTorrent.
  • Remove simplifying assumptions one-by-one.
  • Assume static clients through-out.

14
Contributions
  • Cooperative Homogeneous (Bulk Content) Scenario
  • Provably optimal algorithm
  • Investigated BitTorrents completion time
  • Simulations
  • Heterogeneous scenario
  • Randomized heuristic-based algorithm
  • Non-cooperative Scenario
  • Credit-limited Barter scheme analysis and
    simulation
  • App-specific content delivery/ordering
  • Priority-based distribution heuristic.

15
Talk Roadmap
  • Introduction
  • Background
  • Research Outline
  • Research Details - Analysis of the Cooperative
    Homogeneous Scenario
  • Research Details - Algorithms for Heterogeneous
    Scenarios
  • Research Snapshot Other Scenarios
  • Future Work

16
Cooperative Distribution - Model
Block Size B Quantum of data transmission (Cannot
transmit before fully received)
File F k Blocks B1,B2Bk
  • T(k,n) time taken for all clients to receive
    all blocks.
  • Time unit Tick B/U.

To find the lowest possible value of T(k,n) and
the algorithm that achieves this value.
17
Lower bound
e.g. 1 block, 7 nodes Binomial Tree is optimal
Server S
Tick 1
Bj
C1
C2
C3
  • Observations
  • K blocks take at least k ticks to leave server.
  • Last block takes another log2n -1

C6
C4
C5
C7
Lower bound for T(k,n) k log2n -1 (ticks)
18
Hypercube Algorithm
Tick 1
  • Rule Round-Robin transmission to neighbours

Tick 2
Tick 3
Server S-000 B1,2,3
B3
B1
C-100
C-001
B2
B1
B3
C-010
B2
B1
B1
B2
C-011
C-110
B1
C-101
B1
B2
B1
C-111
B1
19
Hypercube Algorithm
Tick 4
  • Rule transmit highest numbered block

Server S-000 B1,2,3
B3
C-100
C-001
B3
B1
C-010
B2
B1
B1
B2
B3
C-011
C-110
C-101
B2
B1
B1
B1
C-111
B2
B1
20
Hypercube Algorithm
  • Completes in optimal time!

21
Arbitrary n
  • Use a hypercube of logical nodes
  • Logical node can have 1 or 2 physical nodes
  • Dimension of hypercube L Floor(log2n)
  • At most one block mismatch within a logical node
  • This finishes in k log n -1 ticks

Our optimal algorithm design is complete
22
Performance of Some Distribution Methods
  • Completion times T(k,n) for
  • Server serves each client kn
  • Linear pipeline kn-1
  • Multicast tree of degree d d(k logdn -2)
  • Splitstream with d parallel trees kd logdn

All of the above are sub-optimal Compare with
k log2n -1 (ticks)
23
BitTorrent comparison
  • Asynchronous simulator modeling client/client
    messages in BitTorrent spec.
  • Small fixed no. of neighbours unchoked
  • Chosen in order of reverse data rate
    (tit-for-tat)
  • Decision revisited periodically (choke
    interval)
  • Ties broken by bandwidth to neighbour.
  • 1 neighbour unchoked optimistically
  • Stays unchoked for 3choke-interval

24
BitTorrent Results - Snapshot
  • Assumed k blocks and n nodes (all arriving at
    time 0)
  • Varied k and n from 10-2000
  • Metric completion time T (of all nodes)
  • Least-squares estimate of T(k,n)2.2k47log2n-173.
  • With default parameters
  • This can be improved to 1.3k9.8log2n-9
  • By tuning parameters increasing choke interval,
    and decreasing the number of simultaneous uploads.

BitTorrent can be 2.2x worse than optimal (in
completion time). That factor can fall to 1.3x,
by changing certain features (at the risk of
weakening the tit-for-tat scheme)
25
Talk Roadmap
  • Introduction
  • Background
  • Research Outline
  • Research Details - Analysis of the Cooperative
    Homogeneous Scenario
  • Research Details - Algorithms for Heterogeneous
    Scenarios
  • Research Snapshot Other Scenarios
  • Future Work

26
Adapting to Heterogeneous Clients
  • Hypercube algorithm requires synchronized
    communication pattern
  • Can extend to some simple specific heterogeneous
    cases.
  • Not suited for general heterogeneous scenario
  • Key operation optimal mapping of nodes that need
    a block to nodes that have that block,
  • to ensure maximal utilization of client upload
    capacity
  • Can we do this mapping randomly?
  • Random overlay graph.
  • Neighbor selection e.g. Random
  • Block Selection e.g. Rarest-First
  • Transmit 1 Block
  • Notify neighbors of block reception

Repeat
27
The Price of Randomization
  • Synchronous simulations of homogeneous clients
  • Metric completion time T (k,n)
  • Constant B T in ticks(B/U).
  • Overall range k10-10000, n10-10000
  • Least squares estimate of T(k,n) 1.01k4.4log
    n3.2

Randomized algorithm close to optimal when
kgtgtlog2n Reduces completion time by factor of
1.3-2.2 compared to BitTorrent (depending on
tuning)
28
Heterogeneous Conditions
  • Issues
  • Client upload bandwidths can be in a wide range.
  • Bandwidth can depend on the destination client
    too.
  • Nodes can consider neighbour bandwidths when
    selecting neighbour to transmit to.
  • BitTorrent's selection policies are not targeted
    for a completely cooperative scenario.
  • Always selecting higher-bandwidth neighbours can
    lead to starvation of lower-bandwidth nodes
    (worse completion times)

B1
Lower-bandwidth path
B1
29
Heterogeneous Case Heuristics
  • Proposal HRand
  • Intuition
  • Nodes have a queue of servable blocks
  • Make sure no queue becomes too short/empty.
  • Demand metric for node Y accounts for
  • Blocks required by Y, to serve its neighbors.
  • The no. of such neighbours
  • The bandwidth at which N can serve to those
    neighbours.
  • Use a bandwidth threshold
  • Lowest Supply node chosen to send block to.

30
Simulation Methodology
  • Algorithms simulated
  • RAND - random selection, no neighbour-heuristic
  • HRAND - our heuristic uses U(N)
  • GRAND greedy neighbour selection, approximates
    BitTorrent, minus choke mechanisms.
  • Client Bandwidth Distributions Considered
  • Homogeneous
  • 2-Level
  • Clustered

31
Results
  • Clustered Model clients grouped into 10
    clusters bandwidth within clusters is 10x the
    bandwidth across the clusters
  • E.g. clusters can be topological or geographical.
  • GRAND actually performs worse than RAND in this
    scenario
  • HRAND outperforms GRAND by around a factor of
    1.6-2.1.

32
Talk Roadmap
  • Introduction
  • Background
  • Research Outline
  • Research Details - Analysis of the Cooperative
    Homogeneous Scenario
  • Research Details - Algorithms for Heterogeneous
    Scenarios
  • Research Snapshot Other Scenarios
  • Future Work

33
Non-Cooperative Clients
  • Proposal incentive scheme based on barter of
    data blocks
  • Credit-limited Barter
  • X uploads to Y only if the net no. of blocks from
    X to Y is lt S.
  • Degree limit required to limit free blocks.
  • Advantages
  • Strictly defined invariant relationship between
    peers
  • No timing parameter (which adversely affects
    performance in BitTorrents case).

34
Barter results (snapshot)
  • Approach
  • Focus on completion time of above algorithms
  • Analysis of specific cases simulations for
    general case.
  • Not in scope analysis of the strength of the
    incentive scheme.

35
Customizing delivery
  • Different applications gt different requirements
    on data delivery
  • Download-in-order gt download full movie, but
    start watching quickly.
  • Live Video Stream
  • Rewind and fast-forward semantics
  • Proposal Block priority-based distribution
  • Minimal change to the distribution algorithm

App-specific Layer
Block priorities
App-independent Distribution (HRAND/Hypercube)
36
Summary of Contributions
  • Hypercube Algorithm for optimal completion time
    in a homogeneous scenario.
  • For heterogeneous scenarios, we proposed a
    randomized heuristic-based algorithm (and
    evaluated it by simulations)
  • The above two algorithms are faster, simpler, and
    more general than related prior work BitTorrent,
    Qiu et al. Sigcomm04, Xang et al. Infocom04,
    Bar Noy et al. DAM 00, Splitstream
  • Established BitTorrents completion time by
    simulations.
  • Adapted to non-cooperative scenario proposing
    fast barter-based schemes
  • Proposed an mechanism to enable
    application-specific customization of block
    ordering.

37
Future Work
  • Real-world experience
  • Implementation on PlanetLab
  • Impact of messaging overheads
  • Simulate real-world traces
  • Reliability and Dynamicity
  • Impact of network failure and node churn.
  • Explore distributed/replicated tracker-state
  • Consider resilient overlay routing to tracker
  • More Customized Applications
  • Emulate TIVO semantics.
  • Algorithms for cyclic barter
  • The hypercube satisfies cyclic barter, optimally.

38
Backup Slides follow
39
Optimal Algorithm/Proof
  • Binomial Pipeline (n2L) 5
  • Opening phase of L ticks
  • nodes in L groups Gi has 2L-i nodes.
  • Middle phase
  • Match and swap!
  • End server keeps sending Bk

After tick k-1 Bk moves along a binomial tree
gt Optimal !
5 Yang et al. Service Capacity of peer-to-peer
Networks INFOCOM04. discusses a version of
this algorithm for npower-of-2
40
HRAND Results
  • 2-Level Model 50 of Clients have 10x the
    bandwidth of the remaining clients
  • e.g. Cable-Dialup mix or Cable-CampusNetwork mix.
  • HRAND reduces the completion time by a factor of
    1.2-1.8, compared to GRAND and RAND.

41
BACKUP Barter Models (snapshot)
  • Strict Barter lower boundkn/2.
  • If download capacitygt2U, we have an algorithm
    with T(k,n)kn-1.
  • High start-up cost gt high completion times
  • Relaxed Barter
  • X uploads to Y only if the net no. of blocks from
    X to Y is lt S.
  • But Y can get S(degree) free blocks
  • So S has to impose a degree limit (issuing tokens
    to allow peering)
  • Special case analyses of Relaxed barter indicate
    much lower completion times than strict barter
  • S2,npower-of-2 Hypercube algorithm can be
    used.
  • S1 T(k,n) upper-bounded by kn-2.
  • Simulations for general cases.

42
Barter Results (snapshot)
  • Random Block Selection requires high graph degree
  • Low (near-optimal) completion time can be
    achieved
  • Rarest-first block selection policy is necessary
    to maintain low degree.

43
Results (snapshot)
  • Evaluation Approach
  • 2 candidate Upper Layers
  • IBULK Download-in-order
  • ISTREAM Fixed-rate CBR
  • Priority scheme sliding window (HRAND)
  • Simulations
  • Comparison Algorithms
  • Rarest-lowest block
  • GRW
  • Greedy Bandwidth-based Neighbor Selection
  • Rarest-First Block Selection
  • Priority Window
  • Metric highest uninterrupted data rate, loss
    rate
Write a Comment
User Comments (0)
About PowerShow.com