Title: A Hybrid Architecture for Cost-Effective On-Demand Media Streaming
1- A Hybrid Architecture for Cost-Effective
On-Demand Media Streaming - Mohamed Hefeeda Bharat Bhargava
- CS Dept, Purdue University
- Support NSF, CERIAS
- FTDCS 2003
2Motivations
- Imagine a media server streaming movies to
clients - Target environments
- Distance learning, corporate streaming,
- Current approaches
- Unicast
- Centralized
- Proxy caches
- CDN (third-party)
- Multicast
- Network layer
- Application layer
- Proposed approach (Peer-to-Peer)
3Unicast Streaming Centralized
- Pros
- Easy to deploy, administer
- Cons
- Limited scalability
- Reliability concerns
- Load on the backbone
- High deployment cost ..
- Note
- A server with T3 link (45 Mb/s) supports only up
to 45 concurrent users at 1Mb/s CBR!
4Unicast streaming Proxy
- Pros
- Better performance
- Less load on backbone and server
- Prefix caching, selective caching, staging,
- Cons
- Proxy placement and management
- High deployment cost ..
5Unicast Streaming CDN
- Pros
- Good performance
- Suitable for web pages with moderate-size objects
- Cons
- Cot CDN charges for every megabyte served! ?
- Not suitable for VoD service movies are quite
large (Gbytes) - Note Raczkowski02
- Cost ranges from 0.25 to 2 cents/MByte
- For a one-hour streamed to 1,000 clients, content
provider pays 264 to CDN (at 0.5 cents/MByte)!
6Multicast Streaming Network Layer
- Pros
- Efficient!
- Asynchronous client
- Patching, skyscraper,
- e.g., Mahanti, et al. ToN03
- Cons
- Scalability of the routers
- Asynchronous ? tune to multiple channels
- Require high inbound bandwidth (2R)
- Not widely deployed
Patch stream
7Multicast Streaming Application Layer
- Pros
- Deployable
- Cons
- Assumes end systems can support (outbound
bandwidth) multiple folds of the streaming rate -
8P2P Approach Key Ideas
- Make use of underutilized peers resources
- Make use of heterogeneity
- Multiple peers serve a requesting peer
- Network-aware peers organization
9P2P Approach (contd)
- Pros
- Cost effective
- Deployable (no new hardware, nor network support)
- No extra inbound bandwidth (R)
- Small load on supplying peers (lt R)
- Less stress on backbone
- Challenges
- Achieve good quality
- Unreliable, limited capacity, peers
- Highly dynamic environment
10P2P Approach Entities
- Peer
- Level of cooperation (rate, bw, coonections)
- Super peer
- Help in searching and dispersion
- Seed peer
- Initiate the streaming
- Stream
- Time-ordered sequence of packets
- Media file
- Divided into equal-size segments
- Super Peers play special roles ? Hybrid system
-
11Hybrid System Issues
- Peers Organization
- Two-level peers clustering
- Join, leave, failure, overhead, super peer
selection - Operation
- Client protocol manage multiple suppliers
- Switch suppliers
- Effect of switching on quality
- Cluster-Based Searching
- Find nearby suppliers
- Cluster-Based Dispersion
- Disseminate new media files
- Details are in the extended version of the paper
12Peers Organization Two-Level Clustering
- Based on BGP tables
- Oregon RouteViews
- Network-cluster KW 00
- Peers sharing the same network prefix
- 128.10.0.0/16 (purdue), 128.2.0.0/16 (cmu)
- 128.10.3.60, 128.10.7.22, 128.2.10.1,
128.2.11.43 - AS-cluster
- All network clusters within an Autonomous System
Snapshot of a BGP routing table
13Peers Organization Join
Bootstrap data structure
14Client Protocol
- Building blocks of the protocol to be run by a
requesting peer - Three phases
- Availability check
- Search for all segments with the full rate
- Streaming
- Stream (and playback) segment by segment
- Caching
- Store enough copies of the media file in each
cluster to serve all expected requests from that
cluster - Which segments to cache?
15Client Protocol (cont'd)
- Phase II Streaming
- tj tj-1 d / d time to stream a segment /
- For j 1 to N do
- At time tj, get segment sj as follows
- Connect to every peer Px in Pj (in parallel) and
- Download from byte bx-1 to bx-1
Note bx sj Rx/R
Example P1, P2, and P3 serving different pieces
of the same segment to P4 with different rates
16Dispersion (cont'd)
- Dispersion Algorithm (basic idea)
- / Upon getting a request from Py to cache Ny
segments / - C ? getCluster (Py)
- Compute available (A) and required (D) capacities
in cluster C - If A lt D
- Py caches Ny segments in a cluster-wide round
robin fashion (CWRR)
- All values are smoothed averages
- Average available capacity in C
- CWRR Example (10-segment file)
- P1 caches 4 segments 1,2,3,4
- P2 then caches 7 segments 5,6,7,8,9,10,1
17Evaluation Through Simulation
- Client parameters
- Effect of switching on quality
- System Parameters
- Overall system capacity,
- Average waiting time,
- Load/Role on the seeding peer
- Scenarios different client arrival patterns
(constant rate, Poisson, flash crowd) and
different levels of peer cooperation - Performance of the dispersion algorithm
- Compare against random dispersion algorithm
18Simulation Topology
- Large, Hierarchical (more than 13,000 nodes)
- Ex. 20 transit domains, 200 stub domains, 2,100
routers, and a total of 11,052 end hosts - Used GT-ITM and ns-2
19Client Effect of Switching
- Initial buffering is needed to hide suppliers
switching - Tradeoff small segment size ? small buffering
but more overhead (encoding, decoding,
processing)
20System Capacity and Load on Seed Peer
System capacity
Load on seeding peer
- The role of the seeding peer is diminishing
- For 50 After 5 hours, we have 100 concurrent
clients (6.7 times original capacity) and none of
them is served by the seeding peer
21System Under Flash Crowd Arrivals
- Flash crowd ? sudden increase in client arrivals
22System Under Flash Crowd (cont'd)
System capacity
Load on seeding peer
- The role of the seeding peer is still just
seeding - During the peak, we have 400 concurrent clients
(26.7 times original capacity) and none of them
is served by the seeding server (50 caching)
23Dispersion Algorithm
5 caching
- Avg. number of hops
- 8.05 hops (random), 6.82 hops (ours) ? 15.3
savings - For a domain with a 6-hop diameter
- Random 23 of the traffic was kept
inside the domain - Cluster-based 44 of the traffic was kept
inside the domain
24Conclusions
- A hybrid model for on-demand media streaming
- P2P streaming Powerful peers do more work
- peers with special roles (super peers)
- Cost-effective, deployable
- Supports large number of clients including flash
crowds - Two-level peers clustering
- Network-conscious peers clustering
- Helps in keeping the traffic local
- Cluster-based dispersion pushes contents closer
to clients (within the same domain) ? - Reduces number of hops traversed by the stream
and the load on the backbone
25 26P2P Systems Basic Definitions
Background
- Peers cooperate to achieve desired functions
- Cooperate share resources (CPU, storage,
bandwidth), participate in the protocols
(routing, replication, ) - Functions file-sharing, distributed computing,
communications, - Examples
- Gnutella, Napster, Freenet, OceanStore, CFS,
CoopNet, SpreadIt, SETI_at_HOME, - Well, arent they just distributed systems?
- P2P distributed systems?
27P2P vs. Distributed Systems
Background
- P2P distributed systems
- Ad-hoc nature
- Peers are not servers Saroui et al., MMCN02
- Limited capacity and reliability
- Much more dynamism
- Scalability is a more serious issue (millions of
nodes) - Peers are self-interested (selfish!) entities
- 70 of Gnutella users share nothing Adar and
Huberman 00 - All kind of Security concerns
- Privacy, anonymity, malicious peers, you name
it!
28P2P Systems Rough Classification Lv et al.,
ICS02, Yang et al., ICDCS02
Background
- Structured (or tightly controlled, DHT)
- Files are rigidly assigned to specific nodes
- Efficient search guarantee of finding
- Lack of partial name and keyword queries
- Ex. Chord Stoica et al., SIGCOMM01, CAN
Ratnasamy et al., SIGCOMM01, Pastry Rowstron
and Druschel, Middleware01 - Unstructured (or loosely controlled)
- Files can be anywhere
- Support of partial name and keyword queries
- Inefficient search (some heuristics exist) no
guarantee of finding - Ex. Gnutella
- Hybrid (P2P centralized), super peers notion)
- Napster, KazaA
29File-sharing vs. Streaming
Background
- File-sharing
- Download the entire file first, then use it
- Small files (few Mbytes) ? short download time
- A file is stored by one peer ? one connection
- No timing constraints
- Streaming
- Consume (playback) as you download
- Large files (few Gbytes) ? long download time
- A file is stored by multiple peers ? several
connections - Timing is crucial
30Related Work Streaming Approaches
Background
- Distributed caches e.g., Chen and Tobagi, ToN01
- Deploy caches all over the place
- Yes, increases the scalability
- Shifts the bottleneck from the server to caches!
- But, it also multiplies cost
- What to cache? And where to put caches?
- Multicast
- Mainly for live media broadcast
- Application level Narada, NICE, Scattercast,
Zigzag, - IP level e.g., Dutta and Schulzrine, ICC01
- Widely deployed?
31Related Work Streaming Approaches
Background
- P2P approaches
- SpreadIt Deshpande et al., Stanford TR01
- Live media
- Build application-level multicast distribution
tree over peers - CoopNet Padmanabhan et al., NOSSDAV02 and
IPTPS02 - Live media
- Builds application-level multicast distribution
tree over peers - On-demand
- Server redirects clients to other peers
- Assumes a peer can (or is willing to) support the
full rate - CoopNet does not address the issue of quickly
disseminating the media file