Selected Techniques in Content Distribution Networks - PowerPoint PPT Presentation

About This Presentation
Title:

Selected Techniques in Content Distribution Networks

Description:

Overcome bandwidth limitations for video applications to branches ... Start/Stop, FF/Rew, Seek, Change bit rate... Data channel (UDP or TCP) ... – PowerPoint PPT presentation

Number of Views:47
Avg rating:3.0/5.0
Slides: 51
Provided by: cao4
Category:

less

Transcript and Presenter's Notes

Title: Selected Techniques in Content Distribution Networks


1
Selected Techniques in Content Distribution
Networks
  • Pei Cao
  • Cisco Systems, Inc.

2
Enterprise WAN Today
Internet
Data Center
T1
Regional Hubs
. . .
56Kbps,128kbps, DSL
Branch Offices
. . .
. . .
3
Why Enterprise CDN (ECDN)
  • Overcome bandwidth limitations for video
    applications to branches
  • Distribute very-large files to branches
  • Cache and police web contents
  • Consolidate data storage
  • ...

4
Components of ECDN
Data Center
Branches
Edge Content Engine (CE)
WAN
. . . . . .
Content Distribution Manager (CDM)
IOS Router with WCCP
Web Servers
Edge CE
5
Content Delivery
Internet Or WAN
Internet or Intranet Web Server
CDM Agent
Content Dist. Module
Filtering module
RealNetwork Server
RealNetwork Proxy
HTTP Proxy Server
Windows Media Proxy Server
MPEG streaming server
6
Content Distribution
Data Center
Branches
Edge Content Engine (CE)
WAN
. . . . . .
Content Distribution Manager (CDM)
Web Servers
Edge CE
7
Challenges in Building CDNs
  • Network interoperability
  • System scalability
  • Content engine performance
  • System usability

8
Outline
  • Protocol highlight
  • Content-based WCCP
  • Algorithm highlight
  • TPUT Scalable Top-k Algorithm
  • Kernel mechanism highlight Stream Engine

9
Request Interception
  • Web Content Caching Protocol (WCCP) on port 80

Internet Or WAN
Cache Miss
ACK
TCP SYN
TCP SYN
ACK
TCP SYN_ACK
10
WCCP Bypass
Internet Or WAN
TCP SYN
TCP SYN
TCP SYN
TCP SYN
11
Dealing with Client Transparency
Internet Or WAN
TCP SYN
Cache Miss
404 Not Found
TCP SYN
TCP SYN
TCP SYN
GET HTTP/1.1
GET HTTP/1.1
HTTP/1.0 200 OK ltMETA HTTP-EQUIV\REFRESH\
12
Content-Based Interception
  • Problem how to intercept all HTTP traffic from
    client browsers?
  • Possible solutions
  • Send all traffic through content engine (CE)
  • Issues with per-packet latency and CE throughput
  • Send traffic to CE but CE tells router which
    flows to bypass
  • High overhead for short flows

13
Algorithm Highlight
  • Scalable Top-k Algorithm

14
Top-k Queries in CDNs
  • Example queries
  • List top 10 URLs accessed most often among all
    CEs
  • List top 10 domains that consume the most storage
    among all CEs
  • etc.

15
Definitions
  • a network of m nodes, connected to a central
    manager (CM)
  • each node i has a reverse-sorted list of (
    x, Vi(x) )
  • an objects sum
  • V(x) V1(x)V2(x)Vm(x)
  • Problem find the k objects with highest sums
  • ? A generic problem in distributed systems

16
Existing Methods
  • Naïve Algorithm
  • Each node sends the full list of objects and
    their values to the Central Manager
  • Threshold Algorithm (TA)
  • Proposed by multiple groups in the database
    research community

17
The Threshold Algorithm (TA)
  • Example find top 2 objects with max sums in
    three columns

Node 1
Node 2
Node 3
Central Manager (CM)
?
(A, 10) (C, 8) (E, 8) (F, 8) (B, 7) (D, 5) (J,
1) (K, 1) . . .
(B, 10) (D, 9) (F, 8) (H, 6) (G, 5) (C, 1) (A,
1) . . .
(C, 10) (A, 9) (G, 8) (J, 7) (F, 6) (D, 4) (B,
1) . . .
T 30 V(A)20, V(C)19, V(B)18
?
T 26 V(A)20, V(C)19,
?
T 24 V(F)22, V(A)20,
?
T 21 V(F)22, V(A)20,
?
T 18 V(F)22, V(A)20,
18
Adapting TA for Distributed Environments
  • Consists of multiple rounds, each round having
    two round trips
  • Round-trip 1 sorted access CM asks for the
    next B objects on the lists and nodes respond
  • Round-trip 2 random lookup CM sends a list of
    object names to nodes and nodes supply values
  • B k
  • Issues
  • of rounds unpredictable
  • O(m2) network traffic on average

19
New Algorithm Three-Phase Uniform Threshold
(TPUT)
  • Motivation terminate in a fixed number of round
    trips regardless of input
  • Operates in three phases
  • Lower-bound estimation
  • Pruning
  • Final lookup

20
Partial Sums and Upper Bounds
  • Partial sum PS(x) ?Vi(x)
  • Upper bound U(x) ?Ui(x)

Vi(x), if x has been reported by node i to CM
Vi(x)
0, otherwise
Vi(x), if x has been reported by node i to CM
Ui(x)
Ti, otherwise
Ti Node i sends all objects with values gt Ti
21
Examples
Node 2
Node 3
Node 1
CM
(B, 10) (D, 9) (F, 8) (H, 6) (G, 5) (C, 1) (A,
1) . . .
(C, 10) (A, 9) (G, 8) (J, 7) (F, 6) (D, 4) (B,
1) . . .
PS(A) 10 0 9 19 U(A) 10 9 9
28 PS(B) 0 10 0 10 U(B) 8 10 9
27
(A, 10) (C, 8) (E, 8) (F, 8) (B, 7) (D, 5) (J,
1) . . .
?
For any object O, PS(O) V(O) U(O)
22
Steps in TPUT
  • Phase 1
  • Manager gets top k objects from each node
  • Manager
  • Calculate partial sums of all objects
  • Take the kth partial sum E1 (E1 E) set t
    E1/m
  • Phase 2
  • Manager gets all objects with value t from each
    node
  • Manager
  • Calculate partial sums again take the kth
    partial sum E2 (E1 E2 E)
  • Calculate upper bounds of all objects
  • S objects whose upper bounds are E2
  • Phase 3
  • Manager ? Nodes here is S send me all objects
    in S
  • Nodes ? Manager here they are

23
Example
Node 2
Node 3
Node 1
CM
(B, 10) (D, 9) (F, 8) (H, 6) (G, 5) (C, 1) (A,
1) . . .
(C, 10) (A, 9) (G, 8) (J, 7) (F, 6) (D, 4) (B,
1) . . .
(A, 10) (C, 8) (E, 8) (F, 8) (B, 7) (D, 5) (J,
1) . . .
S(F) 22 S(A) 20 S(C) 19 Top 2 objects
are F and A.
24
Improving the Pruning Power
  • Set t (E1/m) a, where 0ltalt1

. . .
U(o)
E2/m
t
25
Compression via Hashing
  • Problem reducing traffic in phase 2
  • Solution send hashed keys of object IDs
  • Node report to CM (hash(o), V(o))
  • Hashed keys are short
  • If hash(o1)hash(o2), then V max(V(o1), V(o2))
  • Candidate set S is a set of hashed keys

26
Evaluating TPUT Algorithm
  • Trace-driven simulation
  • Optimality analysis

27
Trace Data for Simulations
28
Results on Unicast-Bytes
m10
m30
m64
m128
m203
m512
29
Number of Objects Looked-Up
30
Results on Multicast-Bytes
m10
m30
m64
m128
m203
m512
31
Optimality Analysis
  • Main results
  • TPUT is instance optimal for data sets with a
    log-log slope function C(n)
  • Zipf distribution C(n) n
  • Zipf distribution opt-ratio (m-1)2m km
  • Setting alt1 reduces cost qualitatively.
  • Zipf distribution opt-ratio (m-1)?O(vm)
    km/a

32
General Instance Optimality
  • Definition
  • An algorithm R is instance-optimal with
    optimality ratio C1, if exists C2, such that for
    any data series D, and any algorithm A,
  • cost(R, D) C1 cost(A, D) C2
  • cost is amount of network traffic
  • TA is instance optimal with opt-ratio O(m2)

33
Worst Cases for Fixed Number Round-Trip Algorithms
  • TPUT is not general instance optimal
  • Nor can any algorithm that terminates in a fixed
    number of round trips

Finding obj with highest sum
Node 1 (A, 1) (C, 1) (X1, 0.6) (X2,
0.6) . . . (Xn, 0.6) (B, 0.5) . .
Node 2 (B, 1) (D, 0.2) . . . . . . . . .
34
Log-Log Slope Function C(n)
  • L(j) is the value at position j in a
    reverse-sorted list
  • The list satisfies log-log slope function C(n),
    if, for all jk, L(jC(n)) lt L(j)/n
  • For Zipf-like distribution L(j) 1/j?, C(n)
    n1/?.

List Position 1
. . . .
. Position j . .
. . .
. . Position jC(n) .
. . .
. . .
L(j)
lt L(j)/n
35
Properties of the Two Lower Bounds
  • Let E be the true bottom
  • E1 E/m
  • E2 gt E/2
  • E2 E1
  • E2 gt E E1(m-1)/m
  • E2 gt (m/(2m-1))E

36
Restricted Instance Optimality of TPUT (a1)
  • Assume D is a collection of m lists all following
    log-log slope function C(n), then for any
    algorithm A,
  • cost(TPUT,D) cost(A,D) ((m-1)C(2m)C(m)k)

37
Effect of alt1
  • Property
  • If object x appears in n nodes in Phase 2 and
    U(x) E2, then its average value in those nodes
    R(x) E2 (1-a)/n
  • Let li the num of objects in S that appear in
    exactly i nodes in Phase 2, then
  • 1l1 2l2 3l3 mlm C(m (1a)/a)
    ?bi
  • For each i, l1 l2 li C( i (1
    a)/(1-a)) ?bi
  • Size of S is l1 l2 lm

38
Effect of alt1 (Cont)
  • Opt-ratio (m-1) C(dß) mk/a, where d is
  • d C(dß) - ?C(i ß) C(m (1a)/a)
  • For Zipf distribution, TPUT w. alt1 has
    opt-ratio (m-1) c vm
    mk/a

d
i1
39
Top-k Query Calculation in CDNs
  • of objects small ? naïve alg.
  • of objects large ? TPUT w. alt1
  • Optimal a depends on of nodes
  • Limit max of objects sent in phase 2
  • TPUT extends to hierarchical networks easily

40
Kernel Mechanism Highlight
  • Stream Engine

41
Building High Performance Internet Streaming
Server
  • Basic characteristics of streaming protocols
  • Control channel (TCP)
  • Start/Stop, FF/Rew, Seek, Change bit rate
  • Data channel (UDP or TCP)
  • Paced sending of streaming data
  • What makes Linux inefficient
  • Data copies
  • Context switches

42
Observations on Per Stream Flow
43
Observations on Per Stream Flow
lt 1 runtime gt 98 code
gt 99 runtime lt 2 code
44
Where Stream Engine Fits
45
Streaming File and Data Packets
packet1
packet2
packetn
indices
. . .
file header
ts
SubBlock1
SubBlock2
Padding
Sending time
TCP header
46
Stream Engine
  • In-kernel event driven module to deliver
    streaming data
  • Similar to sendfile() but has streaming logic
  • Method to assemble data packet
  • Timed send
  • Control channel monitoring

47
Stream Engine Interface
  • client_data_fd
  • client_control_fd
  • source_fd offset
  • packet_timing_and_assembly
  • Example 1 fixed_rate_fixed_block
  • Example 2 asf_packet_parse

48
Performance Comparison
Based on PC 1 Xeon 2.8Ghz, 2GB mem, 2 Gigabit
interface
49
Stream Engine Future
  • Put it in hardware
  • TCP-Offloading Engines
  • Special blades in Cat6K switches
  • To be used by a highly popular Internet radio
    station

50
Summary
  • Techniques
  • Content-based WCCP
  • Patent pending
  • TPUT as a top-k algorithm
  • Submitted for publication
  • Stream Engine
  • Published in WCW2003
  • Open research questions
Write a Comment
User Comments (0)
About PowerShow.com