Title: Selected Techniques in Content Distribution Networks
1Selected Techniques in Content Distribution
Networks
- Pei Cao
- Cisco Systems, Inc.
2Enterprise WAN Today
Internet
Data Center
T1
Regional Hubs
. . .
56Kbps,128kbps, DSL
Branch Offices
. . .
. . .
3Why Enterprise CDN (ECDN)
- Overcome bandwidth limitations for video
applications to branches - Distribute very-large files to branches
- Cache and police web contents
- Consolidate data storage
- ...
4Components of ECDN
Data Center
Branches
Edge Content Engine (CE)
WAN
. . . . . .
Content Distribution Manager (CDM)
IOS Router with WCCP
Web Servers
Edge CE
5Content Delivery
Internet Or WAN
Internet or Intranet Web Server
CDM Agent
Content Dist. Module
Filtering module
RealNetwork Server
RealNetwork Proxy
HTTP Proxy Server
Windows Media Proxy Server
MPEG streaming server
6Content Distribution
Data Center
Branches
Edge Content Engine (CE)
WAN
. . . . . .
Content Distribution Manager (CDM)
Web Servers
Edge CE
7Challenges in Building CDNs
- Network interoperability
- System scalability
- Content engine performance
- System usability
8Outline
- Protocol highlight
- Content-based WCCP
- Algorithm highlight
- TPUT Scalable Top-k Algorithm
- Kernel mechanism highlight Stream Engine
9Request Interception
- Web Content Caching Protocol (WCCP) on port 80
Internet Or WAN
Cache Miss
ACK
TCP SYN
TCP SYN
ACK
TCP SYN_ACK
10WCCP Bypass
Internet Or WAN
TCP SYN
TCP SYN
TCP SYN
TCP SYN
11Dealing with Client Transparency
Internet Or WAN
TCP SYN
Cache Miss
404 Not Found
TCP SYN
TCP SYN
TCP SYN
GET HTTP/1.1
GET HTTP/1.1
HTTP/1.0 200 OK ltMETA HTTP-EQUIV\REFRESH\
12Content-Based Interception
- Problem how to intercept all HTTP traffic from
client browsers? - Possible solutions
- Send all traffic through content engine (CE)
- Issues with per-packet latency and CE throughput
- Send traffic to CE but CE tells router which
flows to bypass - High overhead for short flows
13Algorithm Highlight
14Top-k Queries in CDNs
- Example queries
- List top 10 URLs accessed most often among all
CEs - List top 10 domains that consume the most storage
among all CEs - etc.
15Definitions
- a network of m nodes, connected to a central
manager (CM) - each node i has a reverse-sorted list of (
x, Vi(x) ) - an objects sum
- V(x) V1(x)V2(x)Vm(x)
- Problem find the k objects with highest sums
- ? A generic problem in distributed systems
16Existing Methods
- Naïve Algorithm
- Each node sends the full list of objects and
their values to the Central Manager - Threshold Algorithm (TA)
- Proposed by multiple groups in the database
research community
17The Threshold Algorithm (TA)
- Example find top 2 objects with max sums in
three columns
Node 1
Node 2
Node 3
Central Manager (CM)
?
(A, 10) (C, 8) (E, 8) (F, 8) (B, 7) (D, 5) (J,
1) (K, 1) . . .
(B, 10) (D, 9) (F, 8) (H, 6) (G, 5) (C, 1) (A,
1) . . .
(C, 10) (A, 9) (G, 8) (J, 7) (F, 6) (D, 4) (B,
1) . . .
T 30 V(A)20, V(C)19, V(B)18
?
T 26 V(A)20, V(C)19,
?
T 24 V(F)22, V(A)20,
?
T 21 V(F)22, V(A)20,
?
T 18 V(F)22, V(A)20,
18Adapting TA for Distributed Environments
- Consists of multiple rounds, each round having
two round trips - Round-trip 1 sorted access CM asks for the
next B objects on the lists and nodes respond - Round-trip 2 random lookup CM sends a list of
object names to nodes and nodes supply values - B k
- Issues
- of rounds unpredictable
- O(m2) network traffic on average
19New Algorithm Three-Phase Uniform Threshold
(TPUT)
- Motivation terminate in a fixed number of round
trips regardless of input - Operates in three phases
- Lower-bound estimation
- Pruning
- Final lookup
20Partial Sums and Upper Bounds
- Partial sum PS(x) ?Vi(x)
- Upper bound U(x) ?Ui(x)
Vi(x), if x has been reported by node i to CM
Vi(x)
0, otherwise
Vi(x), if x has been reported by node i to CM
Ui(x)
Ti, otherwise
Ti Node i sends all objects with values gt Ti
21Examples
Node 2
Node 3
Node 1
CM
(B, 10) (D, 9) (F, 8) (H, 6) (G, 5) (C, 1) (A,
1) . . .
(C, 10) (A, 9) (G, 8) (J, 7) (F, 6) (D, 4) (B,
1) . . .
PS(A) 10 0 9 19 U(A) 10 9 9
28 PS(B) 0 10 0 10 U(B) 8 10 9
27
(A, 10) (C, 8) (E, 8) (F, 8) (B, 7) (D, 5) (J,
1) . . .
?
For any object O, PS(O) V(O) U(O)
22Steps in TPUT
- Phase 1
- Manager gets top k objects from each node
- Manager
- Calculate partial sums of all objects
- Take the kth partial sum E1 (E1 E) set t
E1/m - Phase 2
- Manager gets all objects with value t from each
node - Manager
- Calculate partial sums again take the kth
partial sum E2 (E1 E2 E) - Calculate upper bounds of all objects
- S objects whose upper bounds are E2
- Phase 3
- Manager ? Nodes here is S send me all objects
in S - Nodes ? Manager here they are
23Example
Node 2
Node 3
Node 1
CM
(B, 10) (D, 9) (F, 8) (H, 6) (G, 5) (C, 1) (A,
1) . . .
(C, 10) (A, 9) (G, 8) (J, 7) (F, 6) (D, 4) (B,
1) . . .
(A, 10) (C, 8) (E, 8) (F, 8) (B, 7) (D, 5) (J,
1) . . .
S(F) 22 S(A) 20 S(C) 19 Top 2 objects
are F and A.
24Improving the Pruning Power
- Set t (E1/m) a, where 0ltalt1
. . .
U(o)
E2/m
t
25Compression via Hashing
- Problem reducing traffic in phase 2
- Solution send hashed keys of object IDs
- Node report to CM (hash(o), V(o))
- Hashed keys are short
- If hash(o1)hash(o2), then V max(V(o1), V(o2))
- Candidate set S is a set of hashed keys
26Evaluating TPUT Algorithm
- Trace-driven simulation
- Optimality analysis
27Trace Data for Simulations
28Results on Unicast-Bytes
m10
m30
m64
m128
m203
m512
29Number of Objects Looked-Up
30Results on Multicast-Bytes
m10
m30
m64
m128
m203
m512
31Optimality Analysis
- Main results
- TPUT is instance optimal for data sets with a
log-log slope function C(n) - Zipf distribution C(n) n
- Zipf distribution opt-ratio (m-1)2m km
- Setting alt1 reduces cost qualitatively.
- Zipf distribution opt-ratio (m-1)?O(vm)
km/a
32General Instance Optimality
- Definition
- An algorithm R is instance-optimal with
optimality ratio C1, if exists C2, such that for
any data series D, and any algorithm A, - cost(R, D) C1 cost(A, D) C2
- cost is amount of network traffic
- TA is instance optimal with opt-ratio O(m2)
33Worst Cases for Fixed Number Round-Trip Algorithms
- TPUT is not general instance optimal
- Nor can any algorithm that terminates in a fixed
number of round trips
Finding obj with highest sum
Node 1 (A, 1) (C, 1) (X1, 0.6) (X2,
0.6) . . . (Xn, 0.6) (B, 0.5) . .
Node 2 (B, 1) (D, 0.2) . . . . . . . . .
34Log-Log Slope Function C(n)
- L(j) is the value at position j in a
reverse-sorted list - The list satisfies log-log slope function C(n),
if, for all jk, L(jC(n)) lt L(j)/n - For Zipf-like distribution L(j) 1/j?, C(n)
n1/?.
List Position 1
. . . .
. Position j . .
. . .
. . Position jC(n) .
. . .
. . .
L(j)
lt L(j)/n
35Properties of the Two Lower Bounds
- Let E be the true bottom
- E1 E/m
- E2 gt E/2
- E2 E1
- E2 gt E E1(m-1)/m
- E2 gt (m/(2m-1))E
36Restricted Instance Optimality of TPUT (a1)
- Assume D is a collection of m lists all following
log-log slope function C(n), then for any
algorithm A, - cost(TPUT,D) cost(A,D) ((m-1)C(2m)C(m)k)
37Effect of alt1
- Property
- If object x appears in n nodes in Phase 2 and
U(x) E2, then its average value in those nodes
R(x) E2 (1-a)/n - Let li the num of objects in S that appear in
exactly i nodes in Phase 2, then - 1l1 2l2 3l3 mlm C(m (1a)/a)
?bi - For each i, l1 l2 li C( i (1
a)/(1-a)) ?bi - Size of S is l1 l2 lm
38Effect of alt1 (Cont)
- Opt-ratio (m-1) C(dß) mk/a, where d is
- d C(dß) - ?C(i ß) C(m (1a)/a)
- For Zipf distribution, TPUT w. alt1 has
opt-ratio (m-1) c vm
mk/a
d
i1
39Top-k Query Calculation in CDNs
- of objects small ? naïve alg.
- of objects large ? TPUT w. alt1
- Optimal a depends on of nodes
- Limit max of objects sent in phase 2
- TPUT extends to hierarchical networks easily
40Kernel Mechanism Highlight
41Building High Performance Internet Streaming
Server
- Basic characteristics of streaming protocols
- Control channel (TCP)
- Start/Stop, FF/Rew, Seek, Change bit rate
- Data channel (UDP or TCP)
- Paced sending of streaming data
- What makes Linux inefficient
- Data copies
- Context switches
42Observations on Per Stream Flow
43Observations on Per Stream Flow
lt 1 runtime gt 98 code
gt 99 runtime lt 2 code
44Where Stream Engine Fits
45Streaming File and Data Packets
packet1
packet2
packetn
indices
. . .
file header
ts
SubBlock1
SubBlock2
Padding
Sending time
TCP header
46Stream Engine
- In-kernel event driven module to deliver
streaming data - Similar to sendfile() but has streaming logic
- Method to assemble data packet
- Timed send
- Control channel monitoring
47Stream Engine Interface
- client_data_fd
- client_control_fd
- source_fd offset
- packet_timing_and_assembly
- Example 1 fixed_rate_fixed_block
- Example 2 asf_packet_parse
48Performance Comparison
Based on PC 1 Xeon 2.8Ghz, 2GB mem, 2 Gigabit
interface
49Stream Engine Future
- Put it in hardware
- TCP-Offloading Engines
- Special blades in Cat6K switches
- To be used by a highly popular Internet radio
station
50Summary
- Techniques
- Content-based WCCP
- Patent pending
- TPUT as a top-k algorithm
- Submitted for publication
- Stream Engine
- Published in WCW2003
- Open research questions