Title: Efficient Top-k Queries in Large-Scale Networks
1Efficient Top-k Queries in Large-Scale Networks
- Pei Cao
- Cisco Systems, Inc.
- Consulting Faculty, Stanford University
2Motivation
- Enterprise content delivery networks (CDNs)
- CE web cache and streaming media cache combined
- Number of branches 50 - 2000
Data Center
Central Manager
56Kbps,128kbps, DSL
Branch Offices
. . .
. . .
CE
CE
CE
3Top-k Queries in CDNs
- Example queries
- Across all CEs, which URLs are accessed most
often? - Across all CEs, which domains consume the most
storage? - Across all CEs, which cached objects produced the
biggest bandwidth savings? - etc.
4Definitions
- a network of m nodes, connected to a central
manager (CM) - each node i has a reverse-sorted list of (
x, Vi(x) ) - an objects sum
- V(x) V1(x)V2(x)Vm(x)
- Problem find the k objects with highest sums
- Goal answer this question with minimum network
traffic - ? A generic problem in distributed systems
5Existing Methods
- Naïve Algorithm
- Each node sends the full list of objects and
their values to the Central Manager - Threshold Algorithm (TA)
- Proposed by multiple groups in the database
research community
6The Threshold Algorithm (TA)
- Example find top 2 objects with max sums in
three columns
Node 1
Node 2
Node 3
Central Manager (CM)
?
(A, 10) (C, 8) (E, 8) (F, 8) (B, 7) (D, 5) (J,
1) (K, 1) . . .
(B, 10) (D, 9) (F, 8) (H, 6) (G, 5) (C, 1) (A,
1) . . .
(C, 10) (A, 9) (G, 8) (J, 7) (F, 6) (D, 4) (B,
1) . . .
T 30 V(A)20, V(C)19, V(B)18
?
T 26 V(A)20, V(C)19,
?
T 24 V(F)22, V(A)20,
?
T 21 V(F)22, V(A)20,
?
T 18 V(F)22, V(A)20,
7Adapting TA for Distributed Environments
- Consists of multiple rounds, each round having
two round trips - Round-trip 1 sorted access CM asks for the
next B objects on the lists and nodes respond - Round-trip 2 random lookup CM sends a list of
object names to nodes and nodes supply values - B k
- Issues
- of rounds unpredictable
- O(m2) network traffic
8New Algorithm Three-Phase Uniform Threshold
(TPUT)
- Motivation terminate in a fixed number of round
trips regardless of input - Operates in three phases
- Lower-bound estimation
- Pruning
- Final lookup
9Partial Sums and Upper Bounds
- Partial sum PS(x) ?Vi(x)
- Upper bound U(x) ?Ui(x)
Vi(x), if x has been reported by node i to CM
Vi(x)
0, otherwise
Vi(x), if x has been reported by node i to CM
Ui(x)
Ti, otherwise
Ti Node i sends all objects with values gt Ti
10Examples
Node 2
Node 3
Node 1
CM
(B, 10) (D, 9) (F, 8) (H, 6) (G, 5) (C, 1) (A,
1) . . .
(C, 10) (A, 9) (G, 8) (J, 7) (F, 6) (D, 4) (B,
1) . . .
PS(A) 10 0 9 19 U(A) 10 9 9
28 PS(B) 0 10 0 10 U(B) 8 10 9
27
(A, 10) (C, 8) (E, 8) (F, 8) (B, 7) (D, 5) (J,
1) . . .
?
For any object O, PS(O) V(O) U(O)
11Steps in TPUT
- Phase 1
- Manager ? Nodes start top-k query
- Nodes ? Manager here are my top-k objects
- Manager
- Calculate partial sums of all objects
- Take the kth partial sum E1 (E1 E) set t
E1/m - Phase 2
- Manager ? Nodes send me all objects with value
t - Nodes ? Manager here they are
- Manager
- Calculate partial sums again take the kth
partial sum E2 (E1 E2 E) - Calculate upper bounds of all objects
- S objects whose upper bounds are E2
- Phase 3
- Manager ? Nodes here is S send me all objects
in S - Nodes ? Manager here they are
12Example
Node 2
Node 3
Node 1
CM
(B, 10) (D, 9) (F, 8) (H, 6) (G, 5) (C, 1) (A,
1) . . .
(C, 10) (A, 9) (G, 8) (J, 7) (F, 6) (D, 4) (B,
1) . . .
(A, 10) (C, 8) (E, 8) (F, 8) (B, 7) (D, 5) (J,
1) . . .
S(F) 22 S(A) 20 S(C) 19 Top 2 objects
are F and A.
13Improving the Pruning Power
- Set t (E1/m) a, where 0ltalt1
. . .
E2/m
t
14Compression via Hashing
- Problem object IDs can be too long
- Solution send hashed keys of object IDs
- Node report to CM (hash(o), V(o))
- If hash(o1)hash(o2), then V max(V(o1), V(o2))
- Candidate set S is a set of hashed keys
- Size of key log(total of objects in all
nodes) - Effect
- Algorithm is still correct
- However, might need an additional round trip
15Evaluating TPUT Algorithm
- Trace-driven simulation
- Optimality analysis
16Trace Data for Simulations
NLANR-10 daily web access from 10 NLANR proxies
Worldcup-30 2-hr logs from 30 WorldCup web servers
DEC-64 split 1-day DEC proxy traces into 64 sub-traces by client IP
DEC-128 split 2-day DEC proxy traces into 128 sub-traces by client IP
NLANR-203 split NLANR traces into 203 sub proxy traces by client IP
Berkeley-512 Split one week UCB traces into 512 sub traces by client IP
17Performance Metrics
- Communication costs
- Unicast-bytes
- Multicast-bytes
- Messages are all compressed by gzip
18Results on Unicast-Bytes
m10
m30
m64
m128
m203
m512
19Number of Objects Looked-Up
Trace K10 TA K10 TPUT/0.5 K100 TA K100 TPUT/0.5
NLANR-10 166 18 1486 176
WorldCup-30 46 12 238 101
DEC-64 3164 31 9817 244
DEC-128 6928 28 26680 250
NLANR-203 5576 28 43954 238
Berkeley-512 47899 41 180550 132
20Results on Multicast-Bytes
m10
m30
m64
m128
m203
m512
21Optimality Analysis
- Main results
- TPUT is instance optimal for data sets with a
log-log slope function C(n) - Zipf distribution C(n) n
- Zipf distribution opt-ratio (m-1)2m km
- Setting alt1 reduces cost qualitatively.
- Zipf distribution opt-ratio (m-1)?O(vm )
km/a
22General Instance Optimality
- Definition
- An algorithm R is instance-optimal with
optimality ratio C1, if exists C2, such that for
any data series D, and any algorithm A, - cost(R, D) C1 cost(A, D) C2
- cost is amount of network traffic
- TA is instance optimal with opt-ratio O(m2)
23Worst Cases for Fixed Number Round-Trip Algorithms
- TPUT is not general instance optimal
- Nor can any algorithm that terminates in a fixed
number of round trips regardless of input
Finding obj with highest sum
Node 1 (A, 1) (C, 1) (X1, 0.6) (X2,
0.6) . . . (Xn, 0.6) (B, 0.5) . .
Node 2 (B, 1) (D, 0.2) . . . . . . . . .
24Log-Log Slope Function
- L(j) is the value at position j in a
reverse-sorted list - The list satisfies log-log slope function C(n),
if, for all jk, L(jC(n)) lt L(j)/n - For Zipf-like distribution L(j) 1/j?, C(n)
n1/?.
List Position 1
. . . .
. Position j . .
. . .
. . Position jC(n) .
. . .
. . .
L(j)
lt L(j)/n
25Properties of the Two Lower Bounds
- Let E be the true bottom
- E1 E/m
- E2 gt E/2
- E2 E1
- E2 gt E E1(m-1)/m
- For any x, V(x) PS(x) lt (m-1)t? V(x) PS(x)
lt (m-1) E1/m? E E2 lt E1 (m-1)/m - E2 gt (m/(2m-1))E
26Restricted Instance Optimality of TPUT (a1)
- Assume D is a collection of m lists all following
log-log slope function C(n), then for any
algorithm A, - cost(TPUT,D) cost(A,D) ((m-1)C(2m) C(m)k)
- Proof assume the optimal algorithm for D stops
at position bi on list i, then L(bi) lt E - The number of objects in S from node i is bi
C(2m) - Each node sends C(m) k objects in round-trip
2
27Effect of alt1
- Property
- If object x appears in n nodes in Phase 2 and
U(x) E2, then its average value in those nodes
R(x) E2 (1-a)/n - Let li the num of objects in S that appear in
exactly i nodes in Phase 2, then - 1l1 2l2 3l3 mlm C(m (1a)/a)
?bi - l1 l2 li C( i (1 a)/(1-a)) ?bi
- Size of S is l1 l2 lm
28Analysis of alt1
- Whats the maximum l1l2 lm under the
following constraints? - 1l12l2 3l3 mlm C(m (1a)/a) B
- l1 C(1ß) B
- l1l2 C(2ß) B
- ...
- l1l2 lm C( m ß) B
- where ß (1a)/(1-a), B ?bi
- Solution maximize l1, l2, , ld, and
set ld1, ld2, , lm to 0 - Li C(i ß) B C((i-1) ß) B
- d C(d ß) B - ?C(i ß) B C(m (1a)/a) B
- Candidate set size S C(d ß) B
29a For Zipf Distributions
- For Zipf distribution, where C(n) n, size of
candidate set S is cvm B - ? Optimality ratio for TPUT with alt1 is (m-1)
c vm m/a k
30TPUT for Hierarchical Networks
Phase 1 Lower-Bound Estimation
Phase 2 Selection by value Pruning
Phase 3 Final lookup
S
tE/m a
. . .
S
t (E/mn) a
. . .
. . .
. . .
31Summary and Future Work
- TPUT should be used for top-k queries in
distributed networks - TPUT is instance-optimal under the log-log slope
function assumption - Introducing alt1 improves performance
significantly - Future work
- Evaluating TPUT for hierarchical and P2P networks
- Distributed algorithms for other aggregate
statistics
32Backup Slides
33Bandwidth Consumption of Threshold Algorithm
Trace Raw Data K10 TA UniCast K10 TA MultiCast K100 TA UniCast K100 TA UniCast
NL-10 26MB 56.3KB 25.9KB 318KB 132KB
WC-30 426KB 31KB 22KB 96KB 80KB
DEC-64 7.4MB 1.7MB 160KB 4.6MB 359KB
DEC-128 15MB 7.2MB 419KB 24.6MB 1.2MB
NL-203 44MB 22MB 1.2MB 143MB 4.2MB
UCB-512 78MB 423MB 16.1MB 1.47GB 31MB
34Bandwidth Consumption of TPUTHash
Trace Raw Data K10 TPUT-H UniCast K10 TPUT-H MultiCast K100 TPUT-H UniCast K100 TPUT-H UniCast
NL-10 26MB 8KB 7KB 52KB 49KB
WC-30 426KB 44KB 38KB 99KB 89KB
DEC-64 7.4MB 64KB 59KB 322KB 300KB
DEC-128 15MB 161KB 150KB 870KB 828KB
NL-203 44MB 154KB 139KB 764KB 687KB
UCB-512 78MB 1.03MB 978KB 15.8MB 15.3MB
35Unicast-Bytes for Top-100 Objects
36Multicast-Bytes for Top-100 Objects
37Varying a
38Fixed-Number Round Trip Algorithms
- Criteria by which a node decides to send objects
- By position
- By name
- By value
- Any fixed-number round trip algorithm must
include a by value operation - Any algorithm, if include by value operation,
wont be instance optimal
39TA Running over Networks
Node 2
Node 3
Node 1
CM
(B, 10) (D, 9) (F, 8) (H, 6) (G, 5) (C, 1) (A,
1) . . .
(C, 10) (A, 9) (G, 8) (J, 7) (F, 6) (D, 4) (B,
1) . . .
T 26 looks up A, B, C, D ? V(A)20, V(C)19
cant stop
(A, 10) (C, 8) (E, 8) (F, 8) (B, 7) (D, 5) (J,
1) . . .
?
T 21 looks up E, F, G, H, J ? V(F)22,
V(A)20 cant stop
?
?
T 10 stop
40TPUT
- Phase 3
- Manager ? Nodes here is S send me all objects
in S - Nodes ? Manager here they are
- Manager calculate sums for objects in S select
the top k objects