Title: Efficient Top-k Queries in Large-Scale Networks
1Efficient Top-k Queries in Large-Scale Networks
- Pei Cao
- Cisco Systems, Inc.
- Consulting Faculty, Stanford University
2Motivation
- Enterprise content delivery networks (CDNs)
- CE web cache and streaming media cache combined
- Number of branches 50 - 2000
Data Center
Central Manager
56Kbps,128kbps, DSL
Branch Offices
. . .
. . .
CE
CE
CE
3Top-k Queries in CDNs
- Example queries
- Across all CEs, which URLs are accessed most
often? - Across all CEs, which domains consume the most
storage? - Across all CEs, which cached objects produced the
biggest bandwidth savings? - etc.
4Definitions
- a network of m nodes, connected to a central
manager (CM) - each node i has a reverse-sorted list of (
x, Vi(x) ) - an objects sum
- V(x) V1(x)V2(x)Vm(x)
- Problem find the k objects with highest sums
- Goal answer this question with minimum network
traffic - ? A generic problem in distributed systems
5Existing Methods
- Each node sends the full list of objects and
their values to the Central Manager - Pro simple to implement works fine when the
number of objects is small - Con when the number of objects is large,
consumes too much network bandwidth - Use the threshold algorithm (TA)
- Proposed by multiple groups in the database
research community
6The Threshold Algorithm (TA)
- Example find top 2 objects with max sums in
three columns
Node 1
Node 2
Node 3
Central Manager (CM)
?
(A, 10) (C, 8) (E, 8) (F, 8) (B, 7) (D, 5) (J,
1) (K, 1) . . .
(B, 10) (D, 9) (F, 8) (H, 6) (G, 5) (C, 1) (A,
1) . . .
(C, 10) (A, 9) (G, 8) (J, 7) (F, 6) (D, 4) (B,
1) . . .
T 30 V(A)20, V(C)19, V(B)18
?
T 26 V(A)20, V(C)19,
?
T 24 V(F)22, V(A)20,
?
T 21 V(F)22, V(A)20,
?
T 18 V(F)22, V(A)20,
7Adapting TA for Distributed Environments
- Consists of multiple rounds
- Each round has two round trips
- Round-trip 1 sorted access CM asks for the
next B objects on the lists and nodes respond - Round-trip 2 random lookup CM sends a list of
object names to nodes and nodes supply values - B k
8TA Running over Networks
Node 2
Node 3
Node 1
CM
(B, 10) (D, 9) (F, 8) (H, 6) (G, 5) (C, 1) (A,
1) . . .
(C, 10) (A, 9) (G, 8) (J, 7) (F, 6) (D, 4) (B,
1) . . .
T 26 looks up A, B, C, D ? V(A)20, V(C)19
cant stop
(A, 10) (C, 8) (E, 8) (F, 8) (B, 7) (D, 5) (J,
1) . . .
?
T 21 looks up E, F, G, H, J ? V(F)22,
V(A)20 cant stop
?
?
T 10 stop
9Problems with TA in Large Networks
- Num of round-trips required vary by data input
- High bandwidth consumption when number of nodes
is large - In round trip 2, the list of random-lookup
objects are the union of all objects sent by m
nodes in round trip 1 - In round trip 2, the list goes to all m nodes
10New Algorithm Two-Phase Uniform Threshold (TPUT)
- Motivation algorithm should terminate in a fixed
(and small) number of round trips - Operates in two phases
- Phase 1 get a lower-bound estimate on the bottom
value in the top-k set (i.e. the true bottom,
denoted as E) - Phase 2 all nodes send objects who sums are
potentially higher than the lower bound CM
aggregates the info, refines the estimate,
determines the candidates, and looks up
candidates in all nodes
11Partial Sums and Upper Bounds
- Partial sum PS(x) ?Vi(x)
- Upper bound U(x) ?Ui(x)
Vi(x), if x has been reported by node i to CM
Vi(x)
0, otherwise
Vi(x), if x has been reported by node i to CM
Ui(x)
Li, otherwise
Li is the lowest value that node I has reported
to CM
12Examples
Node 2
Node 3
Node 1
CM
(B, 10) (D, 9) (F, 8) (H, 6) (G, 5) (C, 1) (A,
1) . . .
(C, 10) (A, 9) (G, 8) (J, 7) (F, 6) (D, 4) (B,
1) . . .
PS(A) 10 0 9 19 U(A) 10 9 9
28 PS(B) 0 10 0 10 U(B) 8 10 9
27
(A, 10) (C, 8) (E, 8) (F, 8) (B, 7) (D, 5) (J,
1) . . .
?
For any object O, PS(O) V(O) U(O)
13Steps in TPUT
- Round-trip 1
- Manager ? Nodes start top-k query
- Nodes ? Manager here are my top-k objects
- Manager
- Calculate partial sums of all objects and sort
them - Take the kth value, call it E1 E1 E
- set t E1/m
- Round-trip 2
- Manager ? Nodes send me all objects with value
t - Nodes ? Manager here they are
- Manager
- Calculate partial sums of all objects and sort
them take the kth value, call it E2 E1 E2
E - For each object, calculate its upper bound
select those objects whose upper bounds are E2
call the set S
14TPUT
- Round-trip 3
- Manager ? Nodes here is S send me all objects
in S - Nodes ? Manager here they are
- Manager calculate sums for objects in S select
the top k objects
15Example
Node 2
Node 3
Node 1
CM
(B, 10) (D, 9) (F, 8) (H, 6) (G, 5) (C, 1) (A,
1) . . .
(C, 10) (A, 9) (G, 8) (J, 7) (F, 6) (D, 4) (B,
1) . . .
PS(A) 19 PS(C) 18 ? E1 18 t 6
(A, 10) (C, 8) (E, 8) (F, 8) (B, 7) (D, 5) (J,
1) . . .
?
PS(F) 22 PS(A) 19 ? E2 19 U(H) 18, U(J)
19 ? H and J are out! S (A, B, C, D, E, F, G)
?
S(F) 22 S(A) 20 S(C) 19 Top 2 objects
are F and A.
16Improving the Pruning Power
- Observation if E2 E1, then no object can be
pruned away - Solution set t (E1/m) a, where 0ltalt1
- Effect
- Increases traffic in round-trip 2
- But shrinks the set of candidates, hence reduce
traffic in round-trip 3 - Optimal alpha depends on data set
- Default alpha 0.5
17Compression via Hashing
- Problem object IDs can be too long
- Solution send hashed keys of object IDs
- Node report to CM (hash(o), V(o))
- If hash(o1)hash(o2), then V max(V(o1), V(o2))
- Candidate set S is a set of hashed keys
- Size of key log( of objects in all nodes)
- Effect
- maintains correctness of pruning by upper bounds
- However, might need an additional round trip
18Evaluating TPUT Algorithm
- Trace-driven simulation
- Optimality analysis
19Trace Data for Simulations
- NLANR-10 daily web access from 10 NLANR proxies
- Worldcup-30 2-hr web accesses from 30 servers
hosting 1997 WorldCup - DEC-64 split one-day DEC traces into 64
sub-traces by client IP - Simulating an enterprise with 64 branch offices
- DEC-128 split two-day DEC traces into 128
sub-traces by client IP - Simulating an enterprise with 128 branch offices
- NLANR-208 split NLANR traces into 208 sub proxy
traces by client IP - Simulating an enterprise CDN of 208 nodes
- Berkeley-512 split one week UCB traces into 512
sub traces - Simulating a 512 branch office with 16 people per
branch
20Performance Metrics
- Communication costs
- Messages are always compressed by gzip
- Unicast-bytes assuming CM communicates with
nodes via uni-cast - Multicast-bytes assuming CM broadcasts to nodes
21Results on Unicast-Bytes
22Results on Multicast-Bytes
23Optimality Analysis
- Main results
- TPUT is instance optimal for data sets following
a log-log slope function. - Zipf distribution is a special case.
- Zipf distribution opt-ratio (m-1)2m km
- Setting alt1 reduces cost qualitatively.
- Zipf distribution ratio (m-1)?O(vm )
k?m/ a
24General Instance Optimality
- Definition
- An algorithm T is instance-optimal with
optimality ratio C1, if exists C2, such that for
any data series D, and any algorithm A, - cost(T, D) C1 cost(A, D) C2
- cost is amount of network traffic
- Threshold Algorithm is instance optimal with
opt-ratio O(m2)
25Worst Cases for Fixed-Number Round Trip Algorithms
- TPUT is not general instance optimal
- Nor can any algorithm that terminates in a fixed
number of round trips regardless of input
Finding obj with highest sum
Node 1 (A, 1) (C, 1) (X1, 0.6) (X2,
0.6) . . . (Xn, 0.6) (B, 0.5) . .
Node 2 (B, 1) (D, 0.2) . . . . . . . . .
26Log-Log Slope Function
- L(j) is the value at position j in a
reverse-sorted list - The list satisfies log-log slope function C(n),
if, for all jk, L(jC(n)) lt L(j)/n - For Zipf-like distribution L(j) 1/j?, C(n)
n1/?.
List Position 1
. . . .
. Position j . .
. . .
. . Position jC(n) .
. . .
. . .
L(j)
lt L(j)/n
27Properties of the Two Lower Bounds
- E1 E/m, where E is the true bottom
- E2 gt E/2
- E2 E1
- For any x, V(x) PS(x) lt (m-1)t? V(x) PS(x)
lt (m-1) E1/m? E E2 lt E1 (m-1)/m ? E2 gt E
E1(m-1)/m - E2 gt (m/(2m-1))E
- Consequently
- Since L(k) E1 in every node, each node sends at
most kC(m) to manager in round trip 2 - A candidate in round trip 3 has average value
RgtE/2m
28Restricted Instance Optimality of TPUT (a1)
- Assume D is a collection of m lists all following
log-log slope function C(n), then for any
algorithm A, - cost(TPUT,D) cost(A,D) ((m-1)C(2m) C(m)k)
- Proof assume the optimal algorithm for D stops
at position bi on list i, then L(bi) lt E? the
number of candidates in round-trip 3 is
bi C(2m)
29Effect of alt1
- Intuition
- if an object appears in few nodes and still
makes the cut, then its average value must be
high - if an object has a small value and makes the
cut, then it must appear in many nodes - Let li be the num of objects that appear in
exactly i nodes from round-trip 2, then - 1l1 2l2 3l3 mlm C(m (1a)/a)
?bi - For each i, If an object appears in less than i
nodes and still makes the cut, then its average
value R E2 (1-a)/I? l1 l2 li C( i
(1 a)/(1-a)) ?bi - Size of candidate set is l1 l2 lm
30Analysis of alt1
- Whats the maximum l1l2 lm under the
following constraints? - 1l12l2 3l3 mlm C(m (1a)/a) B
- l1 C(1ß) B
- l1l2 C(2ß) B
- ...
- l1l2 lm C( m ß) B
- where ß (1a)/(1-a), B ?bi
- Solution maximize l1, l2, , ld, and
set ld1, ld2, , lm to 0 - Lj C(i ß) B C((i-1) ß) B
- d C(d ß) B - ?C(i ß) B C(m (1a)/a) B
- Candidate set size S C(d ß) B
31a For Zipf Distributions
- For Zipf distribution, where C(n) n, size of
candidate set is O(vm) ?bi - ? Optimality ratio for TPUT with alt1 is (m-1)
c vm mk - Optimal a depends on m, but should gt 1/3
default 0.5
32Summary and Open Questions
- TPUT algorithm works well for top-k queries in
distributed networks - Introducing a0.5 improve performance
significantly - TPUT is instance-optimal under log-log slope
function assumption - Easy to extend the algorithm to hierarchical
networks - Open question
- Is TPUT instance optimal compared with all fixed
round trip algorithms over all data sets?
33Performance of Threshold Algorithm
Trace Raw Data K10 TA UniCast K10 TA MultiCast K100 TA UniCast K100 TA UniCast
NL-10 26MB
WC-30
DEC-64
DEC-128
NL-208
UCB-512
34Unicast-Bytes for Top-100 Objects
35Multicast-Bytes for Top-100 Objects
36Fixed-Number Round Trip Algorithms
- Criteria by which a node decides to send objects
- By position
- By name
- By value
- Any fixed-number round trip algorithm must
include a by value operation - Any algorithm, if include by value operation,
wont be instance optimal
37Why Uniform Threshold?
Node 2
Node 3
Node 1
CM
(B, 10) (D, 9)
(C, 10) (A, 9)
T 8 9 9 26 E1 18 ? Could set a per-node
ti Li E1/T
(A, 10) (C, 8)
?
- Benefit of uniform threshold E2 gt E/2, where E
is the true bottom - E2 E1
- E2 E (m-1)/m E1
- because V(x)-PS(x) lt (m-1)t for all x
- ? E2 (m/(2m-1)) E
38(No Transcript)