Title: Energy Efficient Exact kNN Search in Wireless Broadcast Environments
1Energy Efficient Exact kNN Search in Wireless
Broadcast Environments
- Bugra Gedik, Aameek Singh, Ling Liu
- College of Computing
- Georgia Institute of Technology
2Motivation
- There is strong interest in location based
services (LBSs) - Popularity of mobile wireless communications
- Emergence of positioning technologies like GPS
- LBSs provide mobile users with information that
is specialized to their location, ex - the nearest gas station or the five closest
restaurants - LBSs are accessed through a common wireless
channel, which connects the users to the service
provider - Limited network bandwidth -gt Wireless
broadcast - Power constrained mobile devices -gt Energy
efficient search - An interesting problem in this domain is,
- Investigation of indexing and searching
mechanisms for energy efficient querying of
location dependent data in wireless broadcast
environments
3Problem Definition
- kNN Search on the Air problem
- Broadcasting location dependent data together
with a spatial index on the wireless medium and
searching this broadcast on mobile client devices
in order to answer k nearest neighbor (kNN)
queries in an energy efficient way - Traditional mechanisms do not work well
- Access to the medium is sequential
- Methods based on approximate results 30
- How about exact search?
4Some Example Applications
- Commercial Setting
- The server can broadcast locations of gas
stations - A car driver can pose a query like Give me the
positions of the 3 nearest gas stations - Military Setting
- The server can build a spatial index on positions
of military units and broadcast it to the
battlefield (possibly encrypted) - Then the mobile devices can tune in and process
the broadcast to answer spatial queries - One such query can be posed by a tank as Give
me the positions and names of the 10 nearest
friendly units - kNN search on low dimensional spaces for
sequential access mediums
5Background Air Indexing
- Air indexes
- In wireless broadcast, it is crucial that energy
is conserved on the mobile unit side when
answering queries on the broadcasted data. - To alleviate the vast energy consumption problem
of searching un-indexed data, air indexes were
introduced. - Access Time latency measure
- the time difference between the point at which a
query is posed and the point at which result of
the query is fully computed - Tune-in Time energy consumption measure
- the total time during which the mobile unit was
listening to data from the wireless medium ( of
packets read) - Air indexing strives to decrease tune-in time
while keeping the increase in access latency due
to the broadcast of extra index information
minimal.
6Background R-trees
- R-trees are spatial index structures widely used
to index n-dimensional points or rectangles - Practical for secondary storage, nodes correspond
to disk blocks (packets in wireless broadcast) - R-trees can be thought of as the multidimensional
version of B-trees
7Background R-trees (kNN search)
- A branch and bound algorithm
- Uses a heuristic to select branches to follow
- The need for backtracking makes the algorithm ill
suited for wireless medium lt- sequential access
nature - Metrics used in the algorithm
- Given a point P and the mbr of a node N,
- MINDIST(P,N) is the minimum distance from P to
Ns mbr - MAXDIST(P,N) is the maximum distance from P to
Ns mbr - MINMAXDIST(P,N) is the maximum possible minimum
distance between P and the mbr of closest object
residing in Ns mbr
8Background R-trees (kNN search)
- ItemQueue is a priority queue that stores nodes
to explore. It is sorted on the MINDIST measure
and does not have a predefined size. - ResultQueue is a list of size maximum k, that
stores the best k nodes seen so far. It is sorted
on the MINMAXDIST measure. - kthdist MINMAXDIST value of the kth item in
ResultQueue, infinity if less than k items are
present. It is an upper bound on the distance of
the kth nearest neighbor.
kthdist
ItemQueue
ResultQueue
?
(R0,0)
(R1,0) (R2,a)
(R2,b) (R1,c)
c
k2
9Background R-trees (kNN search)
- ItemQueue is a priority queue that stores nodes
to explore. It is sorted on the MINDIST measure
and does not have a predefined size. - ResultQueue is a list of size maximum k, that
stores the best k nodes seen so far. It is sorted
on the MINMAXDIST measure. - kthdist MINMAXDIST value of the kth item in
ResultQueue, infinity if less than k items are
present. It is an upper bound on the distance of
the kth nearest neighbor.
kthdist
ItemQueue
ResultQueue
?
(R0,0)
(R1,0) (R2,a)
(R2,b) (R1,c)
c
(R4,d) (R3,f) (R5,h) (R2,a)
(R3,g) (R2,b)
b
k2
10Background R-trees (kNN search)
- ItemQueue is a priority queue that stores nodes
to explore. It is sorted on the MINDIST measure
and does not have a predefined size. - ResultQueue is a list of size maximum k, that
stores the best k nodes seen so far. It is sorted
on the MINMAXDIST measure. - kthdist MINMAXDIST value of the kth item in
ResultQueue, infinity if less than k items are
present. It is an upper bound on the distance of
the kth nearest neighbor.
kthdist
ItemQueue
ResultQueue
?
(R0,0)
(R1,0) (R2,a)
(R2,b) (R1,c)
c
(R4,d) (R3,f) (R5,h) (R2,a)
(R3,g) (R2,b)
b
(R3,f) (R5,h) (R2,a)
(p13,d13) (R3,g)
g
p11, p12 are pruned since d11, d12 are both gt b
k2
11Background R-trees (kNN search)
- ItemQueue is a priority queue that stores nodes
to explore. It is sorted on the MINDIST measure
and does not have a predefined size. - ResultQueue is a list of size maximum k, that
stores the best k nodes seen so far. It is sorted
on the MINMAXDIST measure. - kthdist MINMAXDIST value of the kth item in
ResultQueue, infinity if less than k items are
present. It is an upper bound on the distance of
the kth nearest neighbor.
kthdist
ItemQueue
ResultQueue
?
(R0,0)
(R1,0) (R2,a)
(R2,b) (R1,c)
c
(R4,d) (R3,f) (R5,h) (R2,a)
(R3,g) (R2,b)
b
(R3,f) (R5,h) (R2,a)
(p13,d13) (R3,g)
g
(R5,h) (R2,a)
(p13,d13) (p1,d1)
d1
Result p13, p1
k2
12Adapting the Algorithm for Wireless Medium
(w-conv alg)
- It is not possible to sort ItemQueue on the
MINDIST measure - MINDIST ordering of tree nodes is not consistent
with their order of appearance in the broadcast - Reading a node from the medium based on the
topmost item in the MINDIST sorted ItemQueue may
result in leaving behind other tree nodes that
have entries in ItemQueue - As a result, the items in ItemQueue have to be
sorted based on their appearance order on the
medium - We cannot immediately stop the search when the
ResultQueue consists of objects only - It is not guaranteed that the rest of the items
in ItemQueue cannot generate an object closer
than the current k - Because the queue is no more sorted on MINDIST
measure - As a result, the search halts when the ItemQueue
becomes empty
13Optimizing the Algorithm for Wireless Medium
(w-opt alg)
- After the root node is processed
- ItemQueue u, v, w
- ResultQueue (v,5), (w,14)
- kthdist 14
- Node v cannot be pruned
- In fact, in this example it is possible to prune
it
- Knowledge
- If the minimum fanout of the tree is f, then we
know that there are at least f l-1 objects under
node ws mbr
14Optimizing the Algorithm for Wireless Medium
(w-opt alg)
- Given this knowledge, at the time when we add w
to ItemQueue - There is one object at most 5 away from the query
point - There are at least f l-11 ? 1 objects at most 10
away from the query point - Then kthdist 10 after the root node is
processed - Now v can be pruned
- New Rule (w-opt algorithm)
- While adding a node, say node N at level i, with
its MINMAXDIST measure to ResultQueue, we also
insert f i -1 additional entries with the MAXDIST
measure of node N. ResultQueue is sorted on the
associated measures of the entries
15Does Serialization Order Matter?
- The way a spatial index is organized on the
broadcast medium - of index nodes read by the kNN search, tune in
time - Previous work on range and kNN search in
broadcast environments 28, 10, 30, 15 used a
depth first search (dfs) order serialization of
the tree, - conventional algorithm is based on dfs guided by
a heuristic
Proof In the paper.
16Does Serialization Order Matter?
- Result circle The circle formed around the query
point using its distance from the kth nearest
neighbor - A wrong decision causes high cost in terms of
tune-in time, as we are trapped in a branch - BFS serialization does not share the same
problem, but it may result in having a large
ItemQueue size
result circle
17Histograms for Better Pruning? (w-hist alg)
- A simple grid-like histogram can be used to
obtain an upper bound on size of the result
circle of a kNN query - Given a query point P and any set of k objects,
the distance between P and the furthest object in
the given set of objects from P, is larger than
the radius of the result circle - As a result, any circle centered at point P that
covers some set histogram cells such that the sum
of number of objects located under these
histogram cells is larger than k, covers the
result circle.
- Pick the closest non-empty cells to the query
point, such that the total number of objects
contained in these cells is at least k - Set r as the maximum of MAXDISTs of all cells
that are picked - Circle centered at P with radius r is called
the pruning circle (PC)
18Histograms for Proving Bounds? (w-alg)
- New Rule (w-hist algorithm)
- When a new node is to be added to ItemQueue, it
can be discarded if its mbr does not intersect
with the pruning circle - The histogram should not be too large, otherwise
the gain from pruning cannot compensate the cost
of reading the histogram - In fact we have a formula for an upper bound on
the tune-in time (for uniformly distributed
data) - Histogram cell size cannot be taken smaller than
the value that maximizes the above equation
19Histograms for Proving Bounds?
- Histograms help decrease the max. length of the
ItemQueue through pruning - They may not always decrease the tune-in time,
especially when the distribution is skewed - As a result, the tradeoffs require further study
20Non-spatial Predicates
- Non-spatial attributes that may need to be taken
into account when answering queries, ex type of
an object - We consider two techniques to answer kNN queries
that may specify an optional type constraint - The t-index Method
- Use t separate indexes each indexing only its
associated type - Keep a lookup structure that has pointers for
each type - Really efficient for queries with type constrains
- Costly for other queries, needs to lookup all
indexes - Improve the performance of queries without type
constraints, the order of the indexes can be
selected such that an index corresponding to type
i comes before the one corresponding to j on the
broadcast medium if ni gt nj .
21Non-spatial Predicates
- The t-hist/1-index Method
- Use a single spatial index which indexes all
objects - Include t histograms, each one built for a
particular type - Keep a lookup structure that has pointers for
each type - Let the leaf nodes of the tree also mark the type
of each object - Queries with type constraints can be processed by
only using the pruning circle derived from the
associated histogram of the given type - Really efficient for queries without type
constraints - To improve the performance of queries with type
constraints, we can prune a node whose mbr does
not intersect with a non-empty cell of the
histogram of the associated type - We name this latter optimization as
t-hist/1-index/hp method - Note that it is only fair to compare them when
the total index size (together with histograms)
occupied by the two methods is same
22Tune-in Time and Queue Size
- tune-in time
- with dfs organization
- w-opt 33 better than w-conv
- with bfs organization
- w-opt 30 better than w-conv
- bfs organization provides up to 55 improvement
over dfs organization - queue size
- bfs organization has larger memory requirement
than dfs - w-opt decreases the memory requirement for both
organizations -
- tune-in time
- after k 4 w-opt starts providing significant
improvement over w-conv - after k 8 w-opt performs better than w-conv
independent of the index serialization order - queue size
- memory requirement of bfs organization grows fast
with increasing k when w-conv is used and
significantly drops when w-opt is used.
23Scalability w.r.t. Number of Objects
- Similar trends are observed
- w-opt search and bfs organization prevailing over
other configurations. - However, comparing Figure 9 and Figure 7 reveals
that the improvement in tune-in time increases
with increasing k -
- In fact one can prove that for NN search (k 1)
w-opt reduces to w-conv.
24Merits and Demerits of Histograms
- Figure 10
- a single packet histogram improves the tune-in
time - for this setup, histograms with larger sizes do
not provide better tune-in times - the effect of using a histogram is more prominent
with the dfs organization - with histograms the bfs and dfs organizations are
effectively the same - Figure 11
- with skewed data the latter observation no more
holds bfs organization with w-opt search
outperforms all alternatives - this does not indicate that histograms are
useless for skewed data sets - a small histogram significantly (more than 50)
decreases the queue size increases the tune-in
time marginally (around 5)
25More on Histograms andSearch with Non-spatial
Predicates
- Figure 12
- for larger number of objects it is better to use
larger histograms - dfs organization is more sensitive to the
increase in the number of objects - Figure 15
- t-index performs much better for queries with
type constraints - t-hist/1-index/hp method (powered by the
additional histogram pruning), achieves
significant improvement in tune-in time,
especially for queries having constraints on
infrequent types - Figure 16
- t-index performs poorly for queries without type
constraints - Although not outperforming t-index for type
constrained queries, t-hist/1-index/hp strikes a
good balance between two types of queries
26Conclusions
- The introduced w-opt search technique
significantly decreases tune-in time,
irrespective of how the index is organized (bfs
or dfs)
- Organizing the index in bfs manner provides
considerably better tune-in time but a higher
memory requirement due to queue size
- Using histograms can improve tune-in time,
although the improvement is marginal for bfs
organization as opposed dfs organization
- The use of histograms can be extended to support
answering kNN queries with type constraints.
27The End !