Energy Efficient Exact kNN Search in Wireless Broadcast Environments - PowerPoint PPT Presentation

1 / 27

About This Presentation

Title:

Energy Efficient Exact kNN Search in Wireless Broadcast Environments

Description:

p11, p12 are pruned since d11, d12 are both b. Background R-trees (kNN search) ... Now v can be pruned. New Rule (w-opt algorithm) ... – PowerPoint PPT presentation

Number of Views:44

Avg rating:3.0/5.0

Slides: 28

Provided by: loca297

Category:

more less

Transcript and Presenter's Notes

Title: Energy Efficient Exact kNN Search in Wireless Broadcast Environments

1
Energy Efficient Exact kNN Search in Wireless
Broadcast Environments

Bugra Gedik, Aameek Singh, Ling Liu
College of Computing
Georgia Institute of Technology

2
Motivation

There is strong interest in location based
services (LBSs)
Popularity of mobile wireless communications
Emergence of positioning technologies like GPS
LBSs provide mobile users with information that
is specialized to their location, ex
the nearest gas station or the five closest
restaurants
LBSs are accessed through a common wireless
channel, which connects the users to the service
provider
Limited network bandwidth -gt Wireless
broadcast
Power constrained mobile devices -gt Energy
efficient search
An interesting problem in this domain is,
Investigation of indexing and searching
mechanisms for energy efficient querying of
location dependent data in wireless broadcast
environments

3
Problem Definition

kNN Search on the Air problem
Broadcasting location dependent data together
with a spatial index on the wireless medium and
searching this broadcast on mobile client devices
in order to answer k nearest neighbor (kNN)
queries in an energy efficient way
Traditional mechanisms do not work well
Access to the medium is sequential
Methods based on approximate results 30
How about exact search?

4
Some Example Applications

Commercial Setting
The server can broadcast locations of gas
stations
A car driver can pose a query like Give me the
positions of the 3 nearest gas stations
Military Setting
The server can build a spatial index on positions
of military units and broadcast it to the
battlefield (possibly encrypted)
Then the mobile devices can tune in and process
the broadcast to answer spatial queries
One such query can be posed by a tank as Give
me the positions and names of the 10 nearest
friendly units
kNN search on low dimensional spaces for
sequential access mediums

5
Background Air Indexing

Air indexes
In wireless broadcast, it is crucial that energy
is conserved on the mobile unit side when
answering queries on the broadcasted data.
To alleviate the vast energy consumption problem
of searching un-indexed data, air indexes were
introduced.
Access Time latency measure
the time difference between the point at which a
query is posed and the point at which result of
the query is fully computed
Tune-in Time energy consumption measure
the total time during which the mobile unit was
listening to data from the wireless medium ( of
packets read)
Air indexing strives to decrease tune-in time
while keeping the increase in access latency due
to the broadcast of extra index information
minimal.

6
Background R-trees

R-trees are spatial index structures widely used
to index n-dimensional points or rectangles
Practical for secondary storage, nodes correspond
to disk blocks (packets in wireless broadcast)
R-trees can be thought of as the multidimensional
version of B-trees

7
Background R-trees (kNN search)

A branch and bound algorithm
Uses a heuristic to select branches to follow
The need for backtracking makes the algorithm ill
suited for wireless medium lt- sequential access
nature
Metrics used in the algorithm
Given a point P and the mbr of a node N,
MINDIST(P,N) is the minimum distance from P to
Ns mbr
MAXDIST(P,N) is the maximum distance from P to
Ns mbr
MINMAXDIST(P,N) is the maximum possible minimum
distance between P and the mbr of closest object
residing in Ns mbr

8
Background R-trees (kNN search)

ItemQueue is a priority queue that stores nodes
to explore. It is sorted on the MINDIST measure
and does not have a predefined size.
ResultQueue is a list of size maximum k, that
stores the best k nodes seen so far. It is sorted
on the MINMAXDIST measure.
kthdist MINMAXDIST value of the kth item in
ResultQueue, infinity if less than k items are
present. It is an upper bound on the distance of
the kth nearest neighbor.

kthdist
ItemQueue
ResultQueue
?
(R0,0)
(R1,0) (R2,a)
(R2,b) (R1,c)
c
k2
9
Background R-trees (kNN search)

ItemQueue is a priority queue that stores nodes
to explore. It is sorted on the MINDIST measure
and does not have a predefined size.
ResultQueue is a list of size maximum k, that
stores the best k nodes seen so far. It is sorted
on the MINMAXDIST measure.
kthdist MINMAXDIST value of the kth item in
ResultQueue, infinity if less than k items are
present. It is an upper bound on the distance of
the kth nearest neighbor.

kthdist
ItemQueue
ResultQueue
?
(R0,0)
(R1,0) (R2,a)
(R2,b) (R1,c)
c
(R4,d) (R3,f) (R5,h) (R2,a)
(R3,g) (R2,b)
b
k2
10
Background R-trees (kNN search)

ItemQueue is a priority queue that stores nodes
to explore. It is sorted on the MINDIST measure
and does not have a predefined size.
ResultQueue is a list of size maximum k, that
stores the best k nodes seen so far. It is sorted
on the MINMAXDIST measure.
kthdist MINMAXDIST value of the kth item in
ResultQueue, infinity if less than k items are
present. It is an upper bound on the distance of
the kth nearest neighbor.

kthdist
ItemQueue
ResultQueue
?
(R0,0)
(R1,0) (R2,a)
(R2,b) (R1,c)
c
(R4,d) (R3,f) (R5,h) (R2,a)
(R3,g) (R2,b)
b
(R3,f) (R5,h) (R2,a)
(p13,d13) (R3,g)
g
p11, p12 are pruned since d11, d12 are both gt b
k2
11
Background R-trees (kNN search)

ItemQueue is a priority queue that stores nodes
to explore. It is sorted on the MINDIST measure
and does not have a predefined size.
ResultQueue is a list of size maximum k, that
stores the best k nodes seen so far. It is sorted
on the MINMAXDIST measure.
kthdist MINMAXDIST value of the kth item in
ResultQueue, infinity if less than k items are
present. It is an upper bound on the distance of
the kth nearest neighbor.

kthdist
ItemQueue
ResultQueue
?
(R0,0)
(R1,0) (R2,a)
(R2,b) (R1,c)
c
(R4,d) (R3,f) (R5,h) (R2,a)
(R3,g) (R2,b)
b
(R3,f) (R5,h) (R2,a)
(p13,d13) (R3,g)
g
(R5,h) (R2,a)
(p13,d13) (p1,d1)
d1
Result p13, p1
k2
12
Adapting the Algorithm for Wireless Medium
(w-conv alg)

It is not possible to sort ItemQueue on the
MINDIST measure
MINDIST ordering of tree nodes is not consistent
with their order of appearance in the broadcast
Reading a node from the medium based on the
topmost item in the MINDIST sorted ItemQueue may
result in leaving behind other tree nodes that
have entries in ItemQueue
As a result, the items in ItemQueue have to be
sorted based on their appearance order on the
medium
We cannot immediately stop the search when the
ResultQueue consists of objects only
It is not guaranteed that the rest of the items
in ItemQueue cannot generate an object closer
than the current k
Because the queue is no more sorted on MINDIST
measure
As a result, the search halts when the ItemQueue
becomes empty

13
Optimizing the Algorithm for Wireless Medium
(w-opt alg)

After the root node is processed
ItemQueue u, v, w
ResultQueue (v,5), (w,14)
kthdist 14
Node v cannot be pruned
In fact, in this example it is possible to prune
it

Knowledge
If the minimum fanout of the tree is f, then we
know that there are at least f l-1 objects under
node ws mbr

14
Optimizing the Algorithm for Wireless Medium
(w-opt alg)

Given this knowledge, at the time when we add w
to ItemQueue
There is one object at most 5 away from the query
point
There are at least f l-11 ? 1 objects at most 10
away from the query point
Then kthdist 10 after the root node is
processed
Now v can be pruned

New Rule (w-opt algorithm)
While adding a node, say node N at level i, with
its MINMAXDIST measure to ResultQueue, we also
insert f i -1 additional entries with the MAXDIST
measure of node N. ResultQueue is sorted on the
associated measures of the entries

15
Does Serialization Order Matter?

The way a spatial index is organized on the
broadcast medium
of index nodes read by the kNN search, tune in
time
Previous work on range and kNN search in
broadcast environments 28, 10, 30, 15 used a
depth first search (dfs) order serialization of
the tree,
conventional algorithm is based on dfs guided by
a heuristic

Proof In the paper.
16
Does Serialization Order Matter?

Result circle The circle formed around the query
point using its distance from the kth nearest
neighbor
A wrong decision causes high cost in terms of
tune-in time, as we are trapped in a branch
BFS serialization does not share the same
problem, but it may result in having a large
ItemQueue size

result circle
17
Histograms for Better Pruning? (w-hist alg)

A simple grid-like histogram can be used to
obtain an upper bound on size of the result
circle of a kNN query
Given a query point P and any set of k objects,
the distance between P and the furthest object in
the given set of objects from P, is larger than
the radius of the result circle
As a result, any circle centered at point P that
covers some set histogram cells such that the sum
of number of objects located under these
histogram cells is larger than k, covers the
result circle.

Pick the closest non-empty cells to the query
point, such that the total number of objects
contained in these cells is at least k
Set r as the maximum of MAXDISTs of all cells
that are picked
Circle centered at P with radius r is called
the pruning circle (PC)

18
Histograms for Proving Bounds? (w-alg)

New Rule (w-hist algorithm)
When a new node is to be added to ItemQueue, it
can be discarded if its mbr does not intersect
with the pruning circle
The histogram should not be too large, otherwise
the gain from pruning cannot compensate the cost
of reading the histogram
In fact we have a formula for an upper bound on
the tune-in time (for uniformly distributed
data)
Histogram cell size cannot be taken smaller than
the value that maximizes the above equation

19
Histograms for Proving Bounds?

Histograms help decrease the max. length of the
ItemQueue through pruning
They may not always decrease the tune-in time,
especially when the distribution is skewed
As a result, the tradeoffs require further study

20
Non-spatial Predicates

Non-spatial attributes that may need to be taken
into account when answering queries, ex type of
an object
We consider two techniques to answer kNN queries
that may specify an optional type constraint
The t-index Method
Use t separate indexes each indexing only its
associated type
Keep a lookup structure that has pointers for
each type
Really efficient for queries with type constrains
Costly for other queries, needs to lookup all
indexes
Improve the performance of queries without type
constraints, the order of the indexes can be
selected such that an index corresponding to type
i comes before the one corresponding to j on the
broadcast medium if ni gt nj .

21
Non-spatial Predicates

The t-hist/1-index Method
Use a single spatial index which indexes all
objects
Include t histograms, each one built for a
particular type
Keep a lookup structure that has pointers for
each type
Let the leaf nodes of the tree also mark the type
of each object
Queries with type constraints can be processed by
only using the pruning circle derived from the
associated histogram of the given type
Really efficient for queries without type
constraints
To improve the performance of queries with type
constraints, we can prune a node whose mbr does
not intersect with a non-empty cell of the
histogram of the associated type
We name this latter optimization as
t-hist/1-index/hp method
Note that it is only fair to compare them when
the total index size (together with histograms)
occupied by the two methods is same

22
Tune-in Time and Queue Size

tune-in time
with dfs organization
w-opt 33 better than w-conv
with bfs organization
w-opt 30 better than w-conv
bfs organization provides up to 55 improvement
over dfs organization
queue size
bfs organization has larger memory requirement
than dfs
w-opt decreases the memory requirement for both
organizations
tune-in time
after k 4 w-opt starts providing significant
improvement over w-conv
after k 8 w-opt performs better than w-conv
independent of the index serialization order
queue size
memory requirement of bfs organization grows fast
with increasing k when w-conv is used and
significantly drops when w-opt is used.

23
Scalability w.r.t. Number of Objects

Similar trends are observed
w-opt search and bfs organization prevailing over
other configurations.
However, comparing Figure 9 and Figure 7 reveals
that the improvement in tune-in time increases
with increasing k
In fact one can prove that for NN search (k 1)
w-opt reduces to w-conv.

24
Merits and Demerits of Histograms

Figure 10
a single packet histogram improves the tune-in
time
for this setup, histograms with larger sizes do
not provide better tune-in times
the effect of using a histogram is more prominent
with the dfs organization
with histograms the bfs and dfs organizations are
effectively the same
Figure 11
with skewed data the latter observation no more
holds bfs organization with w-opt search
outperforms all alternatives
this does not indicate that histograms are
useless for skewed data sets
a small histogram significantly (more than 50)
decreases the queue size increases the tune-in
time marginally (around 5)

25
More on Histograms andSearch with Non-spatial
Predicates

Figure 12
for larger number of objects it is better to use
larger histograms
dfs organization is more sensitive to the
increase in the number of objects
Figure 15
t-index performs much better for queries with
type constraints
t-hist/1-index/hp method (powered by the
additional histogram pruning), achieves
significant improvement in tune-in time,
especially for queries having constraints on
infrequent types
Figure 16
t-index performs poorly for queries without type
constraints
Although not outperforming t-index for type
constrained queries, t-hist/1-index/hp strikes a
good balance between two types of queries