Title: Optimization of Spatial Joins on Mobile Devices
1Optimization of Spatial Joins on Mobile Devices
N. Mamoulis1, P. Kalnis2, S. Bakiras3, X. Li2
1 Department of Computer Science and Information
Systems, University of Hong Kong
2 Department of Computer Science, National
University of Singapore
3 Department of Electrical and Electronic
Engineering, University of Hong Kong
2Motivation
- Users are equipped with a mobile device (eg. PDA)
- Ad-hoc spatial queries
- Combine data from remote servers
Find hotels which are within 500m of a seafood
restaurant
- Servers do not collaborate with each other
- The query is executed on the mobile device
3Cost
- Telecommunication companies typically charge by
the bulk of transferred data (eg. GPRS), instead
of connection time. - Goal Minimize the amount of transferred data.
4Mediators?
Restaurants
Hotels
Mediator
- Services may only allow end-user connections
(eg., subscribers only) - Access through mediators may be more expensive
- Requests are ad-hoc existing mediators may not
support them
5Solution
- Integrate the statistics retrieval with the query
processing phase - Ask aggregate queries to estimate the data
distribution - Partition the space recursively to achieve
sub-linear transfer cost - Choose the physical operator indepen-dently for
each partition
6Related Work
- Hash-based methods (eg. PBSM) require all data
to be transferred - R-tree based methods (eg., Tan et.al, TKDE,
2000) require access to internal index - Mediators
- HERMES Statistics from previous queries
- DISCO, Garlic Statistics during initialization
- Tuckila Optimize parts of the execution tree
7Operators
- WINDOW query return all objects intersecting a
window w - COUNT query return the number of objects
intersecting w - e-RANGE query return all objects within range e
from a point p
We do not have access to the internal indices!
8Hash based spatial join
Each partition must fit in memory
9Recursive evaluation
Retrieve statistics for each subpart
10Nested loop spatial join
Recursive HBSJ 4 QRY 2 RCV 5 RCV NLSJ 2
RCV 2 SND 2 RES
11Cost Model
- c1 download RW objects from R and Sw objects
from S and join them on the PDA - c2 download RW objects from R, send them as
window queries to S and retrieve the results - c4 repartition w, retrieve detailed statistics
and apply the algorithm recursively
12MobiJoin algorithm
- MobiJoin(w, Rw, Sw)
- if Rw0 or Sw0 then return
- compute c1, c2, c3, c4
- cmin min(c1,c2,c3,c4)
- if cmin c4 then
- impose a regular grid over w
- for each cell w in w
- retrieve Rw and Sw
- MobiJoin(w, Rw, Sw)
- else follow action specified by cmin
13Iceberg Spatial Semi-Join
- SELECT H.id
- FROM Hotels H, Restaurants R
- WHERE dist(H.location, R.location) e
- GROUP BY H.id
- HAVING COUNT() m
14Experimental setup
- Implementation
- Server Unix
- Client HP-Ipaq PDA (WiFi network, 400MHz RISC
CPU, 64MB RAM, Windows Pocket PC) - Datasets
- Synthetic 1K 10K points, varying skew
- Real Roads and railways of Germany
- Algorithms
- NLSP Only nested loop spatial join
- HBSJ Only hash-based spatial join
15Varying the distance threshold e
PDA buffer 5
16Varying the data skew
Uniform data gt MobiJoin reduces to HBSJ
17Varying the PDAs buffer size
Packets
Bytes
Large buffer gt HBSJ fails to prune the empty
areas
18Iceberg queries
Uniform data
Skewed data
Real dataset (35K) joins a synthetic dataset (1K)
19Conclusions
- Distributed spatial joins on mobile devices
- No mediator non collaborative servers limited
set of supported operators - MobiJoin
- Dynamically optimizes the entire process of
statistics retrieval and query execution - Single ad-hoc query
- Future work
- Support multi-way spatial joins
- Improve the accuracy of the cost model