Title: Apresenta
1FEDERAL UNIVERSITY OF RIO DE JANEIRO
Spatial Query Broker in a Grid Environment
Author Wladimir S. Meyer Advisors Jano M.
Souza Milton R. Ramirez
2Outline
- Motivation and Goal
- The Problem
- Related works
- The Proposal
- SQB Architecture
- Preliminary Tests
- Remarks
3Outline
- Motivation and Goal
- The Problem
- Related works
- The Proposal
- SQB Architecture
- Preliminary Tests
- Remarks
4Motivation
- The dissemination of GIS systems, associated
with the improvement of channels bandwidth, is
increasing quickly and the interactions between
data producers and consumers are becoming more
frequent, complex and dynamic. - Some hot points in these relationships
- Huge amount of data spread by many different
geographic places - Complexity of spatial data
- Demand for sophisticated services delivered by
web - The high price that shared resources may have in
some federations (CPU time, storage space, ...) - Integration problems (many levels of
heterogeneity)
Distributed spatial operations and methods to
improve their efficiency take an important role
in this context . There are a lot of works
involving spatial operations in a centralized
context, but few in a distributed context. The
Grid computig paradigm aggregate many
characteristics that can improve the execution of
distributed spatial operations.
5Goal
This work aim at improving the efficiency of
distributed spatial join by means of an
architecture that permits the allocation of
non-specialized computers in execution of the
operation, reducing the overall response
time. Spatial join was focused because it is a
very common operation in GIS systems and has a
high processing cost. The architecture also
offers condictions to make experiments with new
algorithms (filter/refine, scheduler, ...)
6Outline
- Motivation and Goal
- The Problem
- Related works
- The Proposal
- SQB Architecture
- Preliminary Tests
- Remarks
7The Problem
How to proceed with a spatial join in a pool of
data providers that share a huge amount of
spatial data, in order to have the response time
bellow a limit stated by some quality
criteria? The data fragmentation may be spatial
and/or thematic (ie a hybrid schema) and there
are local spatial indexes on each dataset This
scenario could be depicted by a pool of regional
governmental agencies responsible by
cartographic data generation, offering
query-services that run over their data by mean
of the internet.
8Outline
- Motivation and Goal
- The Problem
- Related works
- The Proposal
- SQB Architecture
- Preliminary Tests
- Remarks
9Related Work
- Many important works in spatial query processing
are related with the filter / refine strategy
5. Some of them are mentioned bellow - Multi-Step processing of spatial joins Brinkhoff
et al 6 - Raster signatures in spatial joins (4CRS)
Zimbrao et al 30 - Multi-Steps with remote indexes (MR2) Ramirez
and Souza 26 - On the other hand, the execution of the query
plan in a distributed context may emphasize the
parallelism as a manner to reduce the overall
response time. - MR2 Ramirez 26
- Grid Greedy Node, Porto et al 25
- OGSA-DQP, Smith et al 27
- The need of a scheduler module in some of these
strategies should guarantee an adequate load
balance among the selected local SDBMS
10Outline
- Motivation and Goal
- The Problem
- Related works
- The Proposal
- SQB Architecture
- Preliminary Tests
- Remarks
11The Proposal
In this work, the grids ability in offering
resources on-demand is used to reduce the overall
response time during distributed spatial query
join operations in databases. The parallelism in
previous works involves only those nodes that are
storing spatial data mentioned in the query. Our
proposal is involve also generic computational
resources in the most expensive step of the
filter / refine strategy the exact geometry
processing.
Multi-step filter / refine strategy 6
12The Proposal
The follow picture gives an overview of the
context
13The Proposal
A specialized meta-scheduler, named Spatial Query
Broker (SQB), is being proposed to deal with all
spatial query processing, in a similar way as
conventional Resource Brokers in grid
environments.
Item SQB OGSA-DQP GridWay WMS
Unit of work Query Query Job Job
App domain Databases Databases Generic jobs Generic jobs
Dynamic scheduling Yes No Yes No
Spatial queries? Yes No - -
Use generic nodes? Yes No - -
14Outline
- Motivation and Goal
- The Problem
- Related works
- The Proposal
- SQB Architecture
- Preliminary Tests
- Remarks
15SQB Architecture
The SQB is composed by the following modules
16SQB Architecture
Steps managed by the optimizer
17SQB Architecture
- The Execution Monitor builds two queues to store
the inconclusive pairs in order to deliver them
to the CEs. - One of them are shared among faster CEs, while
the other among slower ones. - The total number of vertices is adopted as
indicator to the complexity of the processing. - A throughput indicator is previously picked up
from the CEs and registered in the Information
server (MDS)
It isnt necessary to sort the pairs
18SQB Architecture
Simplified sequence diagram
19Outline
- Motivation and Goal
- The Problem
- Related works
- The Proposal
- SQB Architecture
- Preliminary Tests
- Remarks
20Preliminary Tests
Despite a prototype is under construction, a few
tests were done with synthetic spatial datasets
consisting of polygons in order to give us some
relative parameters to guide our work while
dealing with spatial joins among polygons
(overlap predicate). Spatial join operations
were performed over servers that have both
datasets R-Tree indexed. The original datasets
were partitioned in four and nine regular parts
and the response time (RT) on each situation was
taken
RT TMSG messages TTX bytes TCPU
TI/O
- Objets that cross boundaries were replicated on
involved datasets (they werent split). - The tests were executed in three situations
- The whole query at once in a single SDBMS
- The query over the same region broken in four
parts and executed by four identical machines - The query over the same region broken in nine
parts and executed by nine identical machines
21Preliminary Tests
Theme 1
Theme 2
22Preliminary Tests
This operation is CPU bound and the communication
cost has a low impact in the final response time.
RT TMSG messages TTX bytes TCPU
TI/O
T remove replicas
Communications cost based on a 256kbps
bandwidth
23Preliminary Tests
1
4
9
servers
The processing cost and the communication cost
tend to reach a same magnitude when the number of
servers increase.
The superlinear speedup means, in this case, that
computational resources available in a single
machine were insufficient to reach good response
time
24Test conditions
- The preliminary tests were executed under the
following conditions - Spatial Database Secondo
- Grid Middleware Globus GT4
- Datasets Two datasets composed by 10060
triangles indexed - Hardware Sempron 2800, 1GB RAM, 80GB HD
- OS Fedora Linux
- The overall architecture is under construction
and is based on web services (WSRF)
25Outline
- Motivation and Goal
- The Problem
- Related works
- The Proposal
- SQB Architecture
- Preliminary Tests
- Remarks
26Remarks
- This work presents an architecture based on grid
infrastructure tailored to cover some needs of a
distributed geographic information system. - The focus was on offering a strategy to execute
spatial queries over spatial databases managed by
several organizations that are gathered in a
federation - The filter/refine approach was adopted and tried
to use some pre-existent spatial index in
datasets. - A global ID structure must be proposed in order
to - Easily reduce the multi-processing of objects
crossing boundaries after filtering step
(avoiding to move them unnecessarily to CEs) - Isolate the processing in SQB from local IDs,
improving the scalability - As next steps
- Specify new cost models to help the optimizer and
the scheduler taken into account the dynamic of
the environment - Research the scheduling process in order to
improve the reliability of the architecture - Compare the response time of a join, executed
over a benchmark dataset, with that one executed
in similar distributed environments
27References
1. Adzigogov, L., Soldatos, J., and Polymenakos,
L. (2005). "EMPEROR An OGSA Grid Meta-Scheduler
based on Dynamic Resource." Journal of Grid
Computing, 3, 19-37. 2. Afgan, E. (2004). "Role
of the Resource Broker in the Grid." ACM,
Huntsville, Alabama, USA. 3. Andretto, P. e. a.
(2004). "Practical approaches to Grid workload
and resource management in the EGEE
project.". 4. Azevedo, L. G., Monteiro, R. S.,
Zimbrão, G., and Souza, J. M. (2004).
"Approximate Spatial Query Processing Using
Raster Signature.". 5. Brinkhoff, T., Kriegel, H.
and Seeger B.(1993). Efficient Processing of
Spatial Joins Using R-Trees, In Proceedings of
the 1993 ACM SIGMOD, Washington,DC. 6. Brinkhoff,
T., Kriegel, H., and Schneider, R. (1994).
"Multi-Step Processing of Spatial Joins."
Washington,DC - USA, 237-246. 7. Buyya, R., and
Venegupal, S. (2004). "The Gridbus Toolkit for
Service Oriented Grid and Utility Computing An
overview and Status Report.". 8. Câmara, G., and
Queiroz, G. (2002). "GeoBR Intercâmbio Sintático
e Semântico de Dados Espaciais.". 9. Di, L.,
Chen, A., Yang, W., and Zhao, P. (2003). "The
Integration of Grid Technology with OGC Web
Services (OWS) in NWGISS for NASA EOS
Data.". 10. EGEE .(2006) "GLite - Installation
and Configuration Guide v 3.0 (rev 2)" , European
Union. 11. Egenhofer, M. J., and Herring, J. R.
(1994) "Categorizing Binary Topological Relations
Between Regions, Lines and Point in Geographical
Databases" , NCGIA. 12. "Globus Toolkit
4."(2005). www.gridbus.org/escience/051205GlobusTu
torialeScience.ppt, July/2006. 13. Foster, I.,
and Kesselman, C. (1999). "Computational grids."
The Grid Blueprint for a New Computing
Infrastructure, Morgan-Kaufman. 14. Foster, I.,
Kesselman, C., and Tuecke, S. (2001). "The
Anatomy of the Grid Enabling Scalable Virtual
Organizations." Lecture Notes in Computer
Science, 2150. 15. Gistafson, J. L. (1990).
"Fixed Time, Tiered Memory, and Superlinear
Speedup.".
28References
16. GridWay Team .(2006) "GridWay 5
Documentation User Guide" Madrid, Spain,
Universidad Complutense de Madrid. 17. Güting,
R. H., Behr, T., Almeida, V., Ding, Z., Hoffmann,
F., and Spiekermann, M. (2004) "Secondo An
Extensible DBMS Architecture and Prototype"
Hagen, Germany, Fernuniversität Hagen.
18. Hanssen, G. (2005). "The Filter/Refine
Strategy A Study on the Land-Use Resource
Dataset in Norway.". 19. Ilya, Z., Memon, A.,
Petropoulos, M., and Baru, C. (2003). "Online
Querying of Heterogeneous Distributed Spatial
Data on a Grid." Brno, Cz, 813-823. 20. Kang,
M.-S., and Choy, Y.-C. (2002). "Deploying
parallel spatial join algorithm for network
environment." IEEE, 177-181. 21. Meyer, W. S.,
and Souza, J. M. (2006). "Overlapped Regions with
Distributed Spatial Databases in a Grid
Environment." Rio de Janeiro, Brazil. 22. Meyer,
W. S., Souza, J. M., and Ramirez, M. R. (2005).
"Secondo-gridAn Infrastructure to Study Spatial
Databases in Computational Grids." Campos do
Jordão, SP, Brazil. 23. Mondal, A., Goda, K., and
Kitsuregawa, M. (2003). "Effective Load-Balancing
via Migration and Replication in Spatial Grids."
Lecture Notes in Computer Science, 2736,
202-211. 24. Özsu, M. T., and Valduriez, P.
(2001). "Principles of Distributed Database
Systems." Prentice-Hall. 25. Porto, F., Silva, V.
F. V., Dutra, M. L., and Shulze, B. (2005). "An
adaptive distributed query processing grid
service." Trondheim, Norway. 26. Ramirez, M. R.
(2001) "Spatial Distributed Query Processing" Rio
de Janeiro, RJ, COPPE/UFRJ. 27. Smith, J.,
Gounaris, A., Watson, P., Paton, N. W.,
Fernandes, A. A. A., and Sakellariou, R. (2002)
"Distributed Query Processing on the Grid"
28. "OGSA-DQP 3.1 User's Documentation."(2006).
http//www.ogsadai.org.uk/documentation/ogsa-dqp_3
.1/, July/2006. 29. Venegupal, S., Buyya, R., and
Winton, L. (2004). "A Grid Service Broker for
Scheduling Distributed Data-Oriented Applications
on Global Grids.". 30. Zimbrão, G., and Souza, J.
M. (1998). "A Raster Approximation for the
Processing of Spatial Joins." New York - USA,
558-569.
29Thank You !
30(No Transcript)