Title: OGSADQP
1OGSA-DQP
- Steven Lynden
- University of Manchester
2Introduction
- OGSA-DQP is a service based distributed query
processor - It evaluates queries over distributed data
sources wrapped by OGSA-DAI - It is built using OGSA-DAI extensibility points
- People involved
- University of Manchester
- Tasos Gounaris, Steven Lynden, Alvaro Fernandes,
Rizos Sakellariou, Norman Paton - University of Newcastle
- Jim Smith, Arijit Mukherjee, Paul Watson
- OGSA-DAI
- Prototype release 3.0 available from the OGSA-DAI
website
3OGSA-DQP high-level overview
- OGSA-DQP uses a middleware approach.
- It can be seen as a mediator over OGSA-DAI
wrappers. - Usability use it as an OGSA-DAI data service.
- DQP is capable of planning, scheduling and
executing in parallel the distributed queries - Calls to analysis (Web) services can be declared
within queries and invoked by DQP.
Query
Results
OGSA-DQP
OGSA-DAI
OGSA-DAI
DBMS
DBMS
data
data
4OGSA-DQP architecture
Evaluator
QE
DQP activities installed
OGSA-DAI data service
Evaluator
perform
QE
Evaluator
QE
The OGSA-DQP service, Grid Distributed Query
Service (GDQS) AKA Coordinator
AKA Grid Query Evaluation Service (GQES)
5OGSA-DQP architecture
- DQP evaluator services
- Are plain Web services
- Implement the QueryEvaluation port type
- evaluate the input is a query plan partition
which is subsequently executed - receiveData allows the evaluator to receive
data from other evaluators - OGSA-DAI extensions
- DQP resource a resource which encapsulates a
distributed query infrastructure DQP evaluator
services, OGSA-DAI data services etc. Implemented
as a data resource accessor. - OQL query statement activity enables the
submission of a query in Object Query Language
(OQL) - DQP factory activity enables the creation and
configuration of DQP resources.
6Example query
- Given two DBMSs and one analysis tool (i.e., a
Web service) - goTerm a table in a GO Gene Ontology database
running as a remote mySQL DB, exposed by an
OGSA-DAI data service - protein a table in a protein sequence DB,
exposed by an OGSA-DAI data service - Blast (sequence alignment scoring Web service)
- We want to obtain alignment scores for a sequence
against proteins of a certain kind - The user submits a single query referencing data
stored at multiple sites. - The author of the query need not be aware of
how/where data is stored. - Queries are written in Object Query Language
(OQL)
select p.proteinId, Blast(p.sequence) from
protein p, goTerm t where t.termId GO0005942
and p.proteinIdt.proteinId
7Client interaction with OGSA-DQP
- Two kinds of client/server interactions
- Configuration the client sends a perform
document requesting the service to create a DQP
data service resource - Query submission the client sends a perform
document requesting the service to execute an
Object Query Language (OQL) query, using a DQP
data service resource created in (1) - The data service resource created in (1)
encapsulates the distributed query infrastructure
used to execute queries. Differs from the typical
OGSA-DAI data service resources e.g. relational
data service resource
8DQP configuration
ltperformgt ltDQPFactorygt Evaluator URLs OGSA-DAI
data service resources Web service
URLs lt/DQPFactorygt lt/performgt
OGSA-DAI data service
GetRP
OGSA-DAI data service
OGSA-DAI data service
GetRP
perform
DQP factory activity
Result resource ID of created DSR
creates
DQP DSR
- Global schema of imported DBs analysis
services - Set of evaluators that can be used
- Physical DB metadata (used to optimise queries)
9DQP query evaluation
ltperformgt ltOQLQueryStatementgt ltexpressiongt OQL
query lt/expressiongt lt/OQLQueryStatementgt
lt/performgt
OGSA-DAI data service
Evaluator
perform
QE
OGSA-DAI data service
Analysis service
Evaluator
transport
perform
. . .
QE
OQLQueryStatement
DQP DSR
OGSA-DAI data service
Evaluator
perform
QE
Result WebRowSet XML Stream
10OQL Query Statement activity detail
query
single-node optimiser
logical optimiser
physical optimiser
parser
evaluators
partitioner
scheduler
multi-node optimiser
query results
11Logical optimisation
- Consider the query
- select p.proteinId, Blast(p.sequence)
- from protein p, goTerm t
- where t.termId GO0005942 and
- p.proteinId t.proteinId
- Plan is expressed as a logical algebra
- Multiple equivalent plans are generated
-
reduce
op_call (Blast)
join (proteinId)
reduce
reduce
scan termIdGO0005942 (goTerm)
scan (protein)
12Physical optimisation
- Plan is expressed as a physical algebra
- Plan is chosen by cost-ranking of equivalent plans
reduce
op_call (Blast)
hash_join (proteinId)
reduce
reduce
table_scan termIdGO0005942 (goTerm)
table_scan (protein)
13Query partitioning
- Plan is transformed into a parallel algebra
(physical operators data exchange) - Exchange operators are placed where data exchange
must take place
reduce
op_call (Blast)
exchange
hash_join (proteinId)
exchange
exchange
reduce
reduce
table_scan termIdGO0005942 (goTerm)
table_scan (protein)
14Query scheduling
- Allocate operators to evaluator nodes
partitioned parallelism
pipelined parallelism
select p.proteinId, Blast(p.sequence) from
protein p, goTerm t where t.termId
GO0005942 and p.proteinId t.proteinId
15Conclusion
- OGSA-DQP is a service based distributed query
processor that is - Exposed as a service
- Implemented as an orchestration of services
- Benefits
- Queries are executed in parallel
- OGSA-DAI OGSA-DQP can take advantage of the host
of delivery options provided by OGSA-DAI - Web services can be invoked during query
execution, merging data access with data analysis