Title: Query Processing in Connectivity-Challenged Environments
1Query Processing in Connectivity-Challenged
Environments
- Priyanka Puri
- Sharma Chakravarthy
- Gururaj Poornima
- Mohan Kumar
- Information Technology Laboratory
- Computer Science and Engineering Department
- The University of Texas at Arlington, Arlington,
TX 76009 - Email sharma_at_cse.uta.edu
- URL http//itlab.uta.edu/sharma
2- This effort is supported by AFRL under Contract
Number FA8750-09-2-0199 - Sanjay Madria and Raytheon (Waseem Naqvi) are
also involved in this project
3Query Processing
- Has been addressed in the context of centralized
DBMSs - Has been addressed in the context of distributed
DBMSs - Cost-based plan generation is typically used
- So, is there anything more/new to do?
4UAV 4
UAV 2
UAV 3
UAV 1
UAV 5
Ground Controller n
Ground Controller 1
5Ground Controller 1
6Currently
- Data is dumped into a central server and queried
- Bandwidth, QoS issues are not addressed
- No collaboration among nodes
- No continuous query processing, notification,
fusion, context usage, and real- or near
real-time support
7Proposed long-term Architecture
Limited Resources Mobility Heterogeneity Disconnec
tions
Network of computing nodes Unmanned vehicles,
Sensors, Robots, PCs , Servers, Ground
Controlling devices
Queries, Tasks, Requests, Continuous Queries
Publish/Subscribe
SOA Distributed Middleware Task planning Join
computation Composition pub/sub Context-aware N
otification Resource Management Data management
Context/ Knowledge Base
Fault Tolerance Services
Local fusion/Materialization
Publish Subscribe Capability
Query Capability
Raw Data / fused data /data from other nodes
8 9MyObjects Table at each node
Timestamp Node_id Longitude Latitude Obj_type Obj_desc Object_ptr
8 bytes 4 bytes 4 bytes 4 bytes 8 chars Varchar (64) Pointer (8 bytes)
Total width 100 bytes
Cardinality (number of tuples) , Selectivity,
replication site of data are known (part of meta
data)
10Query Plan Format
Operation 1 Param Operand1 Operand1 Loc Operand2 Operand 2 Location Result Name Result Loc
Operation 2 Param Operand1 Operand1 Loc Operand1 Operand2 Loc Result Name Result Loc
Operation n Param Operand1 Operand1 Loc Operand1 Operand2 Loc Result Name Result Loc
11Operations in Plan format
Operation Param Operand1 Operand1 Loc Operand2 Operand 2 Loc Result Name Result Loc
Select A gt 100 R1 1 Null Null R1 1
Project A1, A3, A4 R1 1 Null Null R1 1
Move Null R1 1 Null Null R 2
Copy Null R1 1 Null Null R14 4
SemiJoin A C R 2 R2 2 SR1 2
Join B D R12 2 R2 2 JR1 2
12Plan using Semijoin chains
- SELECT c1 R1
- MOVE R11 To Site2
- SELECT c2 R2
- SJ R11 R21 J1
- MOVE J1 To Site3
- SELECT c3 R3
- SJ J1 R31 J2
- MOVE J2 To Site2
- SJ J2 R21 J3
- MOVE J3 To Site1
- SJ J3 R11 J4
- COPY R To Site7 J
- Total Cost 14720 32000 46720
R1 1000
R2 5000
R3 3000
1
2
3
select project
select project
select project
R213000
lat
R11800
R31600
long
J11200
J2240
Cost3200
Cost4800
long,nodeid
7
Cost1920
J31200
lat,nodeid
Cost4800
J4320
J
Cost32000
13Semi-join/join plan generation
- We are developing algorithms for generating the
plan space and pruning it for generating best
(or good) plan for each input query (expressed
as a join query) - It is a cost-based algorithm based on System R
and SDD approaches extended to include
connectivity and bandwidth issues - The complexity of plan generation is kn n is
number of joins and k is the number of
alternatives for each join. - Assuming less than 5 joins in a query
- Integrate replication into the algorithm
14Plan Generation Alternatives
- A Query Plan (QP) is a numbered sequence of
operations for executing a Query - A QP includes how data is moved as part of
execution - Plan generation alternatives
- Static Plan generated once and executed in a
distributed manner - Dynamic plan generated incrementally at each
node as the query progresses using current
connectivity information - Parallel plan partial plans are executed in
parallel - Interactive plan get some estimate by asking
nodes that has relevant data
15Static plan
- The physical plan generated will have node
information for data propagation. - This will be mapped to actual connectivity by
the physical layer for execution - It is possible that no connectivity exists by the
time execution is performed for a generated query
plan - In that case, either a new plan can be generated
(using the same algorithm, but using current meta
data) or an alternative approach can be used to
incrementally modify the plan
16Dynamic plan
- Generate plan for the first join and defer the
rest of the plan - Join plans are generated one at a time
- Current connectivity information can be used
- Result size estimation will also be more accurate
- Query execution and (partial) plan generation are
intertwined - Does not increase the complexity of plan
generation or plan execution (compared to static)
17Parallel plan
- All local operations/computations (select,
project, and even some joins) can be done in
parallel - Join plans are still generated one at a time
- Increases message/information exchange
- Current connectivity information can be used
- Result size estimation will also be more accurate
- Deal with responses and plan generation and
execution may be slightly more complicated than
the previous cases
18Interactive plan
- When a query comes in, send out requests for
local processing and get processing time and size
information - Use the above to generate partial plans
- Join plans are still generated using information
obtained interactively - Increases message/information exchange
- Current connectivity information can be used
- Result size estimation will also be more accurate
- Combines Dynamic and parallel execution in an
interactive manner
19Replication Issues
- Algorithm for Replication
- Single copy replication that minimizes the data
transmission cost and maximizes the number of
paths (to deal with connectivity) - Algorithm for Replication utilization
- Given a replication, determine the utility of
that replica in terms of query evaluation cost
for a reasonable load - Reconcile the above two to come up with a
replication strategy that balances the competing
tradeoffs
20Thank You !