Optimization of Distributed Queries - PowerPoint PPT Presentation

1 / 41

About This Presentation

Title:

Optimization of Distributed Queries

Description:

Title: Data Mining Engineering Author: Peter Brezany Last modified by: brezany Created Date: 8/2/1995 10:08:02 AM Document presentation format: A4 Paper (210x297 mm) – PowerPoint PPT presentation

Number of Views:92

Avg rating:3.0/5.0

Slides: 42

Provided by: PeterB243

Category:

more less

Transcript and Presenter's Notes

Title: Optimization of Distributed Queries

1
Optimization of Distributed Queries

Univ.-Prof. Dr. Peter Brezany
Institut für Scientific Computing
Universität Wien
Tel. 4277 39425
Sprechstunde Di, 13.00-14.00
LV-Portal www.par.univie.ac.at/brezany/teach/gck
fk/300658.html

2
Layers of Query Processing
3
Query Optimization Process
4
Search Space

For a given query, the search space can be
defined as the set of equivalent operator trees,
that can be produced using transformation rules.
It is useful to concentrate on join trees,
operator trees whose operators are join or
Cartesian product.
This is because permutations of the join order
have the most important effect on performance of
relational queries.
Next example illustrates 3 equivalent join trees,
which are obtained by exploiting the
associativity of binary operators. Join tree (c)
which starts with a Cartesian product may have a
much higher cost than other join trees.

5
Search Space - Example
Example SELECT ENAME, RESP FROM EMP,
ASG, PROJ WHERE EMP.ENOASG.ENO AND ASG.PNOP
ROJ.PNO
6
Search Space Shape of the Join Tree
A linear tree at least one operand of each
operand node is a base relation. A bushy tree is
more general and may have operators whose both
operands are intermediate operators. In a
distributed environment, bushy trees are useful
in exhibiting parallelism.
7
Distributed Cost Model

An optimizers cost model includes
Cost functions to predict the cost of operators
Statistics and base data and formulas to evaluate
the sizes of intermediate results.
Cost Functions can be expressed with respect to
either the total time or the response time.The
total time is the sum of all time (cost)
components, the response time is the elapsed time
from the initiation to the completion of the
query.
Total_time TCPU insts TI/O I/Os TMSG
msgs TTR bytes
TCPU the time of a CPU instruction
TI/O - the time of a disk I/O
TMSG - the fixed time of initiating and receiving
a message
TTR - the time it takes to transmit a data unit
from one site to another
Costs are generally expressed in terms of time
units, which in turn, can be translated into
other units (e.g., dollars).

8
Cost Function (cont.)

When the response time of the query is the
objective function of the optimizer, parallel
local processing and parallel communications must
also be considered.
Response_time TCPU seq_insts TI/O
seq_I/Os TMSG seq_msgs TTR seq_bytes
Example

Site 1
x units
Site 3
Site 2
y units
Most early distributed DBMSs designed for wide
area networks have ignored the local processing
cost and concentrate on minimizing the
communication cost. Total_time 2 TMSG TTR
(x y) Respone_time max TMSG TTR x,
TMSG TTR y since the transfers can be done
in parallel.
9
Cost Function (cont.)

Minimizing response time is achieved by
increasing the degree of parallel execution.
This does not imply that the total time is also
minimized.
On contrary, it can increase the total time, for
example by having more parallel local processing
(often includes synchronization overhead) and
transmissions.
Minimizing the total time implies that the
utilization of the resources improves, thus
increasing the system throughput.
In practice, a compromise between the total and
response times is desired.

10
Database Statistics

The main factor affecting the performance is the
size of the intermediate relations that are
produced during the execution.
When a subsequent operation is located at a
different site, the intermediate relation must be
transmitted over the network. ? It is of prime
interest to estimate the size of the intermediate
results in order to minimize the size of data
transfers.
The estimation is based on statistical
information about the base relations and formulas
to predict the cardinalities of the results of
the relational operations. ? the more precise
statistics being the more costly.
For a relation R defined over the attributes A
A1, A2, ..., An and fragmented as R1, R2, ...,
Rr, the statistical data are the following

11
Database Statistics (cont.)
12
Database Statistics (cont.)
13
Database Statistics (cont.)
14
Database Statistics (cont.)
15
Centralized Query Optimization- INGRES Algorithm

Why reviewing centralized optimizations?
A distributed query is is translated into local
queries, each of which is processed in a
centralized way.
Distributed techniques are extensions of
centralized ones
Centralized optimization is a simpler problem
the minimization of communication costs makes
distributed query optimization more complex.
INGRES is a popular relational DB system and it
has a distributed version whose optimization
algorithms are extensions of the centralized
version.

16
INGRES Algorithm

It uses a dynamic query optimization algorithm
that recursively breaks-up a calculus query into
smaller pieces.
It combines calculus-algebra decomposition and
optimization.
A query is first decomposed into a subsequence of
queries having a unique relation in common. Then
each monorelation query is processed by a
one-variable query processor (OVQP).
The OVQP optimizes the access to a single
relation by selecting the best access method to
that relation (e.g., index, sequential scan).

17
INGRES Algorithm (cont.)

By q qi-1 ? qi we denote a query q decomposed
into 2 subqueries qi-1 and qi, where qi-1 is
executed first and its result is consumed by qi.
Given an n-relation query q, the INGRES query
processor decomposes q into n subqueries q1 ? q2
? ... ? qi. This decomposition uses two basic
techniques detachment and substitution.
Detachment is used first it breaks q into q
? q, based on a common relation that is the
result of q.
A more detailed explanation and examples
follow.

18
INGRES Algorithm - Detachment
If the query q is expressed in SQL is of the form
19
Database Example
20
INGRES Algorithm Detachment - Example
Running Example for INGRES To illustrate the
detachment technique, we apply it to the
following query Names of employees working
on the CAD/CAM project This query can be
expressed by the following query q1 on our
example engineering DB.
21
INGRES Algorithm - Substitution
22
INGRES Algorithm Substitution - Example
OVQP one-variable query processor
23
INGRES Algorithm (formalized)
24
JOIN ORDERING IN FRAGMENT QUERIES

Ordering joins is an important aspect of
centralized query optimization.
Join ordering in a distributed context is even
more important since joins between fragments may
increase the communication time.
2 basic approaches exist to order joins in
fragment queries
direct optimization of the ordering of joins
(e.g. in the Distributed INGRES algorithm).
replacement of joins by combination of semijoins
in order to minimize communication costs.

25
Join Ordering

Note To simplify notation, we use the term
relation to designate a fragment stored at a
particular site.
Lets the query is R ? S, where R and S are
relations stored at different sites and ? denotes
the join operator. The obvious choice is to send
the smaller relation to the site of the larger
one.

if size(R) lt size(S)
R
S
if size(R) gt size(S)
More interesting is the case where there are more
than 2 relations to join. The objective of the
join ordering algorithm is to transmit smaller
operands. The difficulty the join operations may
reduce or increase the size of intermediate
results ? estimating the size of joint results
is mandatory, but difficult.
26
Join Ordering - Example
Example Consider the following query expressed
in relat. alg. PROJ ?PNO
EMP ?ENO ASG whose join graph is below
Site 2
ASG
ENO
PNO
EMP
PROJ
Site 3
Site 1
This query can be executed in at least 5
different ways. We describe them by the programs
introduced in the next slide.
27
Join Ordering Example (cont.)
28
Semijoin Based Algorithms

The join of 2 relations R and S over attribute A,
stored at sites 1 and 2 respectively, can be
computed by replacing one or both operand
relations by a semijoin with the other relation,
using the following rules

Strategy 1
Strategy 1
29
Horizontal fragmentation
30
Vertical fragmentation
31
Hybrid (Mixed) Fragmentation
In most cases a simple horizontal or vertical
fragmentation will not be sufficient to satisfy
the requirements of user applications. In this
case a vertical fragmentation may be followed by
a horizontal one, or vice versa, producing a
tree-structured partitioning.
Hybrid fragmentation
Reconstruction of hybrid fragmentation
32
Hybrid Fragmentation (cont.)
Relation A
Fragment H1
Fragment H2V2
Fragment H2V1H1
Fragment H2V1H2
Fragment H2V1H3
33
Distributed INGRES Algorithm

Only horizontal fragmentation is handled.
General and broadcast (the same data unit can be
transmitted from one site to all the other sites
in a simple transfer) network topology
considered.
E.g., broadcasting is used to replicate fragments
and then to maximize the degree of parallelism.
The input is a query expressed in tuple
relational calculus and schema information the
network type the location and size of each
fragment.
The algorithm is executed by the site, called the
master site, where the query is initiated.
The algorithm called D-INGRES-QOA is given in
Algorithm 9.3. in the next slide.

34
Distributed INGRES Algorithm
MRQ_list
35
Distributed INGRES Algorithm (cont.)

All monorelation queries (e.g. selection and
projection) that can be detached are first
processed locally Step (1)
Reduction algorithm is applied to the original
query Step (2). (Reduction is a technique that
isolates all irreducible subqueries and
monorelation subqueries by detachment.)
Monorelation queries are ignored because they
have already been processed in step (1). Thus the
REDUCE procedure produces a sequence of
irreducible subqueries q1 ? q2 ? ... ? qn, with
at most one relation in common between two
consecutive subqueries. Our Running Example in
the previous slides, which illustrated the
detachment technique, also illustrates what the
REDUCE procedure would produce.
Based on the list of irreducible queries isolated
in step (2) and the size of each fragment, the
next subquery MRQ, which has at least 2
variables, is chosen at Step (3.1) and Steps
(3.2), (3.3), and 3.4 are applied to it.
Step 3.2 selects the best strategy to process the
query MRQ. This strategy is described by a list
of pairs (F, S), in which F is a fragment to
transfer to the processing site S.
Step 3.3 transfers all the fragments to their
processing sites.
Step 3.4 executes MRQ.
If there are remaining subqueries, the algorithm
goes back to step (3) and performs the next
iteration. Otherwise, the algorithm terminates.

36
Distributed INGRES Algorithm (cont.)

Optimization occurs in steps (3.1) and (3.2). The
algorithm has produced subqueries with several
components and their dependency order (similar to
one given by a relational algebra tree).
At step (3.1) a simple choice for the next
subquery is to take the next one having no
predecessor and involving the smaller fragments.
This minimize the size of the intermediate
results.
E.g., if a query q has the subqueries q1, q2, and
q3, with dependencies q1 ? q3, q2 ? q3, and if
the fragments referred to by q1 are smaller than
those referred to by q2, then q1 is selected.
The subquery selected must then be executed.
Since the relation involved in a subquery may be
stored at different sites and even fragmented,
the subquery may nevertheless be further
subdivided.

37
Distributed INGRES Algorithm (cont.) - Example
Assume that relations EMP, ASG, and PROJ of the
query of our Running Example are stored as
follows, where relation EMP is fragmented.
38
Distributed INGRES Algorithm (cont.)

At step (3.2), the next optimization problem is
to determine how to execute the subquery by
selecting the fragments that will be moved and
the sites where the processing will take place.
For an n-relation subquery, fragments from n-1
relations must be moved to the site(s) of
fragments of the remaining relation, say Rp, and
then replicated there. Also, the remaining
relation may be further partitioned into k
equalized fragments in order to increase
parallelism. This method is called
fragment-and-replicate and performs a
substitution of fragments rather than of tuples
as in centralized INGRES.
The selection of the remaining relation and of
the number of processing sites k on which it
should be partitioned is based on the objective
function and the topology of the network.
(Replication is cheaper in broadcast networks
than in point-to-point networks).

39
Distributed INGRES Algorithm (cont.) - Example
40
Architecture of Distributed DBMS Revisited
Detailed Model of the Distributed
Execution Monitor
Components of a Distributed DBMS
41
Architecture Revisited (cont.)

The Transaction Manager (TM) is responsible for
coordinating the execution of the DB operations
on behalf of an application.
The Scheduler (SC) is responsible for the
implementation of a specific concurrency control
algorithm for synchronizing access to the
database.
Each transaction originates at one site its
originating site. The execution of the database
operations of a transaction is coordinated by the
TM at that transactions originating site.
The TMs implement an interface for the
applications programs which consists of 5
commands
Begin_transaction. This is an indicator to the TM
that a new transaction is starting. The TM does
some bookkeeping, such as recording the
transactions name, the originating application,
etc.
Read. If the data item x is stored locally, ist
value is read and returned to the transaction.
Otherwise, the TM selects one copy of x and
requests ist copy to be returned.
Write. The TM coordinates the updating ofn xs
value at each site where it resides.
Commit. The TM coordinates the physical updating
of all databases that contain copies of each data
item for which a previous write was issued.
Abort. The TM makes sure that no effects of the
transaction are reflected in the DB.
In providing these services, a TM can
communicate with SCs and data processors at the
same or at different sites.