Title: Tutorial 2: Distributed Query Processing
1Tutorial 2 Distributed Query Processing
- INFS7907 Advanced Database SystemsSemester 1,
2004
2Overview
- Distribute database design
- Primary horizontal fragmentation
- Derived horizontal fragmentation
- Distribute query processing
- How does DDBMS process a query?
- Semi-join
3Q1- Primary horizontal fragmentation (Principle)
- What?
- Given a single table T, we can fragment it based
on Predicates (conditions) - Example Select from Student where
Department_idITEE - How? (Algorithms) Understand the process
- COM_MIN
- Get a set of single predicates which are complete
and minimal to the Application. - PHORIZONTAL
- Input A set of single predicates and table
- Output fragements
4Q1- Primary horizontal fragmentation (Question)
Consider S given below. It is the subject of
primary horizontal fragmentation.
Suppose that there is only 1 application that
accesses S. The application checks the salary
information and determines a raise accordingly.
Assume that employee records are managed in two
places, one handling the records of those with
salaries less than or equal to 30,000 and the
other handling the records of those who earn more
than 30,000. Therefore the query is issued at
two sites.
The simple predicates that would be used to
partition S are P1 SAL 30000 P2 SAL gt
30000 thus giving the initial set of simple
predicates Pr p1, p2. Question (a) How do
we check if Pr is complete and minimal? (b)
Apply the PHORIZONTAL algorithm.
5Q1- Primary horizontal fragmentation (Solution)
(a) Applying the COM_MIN algorithm can prove
that Pr is complete and minimal
(b) Applying the PHORIZONTAL algorithm minterm
predicates (every possible conjunction of simple
predicates) m1 (SAL 30000) ? (SAL gt
30000) m2 (SAL 30000) ? ?(SAL gt 30000) m3 ?
(SAL 30000) ? (SAL gt 30000) m4 ? (SAL
30000) ? ? (SAL gt 30000) Implications
(relationships between predicates) i1 (SAL
30000) ? ? (SAL gt 30000) i2 ? (SAL 30000) ?
(SAL gt 30000) i3 (SAL gt 30000) ? ? (SAL
30000) i4 ? (SAL gt 30000) ? (SAL
30000) Simplification m1 is contradictory
(according to i1), m4 is contradictory (due to
i2) Mm2, m3 P1, P2 Fragmentation Fs S1,
S2 according to M.
6Q2 Distributed query processing(Question)
A company has branches in Brisbane, Sydney and
Melbourne. Its company database consists of two
tables (global schema) branch(name, location,
manager) employee(name, office, salary,
home_address)
- The database is fragmented as the following
(local schema) - Brisbane site ?location Brisbane branch
(bb), ? name, office, home_address employee (e1) - Sydney site ?location Sydney branch (bs)
- Melbourne site ?location ? Brisbane and
location ? Sydney branch (bo), ? name, salary
employee (e2) - For the following query
- select e.name, salary
- from branch b, employee e
- where location Brisbane and e.name manager
- Give operator trees for the above query
- after query decomposition (i.e., on global
relations), - after localization, and
- after reduction.
7Q2 Distributed query processing(Principle
solution)
Localization Replace the global table by
fragments (fragments can be reconstructed).
Simplification The operator tree is simplified
based on the definition of fragments and the
structure of old tree.
Decomposition sql?RA represented by operator
tree Same as in centralized system
8Q3 - Semi-join (What)
- What is semi-join?
- r ?lt s?r(r ?? s)
- s ?ltr ?s(r ?? s)
- the schema of result table is always the same as
that of the left table - Is r ?lt s equal to s ?ltr ?
Semi-join can be regarded as a kind of selection!
Solutions a) r( A, B) 1 4
1 5
2 4
Let r(AB) and s(BCD) be two following
relations r( A, B) s(B C D) 1 4
4 5 0 1 5 4 7 8
2 4 5 0 1 2 6 5
2 1 3 7
9Q3 - Semi-join (Why How)
A simple approach ship r from site 1 to site
2 We prefer to only ship tuples that are useful
for the join. Semi-join are used to find those
tuples. Goal reduce the data transmission in the
networks
t1 ? B(s)
2 data unit
t2 r t1r s
6 data unit
Compute t2 s
c) Assume r is at site 1 and s is at site 2, and
a query r s has been issues at site 2. Give
steps for a query processing strategy using
semijoin, and check if the semijoin is a
beneficial option in this case (ignore local
processing cost).
step 1 site 2 sends t1 ? B(s) to site 1
step 2 site 1 sends t2 r t1 to site 2
step 3 site 2 returns t2 s to the
user.
r( A, B) s(B C D) 1 4 4
5 0 1 5 4 7 8 2 4
5 0 1 2 6 5 2 1
3 7
Data communication cost Semi-join 26 Ship r 10
Beneficial in this case but not always
10Q5 - Derived horizontal fragmentation(Principle)
- What
- Two tables
- owner and member
- owner referenced table
- member referencing tale
- Partition member based on the predicate of owner
- How
- SELECT member.all-attr FROM member, owner WHERE
member.join-attr owner.join-attr and mj - Or SELECT FROM member WHERE member.join-attr
IN (SELECT owner.join-attr FROM owner WHERE
mj)
member Student (sid, sname, dept_id) Dept(did,
dname, location) owner
Semi-join!
11Q5 - Derived horizontal fragmentation(Question
Solution)
- Write an SQL query to find the name and initial
diagnosis of patients in air-conditioned wards
Given the following database schema WARD
SELECT PNAME, INIT-DIAG FROM PATIENT, WARD WHERE
AIR-COND Y AND PATIENT.WARD WARD.NO
b) What is the minterm predicate for the primary
horizontal fragmentation of the WARD relation for
air-conditioned wards
PATIENT
P1 AIR-COND Y
c) How many fragments of WARD are generated from
applying the predicates P1 AIR-COND 'Y' P2
AIR-COND 'N'
2
12Q5 - Derived horizontal fragmentation(Question
Solution)
d) What is the relationship between WARD and
PATIENT, that is which is owner and which is
member
Given the following database schema WARD
WARD is the owner PATIENT is the member
e) What is the join predicate of the relations
WARD and PATIENT
WARD.NO PATIENT.WARD
f) Find the derived horizontal fragments of the
PATIENT relation on the basis of the predicates
used in (c)
PATIENT
Fragment 1 and 2 have the same schema as
PATIENT Please complete this question after
tutorial
13Q4 Semi-join (More exercises)
14Q6 Discussion on query execution strategy
15Thank you !!!