Title: ICS 214B: Transaction Processing and Distributed Data Management
1ICS 214B Transaction Processing and Distributed
Data Management
- Lecture 9 Fragmentation and Distributed Query
Processing - Professor Chen Li
2Which simple predicates should we use in Pr?
- Desired property of Pr
- - minimality
- - uniformity
3Return to example
- E(, NM, LOC, SAL,)
- Common queries
- Qa select Qb select
- from E from E
- where LOCSa where LOCSb
- and and ...
4Three choices
- (1) Pr F1 E
- (2) Pr LOCSa, LOCSb
- F2 ? locSa E, ? locSb E
- (3) Pr LOCSa, LOCSb, Sallt10
- F3 ?locSa ? sallt10 E, ?locSa ? sal?10 E,
- ?locSb ? sallt10E, ?locSb ? sal?10 E
5In other words
Qa Select loc Sa ...
LocSa ? sal lt 10
Qb Select loc Sb ...
LocSa ? sal ? 10
F3
F1
F2
LocSb ? sal lt 10
LocSb ? sal ? 10
6Informal definition
- Set of predicates Pr is uniform if
- for every Fi ? FPr, every t ? Fi has equal
probability of access by every major application. - Note FPr is fragmentation defined by minterm
predicates generated by Pr.
7Back to example
Qa Select loc Sa ...
LocSa ? sal lt 10
tuples here have higher probability of access
Qb Select loc Sb ...
LocSa ? sal ? 10
F1
LocSb ? sal lt 10
tuples here have lower probability of access
LocSb ? sal ? 10
so F1 is not good...
8Back to example
Qa Select loc Sa ...
LocSa ? sal lt 10
tuples here have same probability of access
Qb Select loc Sb ...
LocSa ? sal ? 10
F2
LocSb ? sal lt 10
so F2 is good...
so is F3 ...
LocSb ? sal ? 10
9Informal definition
- Set of predicates Pr is minimal if no
- Pr ? Pr is uniform
10Back to example
uniform?
- (1) Pr N
- (2) Pr LOCSa, LOCSb Y
- (3) Pr LOCSa, LOCSb, Sallt10 N
-
Pr(2) is a subset of Pr(3), so Pr(3) is not
minimal...
11Is Pr uniform and minimal a good thing?
- Not necessarily! But it does simplify allocation
problem...
12Derived horizontal fragmentation
- E(ENO, NAME, SAL, LOC)
-
- J(ENO, DESCRIPTION,)
E ? F E1, E2 by LOC
Common query Given an employee name, list
projects (s)he works in
13E2
E1
(at Sa)
(at Sb)
J
14E2
E1
(at Sa)
(at Sb)
J2
J1
15Derived horizontal fragmentation
- R, F F1, F2, ... Fn
- ?
- S, D D1, D2, Dn where Di S Fi
- Convention R is owner
- S is member
F could be primary or derived
16Checking completeness and disjointness of
derived fragmentation
Example Say J is
- ? But no 33 in E1 nor in E2!
This J tuple will not be in J1 nor
J2 ?Fragmentation not complete
17- To get completeness Need to enforce
- referential integrity constraint
- join attr() of member relation
- ?
- join attr() of owner relation
18Example
E2
E1
Fragmentation is not disjoint!
J
J1
J2
19- To get disjointness Join attribute() should be
key of owner relation
20Summary horizontal fragmentation
- Type primary, derived
- Properties completeness, disjointness
- Predicates minimal, uniform
21Vertical fragmentation
Example
E
E2
E1
22- RT ? R1T1 Ti ? T
-
- RnTn
- Just like normalization of relations
-
...
23Properties RT ? RiTi
all i
24- (2) Disjointness
- Ti ? Tj ? for all i,j i?j
- E(,LOC,SAL)
E1(,LOC)
E2(SAL)
25- (3) Reconstruction Lossless join
- Ri R
-
all i
? One way to achieve lossless join Repeat
key in all fragments, i.e., Key ? Ti for all i
26Hybrid Fragmentation
R
Horizontal
R2
R1
Vertical
R22
R12
R21
R11
27Hybrid Fragmentation -- Reconstruction
U
Horizontal
Vertical
R22
R12
R21
R11
28Allocation
- Example E(,NM,LOC,SAL) ?
- F1 ?locSa E F2 ?locSb E
- Qa select where locSa...
- Qb select where locSb
Where do F1,F2 go?
Site b
Site a
?
29Issues
- Where do queries originate?
- What is communication cost? and size of
answers, relations, - What is storage capacity, cost at sites? and size
of fragments? - What is processing power at sites?
- What is query processing strategy?
- How are joins done?
- Where are answers collected?
30Do we replicate fragments?
- Cost of updating copies?
- Writes and concurrency control?
- ...
31Optimization problem
- What is best placement of fragments and/or best
number of copies to - minimize query response time
- maximize throughput
- minimize some cost
- ...
- Subject to constraints?
- Available storage
- Available bandwidth, power,
- Keep 90 of response time below X
- ...
- Often, can use common sense
- Place fragments where they are most heavily
accessed
32Summary
- Horizontal and vertical fragmentation
- Designing good fragmentations and allocation
- Next
- Query processing in distributed databases
33Query
Query Plan
34Decomposition
- Same as in centralized system
- Normalization
- Eliminating redundancy
- Algebraic rewriting
35Normalization
- Convert from query language to relational algebra
36Example
- SELECT R.A, S.D
- FROM R, S
- WHERE (R.B1 and S.C2) and (R.A S.A)
37Eliminate redundancy
- E.g. in conditions
- (S.A1) ? (S.Agt5) ? False
- (S.Alt10) ? (S.Alt5) ? S.Alt5
38E.g. Common sub-expressions
-
- U U
-
- S ?cond ?cond T S ?cond T
- R R R
39Algebraic rewriting
- E.g. Push conditions down
- ?cond3
- ?cond
- ?cond1 ?cond2
- R S R S
40Query
41Localization steps
- (1) Start with query tree
- (2) Replace relations by fragments
- (3) Push ? up
- ?,? down
- (4) Simplify eliminate unnecessary operations
42Notation for fragment
- R cond
- fragment conditions its tuples
satisfy
43Example A
44- (2) ?E3
- ?
- R1 E lt 10 R2 E ? 10
45- (3) ?
- ?E3 ?E3
- R1 E lt 10 R2 E ? 10
46 47Rule 1
- ?C1R c2 ? ?C1R c1 ? c2
- R False ? Ø
A
B
48In example A
- ?E3R2 E?10 ? ?E3 R2 E3 ? E?10
- ? ?E3 R2 False
- ? Ø
49Example B
- (1) Acommon
- attribute
- R S
A
50R1 Alt5 R2 5 ? A ? 10 R3 Agt10
S1 Alt5 S2 A ? 5
51- (3) ?
- R1Alt5S1Alt5 R1Alt5S2A?5
R25?A?10S1Alt5 - R25?A?10S2A?5 R3Agt10S1Alt5
R3Agt10S2A?5
52- (4) ?
- R1Alt5S1Alt5 R25?A?10S2A?5
R3Agt10S2A?5
53Rule 2
- R C1 S C2 ?
- R S C1 ? C2 ? R.A S.A
54In step 4 of Example B
- R1 Alt5 S2 A ? 5
- ?R1 S2 R1.A lt 5 ? S2.A ? 5 ?
R1.A S2.A - ?R1 S2 False ? Ø
55Localization with derived fragmentation
- Example C
- (2)
- ? ?
- R1Alt10R2?10 S1KR.K S2KR.K
- ?R.Alt10 ?R.A?10
56- (3) ?
- R1S1 R1S2 R2S1 R2S2
57 58In step 3 of Example C
- R1Alt10 S2KR.K ? R.A?10
- ? R1 S2 R1.Alt10 ? S2.KR.K ?
R.A?10 ? R1.K S2.K - ? R1 S2False (K is key of R, R1)
- ? Ø
59Localization with vertical fragmentation
- Example D
- (1) ?A R1(K, A, B)
- R R2(K, C, D)
60- (2) ?A
- R1 R2
- (K, A, B) (K, C, D)
61- (3) ?A
- ?K,A ?K,A
- R1 R2
- (K, A, B) (K, C, D)
not really needed
62 63Rule 3
- Given vertical fragmentation of R
- Ri ?Ai (R), Ai ? A
- Then for any B ? A
- ?B (R) ?B Ri B ? Ai ? Ø
64Localization with hybrid fragmentation
- Example E
- R1 ?klt5 ?k,A R
- R2 ?k?5 ?k,A R
- R3 ?k,B R
65 66 67Distributed Query Processing
- Decomposition ?
- Localization ?
- Optimization ? next