Title: CS 347: Distributed Databases and Transaction Processing Notes03: Query Processing
1CS 347 Distributed Databases and Transaction
ProcessingNotes03 Query Processing
2Query Processing
- Decomposition
- Localization
- Optimization
3Decomposition
- Same as in centralized system
- Normalization
- Eliminating redundancy
- Algebraic rewriting
4Normalization
- Convert from general language to a standard
form (e.g., Relational Algebra)
5Example
- Select A,C
- From R,S
- Where (R.B1 and S.D2) or (R.Cgt3 and S.D.2)
- ? (R.B1 v R.Cgt3) ? S.D.2
- R S
-
?
A, C
Conjunctive normal form
6Also Detect invalid expressions
- E.g. Select from R where R.A 3 ? R does
not have A attribute
7Eliminate redundancy
- E.g. in conditions
- (S.A1) ? (S.Agt5) ? False
- (S.Alt10) ? (S.Alt5) ? S.Alt5
8E.g. Common sub-expressions
-
- U U
-
- S ?cond ?cond T S ?cond T
- R R R
9Algebraic rewriting
- E.g. Push conditions down
- ?cond3
- ?cond
- ?cond1 ?cond2
- R S R S
10- After decomposition
- One or more algebraic query trees on relations
- Localization
- Replace relations by corresponding
- fragments
11Localization steps
- (1) Start with query
- (2) Replace relations by fragments
- (3) Push ? up (use CS245 rules)
- ?,? down
- (4) Simplify eliminate unnecessary operations
12Notation for fragment
- R cond
- fragment conditions its tuples
satisfy
13Example A
14- (2) ?E3
- ?
- R1 E lt 10 R2 E ? 10
15- (3) ?
- ?E3 ?E3
- R1 E lt 10 R2 E ? 10
16- (3) ?
- ?E3 ?E3
- R1 E lt 10 R2 E ? 10
17 18Rule 1
- ?C1R c2 ? ?C1R c1 ? c2
- R False ? Ø
A
B
19In example A
- ?E3R2 E?10 ? ?E3 R2 E3 ? E?10
- ? ?E3 R2 False
- ? Ø
20Example B
- (1) Acommon
- attribute
- R S
A
21R1 Alt5 R2 5 ? A ? 10 R3 Agt10
S1 Alt5 S2 A ? 5
22- (3) ?
- R1Alt5S1Alt5 R1Alt5S2A?5
R25?A?10S1Alt5 - R25?A?10S2A?5 R3Agt10S1Alt5
R3Agt10S2A?5
23- (3) ?
- R1Alt5S1Alt5 R1Alt5S2A?5
R25?A?10S1Alt5 - R25?A?10S2A?5 R3Agt10S1Alt5
R3Agt10S2A?5
24- (4) ?
- R1Alt5S1Alt5 R25?A?10S2A?5
R3Agt10S2A?5
25Rule 2
- R C1 S C2 ?
- R S C1 ? C2 ? R.A S.A
26? In step 4 of Example B
- R1 Alt5 S2 A ? 5
- ?R1 S2 R1.A lt 5 ? S2.A ? 5 ?
R1.A S2.A - ?R1 S2 False ? Ø
27Localization with derived fragmentation
- Example C
- (2)
- ? ?
- R1 R2 S1KR.K S2KR.K
- Alt10 A ?10 ?R.Alt10 ?R.A?10
28- (3) ?
- R1S1 R1S2 R2S1 R2S2
29- (4) ?
- R1Alt10 S1KR.K R2A?10 S2KR.K
- ? R.Alt10 ? R.A?10
30? In step 4 of Example C
- R1Alt10 S2KR.K ? R.A?10
- ? R1 S2 R1.Alt10 ? S2.KR.K ?
R.A?10 ? R1.K S2.K - ? R1 S2False (K is key of R, R1)
- ? Ø
31- (4) ?
- R1Alt10 S1KR.K R2A?10 S2KR.K
- ? R.Alt10 ? R.A?10
(4) simplified more ?
S2
R1
R2
S1
32 Localization with vertical fragmentation
- Example D
- (1) ?A R1(K, A, B)
- R R2(K, C, D)
33- (2) ?A
- R1 R2
- (K, A, B) (K, C, D)
34- (3) ?A
- ?K,A ?K,A
- R1 R2
- (K, A, B) (K, C, D)
not really needed
35 36Rule 3
- Given vertical fragmentation of R
- Ri ?Ai (R), Ai ? A
- Then for any B ? A
- ?B (R) ?B Ri B ? Ai ? Ø
37Localization with hybrid fragmentation
- Example E
- R1 ?klt5 ?k,A R
- R2 ?k?5 ?k,A R
- R3 ?k,B R
38 39 40Summary - Query Processing
- Decomposition ?
- Localization ?
- Optimization
- Overview
- Tricks for joins other operations
- Strategies for optimization
41Optimization Process
- Generate query
- plans
- Estimate size of
- intermediate results
- Estimate cost of
- plan (,time,)
P2
P3
Pn
P1
Cn
C3
C2
C1
pick minimum
42Differences with centralized optimization
- New strategies for some operations (semi-join,ran
ge-partitioning, sort,) - Many ways to assign and schedule processors
43Parallel/distributed sort
- Input (a) relation R on single site/disk
- (b) R fragmented/partitioned by
- sort attribute
- (c) R fragmented/partitioned by other
attribute
44- Output (a) sorted R on single site/disk
- (b) fragments/partitions sorted
- F1 F2 F3
19
...
5
...
12
...
20
6
...
15
21
10
50
45Basic sort
- R(K,), sort on K
- Fragmented on K Vector ko, k1, kn
k1
ko
7
27
11
10
20
3
22
17
14
46- Algorithm each fragment sorted independently
- If necessary, ship results
47? Same idea on different architectures
- Shared nothing
- Shared memory sorts F1 sorts F2
P1
P2
Net
F1
F2
M
M
P1
P2
F2
M
F1
48Range partitioning sort
- R(K,.), sort on K
- R located at one or more site/disk, not
fragmented on K
49- Algorithm
- (a) Range partition on K
- (b) Basic sort
- Ra
- Rb
Local sort
R1
R1
ko
Result
Local sort
R2
R2
k1
Local sort
R3
R3
50 Selecting a good partition vector
10
...
7
...
31
...
12
52
8
4
11
15
14
11
32
17
51Example
- Each site sends to coordinator
- Min sort key
- Max sort key
- Number of tuples
- Coordinator computes vector and distributes to
sites - (also decides of sites for local sorts)
52 Sample scenario
- Coordinator receives
- SA Min5 Max10 10 tuples
- SB Min7 Max17 10 tuples
53 Sample scenario
- Coordinator receives
- SA Min5 Max10 10 tuples
- SB Min7 Max17 10 tuples
54(No Transcript)
55- Expected tuples Total tuples
- with key lt ko 2
- 2(ko - 5) (ko - 7) 10
- 3ko 10 10 7 27
- ko 9
56Variations
- Send more info to coordinator
- Partition vector for local site
- Eg. Sa 3 3 3 tuples
- 5 6 8 10 local vector
- - Histogram
- 5 6 7 8 9 10
57? More than one round
- E.g. (1) Sites send range and tuples
- (2) Coordinator returns preliminary vector
Vo - (3) Sites tell coordinator how many tuples in
each Vo range - (4) Coordinator computes final vector Vf
58? Can you come up with a distributed
algorithm?
59Parallel external sort-merge
- Same as range-partition sort, except sort first
Local sort
Ra
R1
ko
Result
Rb
Local sort
R2
k1
In order
R3
Merge
60Parallel external sort-merge
- Same as range-partition sort, except sort first
Local sort
Ra
R1
ko
Result
Rb
Local sort
R2
k1
In order
Note can use merging network if
available(e.g., Teradata)
R3
Merge
61 Parallel/distributed Join
- Input Relations R, S
- May or may not be partitioned
- Output R S
- Result at one or more sites
62Partitioned Join (Equi-join)
Local join
Ra
Rb
f(A)
Result
f(A)
63Notes
- Same partition function f is used for both R and
S (applied to join attribute) - f can be range or hash partitioning
- Local join can be of any type (use any CS245
optimization) - Various scheduling options e.g., (a) partition
R partition S join (b) partition R build
local hash table for R partition S and join
64More notes
- We already know why part-join works
-
-
- R1 R2 R3 S1 S2 S3 R1 S1 R2 S2 R3
S3 - Useful to give this type of join a name, because
we may want to partition data to make
partition-join possible - (especially in parallel DB system)
?
?
?
?
65Even more notes
- Selecting good partition function f very
important - Number of fragments
- Hash function
- Partition vector
66- Good partition vector
- Goal Ri Si the same
- Can use coordinator to select
67Asymmetric fragment replicate join
Local join
Ra
Sa
Rb
Sb
f partition
Result
union
68Notes
- Can use any partition function f for R
- (even round robin)
- Can do any join not just equi-join e.g. R
S - R.A lt S.B
69General fragment and replicate join
- f1
- partition n copies of each fragment
- -gt 3 fragments
Ra
R1
R1
R2
R2
Rb
R3
R3
70? S is partitioned in similar fashion
R1
S1
R1
S2
R2
S1
All nxm pairings of R,S fragments
R3
S1
R3
S2
71Notes
- Asymmetric FR join is special case of general
FR - Asymmetric FR may be good if S small
- Works for non-equi-joins
72 Semi-join
- Goal reduce communication traffic
- R S ? (R S) S or
- R (S R) or
- (R S) (S R)
73Example R S
2 a
3 x
10 b
10 y
25 c
15 z
30 d
25 w
32 x
74Example R S
2 a
3 x
10 b
10 y
25 c
15 z
30 d
25 w
32 x
75Example R S
2 a
3 x
10 b
10 y
25 c
15 z
30 d
25 w
32 x
76 2 a
3 x
10 b
10 y
25 c
15 z
30 d
25 w
32 x
- Computing transmitted data in example
- with semi-join R (S R)
- T 4 A 2 AC result
- with join R S
- T 4 AB result
77 2 a
3 x
10 b
10 y
25 c
15 z
30 d
25 w
32 x
- Computing transmitted data in example
- with semi-join R (S R)
- T 4 A 2 AC result
- with join R S
- T 4 AB result
78? In general
- Say R is smaller relation
- (R S) S better than R S if
- size (?A S) size (R S) lt size (R)
79- Similar comparisons for other semi-joins
- Remember only taking into account transmission
cost
80- Trick
- Encode ?A S (or ?A R ) as a bit vector
- key in S
- lt----one bit/possible key-------gt
0 0 1 1 0 1 0 0 0 0 1 0 1 0 0
81? Three way joins with semi-joins
82? Three way joins with semi-joins
83? Three way joins with semi-joins
84- ? Many options!
- ? Number of semi-join options is exponential in
of relations in join
85Privacy Preserving Join
- Site 1 has R(A,B)
- Site 2 has S(A,C)
- Want to compute R S
- Site 1 should NOT discover any Sinfo not in the
join - Site 2 should NOT discover any Rinfo not in the
join
R
S
site 1
site 2
86Semi-Join Does Not Work
- If Site 1 sends ?A R to Site 2,site 2 leans all
keys of R!
?A R (a1, a2, a3, a4)
87Fix Send hashed keys
- Site 1 hashes each value of A before sending
- Site 2 hashes (same function) its own A values to
see what tuples match
?A R (h(a1), h(a2),h(a3), h(a4))
Site 2 sees it has h(a1),h(a3)
(a1, c1), (a3, c3)
88What is problem?
?A R (h(a1), h(a2),h(a3), h(a4))
Site 2 sees it has h(a1),h(a3)
(a1, c1), (a3, c3)
89What is problem?
?A R (h(a1), h(a2),h(a3), h(a4))
Site 2 sees it has h(a1),h(a3)
(a1, c1), (a3, c3)
- Dictionary attack!Site 2 takes all keys, a1, a2,
a3... andchecks if h(a1), h(a2), h(a3) matches
what Site 1 sent...
90Adversary Model
- Honest but Curious
- dictionary attack is possible (cheating is
internal and cant be caught) - sending incorrect keys not possible (cheater
could be caught)
91One Solution (Agrawal et al)
- Use commutative encryption function
- Ei(x) x encryption using site i private key
- E1( E2 (x)) E2( E1 (X))
- Shorthand for exampleE1(x) is xE2(x) is
xE1(E2(x)) is x
92Solution
(a1, a2, a3, a4)
(a1, a2, a3, a4)
(a1, a3, a5, a7)
computes (a1, a3, a5, a7), intersects with
(a1, b1), (a3, b3)
93Why does this solution work?
94Other Privacy Preserving Operations?
- Inequality join R S
- Similarity Join R S
95Other parallel operations
- Duplicate elimination
- Sort first (in parallel) then eliminate
duplicates in result - Partition tuples (range or hash) and eliminate
locally - Aggregates
- Partition by grouping attributes compute
aggregate locally
96Example
Ra
Rb
97Example
Ra
Rb
98Example
Ra
Rb
99Example
less data!
Ra
Rb
100Example
less data!
Ra
Rb
101Example
less data!
Ra
Rb
102Preview Map Reduce
data A1
data B1
data C1
data A2
data B2
data C2
data A3
103? Enhancements for aggregates
- Perform aggregate during partition to reduce
data transmitted - Does not work for all aggregate functions
- Which ones?
104? Selection
- Range or hash partition
- Straightforward
- ? But what about indexes?
105? Indexing
- Can think of partition vector as root of
distributed index
ko k1
Local indexes
Site 1
Site 2
Site 3
106 Index on non-partition attribute
ko k1
107Notes
- If index is not too big, it may bebetter to keep
whole and make copies... - If updates are frequent,can partition update
work...(Question how do we handle split of
B-Tree pages?)
108- Extensible or linear hashing
- R1
- f R2
- R3
- R4 lt- add
109- How do we adapt schemes?
- Where do we store directory,set of
participants...? - Which one is better for a distributed
environment? - Can we design a hashing scheme withno global
knowledge (P2P)?
110Summary Query processing
- Decomposition and Localization ?
- Optimization
- Overview ?
- Tricks for joins, sort,.. ?
- Tricks for inter-operations parallelism
- Strategies for optimization
111Inter-operation parallelism
112Pipelined parallelism
result
Site 1
Join
Probe
Tuples matching ?c
R
S
113Independent parallelism
- R S T V
- (1) temp1 ? R S temp2 ? T V
- (2) result ? temp1 temp2
Site 2
Site 1
114- Pipelining cannot be used in all cases
- e.g. Hash Join
Stream of R tuples
Stream of S tuples
115Summary
- As we consider query plans for optimization, we
must consider various tricks - - for individual operations
- - for scheduling operations