Title: Lecture 7: Query Execution: Efficient Join Processing
1Lecture 7 Query Execution Efficient Join
Processing
Sept. 14, 2007 ChengXiang Zhai
Most slides are adapted from Jun Yangs and AnHai
Doans lectures
2DBMS Architecture
User/Web Forms/Applications/DBA
query
transaction
Query Parser
Transaction Manager
Query Rewriter
Logging Recovery
Query Optimizer
Lock Manager
Query Executor
Files Access Methods
Lock Tables
Buffers
Buffer Manager
Main Memory
Storage Manager
Storage
3Query Execution Plans
?
buyer,item
SELECT P.buyer, P.item FROM Purchase P,
Person Q WHERE P.buyerQ.name AND
Q.cityurbana AND Q.age lt 20
?
Cityurbana ? age lt 20
buyername
Person
- Query Plan
- logical plan (declarative)
- physical plan (procedural)
- procedural implementation of each logical
operator - scheduling of operations
Purchase
4Logical v.s. Physical Operators
- Logical operators what they do
- e.g., union, selection, project, join, grouping
- Physical operators how they do it
- e.g., nested loop join, sort-merge join, hash
join, index join - Other differences and relations between them?
- How should we design the physical operators?
5How do We Combine Operations?
- The iterator model Each operation is implemented
by 3 functions - Open sets up the data structures and performs
initializations - GetNext returns the the next tuple of the
result. - Close ends the operations. Cleans up the data
structures. - Enables pipelining!
- Contrast with data-driven materialize model.
- Can be generalized to object-oriented DB
6Typical Physical Operators
- Table Scan
- Sorting While Scanning Tables
- Various kinds of algorithms for implementing the
relational operations - Set union, intersection,
- Selection
- Projection
- Join
Join is most important! (why?)
7Notation
- Relations R, S
- Tuples r, s
- Number of tuples R, S
- Number of disk blocks B(R), B(S)
- Number of memory blocks available M
- Cost metric
- Number of I/Os
- Memory requirement
8Internal-Memory Join Solutions
- Nested-loop join check for every record in R and
every record in S costO(RS) - Sort-merge join sort R, S followed by merging
costO(SlogS) (if RltS) - Hash join build a hash table for R for every
record in S, probe the hash table cost O(S)
9External-Memory Solutions Issues to Consider
- Disk / buffer manager
- Cost model (I/O based)
- Indexing
- Sort / hash
10Nested-Loop Join
R
S
p
- For each block of R, and for each r in the block
- For each block of S, and for each s in the block
- Output rs if p evaluates to true over r and s
- R is called the outer table S is called the
inner table - I/Os B(R) R B(S)
- Memory requirement 4 (double buffering)
- More reasonable block-based nested-loop join
- - For each block of R, and for each block of S
- For each r in the R block, and for each s
in the S block - I/Os B(R) B(R) B(S)
- Memory requirement same as before
11Index Nested-Loop Join
R
S
R.AS.B
- Idea use the value of R.A to probe the index on
S(B) - For each block of R, and for each r in the block
- Use the index on S(B) to retrieve s with s.B
r.A - Output rs
- I/Os B(R) R (index lookup)
- Typically, the cost of an index lookup is 2-4
I/Os - Beats other join methods if R is not too big
- Better pick R to be the smaller relation
- Memory requirement 2
12Improvements of Nested-Loop Join
- Stop early
- If the key of the inner table is being matched
- May reduce half of the I/Os (less for
block-based) - Make use of available memory
- Stuff memory with as much of R as possible,
stream S by, and join every S tuple with all R
tuples in memory - I/Os B(R) B(R) / (M 2 ) B(S) (roughly,
B(R) B(S) / M) - Memory requirement M (as much as possible)
13Zig-Zag Join Using Ordered Indexes
R
S
R.AS.B
- Idea use the ordering provided by the indexes on
R(A) and S(B) to eliminate the sorting step of
sort-merge join - Trick use the larger key to probe the other
index - Possibly skipping many keys that do not match
14External Merge Sort
- Problem sort R, but R does not fit in memory
- Pass 0 read M blocks of R at a time, sort them,
and write out a level-0 run - There are B(R) / M level-0 sorted runs
- Pass i merge (M 1) level-(i-1) runs at a time,
and write out a level-i run - (M 1) memory blocks for input, 1 to buffer
output - of level-i runs of level-(i1) runs / (M
1) - Final pass produces 1 sorted run
15Example of External Merge Sort
- Input 1, 7, 4, 5, 2, 8, 9, 6, 3, 0
- Each block holds one number, and memory has 3
blocks - Pass 0
- 1, 7, 4 -gt1, 4, 7
- 5, 2, 8 -gt 2, 5, 8
- 9, 6, 3 -gt 3, 6, 9
- 0 -gt 0
- Pass 1
- 1, 4, 7 2, 5, 8 -gt 1, 2, 4, 5, 7, 8
- 3, 6, 9 0 -gt 0, 3, 6, 9
- Pass 2 (final)
- 1, 2, 4, 5, 7, 8 0, 3, 6, 9 -gt 0, 1, 2, 3, 4,
5, 6, 7, 8, 9
16Performance of external merge sort
- Number of passes
- I/Os
- Multiply by 2 B(R) each pass reads the entire
relation once and writes it once - Subtract B(R) for the final pass
- Roughly, this is O( B(R) log M B(R) )
- Memory requirement M (as much as possible)
17Some Tricks for Sorting
- Double buffering
- Allocate an additional block for each run
- Trade-off smaller fan-in (more passes)
- Blocked I/O
- Instead of reading/writing one disk block at a
time, read/write a bunch (cluster) - Trade-off more sequential I/Os lt-gt smaller
fan-in (more passes)
18Sort-Merge Join
R
S
R.AS.B
- Sort R and S by their join attributes, and then
merge - r, s the first tuples in sorted R and S
- Repeat until one of R and S is exhausted
- If r.A gt s.B then s next tuple in S
- else if r.A lt s.B then r next tuple in R
- else output all matching tuples, and
- r, s next in R and S
- I/Os sorting 2 B(R) 2 B(S)
- In most cases (e.g., join of key and foreign key)
- Worst case is B(R) B(S) everything joins
19Optimization of Sort-Merge Join
- Idea combine join with the merge phase of merge
sort - Sort produce sorted runs of size M for R and S
- Merge and join merge the runs of R, merge the
runs of S, and merge-join the result streams as
they are generated!
Merge
R
Join
Merge
S
Disk
Memory
20Performance of the Two-Pass SMJ
- I/Os 3 (B(R) B(S))
- Memory requirement To be able to merge in one
pass, we should have enough memory to accommodate
one block from each run - M gtB(R) / M B(S) / M
- M gt sqrt(B(R) B(S))
-
21Other Sort-Based Algorithms
- Union (set), difference, intersection
- More or less like SMJ
- Duplication elimination
- External merge sort
- Eliminate duplicates in sort and merge
- GROUP BY and aggregation
- External merge sort
- Produce partial aggregate values in each run
- Combine partial aggregate values during merge
22Partitioned Hash Join
R
S
R.AS.B
- Main idea
- Partition R and S by hashing their join
attributes, and then consider corresponding
partitions of R and S - If r.A and s.B get hashed to different
partitions, they dont join - Hash join vs. Nested-loop join
- Nested-loop join considers all slots
- Hash join considers only those along the diagonal
1 2 3 4 5
R
S
1 2 3 4 5
23GRACE Hash Join
- Partition R into M buckets so that each bucket
fits in memory - Partition S into M buckets
- for each bucket j do
- for each record r in Rj do
- insert into a hash table
- for each record s in Sj do
- probe the hash table.
- Note the asymmetry how do we choose R S?
- I/Os 3(B(R)B(S))
- Memory M gt sqrt(min(B(R) B(S)))
- Improvement?
Reasonable when memory is small
24Hybrid Hash Join
- Hybrid of simple hash join and GRACE
- When partitioning R, keep the records of the
first bucket in memory as a hash table - When partitioning S, for records of the first
bucket, probe the hash table directly - Saving no need to write R1 and S1 to disk or
read them back to memory. - I/Os (3-2M/B(R))(B(R)B(S))
- Memory M gt sqrt(min(B(R) B(S)))
Works well for large and small memory
25Handle Partition Overflow
- Case 1, overflow on disk an R partition is
larger than memory size (note dont care about
the size of S partitions) - Solution recursive partition.
- Case 2, overflow in memory the in-memory hash
table of R becomes too large. - Solution revise the partitioning scheme and keep
a smaller partition in memory.
26Memory Management Issues
- All real memory strategy how to optimally
allocate memory? - One process
- Multiple processes
- Out of memory?
- All virtual memory non-optimal replacement (LRU)
- Hot set virtual memory
27Hash join versus SMJ
- (Assuming two-pass)
- I/Os same
- Memory requirement hash join is lower
- sqrt(min(B(R), B(S)) lt sqrt(B(R) B(S))
- Hash join wins big when two relations have very
different sizes - Other factors
- Hash join performance depends on the quality of
hashing - Might not get evenly sized buckets
- SMJ can be adapted for inequality join predicates
- SMJ wins if R and/or S are already sorted
- SMJ wins if the result needs to be in sorted
order
28Duality of Sort and Hash
- Divide-and-conquer paradigm
- Sorting physical division, logical combination
- Hashing logical division, physical combination
- Handling very large inputs
- Sorting multi-level merge
- Hashing recursive partitioning
- I/O patterns sequential read/write?
29What You Should Know
- How the major join processing algorithms work
(Nested-loop join, Sort merge join, Index-based
join, and Hash join) - How to compute their IO and memory overhead
- Know the relative advantages and disadvantages
and suitable scenarios for each method
30Carry Away Messages
- Old conclusions/assumptions regularly need to be
re-examined because of the change in the world - System R abandoned hash join, but the
availability of large main memories changed the
story - Observe changes in the world and question some
relevant assumptions - In general, changes in the world bring
opportunities for innovations, so be alert about
any changes - How have the needs for data management changed
over time? - Have the technologies for data management been
tracking such changes? - Can you predict what will be our future needs?