Title: Query Execution
1Query Execution
- Zachary G. Ives
- University of Pennsylvania
- CIS 650 Implementing Data Management Systems
- January 24, 2005
Content on hashing and sorting courtesy
Ramakrishnan Gehrke
2Todays Trivia Question
3Query Execution
- What are the goals?
- Logical vs. physical plans what are the
differences? - Some considerations in building execution
engines - Efficiency minimize copying, comparisons
- Scheduling make standard code-paths fast
- Data layout how to optimize cache behavior,
buffer management, distributed execution, etc.
4Execution System Architectures
- Central vs. distributed vs. parallel vs. mediator
- Data partitioning vertical vs. horizontal
- Monet model binary relations
- Distributed data placement
- One operation at a time INGRES
- Pipelined
- Iterator-driven
- Dataflow-driven
- Hybrid approaches
5Execution Strategy Issues
- Granularity parallelism
- Pipelining vs. blocking
- Materialization
JoinPressRel.Symbol EastCoast.CoSymbol
JoinPressRel.Symbol Clients.Symbol
ProjectCoSymbol
SelectClient Atkins
Scan PressRel
ScanEastCoast
Scan Clients
6Iterator-Based Query Execution
- Execution begins at root
- open, next, close
- Propagate calls to children
- May call multiple child nexts
- Synchronous pipelining
- Minimize copies
- Efficient scheduling resource usage
- Can you think of alternatives and their benefits?
JoinPressRel.Symbol EastCoast.CoSymbol
JoinPressRel.Symbol Clients.Symbol
ProjectCoSymbol
SelectClient Atkins
Scan PressRel
Scan Clients
ScanEastCoast
7The Simplest Method
- Iteration over tables
- Sequential scan
- Nested loops join
- Whats the cost? What tricks might we use to
speed it up? - Optimizations
- Double-buffering
- Overlap I/O and computation
- Prefetch a page into a shadow block while CPU
processes different block - Requires second buffer to prefetch into
- Switch to that when the CPU is finished with the
alternate buffer - Alternate the direction of reads in file scan
8Speeding Operations over Data
- Three general data organization techniques
- Indexing
- Associative lookup synopses
- Sorting
- Hashing
9Indices
- GiST and B Trees
- Alternatives for storage
- ltkey, recordgt ltkey, ridgt ltkey, ridsgt
- Clustered vs. unclustered
- Bitmapped index bit position for each value in
the domain - Requires a domain with discrete values (not
necessarily ordinal) - Booleans enumerations range-bounded integers
- Low-update data
- Efficient for AND, OR only expressions between
different predicates
10Usefulness of Indices
- Where are these structures most useful?
- Sargable predicates
- Covering indices
- In many cases, only help with part of the story
- Filter part of the answer set, but we still need
further computation - e.g., AND or OR of two predicates
- General rule of thumb
- Unclustered index only useful if selectivity is lt
10-20
11Sorting External Binary Sort
- Divide and conquer sort into subfiles and merge
- Each pass we read write every page
- If N pages in the file, we need dlog2(N)e 1
- passes to sort the data, yielding a cost of
- 2Ndlog2(N)e 1
Input file
3,4
6,2
9,4
8,7
5,6
3,1
2
PASS 0
1-page runs
1,3
2
3,4
5,6
2,6
4,9
7,8
PASS 1
4,7
1,3
2,3
2-page runs
8,9
5,6
2
4,6
PASS 2
2,3
4,4
1,2
4-page runs
6,7
3,5
6
8,9
PASS 3
1,2
2,3
3,4
8-page runs
4,5
6,6
7,8
9
12General External Merge Sort
- How can we utilize more than 3 buffer pages?
- To sort a file with N pages using B buffer pages
- Pass 0 use B buffer pages. Produce dN / Be
sorted runs of B pages each - Pass 2, , etc. merge B-1 runs
INPUT 1
. . .
. . .
INPUT 2
. . .
OUTPUT
INPUT B-1
Disk
Disk
B Main memory buffers
13Cost of External Merge Sort
- Number of passes 1dlogB-1 dN / Bee
- Cost 2N ( of passes)
- With 5 buffer pages, to sort 108 page file
- Pass 0 d108/5e 22 sorted runs of 5 pages each
(last run is only 3 pages) - Pass 1 d22/4e 6 sorted runs of 20 pages each
(final run only uses 8 pages) - Pass 2 d6/4e 2 sorted runs, 80 pages and 28
pages - Pass 3 Sorted file of 108 pages
14Applicability of Sort Techniques
- Join
- Intersection
- Aggregation
- Duplicate removal as an instance of aggregation
- XML nesting as an instance of aggregation
15Merge Join
- Requires data sorted by join attributes
- Merge and join sorted files, reading sequentially
a block at a time - Maintain two file pointers
- While tuple at R lt tuple at S, advance R (and
vice versa) - While tuples match, output all possible pairings
- Maintain a last in sequence pointer
- Preserves sorted order of outer relation
- Cost b(R) b(S) plus sort costs, if
necessaryIn practice, approximately linear, 3
(b(R) b(S))
16Hashing
- Several types of hashing
- Static hashing
- Extensible hashing
- Consistent hashing (used in P2P well see later)
17Static Hashing
- Fixed number of buckets (and pages) overflow
when necessary - h(k) mod N bucket to which data entry with key
k belongs - Downside long overflow chains
0
h(key) mod N
2
key
h
N-1
Primary bucket pages
Overflow pages
18Extendible Hashing
- If a bucket becomes full split in half
- Use directory of pointers to buckets, double the
directory, splitting just the bucket that
overflowed - Directory much smaller than file, so doubling it
is much cheaper - Only one page of data entries is split
- Trick lies in how hash function is adjusted!
19Example
2
LOCAL DEPTH
Bucket A
16
4
12
32
GLOBAL DEPTH
2
2
Bucket B
13
00
1
21
5
- Directory is array of size 4.
- For rs bucket, take last global depth bits
of h(r) we denote r by h(r) - If h(r) 5 binary 101, it is in bucket
pointed to by 01
01
2
10
Bucket C
10
11
2
DIRECTORY
Bucket D
15
7
19
DATA PAGES
- Insert If bucket is full, split it (allocate
new page, re-distribute)
- If necessary, double directory. (As we will
see, splitting a - bucket does not always require doubling we
can tell by - comparing global depth with local depth for
the split bucket.)
20Insert h(r)20 (Causes Doubling)
2
LOCAL DEPTH
3
LOCAL DEPTH
Bucket A
16
32
GLOBAL DEPTH
32
16
Bucket A
GLOBAL DEPTH
2
2
2
3
Bucket B
1
5
21
13
00
1
5
21
13
000
Bucket B
01
001
2
10
2
010
Bucket C
10
11
10
Bucket C
011
100
2
2
DIRECTORY
101
Bucket D
15
7
19
15
19
7
Bucket D
110
111
2
3
Bucket A2
20
4
12
DIRECTORY
20
12
Bucket A2
4
(split image'
of Bucket A)
(split image'
of Bucket A)
21Points to Note
- 20 binary 10100
- Last 2 bits (00) ? r belongs in A or A2
- Last 3 bits needed to tell which
- Global depth of directory Max of bits needed
to tell which bucket an entry belongs to - Local depth of a bucket of bits used to
determine if an entry belongs to this bucket - When does bucket split cause directory doubling?
- Before insert, local depth of bucket global
depth - Insert causes local depth to become gt global
depth directory is doubled by copying it over
and fixing pointer to split image page - (Use of least significant bits enables efficient
doubling via copying of directory!)
22Comments on Extendible Hashing
- If directory fits in memory, equality search
answered with one disk access else two - Directory grows in spurts, and, if the
distribution of hash values is skewed, directory
can grow large - Multiple entries with same hash value cause
problems! - Delete
- If removal of data entry makes bucket empty, can
be merged with split image - If each directory element points to same bucket
as its split image, can halve directory
23Relevance of Hashing Techniques
- Hash indices use extensible hashing
- Uses of static hashing
- Aggregation
- Intersection
- Joins
24Hash Join
- Read entire inner relation into hash table (join
attributes as key) - For each tuple from outer, look up in hash table
join - Not fully pipelined
25Running out of Memory
- Prevention First partition the data by value
into memory-sized groups - Partition both relations in the same way, write
to files - Recursively join the partitions
- Resolution Similar, but do when hash tables
full - Split hash table into files along bucket
boundaries - Partition remaining data in same way
- Recursively join partitions with diff. hash fn!
- Hybrid hash join flush lazily a few buckets at
a time - Cost lt 3 (b(R) b(S))
26The Duality of Hash and Sort
- Different means of partitioning and merging data
when comparisons are necessary - Break on physical rule (mem size) in sorting
- Merge on logical step, the merge
- Break on logical rule (hash val) in hashing
- Combine using physical step (concat)
- When larger-than-memory sorting is necessary,
multiple operators use the same key, we can make
all operators work on the same in-memory portion
of data at the same time - Can we do this with hashing? Hash teams (Graefe)
27(No Transcript)
28What If I Want to Distribute Query Processing?
- Where do I put the data in the first place (or do
I have a choice)? - How do we get data from point A -gt point B?
- What about delays?
- What about binding patterns?
- Looks kind of like an index join with a sargable
predicate
29Pipelined Hash Join Useful for Joining Web Sources
- Two hash tables
- As a tuple comes in, add to the appropriate side
join with opposite table - Fully pipelined, adaptive to source data rates
- Can handle overflow as with hash join
- Needs more memory
30The Semi-Join/Dependent Join
- Take attributes from left and feed to the right
source as input/filter - Important in data integration
- Simple method
- for each tuple from left send to right
source get data back, join - More complex
- Hash cache of attributes mappings
- Dont send attribute already seen
- Bloom joins (use bit-vectors to reduce traffic)
JoinA.x B.y
A
B
x
31Wrap-Up
- Query execution is all about engineering for
efficiency - O(1) and O(lg n) algorithms wherever possible
- Avoid looking at or copying data wherever
possible - Note that larger-than-memory is of paramount
importance - Should that be so in todays world?
- As weve seen it so far, its all about
pipelining things through as fast as possible - But may also need to consider other axes
- Adaptivity/flexibility may sometimes need this
- Information flow to the optimizer, the runtime
system
32Upcoming Readings
- For Wednesday
- Read Chaudhuri survey as an overview
- Read and review Selinger et al. paper
- For Monday
- Read EXODUS and Starburst papers
- Write one review contrasting the two on the major
issues