Title: Selftuning Memory Management
1Adaptive Query Processing Optimization
The survival of the fittest is the ageless law
of nature, but the fittest are rarely the strong.
The fittest are those endowed with the
qualifications for adaptation, the ability to
accept the inevitable and conform to the
unavoidable, to harmonize with existing or
changing conditions. source unknown
2Memory Adaptive Operators
- Assign a fixed amount of memory to each operator
inside queries - Characteristics of each operator (sort hash
join) are different - Operator requirements depend on the input data
size - Need to know application workload to estimate
memory needed at different points in time - Give each operator an amount of memory based on a
size estimate of its input data, e.g., sort 10GB,
use 40MB sort 100GB, use 400MB - The size estimates are made by the optimizer and
can be inaccurate (off by orders of magnitude) - The memory usage is not constrained by the
workload - An operator adapts its memory usage to respond to
the memory demand in the system
3Challenge Sort 1Gb of data with 1Mb of RAM
4 3
1 2
???
6 2
2 3
9 4
3 4
8 7
4 5
5 6
6 6
3 1
7 8
2
9
Main memory buffers
Disk
Disk
4A Simpler Problem Combine Sorted Files
2 3
6 4
3 2
6 4
3 2
3
3 2
2
7 4
7 4
9 8
Output Buffer
Input Buffer
Main memory buffers
Disk
Disk
5A Simpler Problem Combine Sorted Files
2 3
6
4 4
4
4 4
7 4
7
6 7
Output Buffer
9 8
Input Buffer
8 9
Main memory buffers
Disk
Disk
6What if there are many more runs?
6 4
3 2
7 4
9 8
7 5
4 1
5 3
9 5
Main memory buffers
Disk
Disk
7What if there are many more runs?
1 2 3 3 4 4 4 5 5 5 6 7 7 8 9 9
6 4
3 2
4 4
3 2
7 6
9 8
7 4
9 8
7 5
4 1
5 4
3 1
5 5
9 7
5 3
9 5
8What if there are more memory?
6 4
3 2
7 4
9 8
7 5
4 1
5 3
9 5
Main memory buffers
Disk
Disk
9What if there are more memory?
1 2
6 4
3 2
3
9 8
7 4
1
2 1
4 1
4
7 5
5 3
9 5
Main memory buffers
Disk
Disk
10What if there are more memory?
1 2
6 4
6 4
3 3
9 8
7 4
3 3
4
7 5
5
9 5
Main memory buffers
Disk
Disk
11Multi-way Merge Sort
- Given k sorted files (runs), we can merge them
into larger sorted runs, and eventually produce
one single sorted file. - To sort a very large (unsorted) file, we can do
it in 2 steps - Generate sorted runs
- Merge sorted runs (we already know how to do
this)
12How to generate sorted runs?
- Read as many records as possible into memory
- Perform in-memory sort
- Write out sorted records as a sorted run
- Repeat the process until all records in the
unsorted files are read
13How to generate sorted runs?
7 2 8 3 4 4 6 5 9 5 4 1 7 3 9 5
Main memory buffers
Disk
Disk
14How to generate sorted runs?
7 2
8 3
9 5 4 1 7 3 9 5
6 5
4 4
Main memory buffers
Disk
Disk
15How to generate sorted runs?
2 3
4 4
9 5 4 1 7 3 9 5
7 8
5 6
Main memory buffers
Disk
Disk
16How to generate sorted runs?
2 3 4 4 5 6 7 8
9 5 4 1 7 3 9 5
Main memory buffers
Disk
Disk
17How to generate sorted runs?
2 3 4 4 5 6 7 8
1 3 4 5 5 7 9 9
Main memory buffers
Disk
Disk
18Phase 1
Phase 2
Sorted runs
Unsorted file
Sorted file
19Multi-way Merge Sort
- To sort a file with N pages using B buffer pages
- Phase 1 use B buffer pages. Produce ?N / B?
sorted runs of B pages each. - 1 pass (read write) over the file
- Phase 2 merge B-1 runs each time
- ? log B-1 ?N / B ? ? passes
20Cost of Multi-way Merge Sort
- Number of passes 1 ? log B-1 ?N / B ? ?
- Cost 2N ( of passes)
- E.g., with 5 buffer pages, to sort 108 page file
- Phase 1 (pass 0) ?108 / 5? 22 sorted runs of 5
pages each (last run is only 3 pages) - Phase 2
- Pass 1 ?22 / 4? 6 sorted runs of 20 pages each
(last run is only 8 pages) - Pass 2 2 sorted runs, 80 pages and 28 pages
- Pass 3 Sorted file of 108 pages
21Memory Adaptive External Sort
- In-memory sort phase
- While there is more input mem space
- Read data into buffer
- Sort the buffer
- Check/adjust mem
- In-memory merge phase
- If no more input this is first run
- Merge buffers to produce output and STOP
- If no more mem or this is the last run
- Write sorted run into temp table
- If there is more input
- Check/adjust memory
- Go to In-Memory sort phase
- External merge phase
- If there are multiple runs in temp table
- Check/adjust memory
- Merge them to produce the final results
- For multipe pass merge
- Check/adjust memory before each merge
22In-memory Sort Phase
7 2 8 3 4 4 6 5 9 5 4 1 7 3 9 5
8 3 4 4 6 5 9 5 4 1 7 3 9 5
4 4 6 5 9 5 4 1 7 3 9 5
2 7
7 2
8 3
3 8
Main memory buffers
Disk
Disk
23In-memory Sort Phase
2 7
4 4
9 5 4 1 7 3 9 5
5 6
3 8
Main memory buffers
Disk
Disk
24In-memory Merge Phase
2 3 4 4 5 6 7 8
9 5 4 1 7 3 9 5
Main memory buffers
Disk
Disk
25Hash-Join
- Partition both relations using hash fn h R
tuples in partition i will only match S tuples in
partition i.
- Read in a partition of R, hash it using h2. Scan
matching partition of S, search for matches.
26Observations on Hash-Join
- partitions k lt B (why?), and B-1 gt size of
largest partition to be held in memory. Assuming
uniformly sized partitions, and maximizing k, we
get - k B-1, and M/(B-1) B-1, i.e., B-1 must be gt
- If we build an in-memory hash table to speed up
the matching of tuples, a little more memory is
needed. - In partitioning phase, readwrite both relns
2(MN). In matching phase, read both relns MN
I/Os.
27Memory Adaptive Hash Join
- Partitioning Phase
- Split R into k partitions (k ltlt M) based on hash
function h - Each partition has at least one page
- Pages belonging to same partitions are chained up
- If memory is not enough, flush out some
partitions (keeping at least one page) - Split S into k partitions using h
- If a full partition of R is in memory, then can
probe it immediately otherwise just write to
output buffer - It is possible that some of R partitions may
still be flushed out at this time - Need at least k pages
- Joining Phase
- For remaining partitions of R
- Read R partition and build hash table in memory
- Read corresponding S partition to probe
- If space is not enough, can further split into
subpartitions recursively or use nested loops
join (minimum requirement) - Can acquire more memory if available
28Query Optimization
- Query optimizer finds the best query evaluation
plan - For single table selection queries, it determines
the best access plans - For two-way joins, it determines the join methods
- For complex queries, it finds the cheapest plan
- What is the problem with traditional optimizer?
- Sometimes database optimizers choose query plans
that are sub-optimal by orders of magnitude - Errors in estimation (statistics)
- Assumptions in costing (independence of join
attributes, uniform distribution)
29Estimation Error
- select from R, S where R.aS.a and R.bgtK1 and
R.cgtK2
s
s(R) actual
A
buffer
!
s(R) estimated
!
B
30Single-Point Limitation
A
B
31Re-optimization Overview
- Compute plan as normal
- Add check operators to plan to
- check when plan becomes sub-optimal
- check for significant discrepancies between
estimated and observed values - Execute and react approach
- Trigger re-optimization when checks fail
- Else, proceed
32Choosing PlansOptimization vs. Re-Optimization
- Traditional optimizer chooses plan P
- Re-optimizer chooses same plan P and adds checks
33Re-optimization Limitations
- Re-optimization is expensive(could avoid it by
using robust plans) - May lose partial work
- If start on P1 and re-optimize to P2, will repeat
scan on R
P1 is risky! P2 is robust.
34Re-optimization Limitations
- Limited information collected at run-time
- Only detects when to re-optimize
- E.g., detects that some selectivity gt 5
- But, future re-optimization needs more
information - E.g., what is the true value of selectivity?
- May thrash without that knowledge
35Proactive Re-Optimization in a Nutshell
- If DBMS knows
- re-optimization can happen
It can (proactively) pick plans that are better
than picking a plan and then (reactively)
re-optimizing if something goes wrong.
It should try to avoid it
It can (proactively) collect statistics needed
for future re-optimization steps for the same
query.
It should plan for it
36Proactive Re-optimization Architecture
Optimization
QUERY
2. Use bounding boxes to pick robust or
switchable plans
CATALOG
Execution
37Bounding BoxesRepresenting Uncertainty
- Interval around estimate is
- wide if optimizer uncertain about estimate
- narrow if optimizer certain about estimate
- How is uncertainty measured?
- From the way the statistic is estimated, e.g.
- Histogram -gt very certain
- Multiplication of selectivities -gt uncertain
- Default guess -gt very uncertain
- Etc.
38Bounding BoxesRepresenting Uncertainty
- Interval around estimate is
- wide if optimizer uncertain about estimate
- narrow if optimizer certain about estimate
S
Bounding box
high
est.
low
s(R)
high
estimated
low
39Bounding BoxesPlan Costing and Pruning
- Costing - Computes three costs per each plan
tree - (2-dim bounding box using cardinality
estimates from sub-plans) - Pruning
- For each join subset and interesting order
- Finds 3 plans BestLow, BestEst, and BestHigh
- The plans with lowest CLow, CEst, and CHigh,
respectively - All others are pruned
Cost?
S
CHigh, cost here
CEst, cost here
CLow, cost here
HHJ
high
est.
low
s(R)
S
s(R)
high
estimated
low
40Bounding BoxesOptimization and Execution
- Bounding boxes define what optimizer should plan
for - Goal of optimization
- Find a (set of) plan(s) that behaves well in
bounding box - At run-time
- If observed stats fall inside bounding box keep
going - Else, re-optimize
41Switchable Plans Selecting Plans
BestEstP1
BestHighP2
BestLowP1
- At the end of plan enumeration there are three
seed plans
- Four cases
- The seeds are the same plan
- One of the seeds is robust
- A switchable plan can be created from them
- No single plan, not robust, not switchable
42Optimal Plan
- 1 Plan is optimal for all 3 points
- Choice is easy
43Robust Plan
- 1 plan is, or close to, optimal for all 3 points
- 1 plan can be safely chosen
44Switchable Plan
- There is a plan with close to optimal cost plan
at each point - Additional Requirements
- The decision can be deferred
- Actual statistics must lie within bounding box
- It is possible to switch between the plans
45Switchable Plans
- Switchable Plan Goals
- Pick plan only after uncertainty resolved
- Do no lose or repeat work
- Plans are switchable if
- Have different root operator
- Have the same sub-plan for deep input of root
- Have the same base table as other input
E.g.
INLJ
Hash2
Hash3
Index Seek on T
Scan T
Scan T
Hash1
Hash1
Hash1
Scan R
Scan S
Scan R
Scan S
Scan R
Scan S
46What if Seeds arenot Switchable?
- May still be able to generate other seeds from
those seeds such that they are robust and
switchable - Pick a seed, and derives a set of plans that are
switchable with that seed - If one of the resultant plans cost is close to
that of the other seeds, it can replace them as
part of the switchable plans
47Implementation of a Switchable Plan
Switchable Plan
T
Hash1
Scan R
Scan S
- Buffer tuples until a tuple random sample is
obtained - Compute estimate and pass it up to switch
operator - Switch operator instantiates correct operator
- No work lost or repeated
48Observing Statistics at Run-Time
- Goal 1 Detect when to re-optimize
- Detect when outside bounding box
- Goal 2 Must be cheap
- To minimize impact on execution
- Goal 3 Must be quick
- To avoid spending time in wrong plan
- Goal 4 Must be accurate
- To avoid re-optimization thrashing
49The IdeaRandom Sample Prefix
- Prefix output of operators with random sample of
entire output
Normal output without random prefix
a
b
c
d
e
f
g
h
i
j
h
a
e
Output with random prefix
- When eos punctuation arrives system can
- Compute estimate based on random sample
- Determine if bounding box violation happened
- Use revised estimate in future re-optimization
50ImplementingRandom Sample Prefixes
- Assume random samples of base relations
- For each relation R, there is an R_sample
- R_sample is a random sample of R
- Modified scan operator
- scan R_sample
- emit eos
- scan R skipping tuples in R_sample
- Similarly for indexed scan
51ImplementingRandom Sample Prefixes - Joins
- Modified BNLJ and INLJ operators
- Pass eos from outer relation
- Ignore eos from inner relation
- True random sample of join if outer is FK side
- Hash join
eos
eos
Then, do second pass as in normal hash
Read S into memory until EOS of S
Read all of R, partition, probe S sample
Read rest of S, partition, probe R, output tuples
R
S sample
S
R sample
S sample
rest of R
rest of S
R
S
52Wrap-up Example s(A)Cs(O)
Assume error in Estimate s(A)!
53Conclusion
- Runtime adaptation is needed to achieve improved
performance - Memory adaptive algorithms are needed
- Re-optimization has to be carefully exploited to
avoid thrashing