Query Optimization - PowerPoint PPT Presentation

1 / 24

About This Presentation

Title:

Query Optimization

Description:

Big Cat Shere Khan 100 70 Big Cat Bageera 100 60 Bear Baloo 300 50 Big Cat Aslan 100 40 Operators Query Plan Query Result Query Query parser Query optimizer SELECT ... – PowerPoint PPT presentation

Number of Views:86

Avg rating:3.0/5.0

Slides: 25

Provided by: eben

Category:

more less

Transcript and Presenter's Notes

Title: Query Optimization

1
Query Optimization

RG, Chapter 15
Lecture 16

2
Administrivia

Homework 3 available today
Written exercise will be posted on class website
Due date Tuesday, March 20 by end of class
period
Homework 4 available later this week
Implement nested loops and hash join operators
for minibase
Due date April 5 (after Spring Break)
Midterm 2 is 3/22, 2 weeks from today
In class, covers lectures 10-17
Review will be held Tuesday 3/20 7-9 pm 306 Soda
Hall

3
Review
Now you are here

Query plans are a tree of operators that compute
the result of a query
Optimization is the process of picking the best
plan
Execution is the process of executing the plan

You were here

Operators are the building blocks for computing
results of queries
Sort
Project
Join
Filter
Access methods for files
...

4
Query Plans turning text into tuples
Query Result
Query
Shift
Name
SELECT A.aname, max(F.feedingshift) FROM Animals
A, Feeding F WHERE A.aid F.aid AND (A.species
'Big Cat' or A.species 'Bear') GROUP BY
A.aname HAVING COUNT() gt 1
Aslan 3
Bageera 3
Elsa 3
Shere Khan 3
Tigger 2
Operators
Query Plan
10 2 1 100 3
10 3 2 100 3
20 3 2 100 3
20 2 3 100 3
30 3 2 100 3

40 100 Aslan Big Cat
50 300 Baloo Bear
60 100 Bageera Big Cat
70 100 Shere Khan Big Cat
90 100 Dumbo Elephant

5
Operator Review

Access Path pulls tuples from tables
File scans
Index scans (clustered or unclustered)
Index-only scans
Select (or Filter) conditionally excludes tuples
Can be pushed/combined with Access Path
operator
Use indexes where possible and apply other
predicates on the result
Can also be applied at intermediate point in
query plan
Projection removes columns and duplicates
Column projection often done by operators
Duplicate elimination via Sort or Hash

6
Operator Review

Sort sorts tuples in a particular order
Simple merge sort
General external merge sort (with various
optimizations)
B tree traversal
Join combine tuples from 2 other operators
Page nested loops
Block nested loops
Index nested loops
Sort-merge join
Hash-join
Other operators for
Group By, Temping,

7
Query Optimization steps
SELECT A.aname, max(F.feedingshift) FROM Animals
A, Feeding F WHERE A.aid F.aid AND (A.species
'Big Cat' or A.species 'Bear') GROUP BY
A.aname HAVING COUNT() gt 1
Query parser
Block 3

Parse query from text to intermediate model
Traverse intermediate model and produce
alternate query plans
Query plan tree of relational operators
Optimizer keeps track of cost and properties of
plans
Pick the cheapest plan
Pass cheapest plan on to query execution engine
to execute and produce results of query

Block 2
Block 1
Query optimizer
Cost 200
Cost 500
Cost 150
To execution engine
8
Query Blocks Units of Optimization

Intermediate model is a set of query blocks
1 per SELECT/FROM/WHERE/GROUP BY/HAVING clause

SELECT A.aname, max(F.feedingshift) FROM Animals
A, Feeding F WHERE A.aid F.aid AND (A.species
'Big Cat' or A.species 'Bear') GROUP BY
A.aname HAVING COUNT() gt 1
Query Block

Subqueries produce nested query blocks
treated as calls to a subroutine, made once per
tuple produced by outer query block
sometimes subqueries can be rewritten to produce
cheaper plan

SELECT S.sname FROM Sailors S WHERE S.age IN
(SELECT MAX (S2.age) FROM Sailors
S2 GROUP BY S2.rating)
X
Outer Query Block
Rewritten Query Block
Nested Query Block
9
Query blocks are optimized 1 at a time

Convert block to relational algebra tree
Traverse tree and build plan bottom up
Pick best access method for each relation in FROM
clause
Applying predicates if possible
Consider all join trees
All ways to join relations in FROM clause 1-at-a
time
Consider multiple permutations and join methods
But not all! too many choices
Restrict to left-deep plans
Prune bad plans along the way

Query Block
10
Converting Query Blocks to Relational Algebra
Trees

SQL is relationally complete can express
everything in relational algebra

SELECT S.sname FROM Reserves R, Sailors S WHERE
R.sidS.sid AND R.bid100 AND S.ratinggt5
?(sname)?(bid100 ? rating gt 5) (Reserves ??
Sailors)
11
SQL extends Relational Algebra

SQL is more powerful than relational algebra
extend relational algebra to include aggregate
ops GROUP BY, HAVING
How is this query block expressed?
SELECT S.sname
FROM Sailors S
WHERE S.age IN (constant set from subquery)
And this query block?
SELECT MAX (S2.age)
FROM Sailors S2
GROUP BY S2.rating

?sname(s(age in set from subquery) Sailors)
s(age in set from subquery) Sailors
?Max(age)(GroupByRating(Sailors) )
GroupByRating(Sailors)
12
Why optimize?

Operators have implementation choices
Index scan? File scan? Nested loop join? Sort
merge?
Operators can also be applied in different order!

13
Motivating Example -- Schema used
Sailors (sid integer, sname string, rating
integer, age real) Reserves (sid integer, bid
integer, day dates, rname string)

As seen in previous lectures
Reserves
Each tuple is 40 bytes long, 100 tuples per
page, 1000 pages.
Assume there are 100 boats
Sailors
Each tuple is 50 bytes long, 80 tuples per page,
500 pages.
Assume there are 10 different ratings
Assume we have 5 pages in our buffer pool!

14
Motivating Example
SELECT S.sname FROM Reserves R, Sailors S WHERE
R.sidS.sid AND R.bid100 AND S.ratinggt5

Cost 5005001000 I/Os
Not the worst plan, but
Misses several opportunities
selections could have been pushed earlier,
indexes might have been helpful.
Goal of optimization To find more efficient
plans that compute the same answer.

Plan
1000
500
15
Selectivity calculation

Sailors 500 pages, 80 tuples per page, 10
ratings
Selectivity of S.rating gt 5?
½ -gt 50080/2 20,000 tuples
20,000/80 250 pages
Reserves 1000 pages, 100 tuples per page, 100
boats
Selectivity of R.bid 100?
1/100 -gt 1000100/100 1000 tuples
1000/100 10 pages

16
Alternative Plans Push Selects (No Indexes)
250
1000
500
1000
500
500,500 IOs
250,500 IOs
500 250 1000
17
Alternative Plans Push Selects (No Indexes)
250
10
250
1000
1000
500
250,500 IOs
500 250 1000
250,500 IOs
18
Alternative Plans Try different join order
10
swap
500
1000
6000 IOs
1000 10 500
250,500 IOs
19
Alternative Plans Push Selects and precompute
result (No Indexes)
10
250
10
250
500
1000
500
1000
6000 IOs
1000 500 250 (10 250)
4250 IOs
20
Alternative Plans Try different join order
10
250
10
250
swap
10
250
1000
500
500
1000
4250 IOs
500 1000 10 (250 10)
4010 IOs
21
Optimized query is 124x cheaper than the original!
4010 IOs
500,500 IOs
22
More Alternative Plans (No Indexes)

Main difference Sort Merge Join
With 5 buffers, cost of plan
Scan Reserves (1000) write temp T1 (10 pages,
if we have 100 boats,
uniform distribution) 1010.
Scan Sailors (500) write temp T2 (250 pages, if
have 10 ratings) 750.
Sort T1 (2210) sort T2 (24250) merge
(10250) 2300
Total 4060 page I/Os.
If use BNL join, join 104250, total cost
2770.
Can also push projections, but must be careful!
T1 has only sid, T2 only sid, sname
T1 fits in 3 pgs, cost of BNL under 250 pgs,
total lt 2000.