Temple University presentation

About This Presentation

Transcript and Presenter's Notes

Title: Temple University

1
Temple University CIS Dept. CIS661 Principles
of Data Management

V. Megalooikonomou
Query Optimization
(based on slides by C. Faloutsos at CMU)

2
General Overview - rel. model

Relational model - SQL
Functional Dependencies Normalization
Physical Design
Indexing
Query optimization
Transaction processing

3
Overview of a DBMS
casual user
Naïve user
DBA
DML parser
DDL parser
DML precomp.
trans. mgr
buffer mgr
4
Overview - detailed

Why q-opt?
Equivalence of expressions
Cost estimation
Cost of indices
Join strategies

5
Why Q-opt?

SQL declarative
good q-opt -gt big difference
eg., seq. Scan vs B-tree index, on P1,000 pages

6
Q-opt steps

bring query in internal form (eg., parse tree)
into canonical form (syntactic q-opt)
generate alternative plans
estimate cost pick best

7
Q-opt - example
Canonical form
p
select name from STUDENT, TAKES where
c-idCIS661 and STUDENT.ssnTAKES.ssn
s
STUDENT
TAKES
8
Q-opt - example
Hash join merge join nested loops
Index seq. scan
9
Overview - detailed

Why q-opt?
Equivalence of expressions
Cost estimation
Cost of indices
Join strategies

10
Equivalence of expressions

A.k.a. syntactic q-opt
In short perform selections and projections
early
More details
see transformation rules in text

11
Equivalence of expressions

Q How to prove a transformation rule?
A use TRC, to show that LHS RHS, e.g.

12
Equivalence of expressions
13
Equivalence of expressions
14
Equivalence of expressions

Q how to disprove a rule??

15
Equivalence of expressions

Selections
perform them early
break a complex predicate, and push
simplify a complex predicate
(XY and Y3) -gt X3 and Y3

16
Equivalence of expressions

Projections
perform them early (but carefully)
Smaller tuples
Fewer tuples (if duplicates are eliminated)
project out all attributes except the ones
requested or required (e.g., joining attr.)

17
Equivalence of expressions

Joins
Commutative , associative
Q n-way join - how many diff. orderings?
Exhaustive enumeration too slow

18
Q-opt steps

bring query in internal form (eg., parse tree)
into canonical form (syntactic q-opt)
generate alt. plans
estimate cost pick best

19
Cost estimation

Eg., find ssns of students with an A in CIS661
(using seq. scanning)
How long will a query take?
CPU (but small cost decreasing tough to
estimate)
Disk (mainly, block transfers)
How many tuples will qualify?
(what statistics do we need to keep?)

20
Cost estimation

Statistics for each relation r we keep
nr tuples
Sr size of tuple in bytes

21
Cost estimation
Sr

Statistics for each relation r we keep
V(A,r) number of distinct values of attr. A
(recently, histograms, too)

1
2
3

nr
22
Derivable statistics

fr blocking factor max records/block (??
)
br blocks (?? )
SC(A,r) selection cardinality avg of records
with Agiven (?? )

23
Derivable statistics

fr blocking factor max records/block ( B/Sr
B block size in bytes)
br blocks ( nr / fr )

24
Derivable statistics

SC(A,r) selection cardinality avg of records
with Agiven ( nr / V(A,r) ) (assumes
uniformity...) eg 30,000 students, 10 colleges
how many students in CST?

25
Additional quantities we need

For index i
fi average fanout - degree (50-100)
HTi levels of index i (2-3)
log(entries)/log(fi)
LBi blocks at leaf level

HTi
26
Statistics

Where do we store them?
How often do we update them?

27
Q-opt steps

bring query in internal form (eg., parse tree)
into canonical form (syntactic q-opt)
generate alt. plans
selections sorting projections
joins
estimate cost pick best

28
Cost estimation plan generation

Selections eg.,
select
from TAKES
where grade A
Plans?

29
Cost estimation plan generation

Plans?
seq. scan
binary search
(if sorted consecutive)
index search
if an index exists

30
Cost estimation plan generation

seq. scan cost?
br (worst case)
br/2 (average, if we search for primary key)

31
Cost estimation plan generation

binary search cost?
if sorted and consecutive
log(br)
SC(A,r)/fr (blocks spanned by qualified tuples)
-1

32
Cost estimation plan generation

estimation of selection cardinalities SC(A,r)
non-trivial details later

33
Cost estimation plan generation

method3 index cost?
levels of index
blocks w/ qual. tuples

...
case1 primary key case2 sec. key clustering
index case3 sec. key non-clust. index
34
Cost estimation plan generation

method3 index cost?
levels of index
blocks w/ qual. tuples

..
case1 primary key cost HTi 1
35
Cost estimation plan generation
Sr

method3 index - cost?
levels of index
blocks w/ qual. tuples

1
fr
2
case2 sec. key clustering index OR prim.
index on non-key retrieve multiple records HTi
SC(A,r)/fr

br
36
Cost estimation plan generation

method3 index cost?
levels of index
blocks w/ qual. tuples

...
case3 sec. key non-clust. index HTi
SC(A,r) (actually, pessimistic...)
37
Cost estimation arithmetic examples

find accounts with branch-name Perryridge
account(branch-name, balance, ...)

38
Arithm. examples contd

n-account 10,000 tuples
f-account 20 tuples/block
V(balance, account) 500 distinct values
V(branch-name, account) 50 distinct values
for branch-index fanout fi 20

39
Arithm. examples

Q1 cost of seq. scan?
A1 500 disk accesses
Q2 assume a clustering index on branch-name
cost?

40
Cost estimation plan generation
Sr

method3 index cost?
levels of index
blocks w/ qual. tuples

1
fr
2
case2 sec. key clustering index HTi
SC(A,r)/fr

br
41
Arithm. examples

A2
HTi
SC(branch-name, account)/f-account
HTi 50 values, with index fanout 20 -gt HT2
levels (log(50)/log(20) 1)
SC(..) qualified records
nr/V(A,r) 10,000/50 200 tuples
SC/f spanning 200/20 blocks 10 blocks

42
Arithm. examples

A2 final answer 210 12 block accesses
(vs. 500 block accesses of seq. scan)
footnote in all fairness
seq. disk accesses 2msec or less
random disk accesses 10msec

43
Q-opt steps

bring query in internal form (eg., parse tree)
into canonical form (syntactic q-opt)
generate alternative plans
selections sorting projections
joins
estimate cost pick best

44
Examples

(selection) find student record with ssn123

45
Reminder our Mini-U db
46
DML - nested subqueries

Drill find the ssn of the student with the
highest GPA

47
File organization

Eg., Student records how would you store them
on disk?

48
General Overview - rel. model

Relational model - SQL
Functional Dependencies Normalization
Physical Design
Indexing
Query optimization
Transaction processing

49
Q-opt steps

bring query in internal form (eg., parse tree)
into canonical form (syntactic q-opt)
generate alt. plans
selections (simple complex predicates)
sorting projections
joins
estimate cost pick best

50
Reminder statistics

for each relation r we keep
nr tuples
Sr size of tuple in bytes
V(A,r) number of distinct values of attr. A
fr blocking factor
br number of blocks
SC(A,r) selection cardinality (avg. of records
with Agiven)

51
Selections

we saw simple predicates (Aconstant eg.,
nameSmith)
how about more complex predicates, like
salary gt 10K
age 30 and job-codeanalyst
what is their selectivity?

52
Selections complex predicates

selectivity sel(P) of predicate P
fraction of tuples that qualify
sel(P) SC(P) nr

53
Selections complex predicates

eg., assume that V(grade, TAKES)5 distinct
values
simple predicate P Aconstant
sel(Aconstant) 1/V(A,r)
eg., sel(gradeB) 1/5
(what if V(A,r) is unknown??)

54
Selections complex predicates

range query sel( grade gt C)
sel(Agta) (Amax a) / (Amax Amin)

55
Selections - complex predicates

negation sel( grade ! C)
sel( not P) 1 sel(P)
(Observation selectivity probability)

P
56
Selections complex predicates

conjunction
sel( grade C and course CIS661)
sel(P1 and P2) sel(P1) sel(P2)
INDEPENDENCE ASSUMPTION

57
Selections complex predicates

disjunction
sel( grade C or course CIS661)
sel(P1 or P2) sel(P1) sel(P2) sel(P1 and
P2)
sel(P1) sel(P2) sel(P1)sel(P2)
INDEPENDENCE ASSUMPTION, again

58
Selections complex predicates

disjunction in general
sel(P1 or P2 or Pn)
1 - (1- sel(P1) ) (1 - sel(P2) ) (1 -
sel(Pn))

59
Selections summary

sel(Aconstant) 1/V(A,r)
sel( Agta) (Amax a) / (Amax Amin)
sel(not P) 1 sel(P)
sel(P1 and P2) sel(P1) sel(P2)
sel(P1 or P2) sel(P1) sel(P2)
sel(P1)sel(P2)
UNIFORMITY and INDEPENDENCE ASSUMPTIONS

60
Q-opt steps

bring query in internal form (eg., parse tree)
into canonical form (syntactic q-opt)
generate alt. plans
selections (simple complex predicates)
sorting projections
joins
estimate cost pick best

61
Sorting

Assume br blocks of rel. r, and
only M (ltbr) buffers in main memory
Q1 how to sort (external sorting)?
Q2 cost?

62
Sorting

Q1 how to sort (external sorting)?
A1
create sorted runs of size M
merge

63
Sorting

create sorted runs of size M (how many?)
merge them (how?)

M
...
...
64
Sorting

create sorted runs of size M
merge first M-1 runs into a sorted run of
(M-1) M, ...

M
..
...
...
65
Sorting

How many steps we need to do?
i, where M(M-1)i gt br
How many reads/writes per step? brbr
(each step reads every block once and writes it
once)

M
..
...
...
66
Sorting

In short, excluding the final write, we need
ceil(log(br/M) / log(M-1)) 2 br br

M
..
...
...
67
Q-opt steps

bring query in internal form (eg., parse tree)
into canonical form (syntactic q-opt)
generate alt. plans
selections (simple complex predicates)
sorting projections, aggregations
joins
estimate cost pick best

68
Projection - dupl. elimination

eg.,
select distinct c-id
from TAKES
How?
Pros and cons?

69
Set operations

eg.,
select from REGULAR-STUDENT
union
select from SPECIAL-STUDENT
How?
Pros and cons?

70
Aggregations

eg.,
select ssn, avg(grade)
from TAKES
How?

71
Q-opt steps

bring query in internal form (eg., parse tree)
into canonical form (syntactic q-opt)
generate alt. plans
selections sorting projections, aggregations
joins
2-way joins
n-way joins
estimate cost pick best

72
2-way joins

output size estimation r JOIN s
nr, ns tuples each
case1 cartesian product (R, S have no common
attribute)
of output tuples??

73
2-way joins

output size estimation r JOIN s
case2 r(A,B), s(A,C,D), A is cand. key for r
of output tuples??

ltns
r(A, ...)
s(A, ......)
nr
ns
74
2-way joins

output size estimation r JOIN s
case3 r(A,B), s(A,C,D), A is cand. key for
neither (is it possible??)
of output tuples??

r(A, ...)
s(A, ......)
nr
ns
75
2-way joins

of output tuples
nr ns/V(A,s) or ns nr/V(A,r)
(whichever is less)

76
Q-opt steps

bring query in internal form (eg., parse tree)
into canonical form (syntactic q-opt)
generate alt. plans
selections sorting projections, aggregations
joins
2-way joins - output size estimation algorithms
n-way joins
estimate cost pick best

77
2-way joins

algorithm(s) for r JOIN s?
nr, ns tuples each

78
2-way joins

Algorithm 0 (naive) nested loop (SLOW!)
for each tuple tr of r
for each tuple ts of s
print, if they match

79
2-way joins

Algorithm 0 why is it bad?
how many disk accesses (br and bs are the
number of blocks for r and s)?

nrbs br
80
2-way joins

Algorithm 1 Blocked nested-loop join
read in a block of r
read in a block of s
print matching tuples

cost br br bs
r(A, ...)
s(A, ......)
nr, br
ns records, bs blocks
81
2-way joins

Arithmetic example
nr 10,000 tuples, br 1,000 blocks
ns 1,000 tuples, bs 200 blocks

alg0 2,001,000 d.a. alg1 201,000 d.a.
r(A, ...)
s(A, ......)
10,000 1,000
1,000 records, 200 blocks
82
2-way joins

Observation1 Algo1 asymmetric
cost br br bs - reverse roles
cost bs bsbr
Best choice?

smallest relation in outer loop
r(A, ...)
s(A, ......)
nr, br
ns records, bs blocks
83
2-way joins

Observation2 NOT IN BOOK
what if we have k buffers available?

read in k-1 blocks of r read in a block
of s print matching tuples
r(A, ...)
s(A, ......)
nr, br
ns records, bs blocks
84
2-way joins

Cost?

br br/(k-1) bs
read in k-1 blocks of r read in a block
of s print matching tuples
what if brk-1? what if we assign k-1 blocks to
inner?)
r(A, ...)
s(A, ......)
nr, br
ns records, bs blocks
85
2-way joins

Observation3 can we get rid of the br term?
cost br br bs

A read the inner relation backwards half of the
times! Q cons?
r(A, ...)
s(A, ......)
nr, br
ns records, bs blocks
86
2-way joins

Other algorithm(s) for r JOIN s?
nr, ns tuples each

87
2-way joins - other algos

sort-merge
sort r sort s merge sorted versions
(good, if one or both are already sorted)

88
2-way joins - other algos

sort-merge - cost
2 br log(br) 2 bs log(bs) br bs
needs temporary space (for sorted versions)
gives output in sorted order

89
2-way joins - other algos

use an existing index, or even build one on the
fly
cost br nr c (c look-up cost)

r(A, ...)
nr
90
2-way joins - other algos

hash join
hash r into (0, 1, ..., max) buckets
hash s into buckets (same hash function)
join each pair of matching buckets

r(A, ...)
0
1
max
91
2-way joins - hash join details

how to join each pair of partitions Hr-i, Hs-i ?
A build another hash table for Hs-i, and probe
(look up) it with each tuple of Hr-i

Hr-0
Hs-0
r(A, ...)
0
1
max
92
2-way joins - hash join details

what if Hs-i is too large to fit in main-memory?
A recursive partitioning
more details (overflows, hybrid hash joins) in
book
cost of hash join? (under certain assumptions)
2(br bs) (br bs) 4 max

93
Q-opt steps

bring query in internal form (eg., parse tree)
into canonical form (syntactic q-opt)
generate alt. plans
selections sorting projections, aggregations
joins
2-way joins - output size estimation algorithms
n-way joins
estimate cost pick best

94
n-way joins

r1 JOIN r2 JOIN ... JOIN rn
typically, break problem into 2-way joins

95
Structure of query optimizers

System R
break query in query blocks
simple queries (ie., no joins) look at stats
n-way joins left-deep join trees ie., only one
intermediate result at a time
pros smaller search space pipelining
cons may miss optimal
2-way joins nested-loop and sort-merge

96
Structure of query optimizers

More heuristics by Oracle, Sybase and Starburst
(-gt DB2) in book
In general q-opt is very important for large
databases.
(explain select ltsql-statementgt gives plan)

97
Q-opt steps

bring query in internal form (eg., parse tree)
into canonical form (syntactic q-opt)
generate alt. plans
selections (simple complex predicates)
sorting projections
joins
estimate cost pick best

Write a Comment

User Comments (0)

About PowerShow.com

Temple University PowerPoint PPT Presentation