Query Execution Ch'15 - PowerPoint PPT Presentation

1 / 58

About This Presentation

Title:

Query Execution Ch'15

Description:

http://www.quiz-zone.co.uk/quizrounds/020317trueorfalse/questions.html ... Outer loop runs B(S)/(M-1) times, and each time need to read R: costs B(S)B(R)/(M-1) ... – PowerPoint PPT presentation

Number of Views:25

Avg rating:3.0/5.0

Slides: 59

Provided by: lawrence55

Category:

more less

Transcript and Presenter's Notes

Title: Query Execution Ch'15

1
Query ExecutionCh.15

CS411

1
2
P.Q.
3
True Or False?

An index on a file speeds up selections on the
search key field(s) ?

3
4
Which ?

Search key any subset of the fields of a
relation?
Search key a subset of is Relation's Primary
key(minimal set of fields that uniquely identify
a record in a relation)?

5
(Review) Bad hash functions?

int h(chars) return 0
int h(chars) return rnd()
int h(chars) return s0s1...
int
h(chars) return secure_cipher_key(s)

6
(Review) Bad hash functions?

int h(chars) return 0 SINGLE BUCKET
int h(chars) return rnd() CANT RE-FIND
int h(chars) return s0s1... BETTER but
poor distribution
int
h(chars) return encrypt_hash(s) GOOD
distribution but TOO EXPENSIVE

7
Linear or Extensible Hash or both?

In memory array size must be 2i
Does not use overflow blocks?
Uses a msb bit-flip trick?
Uses a subset of generated hash bit code?
Array entries refer to disk blocks
Splits bucket where new record is inserted?

8
Linear or Extensible Hash or both?

In memory array size must be 2i Extensible
Does not use overflow blocks? Extensible
Uses a msb bit-flip trick? Linear
Uses a subset of hash bits? Both
Array entries refer to disk blocks Both
Splits bucket where new record is inserted? Ext.

Click to add an outline

10
T/F

Most Eskimoes have fridges?
Thomas Edison, who invented the lightbulb, was
afraid of the dark?
King Charles I was executed 1649 after the
English Civil War led by Oliver Cromwell?
http//www.quiz-zone.co.uk/quizrounds/020317trueor
false/questions.html

11
All True

Most Eskimoes have fridges?
Thomas Edison, who invented the lightbulb, was
afraid of the dark?
King Charles I was executed 1649 after the
English Civil War led by Oliver Cromwell?
http//www.quiz-zone.co.uk/quizrounds/020317trueor
false/questions.html
http//www.historylearningsite.co.uk/CharlesI_exec
ution.htm

12
Query Execution
12
13
Outline (Formal)

Logical/physical operators
Cost parameters and sorting
One-pass algorithms
Nested-loop joins
Two-pass algorithms

13
14
Outline (InFormal)

Logical/physical operators
Which streets vs. postal delivery route
Cost parameters and sorting
"How long?!" "I only apples from small to large"
One-pass algorithms
"Hello Tuple... Goodbye tuple!"
Nested-loop joins
"Everybody plays everybody else"
Two-pass algorithms
"Bite of chunks at a time" When ONCE is not
enough

14
15
Query Execution
Query or update
User/ Application
Query compiler
Query execution plan
Execution engine
Record, index requests
Index/record mgr.
Page commands
Buffer manager
Read/write pages
Storage manager
storage
15
16
Logical v.s. Physical Operators

Logical operators
what they do
e.g., union, selection, project, join, grouping
Physical operators
how they do it
e.g., nested loop join, sort-merge join, hash
join, index join

16
17
Query Execution Plans
buyer
?
Cityurbana
phonegt5430000

Query Plan
logical tree
implementation choice at every node
scheduling of operations.

Buyername
(Simple Nested Loops)
Person
Purchase
(Table scan)
(Index scan)
Some operators are from relational algebra, and
others (e.g., scan, group) are not.
17
18
How do We Combine Operations?

The iterator model. Each operation is implemented
by 3 functions
Open sets up the data structures and performs
initializations
GetNext returns the the next tuple of the
result.
Close ends the operations. Cleans up the data
structures.

18
19
How do We Combine Operations?

The iterator model. Enables pipelining!
Contrast with data-driven materialize model.
"pipelined evaluation" vs. "materialized
evaluation"

19
20
Cost Parameters

Cost parameters
M number of blocks that fit in main memory
B(R) number of blocks holding R
T(R) number of tuples in R
V(R,a) number of distinct values of the
attribute a
Estimating the cost
Important in optimization (next lecture)
Compute I/O cost only
We compute the cost to read the tables
We dont compute the cost to write the result
(because pipelining)

20
21
Example Sorting cost?

Two pass multi-way merge sort
Step 1
Read M blocks at a time, sort, write
Result have runs of length M on disk
Step 2
Merge M-1 at a time, write to disk
Result have runs of length M(M-1) M2
Cost nnnn B(R) ?

21
22
Example Sorting cost?

Two pass multi-way merge sort
Step 1
Read M blocks at a time, sort, write
Result have runs of length M on disk
Step 2
Merge M-1 at a time, write to disk
Result have runs of length M(M-1)?M2
Cost 3B(R), Assumption B(R) ? M2

22
23
SCAN

Read entire contents of a table
(Review Section 15.1.1-5 p.716)

24
SCAN Scanning Tables

-) The table is clustered (i.e. blocks consists
only of records from this table)
Table-scan if we know where the blocks are
Index-scan if we have a sparse index to find the
blocks
-(
The table is unclustered (e.g. its records are
placed on blocks with other tables)
May need one read for each record

24
25
Cost of the Scan Operator

Clustered relation
Table scan B(R)
Index scan B(R)
Unclustered relation
T(R)
what if we want to sort as well?

25
26
Cost of the Scan Operator

Clustered relation
Table scan B(R) to sort 3B(R)
Index scan B(R) to sort B(R) or 3B(R)
Unclustered relation
T(R) to sort T(R) 2B(R)

26
27
One pass algorithm
27
28
One pass algorithmexample operators?
28
29
One-pass Algorithms

Selection ?(R), projection ?(R)
Both are tuple-at-a-Time algorithms
Cost B(R)

29
30
One-pass Algorithms

Duplicate elimination ?(R)
Need to keep a dictionary in memory
balanced search tree
hash table
etc
Cost B(R)
Assumption B(?(R)) lt M

30
31
One-pass Algorithms

Grouping ?city, sum(price) (R)
Need to keep a dictionary in memory
Also store the sum(price) for each city
Cost B(R)
Assumption number of cities fits in memory

31
32
One-pass Algorithms

Binary operations R n S, R U S, R S
Assumption min(B(R), B(S)) lt M
Scan one table first, then the next, eliminate
duplicates
Cost B(R)B(S)

32
33
Nested loop join (15.3)
33
34
Nested Loop Joins

Tuple-based nested loop R S
for each tuple r in R do
for each tuple s in S do
if r and s join then output (r,s)
Cost T(R) T(S), sometimes T(R) B(S)

34
35
Nested Loop Joins

Block-based Nested Loop Join
for each (M-1) blocks bs of S do
for each block br of R do
for each tuple s in bs do
for each tuple r in br do
if r and s join then
output(r,s)

35
36
Nested Loop Joins
R S
Join Result
Hash table for block of S (k lt B-1 pages)
. . .
. . .
Input buffer for R
Output buffer
36
37
Nested Loop Joins

Block-based Nested Loop Join
Cost
Read S once cost B(S)
Outer loop runs B(S)/(M-1) times, and each time
need to read R costs B(S)B(R)/(M-1)
Total cost B(S) B(S)B(R)/(M-1)
Notice it is better to iterate over the smaller
relation first

37
38
Two pass algorithm
38
39
Two-Pass Algorithms Based on Sorting

Duplicate elimination ?(R)
Simple idea sort first, then eliminate
duplicates
Step 1 sort runs of size M, write
Cost 2B(R)
Step 2 merge M-1 runs, but include each tuple
only once
Cost B(R)
Total cost 3B(R), Assumption B(R) lt M2

39
40
Two-Pass Algorithms Based on Sorting

Grouping ?city, sum(price) (R)
Same as before sort, then compute the sum(price)
for each group
As before compute sum(price) during the merge
phase.
Total cost 3B(R)
Assumption B(R) lt M2

40
41
Two-Pass Algorithms Based on Sorting

Binary operations R n S, R U S, R S
Idea sort R, sort S, then do the right thing
A closer look
Step 1 split R into runs of size M, then split S
into runs of size M. Cost 2B(R) 2B(S)
Step 2 merge M/2 runs from R merge M/2 runs
from S ouput a tuple on a case by cases basis
Total cost 3B(R)3B(S)
Assumption B(R)B(S)lt M2

41
42
Two-Pass Algorithms Based on Sorting

Join R S
Start by sorting both R and S on the join
attribute
Cost 4B(R)4B(S) (because need to write to
disk)
Read both relations in sorted order, match tuples
Cost B(R)B(S)
Difficulty many tuples in R may match many in S
If at least one set of tuples fits in M, we are
OK
Otherwise need nested loop, higher cost
Total cost 5B(R)5B(S)
Assumption B(R) lt M2, B(S) lt M2

42
43
Two-Pass Algorithms Based on Sorting

Join R S
If the number of tuples in R matching those in S
is small (or vice versa) we can compute the join
during the merge phase
Total cost 3B(R)3B(S)
Assumption B(R) B(S) lt M2

43
44
Two Pass Algorithms Based on Hashing