Query Execution Ch'15 - PowerPoint PPT Presentation

1 / 58
About This Presentation
Title:

Query Execution Ch'15

Description:

http://www.quiz-zone.co.uk/quizrounds/020317trueorfalse/questions.html ... Outer loop runs B(S)/(M-1) times, and each time need to read R: costs B(S)B(R)/(M-1) ... – PowerPoint PPT presentation

Number of Views:25
Avg rating:3.0/5.0
Slides: 59
Provided by: lawrence55
Category:

less

Transcript and Presenter's Notes

Title: Query Execution Ch'15


1
Query ExecutionCh.15
  • CS411

1
2
P.Q.
3
True Or False?
  • An index on a file speeds up selections on the
    search key field(s) ?

3
4
Which ?
  • Search key any subset of the fields of a
    relation?
  • Search key a subset of is Relation's Primary
    key(minimal set of fields that uniquely identify
    a record in a relation)?

5
(Review) Bad hash functions?
  • int h(chars) return 0
  • int h(chars) return rnd()
  • int h(chars) return s0s1...
  • int
  • h(chars) return secure_cipher_key(s)

6
(Review) Bad hash functions?
  • int h(chars) return 0 SINGLE BUCKET
  • int h(chars) return rnd() CANT RE-FIND
  • int h(chars) return s0s1... BETTER but
    poor distribution
  • int
  • h(chars) return encrypt_hash(s) GOOD
    distribution but TOO EXPENSIVE

7
Linear or Extensible Hash or both?
  • In memory array size must be 2i
  • Does not use overflow blocks?
  • Uses a msb bit-flip trick?
  • Uses a subset of generated hash bit code?
  • Array entries refer to disk blocks
  • Splits bucket where new record is inserted?

8
Linear or Extensible Hash or both?
  • In memory array size must be 2i Extensible
  • Does not use overflow blocks? Extensible
  • Uses a msb bit-flip trick? Linear
  • Uses a subset of hash bits? Both
  • Array entries refer to disk blocks Both
  • Splits bucket where new record is inserted? Ext.

9
  • Click to add an outline

10
T/F
  • Most Eskimoes have fridges?
  • Thomas Edison, who invented the lightbulb, was
    afraid of the dark?
  • King Charles I was executed 1649 after the
    English Civil War led by Oliver Cromwell?
  • http//www.quiz-zone.co.uk/quizrounds/020317trueor
    false/questions.html

11
All True
  • Most Eskimoes have fridges?
  • Thomas Edison, who invented the lightbulb, was
    afraid of the dark?
  • King Charles I was executed 1649 after the
    English Civil War led by Oliver Cromwell?
  • http//www.quiz-zone.co.uk/quizrounds/020317trueor
    false/questions.html
  • http//www.historylearningsite.co.uk/CharlesI_exec
    ution.htm

12
Query Execution
12
13
Outline (Formal)
  • Logical/physical operators
  • Cost parameters and sorting
  • One-pass algorithms
  • Nested-loop joins
  • Two-pass algorithms

13
14
Outline (InFormal)
  • Logical/physical operators
  • Which streets vs. postal delivery route
  • Cost parameters and sorting
  • "How long?!" "I only apples from small to large"
  • One-pass algorithms
  • "Hello Tuple... Goodbye tuple!"
  • Nested-loop joins
  • "Everybody plays everybody else"
  • Two-pass algorithms
  • "Bite of chunks at a time" When ONCE is not
    enough

14
15
Query Execution
Query or update
User/ Application
Query compiler
Query execution plan
Execution engine
Record, index requests
Index/record mgr.
Page commands
Buffer manager
Read/write pages
Storage manager
storage
15
16
Logical v.s. Physical Operators
  • Logical operators
  • what they do
  • e.g., union, selection, project, join, grouping
  • Physical operators
  • how they do it
  • e.g., nested loop join, sort-merge join, hash
    join, index join

16
17
Query Execution Plans
buyer
?
Cityurbana
phonegt5430000
  • Query Plan
  • logical tree
  • implementation choice at every node
  • scheduling of operations.

Buyername
(Simple Nested Loops)
Person
Purchase
(Table scan)
(Index scan)
Some operators are from relational algebra, and
others (e.g., scan, group) are not.
17
18
How do We Combine Operations?
  • The iterator model. Each operation is implemented
    by 3 functions
  • Open sets up the data structures and performs
    initializations
  • GetNext returns the the next tuple of the
    result.
  • Close ends the operations. Cleans up the data
    structures.

18
19
How do We Combine Operations?
  • The iterator model. Enables pipelining!
  • Contrast with data-driven materialize model.
  • "pipelined evaluation" vs. "materialized
    evaluation"

19
20
Cost Parameters
  • Cost parameters
  • M number of blocks that fit in main memory
  • B(R) number of blocks holding R
  • T(R) number of tuples in R
  • V(R,a) number of distinct values of the
    attribute a
  • Estimating the cost
  • Important in optimization (next lecture)
  • Compute I/O cost only
  • We compute the cost to read the tables
  • We dont compute the cost to write the result
    (because pipelining)

20
21
Example Sorting cost?
  • Two pass multi-way merge sort
  • Step 1
  • Read M blocks at a time, sort, write
  • Result have runs of length M on disk
  • Step 2
  • Merge M-1 at a time, write to disk
  • Result have runs of length M(M-1) M2
  • Cost nnnn B(R) ?

21
22
Example Sorting cost?
  • Two pass multi-way merge sort
  • Step 1
  • Read M blocks at a time, sort, write
  • Result have runs of length M on disk
  • Step 2
  • Merge M-1 at a time, write to disk
  • Result have runs of length M(M-1)?M2
  • Cost 3B(R), Assumption B(R) ? M2

22
23
SCAN
  • Read entire contents of a table
  • (Review Section 15.1.1-5 p.716)

24
SCAN Scanning Tables
  • -) The table is clustered (i.e. blocks consists
    only of records from this table)
  • Table-scan if we know where the blocks are
  • Index-scan if we have a sparse index to find the
    blocks
  • -(
  • The table is unclustered (e.g. its records are
    placed on blocks with other tables)
  • May need one read for each record

24
25
Cost of the Scan Operator
  • Clustered relation
  • Table scan B(R)
  • Index scan B(R)
  • Unclustered relation
  • T(R)
  • what if we want to sort as well?

25
26
Cost of the Scan Operator
  • Clustered relation
  • Table scan B(R) to sort 3B(R)
  • Index scan B(R) to sort B(R) or 3B(R)
  • Unclustered relation
  • T(R) to sort T(R) 2B(R)

26
27
One pass algorithm
27
28
One pass algorithmexample operators?
28
29
One-pass Algorithms
  • Selection ?(R), projection ?(R)
  • Both are tuple-at-a-Time algorithms
  • Cost B(R)

29
30
One-pass Algorithms
  • Duplicate elimination ?(R)
  • Need to keep a dictionary in memory
  • balanced search tree
  • hash table
  • etc
  • Cost B(R)
  • Assumption B(?(R)) lt M

30
31
One-pass Algorithms
  • Grouping ?city, sum(price) (R)
  • Need to keep a dictionary in memory
  • Also store the sum(price) for each city
  • Cost B(R)
  • Assumption number of cities fits in memory

31
32
One-pass Algorithms
  • Binary operations R n S, R U S, R S
  • Assumption min(B(R), B(S)) lt M
  • Scan one table first, then the next, eliminate
    duplicates
  • Cost B(R)B(S)

32
33
Nested loop join (15.3)
33
34
Nested Loop Joins
  • Tuple-based nested loop R S
  • for each tuple r in R do
  • for each tuple s in S do
  • if r and s join then output (r,s)
  • Cost T(R) T(S), sometimes T(R) B(S)

34
35
Nested Loop Joins
  • Block-based Nested Loop Join
  • for each (M-1) blocks bs of S do
  • for each block br of R do
  • for each tuple s in bs do
  • for each tuple r in br do
  • if r and s join then
    output(r,s)

35
36
Nested Loop Joins
R S
Join Result
Hash table for block of S (k lt B-1 pages)
. . .
. . .
Input buffer for R
Output buffer
36
37
Nested Loop Joins
  • Block-based Nested Loop Join
  • Cost
  • Read S once cost B(S)
  • Outer loop runs B(S)/(M-1) times, and each time
    need to read R costs B(S)B(R)/(M-1)
  • Total cost B(S) B(S)B(R)/(M-1)
  • Notice it is better to iterate over the smaller
    relation first

37
38
Two pass algorithm
38
39
Two-Pass Algorithms Based on Sorting
  • Duplicate elimination ?(R)
  • Simple idea sort first, then eliminate
    duplicates
  • Step 1 sort runs of size M, write
  • Cost 2B(R)
  • Step 2 merge M-1 runs, but include each tuple
    only once
  • Cost B(R)
  • Total cost 3B(R), Assumption B(R) lt M2

39
40
Two-Pass Algorithms Based on Sorting
  • Grouping ?city, sum(price) (R)
  • Same as before sort, then compute the sum(price)
    for each group
  • As before compute sum(price) during the merge
    phase.
  • Total cost 3B(R)
  • Assumption B(R) lt M2

40
41
Two-Pass Algorithms Based on Sorting
  • Binary operations R n S, R U S, R S
  • Idea sort R, sort S, then do the right thing
  • A closer look
  • Step 1 split R into runs of size M, then split S
    into runs of size M. Cost 2B(R) 2B(S)
  • Step 2 merge M/2 runs from R merge M/2 runs
    from S ouput a tuple on a case by cases basis
  • Total cost 3B(R)3B(S)
  • Assumption B(R)B(S)lt M2

41
42
Two-Pass Algorithms Based on Sorting
  • Join R S
  • Start by sorting both R and S on the join
    attribute
  • Cost 4B(R)4B(S) (because need to write to
    disk)
  • Read both relations in sorted order, match tuples
  • Cost B(R)B(S)
  • Difficulty many tuples in R may match many in S
  • If at least one set of tuples fits in M, we are
    OK
  • Otherwise need nested loop, higher cost
  • Total cost 5B(R)5B(S)
  • Assumption B(R) lt M2, B(S) lt M2

42
43
Two-Pass Algorithms Based on Sorting
  • Join R S
  • If the number of tuples in R matching those in S
    is small (or vice versa) we can compute the join
    during the merge phase
  • Total cost 3B(R)3B(S)
  • Assumption B(R) B(S) lt M2

43
44
Two Pass Algorithms Based on Hashing
  • Idea partition a relation R into buckets, on
    disk
  • Each bucket has size approx. B(R)/M
  • Does each bucket fit in main memory ?
  • Yes if B(R)/M lt M, i.e. B(R) lt M2

1
2
B(R)
44
45
Hash Based Algorithms for ?
  • Recall ?(R) ??duplicate elimination
  • Step 1. Partition R into buckets
  • Step 2. Apply ? to each bucket (may read in main
    memory)
  • Cost 3B(R)
  • AssumptionB(R) lt M2

45
46
Hash Based Algorithms for ?
  • Recall ?(R) ??grouping and aggregation
  • Step 1. Partition R into buckets
  • Step 2. Apply ? to each bucket (may read in main
    memory)
  • Cost 3B(R)
  • AssumptionB(R) lt M2

46
47
Hash-based Join
  • R S
  • Recall the main memory hash-based join
  • Scan S, build buckets in main memory
  • Then scan R and join

47
48
Partitioned Hash Join
  • R S
  • Step 1
  • Hash S into M buckets
  • send all buckets to disk
  • Step 2
  • Hash R into M buckets
  • Send all buckets to disk
  • Step 3
  • Join every pair of buckets

48
49
PartitionedHash-Join
  • Partition both relations using hash fn h R
    tuples in partition i will only match S tuples in
    partition i.

50
Partitioned Hash Join
  • Cost 3B(R) 3B(S)
  • Assumption min(B(R), B(S)) lt M2

50
51
Hybrid Hash Join Algorithm
  • When we have more memory B(S) ltlt M2
  • Partition S into k buckets
  • But keep first bucket S1 in memory, k-1 buckets
    to disk
  • Partition R into k buckets
  • First bucket R1 is joined immediately with S1
  • Other k-1 buckets go to disk
  • Finally, join k-1 pairs of buckets
  • (R2,S2), (R3,S3), , (Rk,Sk)

51
52
Hybrid Join Algorithm
  • How big should we choose k ?
  • Average bucket size for S is B(S)/k
  • Need to fit B(S)/k (k-1) blocks in memory
  • B(S)/k (k-1) lt M
  • k slightly smaller than B(S)/M

52
53
Hybrid Join Algorithm
  • How many I/Os ?
  • Recall cost of partitioned hash join
  • 3B(R) 3B(S)
  • Now we save 2 disk operations for one bucket
  • Recall there are k buckets
  • Hence we save 2/k(B(R) B(S))
  • Cost (3-2/k)(B(R) B(S))
    (3-2M/B(S))(B(R) B(S))

53
54
Indexed Based Algorithms
  • In a clustered index all tuples with the same
    value of the key are clustered on as few blocks
    as possible

54
55
Index Based Selection
  • Selection on equality ?av(R)
  • Clustered index on a cost B(R)/V(R,a)
  • Unclustered index on a cost T(R)/V(R,a)

55
56
Index Based Selection
  • Example B(R) 2000, T(R) 100,000, V(R, a)
    20, compute the cost of ?av(R)
  • Cost of table scan
  • If R is clustered B(R) 2000 I/Os
  • If R is unclustered T(R) 100,000 I/Os
  • Cost of index based selection
  • If index is clustered B(R)/V(R,a) 100
  • If index is unclustered T(R)/V(R,a) 5000
  • Notice when V(R,a) is small, then unclustered
    index is useless

56
57
Index Based Join
  • R S
  • Assume S has an index on the join attribute
  • Iterate over R, for each tuple fetch
    corresponding tuple(s) from S
  • Assume R is clustered. Cost
  • If index is clustered B(R) T(R)B(S)/V(S,a)
  • If index is unclustered B(R) T(R)T(S)/V(S,a)

57
58
Index Based Join
  • Assume both R and S have a sorted index (B tree)
    on the join attribute
  • Then perform a merge join (called zig-zag join)
  • Cost B(R) B(S)

58
Write a Comment
User Comments (0)
About PowerShow.com