Query Processing - PowerPoint PPT Presentation

1 / 39

About This Presentation

Title:

Query Processing

Description:

Query Processing – PowerPoint PPT presentation

Number of Views:84

Avg rating:3.0/5.0

Slides: 40

Provided by: huan152

Learn more at: https://engineering.unt.edu

Category:

more less

Transcript and Presenter's Notes

Title: Query Processing

1
Query Processing
2
Exercise

Compute depositor customer, with depositor as
the outer relation.
Customer has a secondary B-tree index on
customer-name
Blocking factor 20 keys
customer 400b/10,000t depositor 100b/5,000t
Merge join
log(400) log(100) 400 100 516

3
Hybrid Merge-join

If one relation is sorted, and the other has a
secondary B-tree index on the join attribute
Merge the sorted relation with the leaf entries
of the B-tree .
Sort the result on the addresses of the unsorted
relations tuples
Scan the unsorted relation in physical address
order and merge with previous result, to replace
addresses by the actual tuples
Sequential scan more efficient than random lookup

4
Hybrid Merge-join
r

s

5

Hash join
Hash function h, range 0 ? k
Buckets for R1 G0, G1, ... Gk
Buckets for R2 H0, H1, ... Hk

Algorithm (1) Hash R1 tuples into G buckets (2)
Hash R2 tuples into H buckets (3) For i 0 to k
do match tuples in Gi, Hi buckets
6
Simple example hash even/odd

R1 R2 Buckets
2 5 Even
4 4 R1 R2
3 12 Odd
5 3
8 13
9 8
11
14

2 4 8
4 12 8 14
3 5 9
5 3 13 11
7
Example Hash Join

R1, R2 contiguous (un-ordered)
? Use 100 buckets
? Read R1, hash, write buckets
R1 ?

100
...
...
10 blocks
8

-gt Same for R2
-gt Read one R1 bucket build memory hash table
-gt Read corresponding R2 bucket hash probe
R1

Relation R1 is called the build input and
R2 is called the probe input.
R2
...
R1
...
memory
? Then repeat for all buckets
9
Cost
Bucketize Read R1 write Read R2
write Join Read R1, R2 Total cost 3 x
bR1bR2
Note this is an approximation since buckets
will vary in size and we have to round up to
blocks
10
Minimum memory requirements

Size of R1 bucket (x/k)
k number of memory buffers
x number of R1 blocks
So... (x/k) lt k
k gt ?x

Which relation should be the build relation?
11
Example of Cost of Hash-Join
customer depositor

Assume that memory size is 20 blocks
bdepositor 100 and bcustomer 400.
depositor is to be used as build input.
Partition it into five partitions, each of size
20 blocks. This partitioning can be done in one
pass.
Similarly, partition customer into five
partitions,each of size 80. This is also done in
one pass.
Therefore total cost 3(100 400) 1500 block
transfers
ignores cost of writing partially filled blocks

12
Complex Joins

Join with a conjunctive condition
r ?1? ? 2?... ? ? n s
Either use nested loops/block nested loops, or
Compute the result of one of the simpler joins r
?i s
final result comprises those tuples in the
intermediate result that satisfy the remaining
conditions
?1 ? . . . ? ?i 1 ? ?i 1 ? . . . ? ?n
Join with a disjunctive condition
r ?1 ? ?2 ?... ? ?n s
Either use nested loops/block nested loops, or
Compute as the union of the records in
individual joins r ? i s
(r ?1 s) ? (r ?2 s) ? . . . ? (r
?n s)

13
Other Operations

Duplicate elimination can be implemented via
hashing or sorting.
Projection is implemented by performing
projection on each tuple followed by duplicate
elimination.

14
Other Operations Aggregation

Aggregation
Sorting
Hashing

15
Other Operations Set Operations

Set operations (?, ? and ?)
can either use variant of merge-join after
sorting
or variant of hash-join.

16
Evaluation of Expressions

So far we have seen algorithms for individual
operations
Alternatives for evaluating an entire expression
tree
Materialization generate results of an
expression whose inputs are relations or are
already computed, materialize (store) it on disk.
Repeat.
Pipelining pass on tuples to parent operations
even as an operation is being executed

17
Query Processing