Title: Query Processing and Optimization
1Query Processing and Optimization
2Steps in the Process
- SQL query
- Scan into tokens
- Parse for syntactic correctness
- Validate attribute and relation names
- Develop intermediate representation query tree
- Query Tree
- Pass through the query optimizer
- Develop Execution Plan
- Execution Plan
- Generate Executable Code
- Executable Code
- Execute by the run-time database processor
- Return Query Results
3Developing the Query Tree
- Query is broken into blocks
- Blocks are translated into relational algebra
expression represented as a tree
4Developing Blocks
- Query block contains a select-from-where clause
- So, in a nested query, each level is a block
- Example
- select lname, fname from employee
- where salary gt select max(salary) from
employee - where dno 5
- Block 1
- select max(salary) from employee where dno 5
- Block 2
- Select lname, fname from employee where salary gt
C
5Translation
- Inner Block Fmax salary(sdno5(Employee))
- Outer Block plname, fname (ssalary gt
C(Employee)) - The query optimizer then chooses a plan for each
block. The example is relatively simple the
inner query is uncorrelated with the outer. It
can be evaluated once and then used as a constant
(c)
6Sorting
- Sorting is crucial for many SQL constructs
- ORDER BY
- Duplicate elimination
- Certain join operations
- UNION
- INTERSECTION
7Problem
- All processing is done in memory
- Internal buffer is smaller than the result of
many query operations. - Whats a DBMS to do?
8Answer
- File ExternalSort(File UnSorted)
-
- Bytes bufferMAXSIZE
- int i 0
- while (buffer is not full)
- read UnSorted into buffer
- while (!UnSorted.eof())
-
- sort the buffer
- write the buffer to disk as a small file called a
run and labeled run_i - while(buffer is not full)
- read Unsorted into buffer)
- i
-
- merge run_0 run_n into a single file,
Sorted - return Sorted
9Some Metrics
- Let
- b number of file blocks (e.g., 1024)
- nb size of the internal buffer (e.g., 5)
- nr number of runs ceil(b/nb) 205
- So, after the sort phase 205 files (or runs) are
stored on disk.
10Metrics Continued
- During the merge phase the sorted runs are merged
after one or more passes. - Degree of Merging (dm)
- The number of runs grouped together to be merged
in a single pass. In our exampleas will be
shown shortly-- dm 4. So, the total number of
runs is divided into groups of 4 for merging in
each pass. - In each pass
- One buffer block used for a block from each run
being merged - One buffer block used to contain one block of the
merged result
11Metrics Continued
- So the degree of merging is the smaller of nb 1
or nr - (That is if the number of runs is less than the
number of blocks available for each pass, then
the degree of merging and the number of runs is
the same) - In our example, dm 4 (i.e., 5 1)
- But how many passes do we need to make?
12Metrics Continued
- Runs are grouped in dm chunks on each pass.
Each of these chunks are merged into a single
run.
13- At the end of the first pass, we then have
ceil(nr/dm) new (but larger) runs to be sorted.
Call this r. After the second pass, we have
ceil(r/dm) new runs to be sorted. We continue
this process until we have a single sorted file.
So, - Number of passes ceil(logdm(nr))
- In our example, we have 205/4 52 runs at the
end of the first pass, 52/4 runs at the second,
13/4 at the end of the 3rd, and 1 run remaining
at the end of the four 4 passes - ceil(log4 (205)) 4
14Metrics Continued
- But how to transform this to something we care
aboutlike disk accesses(da). - The sort phase requires 2 b disk accesses, once
for each read block, and once for writing out
each block of each sorted run. - So da 2 b cost of merge
- But what is the cost of the merge?
15Metrics Continued
- Each pass works on every block, of which there
are b in total. These have to read and written. - So
- The cost of merge 2 b ceil(logdm(nr))
- So da cost of sort cost of merge
- da (2 b) (2 b ceil(logdm(nr)))
16Implementing Select
- S1 Linear Search of entire file
- S2 Binary Search of an ordered file
- S3 Primary index or hash key for equality
condition - S4 Primary index for range selections (gt, lt
etc.) - S5 Clustering index to retrieve multiple records
when the search condition is on a non-key
attribute - S6 B tree for either equality or range selections
17More Complex Selects
- S7 Conjunctive selection using S2 through S6
- S8 Conjunctive selection using composite index.
If two more attributes are involved in equality
conditions in the conjunctive condition and a
composite index exists on the combined fields, we
can use the index directly (example, suppose we
created an index on essn, pno of works_on. - S9 Conjunctive intersection using record
pointers. If we have secondary indexes and
record pointers on a collection of indexes, then
each index can be used to create a set of record
pointers. The intersection of this set is the
conjunction of the conditions.
18General Principles
- Recall Access Path a set of structures that
provide a mechanism for getting to data (e.g.,
hash tables, indexes, B-trees) - When a single condition satisfies the selection
(sdno5(Employee) - If an access path exists, use the corresponding
access method - Else do linear search
- The optimizer should choose the path that
retrieves the fewest records in the most
efficient way. It does this by estimating costs
of various approaches and chooses the least cost
method.
19Conjunction
- Query optimization is needed mostly on
conjunctive selects with multiple conditions
whenever more than one of the attributes involved
in the select have access paths. - The optimizer should choose the access path first
that retrieves the fewest records
20How Does it Choose?
- Definition selectivity (S)
- The ratio of the number of tuples that satisfy
the condition to the total tuples in the
relation. - 0 means no tuples satisfy the condition
- 1 means that all tuples satisfy the condition
21- The number of records (N) satisfying a selection
condition is estimated by - N r(R) S
- The smaller the N, the higher the desirability
using that condition first to retrieve records - Why?
22- Suppose we have this selection
- ssex F and salary 500,000 and dno
5(Employee) - Since the result is the intersection of all
selects it makes sense to retrieve on the least
likely condition first. If 0 tuples are
returned, we look no further. If only a few are
returned, we perform the successive selects on
the result
23- Problem exact selectivities (S) of all
conditions are usually not available. But
estimates of selectivity are kept in the DBMS
catalog - Example 1 equality condition based on a key
attribute S 1/r(R) - Example 2 equality condition on an attribute
with i distinct values S (r(R)/i)/r(R)
1/i if we assume that the values are evenly
distributed among the tuples.
24Disjunction
- Since the result is the union of all individual
select conditions, it matters little which we do
first
25Join
- Definitions
- Joins involving two relations are called two-way
joins. - Joins involving more than two relations are
called multi-way joins.
26Implementing Join 1
- J1 Nested Loop join
- For each tuple, r, in the outer loop, R, retrieve
every record s from the inner loop, S, and test
whether the two tuples satisfy the join
condition rA sB where A and B are
attributes drawn from relations R and S
respectively - Employee xdno dnumber Department
27Implementing Join 2
- J2 Single-loop join
- If an access structure exists for one of the two
join attributes, say attribute B of relation S,
retrieve each record, r, in relation R and then
use the access structure on S to retrieve all
tuples rA sB where A is an attribute of R
28Implementing Join 3
- J3 Sort-merge join
- If the tuples of R and S are physically ordered
by value of the join attributes A and B, join can
be done in most efficient way possible both
relations are scanned concurrently in order of
the join attributes, matching the tuples that
have the same values for A and B. If the files
are not ordered, they may be sorted first. - What happens when a secondary index exists on
each of the join attributes? - The indexes let us scan the records in order of
the join attributes - But the records themselves may be scattered all
over file blocksrecall 1 block per read. So
this method can be very inefficient
29Implementing Join 4
- J4 Hash Join (hash structure does not exist)
- The records of files R and S are both hashed to
the same hash file, using the same hash function
on the join attributes A of R and B of S as hash
keys. - Two phases
- Partitioning phase scan the file with fewer
records, say R, and hash the retrieved records to
hash file buckets. - Probing phase Scan S. Hash each of its records
to probe the appropriate hash bucket. The record
from S is combined with all matching records from
R that are in the bucket. - Problem Assumes that R fits entirely into memory
buckets.
30Performance J1 (nested loop)
- Decide which is to be the inner and which the
outer loop - Read as much as possible of the outer loop file
into available buffer - Read inner loop file a block at a time into
buffer - Probe outer loop buffer with records from inner
loop and put results in a single block buffer - Append results to a result file
- Continue this process until the action of step 4
has bee taken on all records
31Performance J1
- Employee xdno dnumber Department
- Let
- Nb 7, the number of buffer blocks
- rD 50, the number of records in Department
- bD 10, the number of blocks Department uses
- rE 6000, the number of records in Employee
- bE 2000, the number of blocks Employee uses
32Performance J1
- How many blocks are available for the outer loop
file? - Nb (1 block for inner loop 1 block for
matches) - So, Nb 2
- Suppose Employee is chosen for the outer loop.
- Let T number of times buffer is loaded with
blocks from outer file - T ceil(bE/(Nb 2)
- Let I number of blocks accessed for inner file
- I bD ceil(bE/(Nb 2)
- Let A total blocks accessed
- A bE (bD ceil(bE/(Nb 2)
- 2000 ceil(2000/5) 10
- 6000
33Performance J1Exchange inner and outer blocks
- A bD (bE ceil(bD/(Nb 2)
- 10 (2000 10/5)
- 4010
- Wrinkle filled buffer is written to disk and
reused - bresult, the number of blocks that satisfy the
query must be added in . Since its the same in
both cases, we have - Principle J1
- Use the file with fewer blocks as the outer-loop
file in the nested-loop join
34Performance J1
- But
- filled buffer is written to disk and reused
- bresult, the number of blocks that satisfy the
query must be added in . - Yet its the same in both cases, leading to
- Principle J1
- Use the file with fewer blocks as the outer-loop
file in the nested-loop join
35Performance J2(using an access structure to
retrieve the matching records)
- Definition Join Selection Factor
- Percentage of records in file A that will be
joined with records in file B - Consider
- Department xmgrssnssn Employee
- Recall
- rD 50, the number of records in Department
- rE 6000, the number of records in Employee
36Peformance J2
- Every department (50) has a manager
- But most employees (at least 5950) dont manage a
department - So every department record will be joined
- Join Selection Factor 100
- At most 50 employee records will be joined
- Join Selection Factor .8
37Performance J2
- Suppose multilevel secondary indexes exist on ssn
of employee and mgrssn of department. - Suppose further that the index levels are as
follows - xssn 4 (for the employee file)
- xmgrssn 2 (for the department file)
- We are using J2 (an access structure to retrieve
matching records)
38Performance J2
- Option 1
- Retrieve each employee record and use the index
on mgrssn to find the matching department record - Option 2
- Retrieve each department record and use the index
on ssn to find the matching employee
39Performance J2
- Metrics Option 1
- Accesses be (re (xmgrssn 1))
- 20,000
- Metrics Option 2
- Accesses bd (rd (xssn 1))
- 260
40Performance J2
- Principle
- Use in the outer loop either
- Smallest file
- File that has the highest join selection factor
41Performance J3(Sort-Merge Join)
- Case 1 Each file is already sorted
- A pass is made through each file
- Accesses bA bB bresult
- Case 2 One or both files must be sorted
- Add the cost of sorting each external file
- ceil(b log2b)
- (if we know the number of buffers available for
sorting, we can use the more accurate forumlas
developed earlierthe point here is thats its
expensive)
42Performance J3
- Consider
- Department xmgrssnssn Employee
- Recall
- bD 10, the number of blocks in Department
- bE 2000, the number of blocks in Employee
43Performance J3
- Case 1 Each file is already sorted
- A pass is made through each file
- Accesses 2010 bresult
- Case 2 Files must be sorted
- Accesses 2010 ceil(be log2be ) bd
ceil(log2bd ) bresult - 2010 2000 11 10
4 bresult - 24,050
bresult
44Performance J4(Hash Join)
- Employee xssn mgrssn Department
- Recall
- be 2000
- bd 10
- Technique
- Scan smaller file, hashing records into memory,
M, on the join attribute. - Scan larger file, hashing each record on the join
attribute and probing M - Write results to disk
- Total Disk Accesses be bd bresult
-
2010 bresult - Problem Assumes that all of the smaller file
will fit into the reserved memory buffer
45Performance J4Overcoming the Limitation
- Partition both Employee and Department into M
partitions E1, E2, , EM - D1, D2, , DM
- Corresponding pairs, Ex and Dx, have the property
that records from Ex need only be joined with
records from Dx - We ensure this property by scanning each file and
partitioning each using the same hash
functionrequires that we keep track of which
partitions records hashing to the same key are in.
46Performance J4Phase 1 Partitioning
- Requires M internal buffers of 1 block each, to
receive the partitioning results for each file - 1 buffer to accept pieces of the external files
as they are read in - As the internal partitions fill, they are
appended to a disk file that stores this
partition.
47Performance J4Phase 2 Probing
- Requires M iterations
- During iteration x, the two partitions, Ex and Dx
are joined - How many buffers needed the number of blocks in
the smaller of the two partitons plus two
additional buffers to receive Ex and Dx - The probing can either be done through a nested
loop or through an additional hash function
48Performance J4Cost
- Each record of each file is read once and written
back to disk during the partitioning phase - Each record is read once during the probing phase
- Each record of the result is written back to disk
- A 3 (be bd) bres 6030 bres
49Comparison
- J1 (nested loop) 4010 accesses
- J2 (loop with access structure) 260
- J3 (Sorted merge join) 2010
- J4 (Hash join) 6030
- Considerations
- Cost of maintaining the access structure (J2)
- If the operation is to be done frequently, it may
be worth it to sort the files (J3) - If the files are small enough to be brought into
memory J4 becomes more attractive. - Certainly the in-memory operation of hashing and
probing (J4) is faster than the sequential
reading required with J1
50Heuristics
- Rule Numero Uno Apply select and project
operations before applying join. Reason join is
expensive. Select and project reduce the size of
files that must be joined. - Data Structures and heuristic optimization
- Query Tree
- Represents a relational algebra expression
- Relational operations are internal nodes
- Input relations are leaf nodes
- Equivalent Query Tree
- Tree representing an equivalent relational
algebra expression to the one represented in the
query tree but is more efficient to execute - Set of rules that lets us transform a Query Tree
to an Equivalent Query Tree
51Query Trees
- English
- For every dept in Stafford, find the pnum, dnum,
mgrssn, lname, address, bdate - SQL
- select pnum, dnum, mgrssn, lname, address, bdate
- from employee, department, project
- where ssn mgrssn and dnumber and
plocation Stafford - RA
- ?pnumber, dnum, address, bdate (((splocation
Stafford(PROJECT) - xdnum dnumber(DEPARTMENT)))
- xmgrssnssn(EMPLOYEE))
52The Naïve Approach
- Take the cartesian product of Employee,
Department, Project - Select those tuples where
- p.dum d.dnumber and d.mgrssn e.ssn and
p.plocation Stafford - Project
- p.pnumber, p.dnum, e.lname, e.address,
e.bdate
53Obvious Problem
- Suppose the following record sizes in bytes
- Project 100 bytes
- Department 50 bytes
- Employee 150 bytes
- Suppose the following tuple numbers in each
relation - Project 100
- Department 20
- Employee 5000
- Then their cartesian product would contain 10
million tuples of record size 300 bytes each
54A Better General Approach
- Parse RA into subexpressions, innermost at the
bottom of the parse tree - Execute from the bottom up
55But we can apply obvious heurisitcs when
generating an equivalent tree
- Consider Find the last names of employees born
after 1957 who work on a project named Aquarius - SQl select lname from employee, works_on,
project where pname Aquarius and
pnumber pno and essn ssn
and bdate gt 1957-12-31
56Equivalent Trees 1
spnameaquarius and pnumber pno and essn
ssn and bdate gt 1957-12-31
x
project
x
employee
works_on
57Better Move select down the tree
spnmber pno
x
spname aquarius
sessn ssn
project
x
works_on
sBdategt1957-12-31
Employee
58Better YetApply more restrictive select first
sessn ssn
x
spnmber pno
spname aquarius
x
project
sBdategt1957-12-31
works_on
Employee
59Better StillSwitch employee and projectbecause
select on project produces 1 tuple
sessn ssn
x
spnmber pno
sBdategt1957-12-31
x
Employee
spname aquarius
works_on
project
60Better StillReplace CP with natural joins,
making the transitions loops, etc. easier
xessn ssn
xspnmber pno
sBdategt1957-12-31
spname aquarius
works_on
Employee
project
61FinallyReduce tuple size by applying projects
early
xessn ssn
pessn
pssn, lname
xspnmber pno
sBdategt1957-12-31
pessn, pno
ppnumber
spname aquarius
works_on
Employee
project
62An Algorithm(made possible by the list, p. 519)
- Break up select ops with conjunctive conditions
into a cascade of selects - Move selects as far down the tree as possible
(selects are commutative with other ops) - Push most restrictive selects down the tree.
- Combine cartesian products with a subsequent
select into a join - Push project operations down the tree (project is
commutative) - Decompose the tree into subtrees of operations
that can be executed by a single algorithm
63NowOnce we have the final query tree
- We can choose implementation methods for join,
select, and other ops as outlined earlier, paying
attention to these factors - Disk accesses
- Cost of storing intermediate files
- Cost of performing in memory computatons
- Cost of memory buffers
- Cost of shipping query results across a network
64Much of this is stored in the DBMS catalog
- We know
- Number of records and blocks in a file
- Blocking factor for a file
- All access methods and attributes for each file
- Number of levels for a multi-level index
- Number of distinct values for an attribute
- Fraction of records satisfying an equality
condition on an attribute (selectivity)
65Oracle Optimization
- Rule-Based
- Execution plans are based on heuristically ranked
operations - Example Access by physical address of a row
(rowid) is best and full table scan is worst - Cost-Based
- Optimizer examines alternative access paths
- Chooses the one where the query cost is lowest
where query cost is based on i/o and cpu time and
memory usage - Developer-based
- Developers may tell the optimizer things that
might be semantically but not technically known - Suppose we have a company with only 10 men and 90
women. If a secondary index exists it would be
used. Whereas if men and women were evenly
distributed, it might not be used.
66What Weve Done
- Intro to databases (chap. 1)
- Database concepts and architecture (chap. 2)
- Data Modeling with ER (chap. 3)
- Relational Data Model (chap. 5)
- Relational Algebra (6.-6.5)
- ER R mapping (chap. 7)
- SQL-99 (chapters 8 9)
- Functional Dependency and Normalization (chapters
10 11) - Disk Storage, file structures, hashing (chap. 13)
- Indexing (chap. 14)
- Optimization (15.1-15.7, 15.9)