CPSC 310 Database Systems - PowerPoint PPT Presentation

1 / 39
About This Presentation
Title:

CPSC 310 Database Systems

Description:

search the space of physical plans. choose order of operations. complete ... StarsIn(title,year,starName) Movie(title,year,len,inColor,studioName) and a view ... – PowerPoint PPT presentation

Number of Views:27
Avg rating:3.0/5.0
Slides: 40
Provided by: JeffU4
Category:

less

Transcript and Presenter's Notes

Title: CPSC 310 Database Systems


1
CPSC 310 Database Systems
  • Lecturer Anxiao (Andrew) Jiang
  • Lecture Nineteen Query Compilation

2
Query Compilation
  • Parsing
  • Logical Query Plan

3
SQL query
parse
parse tree
convert
answer
logical query plan
execute
apply laws
statistics
Pi
improved l.q.p
pick best
estimate result sizes
(P1,C1),(P2,C2)...
l.q.p. sizes
estimate costs
consider physical plans
P1,P2,..
4
Outline
  • Convert SQL query to a parse tree
  • Semantic checking attributes, relation names,
    types
  • Convert to a logical query plan (relational
    algebra expression)
  • deal with subqueries
  • Improve the logical query plan
  • use algebraic transformations
  • group together certain operators
  • evaluate logical plan based on estimated size of
    relations
  • Convert to a physical query plan
  • search the space of physical plans
  • choose order of operations
  • complete the physical query plan

5
Parsing
  • Goal is to convert a text string containing a
    query into a parse tree data structure
  • leaves form the text string (broken into lexical
    elements)
  • internal nodes are syntactic categories
  • Uses standard algorithmic techniques from
    compilers
  • given a grammar for the language (e.g., SQL),
    process the string and build the tree

6
Example SQL query
  • SELECT title
  • FROM StarsIn
  • WHERE starName IN (
  • SELECT name
  • FROM MovieStar
  • WHERE birthdate LIKE 1960
  • )
  • (Find the movies with stars born in 1960)
  • Assume we have a simplified grammar for SQL.

7
Example Parse Tree
ltQuerygt
ltSFWgt
SELECT ltSelListgt FROM ltFromListgt
WHERE ltConditiongt
ltAttributegt ltRelNamegt
ltTuplegt IN ltQuerygt
title StarsIn
ltAttributegt ( ltQuerygt )
starName ltSFWgt
8
The Preprocessor
  • replaces each reference to a view with a parse
    (sub)-tree that describes the view (i.e., a
    query)
  • does semantic checking
  • are relations and views mentioned in the schema?
  • are attributes mentioned in the current scope?
  • are attribute types correct?

9
Outline
  • Convert SQL query to a parse tree
  • Semantic checking attributes, relation names,
    types
  • Convert to a logical query plan (relational
    algebra expression)
  • deal with subqueries
  • Improve the logical query plan
  • use algebraic transformations
  • group together certain operators
  • evaluate logical plan based on estimated size of
    relations
  • Convert to a physical query plan
  • search the space of physical plans
  • choose order of operations
  • complete the physical query plan

10
Convert Parse Tree to Relational Algebra
  • Complete algorithm depends on specific grammar,
    which determines forms of the parse trees
  • Here give a flavor of the approach

11
Conversion
  • Suppose there are no subqueries.
  • SELECT att-list FROM rel-list WHERE cond
  • is converted into
  • PROJatt-list(SELECTcond(PRODUCT(rel-list))),
    or
  • ?att-list(?cond( X (rel-list)))

12
  • SELECT movieTitle
  • FROM StarsIn, MovieStar
  • WHERE starName name AND birthdate LIKE '1960'

ltQuerygt
ltSFWgt
SELECT ltSelListgt FROM ltFromListgt
WHERE ltConditiongt
ltAttributegt ltRelNamegt , ltFromListgt
AND ltConditiongt
movieTitle StarsIn ltRelNamegt
ltAttributegt LIKE ltPatterngt
MovieStar birthdate
'1960'
ltConditiongt
ltAttributegt ltAttributegt
starName name
13
Equivalent Algebraic Expression Tree
?movieTitle
?
starname name AND birthdate LIKE '1960'
X
StarsIn MovieStar
14
Handling Subqueries
  • Recall the (equivalent) query
  • SELECT title
  • FROM StarsIn
  • WHERE starName IN (
  • SELECT name
  • FROM MovieStar
  • WHERE birthdate LIKE 1960
  • )
  • Use an intermediate format called two-argument
    selection

15
Example Two-Argument Selection
?title
?
StarsIn ltconditiongt
lttuplegt IN ?name
ltattributegt ?birthdate LIKE 1960
starName MovieStar
16
Converting Two-Argument Selection
  • To continue the conversion, we need rules for
    replacing two-argument selection with a
    relational algebra expression
  • Different rules depending on the nature of the
    subquery
  • Here show example for IN operator and
    uncorrelated query (subquery computes a relation
    independent of the tuple being tested)

17
Rules for IN
?
?
C
R ltConditiongt
X
R ?
t IN S
S
C is the condition that equates attributes in t
with corresponding attributes in S
18
Example Logical Query Plan
?title
?starNamename
?
?
StarsIn
?name
?birthdate LIKE 1960
MovieStar
19
What if Subquery is Correlated?
  • Example is when subquery refers to the current
    tuple of the outer scope that is being tested
  • More complicated to deal with, since subquery
    cannot be translated in isolation
  • Need to incorporate external attributes in the
    translation
  • Some details are in textbook

20
Outline
  • Convert SQL query to a parse tree
  • Semantic checking attributes, relation names,
    types
  • Convert to a logical query plan (relational
    algebra expression)
  • deal with subqueries
  • Improve the logical query plan
  • use algebraic transformations
  • group together certain operators
  • evaluate logical plan based on estimated size of
    relations
  • Convert to a physical query plan
  • search the space of physical plans
  • choose order of operations
  • complete the physical query plan

21
Improving the Logical Query Plan
  • There are numerous algebraic laws concerning
    relational algebra operations
  • By applying them to a logical query plan
    judiciously, we can get an equivalent query plan
    that can be executed more efficiently
  • Next we'll survey some of these laws

22
Associative and Commutative Operations
  • product
  • natural join
  • set and bag union
  • set and bag intersection
  • associative (A op B) op C A op (B op C)
  • commutative A op B B op A

23
Laws Involving Selection
  • Selections usually reduce the size of the
    relation
  • Usually good to do selections early, i.e., "push
    them down the tree"
  • Also can be helpful to break up a complex
    selection into parts

24
Selection Splitting
  • ? C1 AND C2 (R) ? C1 ( ? C2 (R))
  • ? C1 OR C2 (R) (? C1 (R)) Uset (? C2 (R))
  • if R is a set
  • ? C1 ( ? C2 (R)) ? C2 ( ? C1 (R))

25
Selection and Binary Operators
  • Must push selection to both arguments
  • ? C (R U S) ? C (R) U ? C (S)
  • Must push to first arg, optional for 2nd
  • ? C (R - S) ? C (R) - S
  • ? C (R - S) ? C (R) - ? C (S)
  • Push to at least one arg with all attributes
    mentioned in C
  • product, natural join, theta join, intersection
  • e.g., ? C (R X S) ? C (R) X S, if R has all
    the atts in C

26
Pushing Selection Up the Tree
  • Suppose we have relations
  • StarsIn(title,year,starName)
  • Movie(title,year,len,inColor,studioName)
  • and a view
  • CREATE VIEW MoviesOf1996 AS
  • SELECT
  • FROM Movie
  • WHERE year 1996
  • and the query
  • SELECT starName, studioName
  • FROM MoviesOf1996 NATURAL JOIN StarsIn

27
The Straightforward Tree
??starName,studioName
??year1996 StarsIn
Movie
28
The Improved Logical Query Plan
29
Laws Involving Projections
  • Consider adding in additional projections
  • Adding a projection lower in the tree can improve
    performance, since often tuple size is reduced
  • Usually not as helpful as pushing selections down
  • If a projection is inserted in the tree, then
    none of the eliminated attributes can appear
    above this point in the tree
  • Ex ?L(R X S) ?L(?M(R) X ?N(S)), where M (resp.
    N) is all attributes of R (resp. S) that are used
    in L
  • Another example
  • ?L(R Ubag S) ?L(R) Ubag ?L(S)
  • But watch out for set union!

30
Push Projection Below Selection?
  • Rule ?L(?C(R)) ?L(?C(?M(R)))
  • where M is all attributes used by L or C
  • But is it a good idea?
  • SELECT starName FROM StarsIn WHERE movieYear
    1996

? starName
? starName
? movieYear1996
? movieYear1996
? starName,movieYear
StarsIn
StarsIn
31
Joins and Products
  • Recall from the definitions of relational
    algebra
  • R C S ?C(R X S) (theta join)
  • R S ?L(?C(R X S)) (natural join)
  • where C equates same-name attributes in R and S,
    and L includes all attributes of R and S dropping
    duplicates
  • To improve a logical query plan, replace a
    product followed by a selection with a join
  • Join algorithms are usually faster than doing
    product followed by selection

32
Duplicate Elimination
  • Moving ? down the tree is potentially beneficial
    as it can reduce the size of intermediate
    relations
  • Can be eliminated if argument has no duplicates
  • a relation with a primary key
  • a relation resulting from a grouping operator
  • Legal to push ? through product, join, selection,
    and bag intersection
  • Ex ?(R X S) ?(R) X ?(S)
  • Cannot push ? through bag union, bag difference
    or projection

33
Grouping and Aggregation
  • Since ? produces no duplicates
  • ?(?L(R)) ?L(R)
  • Get rid of useless attributes
  • ?L(R) ?L(?M(R))
  • where M contains all attributes in L
  • If L contains only MIN and MAX
  • ?L(R) ?L(?(R))

34
Example
  • Suppose we have the relations
  • MovieStar(name,addr,gender,birthdate)
  • StarsIn(title,year,starName)
  • and we want to find the youngest star to appear
    in a movie for each year
  • SELECT year, MAX(birthdate)
  • FROM MovieStar,StarsIn
  • WHERE name starName
  • GROUP BY year

35
Example cont'd
?year,MAX(birthdate)
?year,birthdate
?
  • year,
  • starName
  • birthdate,
  • name

36
Summary of LQP Improvements
  • Selections
  • push down tree as far as possible
  • if condition is an AND, split and push separately
  • sometimes need to push up before pushing down
  • Projections
  • can be pushed down
  • new ones can be added (but be careful)
  • Duplicate elimination
  • sometimes can be removed
  • Selection/product combinations
  • can sometimes be replaced with join

37
Outline
  • Convert SQL query to a parse tree
  • Semantic checking attributes, relation names,
    types
  • Convert to a logical query plan (relational
    algebra expression)
  • deal with subqueries
  • Improve the logical query plan
  • use algebraic transformations
  • group together certain operators
  • evaluate logical plan based on estimated size of
    relations
  • Convert to a physical query plan
  • search the space of physical plans
  • choose order of operations
  • complete the physical query plan

38
Grouping Assoc/Comm Operators
  • Group together adjacent joins, adjacent unions,
    and adjacent intersections as siblings in the
    tree
  • Sets up the logical QP for future optimization
    when physical QP is constructed determine best
    order for doing a sequence of joins (or unions or
    intersections)

U D E F
U
D
E
F
U
A B C
A
B
C
39
Evaluating Logical Query Plans
  • The transformations discussed so far intuitively
    seem like good ideas
  • But how can we evaluate them more scientifically?
  • Estimate size of relations, also helpful in
    evaluating physical query plans
  • Coming up next
Write a Comment
User Comments (0)
About PowerShow.com