Title: Relational Algebra and Optimization
1Relational Algebra and Optimization
- Slides from Jeff Ullman's and Hector
Garcia-Molinas lectures - http//www-db.stanford.edu/ullman/dscb/pslides/ps
lides.html and http//www-db.stanford.edu/ullman/
dbsi/win02/index.html
2Reading Materials
- Lecture Notes
- Database Systems Chapter 15 (read)
3Relational Algebra on Bags
- A bag (or multiset ) is like a set, but an
element may appear more than once. - Example 1,2,1,3 is a bag.
- Example 1,2,3 is also a bag that happens to be
a set.
4Why Bags?
- SQL, the most important query language for
relational databases, is actually a bag language. - Some operations, like projection, are much more
efficient on bags than sets.
5Operations on Bags
- Selection applies to each tuple, so its effect on
bags is like its effect on sets. - Projection also applies to each tuple, but as a
bag operator, we do not eliminate duplicates. - Products and joins are done on each pair of
tuples, so duplicates in bags have no effect on
how we operate.
6Example Bag Selection
R( A, B ) 1 2 5 6 1 2
7Example Bag Projection
R( A, B ) 1 2 5 6 1 2
8Example Bag Product
R( A, B ) S( B, C ) 1 2 3 4 5 6 7 8 1 2
9Example Bag Theta-Join
R( A, B ) S( B, C ) 1 2 3 4 5 6 7 8 1 2
10Bag Union
- An element appears in the union of two bags the
sum of the number of times it appears in each
bag. - Example 1,2,1 UNION 1,1,2,3,1
1,1,1,1,1,2,2,3
11Bag Intersection
- An element appears in the intersection of two
bags the minimum of the number of times it
appears in either. - Example 1,2,1,1 INTER 1,2,1,3 1,1,2.
12Bag Difference
- An element appears in the difference A B of
bags as many times as it appears in A, minus the
number of times it appears in B. - But never less than 0 times.
- Example 1,2,1,1 1,2,3 1,1.
13Beware Bag Laws ! Set Laws
- Some, but not all algebraic laws that hold for
sets also hold for bags. - Example the commutative law for union (R UNION
S S UNION R ) holds for bags. - Since addition is commutative, adding the number
of times x appears in R and S doesnt depend on
the order of R and S.
14Example of the Difference
- Set union is idempotent, meaning that S UNION S
S. - However, for bags, if x appears n times in S,
then it appears 2n times in S UNION S. - Thus S UNION S ! S in general.
15The Extended Algebra
- DELTA eliminate duplicates from bags.
- R1 DELTA(R2)
- TAU sort tuples.
- R1 TAUL (R2)
- Extended projection arithmetic, duplication of
columns. - R1 PROJAB,A,A (R)
- GAMMA grouping and aggregation SUM, AVG,
COUNT, MIN, and MAX - R1 GAMMAL (R2)
- Outerjoin avoids dangling tuples by padding
them with a special NULL symbol in the result.
16 17Query Processing
18Example
- Select B,D
- From R,S
- Where R.A c ? S.E 2 ? R.CS.C
19 R A B C S C D E a 1 10 10 x 2 b 1 20 2
0 y 2 c 2 10 30 z 2 d 2 35 40 x 1 e 3 45 50
y 3
20 How do we execute query?
-
- - Do Cartesian product
- - Select tuples
- - Do projection
One idea
21RXS R.A R.B R.C S.C S.D S.E a 1 10 10
x 2 a 1 10 20 y 2 . .
C 2 10 10 x 2 . .
22Relational Algebra - can be used to
describe plans...
- Ex Plan I
- ?B,D
-
- sR.Ac? S.E2 ? R.CS.C
- X
- R S
OR ?B,D sR.Ac? S.E2 ? R.C S.C (RXS)
23Another idea
Plan II
natural join
24 R S A B C s (R) s(S) C D
E a 1 10 A B C C D E 10
x 2 b 1 20 c 2 10 10 x 2 20 y
2 c 2 10 20 y 2 30 z 2 d 2
35 30 z 2 40 x 1 e 3 45
50 y 3
25Plan III
- Use R.A and S.C Indexes
- (1) Use R.A index to select R tuples with R.A
c - (2) For each R.C value found, use
S.Cindex to find matching tuples
(3) Eliminate S tuples S.E ? 2 (4) Join
matching R,S tuples, project B,D attributes
and place in result
26 R S A B C C D E a 1
10 10 x 2 b 1 20 20
y 2 c 2 10 30 z 2 d 2 35
40 x 1 e 3 45
50 y 3
A
C
I1
I2
27Overview of Query Optimization
28SQL query
parse
parse tree
convert
answer
logical query plan
execute
apply laws
statistics
Pi
improved l.q.p
pick best
estimate result sizes
(P1,C1),(P2,C2)...
l.q.p. sizes
estimate costs
consider physical plans
P1,P2,..
29Example SQL query
- SELECT title
- FROM StarsIn
- WHERE starName IN (
- SELECT name
- FROM MovieStar
- WHERE birthdate LIKE 1960
- )
- (Find the movies with stars born in 1960)
30Example Generating Relational Algebra
?title
?
StarsIn ltconditiongt
lttuplegt IN ?name
ltattributegt ?birthdate LIKE 1960
starName MovieStar
Fig. 7.15 An expression using a two-argument ?,
midway between a parse tree and relational
algebra
31Example Logical Query Plan
?title
?starNamename
?
StarsIn ?name
?birthdate LIKE 1960
MovieStar
Fig. 7.18 Applying the rule for IN conditions
32Example Improved Logical Query Plan
?title
Question Push project to StarsIn?
starNamename
StarsIn ?name
?birthdate LIKE 1960
MovieStar
Fig. 7.20 An improvement on fig. 7.18.
33Example Estimate Result Sizes
- Need expected size
- StarsIn
- MovieStar
P s
34Example Estimate costs
- L.Q.P
- P1 P2 . Pn
- C1 C2 . Cn
-
- Pick best!
35Query Optimization - In class order
- Relational algebra level
- Detailed query plan level
- Estimate Costs
- without indexes
- with indexes
- Generate and compare plans
36Relational algebra optimization
- Transformation rules
- (preserve equivalence)
- What are good transformations?
37Rules Natural joins cross products union
38Note
- Carry attribute names in results, so order is
not important - Can also write as trees, e.g.
-
- T R
R S S T
39Rules Natural joins cross products union
R S S R (R S) T R (S T)
- R x S S x R
- (R x S) x T R x (S x T)
- R U S S U R
- R U (S U T) (R U S) U T
40Rules Selects
sp1 sp2 (R) sp1 (R) U sp2 (R)
41Rules Project
Let X set of attributes Y set of
attributes XY X U Y pxy (R)
px py (R)
42Rules s combined
- Let p predicate with only R attribs
- q predicate with only S attribs
-
- sp (R S)
- sq (R S)
43Rules p,s combined
- Let x subset of R attributes
- z attributes in predicate P (subset of R
attributes) - pxsp (R)
sp px (R)
44Rules p, combined
Let x subset of R attributes y subset
of S attributes z intersection of R,S
attributes pxy (R S)
45Conventional wisdom - do projects early - do
project earlyNo transformation is always good.