Title: Relational Algebra
1Relational Algebra
2What is Relational Algebra?
- Defines operations (data retrieval) for
relational model - SQLs DML (Data Manipulation Language) has data
retrieval facilities, which are equivalent to
that of relational algebra. - SQL and relational algebra are not for complex
operations they support efficient, easy access
of large data sets.
3Basics
- Relational Algebra is defined on bags, rather
than relations (sets). - Bag or multiset allows duplicate values but
order is not significant. - We can write an expression using relational
algebra operators with parentheses - Need closure an operator on bag returns a bag.
- Relational algebra includes set operators, and
other operators specific to relational model.
4Set Operators
- Union, Intersection, Difference, cross product
- Union, Intersection and Difference are defined
only for union compatible relations. - Two relations are union compatible if they have
the same set of attributes and the types
(domains) of the attributes are the same. - Eg of two relations that are not union
compatible - Student (sNumber, sName)
- Course (cNumber, cName)
5Union ?
- Consider two bags R1 and R2 that are
union-compatible. Suppose a tuple t appears in R1
m times, and in R2 n times. Then in the union, t
appears m n times.
R1 ? R2
R1
R2
A B
1 2
1 2
1 2
3 4
3 4
5 6
A B
1 2
3 4
1 2
A B
1 2
3 4
5 6
6Intersection n
- Consider two bags R1 and R2 that are
union-compatible. Suppose a tuple t appears in R1
m times, and in R2 n times. Then in the
intersection, t appears min (m, n) times.
R1
R2
R1 n R2
A B
1 2
3 4
A B
1 2
3 4
1 2
A B
1 2
3 4
5 6
7Difference -
- Consider two bags R1 and R2 that are
union-compatible. Suppose a tuple t appears in R1
m times, and in R2 n times. Then in R1 R2, t
appears max (0, m - n) times.
R1
R2
R1 R2
A B
1 2
3 4
1 2
A B
1 2
3 4
5 6
A B
1 2
8Bag semantics vs Set semantics
- Union is idempotent for sets
- (R1 ? R2) ? R2 R1 ? R2
- Union is not idempotent for bags.
- Intersection is idempotent for sets and bags.
- Difference is idempotent for sets, but not for
bags. - For sets and bags, R1 ? R2 R1 (R1 R2).
9Cross Product (Cartesian Product) X
- Consider two bags R1 and R2. Suppose a tuple t1
appears in R1 m times, and a tuple t2 appears in
R2 n times. Then in R1 X R2, t1t2 appears mn
times.
R1 X R2
R1
R2
A R1.B R2.B C
1 2 2 3
1 2 2 3
1 2 4 5
1 2 4 5
1 2 4 5
1 2 4 5
A B
1 2
1 2
B C
2 3
4 5
4 5
10Basic Relational Operations
- Select, Project, Join
- Select denoted sC (R) selects the subset of
tuples of R that satisfies selection condition C.
C can be any boolean expression, its clauses can
be combined with AND, OR, NOT.
s(C 6) (R)
R
A B C
1 2 5
3 4 6
1 2 7
1 2 7
A B C
3 4 6
1 2 7
1 2 7
11Select
- Select is commutative sC2 (sC1 (R)) sC1 (sC2
(R)) - Select is idempotent sC (sC (R)) sC (R)
- We can combine multiple select conditions into
one condition. sC1 (sC2 ( sCn (R))) sC1 AND
C2 AND Cn (R)
12Project pA1, A2, , An (R)
- Consider relation (bag) R with set of attributes
AR. pA1, A2, , An (R), where A1, A2, , An ? AR
returns the tuples in R, but only with columns
A1, A2, , An.
pA, B (R)
R
A B C
1 2 5
3 4 6
1 2 7
1 2 8
A B
1 2
3 4
1 2
1 2
13Project Bag Semantics vs Set Semantics
- For bags, the cardinality of R cardinality of
pA1, A2, , An (R). - For sets, cardinality of R cardinality of
pA1,A2, , An (R). - For sets and bags
- project is not commutative
- project is idempotent
14Natural Join R ? S
- Consider relations (bags) R with attributes AR,
and S with attributes AS. Let A AR n AS. R ? S
can be defined as - pAR A, A, AS - A (sR.A1 S.A1 AND R.A2 S.A2
AND R.AnS.An (R X S)) - where A A1, A2, , An
- The above expression says select those tuples
in R X S that agree in values for each of the A
attributes, and project the resulting tuples such
that we have only one value for each A attribute.
15Natural Join example
R1
R2
R1 ? R2
A B
1 2
1 2
B C
2 3
4 5
4 5
A B C
1 2 3
1 2 3
16Theta Join R ?C S
- Theta Join is similar to cartesian product,
except that we can specify any condition C. It is
defined as - R ?C S (sC (R X S))
R1 ? R1.BltR2.BR2
R1
R2
A R1.B R2.B C
1 2 4 5
1 2 4 5
1 2 4 5
1 2 4 5
A B
1 2
1 2
B C
2 3
4 5
4 5
17Outer Join R ?o S
- Similar to natural join, however, if there is a
tuple in R, that has no matching tuple in S, or
a tuple in S that has no matching tuple in R,
then that tuple also appears, with null values
for attributes in S (or R).
R1 ?o R2
R1
R2
A B C D
1 2 3 10
1 2 3 11
4 5 6 null
7 8 9 null
null 6 7 12
A B C
1 2 3
4 5 6
7 8 9
B C D
2 3 10
2 3 11
6 7 12
18Left Outer Join R ?oLS
- Similar to natural join, however, if there is a
tuple in R, that has no matching tuple in S,
then that tuple also appears, with null values
for attributes in S (note a tuple in S that has
no matching tuple in R does not appear).
R1 ?oL R2
R1
R2
A B C D
1 2 3 10
1 2 3 11
4 5 6 null
7 8 9 null
A B C
1 2 3
4 5 6
7 8 9
B C D
2 3 10
2 3 11
6 7 12
19Right Outer Join R ?oRS
- Similar to natural join, however, if there is a
tuple in S, that has no matching tuple in R,
then that tuple also appears, with null values
for attributes in R (note a tuple in R that has
no matching tuple in S does not appear).
R1 ?oR R2
R1
R2
A B C D
1 2 3 10
1 2 3 11
null 6 7 12
A B C
1 2 3
4 5 6
7 8 9
B C D
2 3 10
2 3 11
6 7 12
20Renaming ?S(A1, A2, , An) (R)
- Rename relation R to S, attributes of R are
renamed to A1, A2, , An - ?S (R) renames relation R to S, keeping the
attributes same.
?S(X, C, D) (R2)
?S (R2)
R2
S
S
B C D
2 3 10
2 3 11
6 7 12
X C D
2 3 10
2 3 11
6 7 12
B C D
2 3 10
2 3 11
6 7 12
21Example Introducing new relations
Find the semijoin of 2 relations R, S. Semijoin
denoted R ? S is defined as the tuples in R, such
that for a tuple t1 in R, if there exists a tuple
t2 in S, and t1 and t2 agree in all attributes
common to R and S, then t1 appears in the result.
R1 R ? S R2 pAR (R1) R ? S R2 ? R
22Duplicate Elimination ? (R)
? (R)
R
A B
1 2
3 4
A B
1 2
3 4
1 2
1 2
23Extended Projection pL (R)
- Here L can be
- An attribute (just like simple projection)
- An expression x ? y, where x and y are names of
attributes, this renames attribute x to y. - An expression E ? z, where E is any expression
involving attributes, constants, and arithmetic
and string operators. This has an attribute
called z whose values are given by E.
pB?A, CD?X, C, D (R)
R
A X C D
2 13 3 10
2 14 3 11
6 19 7 12
B C D
2 3 10
2 3 11
6 7 12
24Aggregation operators
- MIN, MAX, COUNT, SUM, AVG
- AGGB (R) considers only non-null values of R.
SUMB (R)
COUNTB (R)
MINB (R)
R
SUMB (R)
9
MINB (R)
2
COUNTB (R)
3
A B
1 2
3 4
1 null
1 3
AVGB (R)
COUNT (R)
MAXB (R)
AVGB (R)
3
COUNT (R)
4
MAXB (R)
4
25Aggregation Operators
- MIN, MAX, SUM, AVG must be on any 1 attribute.
COUNT can be on any 1 attribute or COUNT (R) - An aggregation operator returns a bag, not a
single value ! But SQL allows treatment as a
single value.
sBMAXB (R) (R)
A B
3 4
26Grouping Operator ?GL, AL (R)
- ?GL, AL (R) groups all attributes in GL, and
performs the aggregation specified in AL.
? starName, MIN (year)?year, COUNT(title) ?num
(StarsIn)
starName year num
HF 77 3
KR 94 2
StarsIn
title year starName
SW1 77 HF
Matrix 99 KR
6D7N 93 HF
SW2 79 HF
Speed 94 KR
27Sorting Operator ?L (R)
- It sorts the tuples in R. If L is list A1, A2, ,
An, it first sorts by A1, then by A2, and so on. - Sort is used as a last operator in an expression.
?A,B (R)
R
A B C
1 2 5
3 1 6
1 2 7
1 3 8
A B C
1 2 5
1 2 7
1 3 8
3 1 6
28Relational Algebra Operators
- Set Operators Union, Intersection, Difference,
Cartesian Product - Select, Project
- Join Natural Join, Theta Join, (Left/Right)
Outer Join - Renaming, Duplicate Elimination
- Aggregation MIN, MAX, COUNT, SUM, AVG
- Grouping, Sorting