Title: RELATIONAL ALGEBRA
1RELATIONAL ALGEBRA
2FUNCTIONAL DEPENDENCIES (FD)
- A functional dependency (FD) is a constraint
between two sets of attributes. - A functional dependency denoted by X ? Y (read as
"Y is functionally dependent on X"). - The left-hand side of the FD is some times called
as the determinant and the right-hand side is
called dependent. - Example
- (1) SSN ? Name
- (2) PNumber ? PName, PLocation
- (3) SSN, PNumber ? Hours
3SupplierId ? City SupplierId, ProdId ? City
4FD Diagrams
FD1
FD2
FD1 SSN ? Name, BDate, Addr, DNumber FD2
DNumber ? DName, DMgrSSN
5Closure of a set of FDs
- Given some FDs, new FDs can often be inferred.
- The set of all FDs that are implied by a given
set F of FDs is called the closure of F and is
denoted by F . - Example
- Let F AB ? C, C ? B be a set of FDs
satisfied by R (A, B, C). Then, - F A ? A, B ? B, AB ? AC, AB ? BC, AB ?
ABC, etc.
6Armstrong's Axioms
- IR-1 Reflexivity If X ? Y, then X ? Y.
- (if X is a superset of Y or Y is a subset of X)
- IR-2 Augmentation If X ? Y, then XZ ? YZ.
- IR-3 Transitivity If X ? Y and Y ? Z,
- then X ? Z.
- IR-4 Decomposition If X ? YZ, then X ? Y
- and X ? Z.
- IR-5UnionIf X ? Y and X ? Z, then X? YZ.
- IR-6Pseudo-Transitivity
7Example
- Supposing we are given a relation R A, B, C, D,
E, F with a set of FDs as shown below - A ? BC, B ? E , CD ? EF
- Show that the FD AD ? F holds for R
- and is a member of the closure.
- (1) A ? BC CD ? EF Given
- (2) A ? C A ? B Decomposition of (1)
- (3) AD ? CD Augmentation of (2) by adding D
- (4) AD ? CD CD ? EF
- AD ? EF Transitivity of (3) and (1)
- (5) AD ? E AD ? F Decomposition of (5)
8Attribute closure, F
- To compute F , start with FDs in F Repeatedly
apply IR-1 to IR-6 until no new FD can be derived - Armstrong's Axioms do not produce any incorrect
FDs that are added to F . However, finding F
is too expensive the complexity grows
exponentially - The solution is to find the attribute closure of
X, denoted as X
9Algorithm to find X
- Algorithm Attribute_Closure()
-
- X X
- Repeat
- for each FD A ? B in F do
- if A ? X then X ? B
- // i. e. if A is in X , then add B to X
- until no change
- // until no more attributes are added to X
-
10Example
- Consider R (A, B, C) and a set of FDs
- F AB ? C, C ? B
- Using the Algorithm, we calculate the following
closure sets with respect to F - A A, (Reflexivity)
- B B, (Reflexivity)
- C C, B (C?B) because of FD-2
- AB ABC because of FD-1 add attribute
C - AC ACB because of AC ? AB (IR-2)
Augmentation attribute B - BC BC nothing can be added
- ABC ABC nothing can be added
11Decompositions
- The basic idea in decomposition is to split a
relation into smaller relation schemas. - We address the problem of redundancy to a large
extent. - We must ensure that when a relation is
decomposed, the integrity constraints are
maintained.
12Example
Branch Loan
13Lossy and Lossless Decomposition
- Lossless decomposition A relation R is said to
be a lossless decomposition into R1 and R2 iff.
the natural join of these two relations gives
back the original relation R. - Lossy decomposition The natural join of R1 and
R2 does not provide the original relation R, then
it is said to be a lossy decomposition. - Spurious tuples The effect of lossy
decomposition is that when R1 and R2 are joined,
some extra tuples will creep in. These extra
tuples are called as spurious tuples.
14Example
Rows shown with blue color background are called
spurious tuples.
15- Definition Let R be a relation schema. A
decomposition of R, denoted by D R1, R2, ...,
Rn, is a set of relation schemas such that R
R1 ? R2 ? ... ? Rn. - If R1, R2, ..., Rn is a decomposition of R and
r is an instance of R, then - r ? ?R1(r) ?R2(r) . . . ?Rn(r)
16Lossless Join Decomposition
- Theorem Let F be a set of FDs over R, and D
R1, R2 be a decomposition of R. D is a
lossless-join decomposition if and only if - R1 ? R2 ? R1 - R2 is in F or
- R1 ? R2 ? R2 - R1 is in F .
- The above FDs imply that the attributes common to
R1 and R2 must contain a key for either R1 or R2.
17Example
CollegeCode ? Rank CollegeCode ? City
CollegeRank
Lossy R1 CollegeCode, Rank R2 Rank, City
Lossless R1 CollegeCode, Rank R2
CollegeCode, City
18Candidate Keys
- Finding candidate keys
- Step-1 Draw the dependency graph of F. Each
vertex corresponds to an attribute. - Edges can be defined as follows
- A ? B becomes A B
- A ? BC becomes A B
- C
- AB ? C becomes A
- B C
- Step-2 Identify the set of vertices Vni that
have no incoming edges. - Step-3 Identify the set of vertices Voi that
have only incoming edges. - Step-4 A candidate key is a set of attributes
that - contains all attributes in Vni.
- contains no attributes in Voi.
- has no subset that is already a candidate key.
19Example - 1
- Consider R (A, B, C, G, H, I), and
- F A ? BC, CG ? HI, B ? H.
B
H
C
A
I
G
No Incoming Edges Vni A, G Only Incoming
Edges Voi H, I (AG) A, B, C, G, H, I,
AG is the only candidate key
20Example - 2
- Consider R (A, B, C, D, E, H), and
- F A ? B, AB ? E, BH ? C, C ? D, D ? A
A
B
C
D
E
H
Vni H and Voi E Candidate keys AH, BH,
CH, and DH
21Introduction
- What is relational algebra?
- Relational algebra is a collection of operations
that are used to manipulate the entire set of
relations. The output of any relational algebra
operation is always a relation. - What are the Operations?
- Set operations like union, intersection,
difference, and Cartesian product. - Selection, Projection, and Joining.
22Sample Relations
Employee
Department
Project
23The SELECT operation (?)
- The select operation retrieves a subset of tuples
in a relation that satisfy a selection condition.
- ?ltselection-conditiongt (Relation)
- Query-1 Find the employees whose salary is
greater than 10,000 rupees. - ?Salary gt 10000 (Employee)
- Query-2 Find the employees who work for
department 3 and whose salary is greater than
30,000 rupees. - Emp3 ? ?DNo 3 (Employee)
- Result ? ?Salary gt 30000 (Emp3)
- Or,
- Result ? ?DNo 3 and Salary gt 30000 (Employee)
24- One of the interesting properties of the
selection operation is that it is commutative.
Therefore, all the expressions shown below are
equivalent, - ?condition-1 (?condition-2 (R))
- ?(condition-1 AND condition-2) (R)
- ?condition-2 (?condition-1 (R))
25The Projection Operation (?)
- The projection operation is used to select only
few columns from a relation. - ?ltattributesgt (Relation)
- Query-5 List the name and salary of all the
employees. - ?Name, Salary (Employee)
- Query-6 Print the project name and their
locations. - ?PName, PLocation (Project)
- Query -7 Retrieve the Name and Salary of all
employees working for department 1. - Result ? ?Name, Salary (?DNo 1 (Employee))
26- Query-8 Find the name, address, and salary of
the employees who earn more than 25000 rupees. - ?Name, Salary, Addr (?Salary gt 25000 (Employee))
- Query-9 List the name and the location of the
projects not controlled by department 2. - ?PName, PLocation (?DNum ? 2) (Project))
27Union (?)
- The result of this operation, i.e. R ? S, is a
relation that includes all tuples that are either
in R or in S or in both. Duplicate tuples will
not appear in the output. - Union Compatibility Let R and S are two
relations with attributes (A1, A2, , An) and
(B1, B2, .., Bn) respectively. If R and S are to
be unioned, it should satisfy the following two
rules - Rule 1 The relations R and S must have the same
degree. That is, the number of attributes of R
and S must be same. - Rule 2 The domain of the ith attribute of R and
the domain of the ith attribute of S must be
same. - dom(Ai) dom(Bi), where, 1 ? i ? n.
28Example
- Query-10 Retrieve the SSN of all employees who
either work in department 3 or directly supervise
an employee who works in 3. - Emp_Dept3 ? ?SSN (?DNo 3 (Employee))
- Supervisors ? ?SuperSSN (?DNo 3 (Employee))
- Result ? Emp_Dept3 ? Supervisors
- Emp_Dept3 Supervisors
- 2222 4444
- 4444 null
- 5555
Result 2222 4444 5555 null
29Intersection (n)
- The expression R n S returns all tuples that
appear in both the relations R and S. - Query-12 Find the fist name and the last name of
people who are teachers as well as students.
Student Teacher
30Difference
- This operation, written as R S (set difference)
returns all tuples that are in R but not in S. - Query-13 Find the students who are not teachers.
- Result ? Student Teacher
- Susan, Joseph, Andy
- Query-14 Find the teachers who are not students.
- Result ? Teacher Student
- Nandagopal
31Cartesian Product ()
- The Cartesian product or cross-product is a
binary operation that is used to combine two
relations. Assuming R and S as relations with n
and m attributes respectively, the Cartesian
product, R S can be written as, - R (A1, A2, , An) S (B1, B2, , Bm)
- The result of the above set operation is,
- Q (A1, A2, , An, B1, B2, , Bm)
- Where,
- Degree (Q) n m
- count(Q) Number of tuples in R Number of
tuples in S.
32- Query-15 Find for each female employee, all the
names of her dependents. - Dependent
- ESSN DependentName Sex BDate Relationship
- 1111 Pradeep M 05-Jul-63 Brother
- 3333 Sonali F 15-Aug-85 Daughter
- 3333 Rahul M 01-Jan-80 Son
- 4444 Miruthula F 02-Aug-83 Spouse
- FemaleEmp ? ?SSN, Name (?Sex F (Employee))
- FemaleDeps ? FemaleEmp Dependent
- Result ? ?Name, DependentName (?SSN ESSN
(FemaleDeps))
33Join Operations
- Natural join ()
- Equijoin ( )
- Theta join (?)
- Outer join ( )
34Natural Join ()
- When we omit the condition during joining of two
relations, then it is called as natural join ().
When we use the word join, it invariably means
natural join. We shall formally define natural
join as - Let R and S are the two relations with the
attributes as shown below - R(X1, X2, .., Xm, Y1, Y2, , Yn) and
- S(Y1, Y2, , Yn , Z1, Z2, , ZP)
- Example
- Employee (SSN, Name, Addr, DNo)
- Department (DNo, DName)
35Theta Join (?)
- Let R and S are two relations. Consider an
attribute x in R, and an attribute y in S. The
theta join of these two relations can be written
as, - R S where x ? y
- where ? indicates a valid relational operator.
36Equijoin ( )
- The most widely used join operation is equijoin.
As discussed in the previous subsection, when ?
is , this type of ?-join (a special case) is
called equijoin. We shall use the notation to
denote equijoin. The general form for this kind
of join is, - R ltJoin_conditiongt S
- ltJoin_conditiongt must always have operator
37- Query-16 Retrieve the name and salary of the
manager of each department. - Managers ? Department MgrSSN SSN Employee
- Gnames ? ?Name, Salary (Managers)
- Query-17 Find which project(s) and location(s)
Pooja is working on. - EmpPooja ? ?Name 'Pooja' (Employee)
- PoojaPrj ? EmpPooja SSN ESSN WorksOn
- Result ? ? Name, PName, PLocation (PoojaPrj
WrksOn.PNo Project.PNumber Project)
38Outer Joins
- Dangling tuples in Join
- Usually, only a subset of tuples of each relation
will actually participate in a join, i.e. only
tuples that match with the joining attributes.
Tuples of a relation not participating in a join
are called dangling tuples. - How do we keep dangling tuples in the result of a
join? Use null values to indicate a no-join
situation. There are three types of outer joins - - left outer join
- - right outer join and
- - outer join (full outer join).
39- Left outer-join
- R1 R2 is similar to a natural join but
keep all dangling tuples of R1. - Right Outer-Join
- R1 R2 is similar to a natural join but
keep all dangling tuples of R2. - Outer Join (full outer-join)
- R1 R2 is similar to a natural join but
keep all dangling tuples of both R1 and R2. - The advantages of outer join is to take the
union of tuples from two relations that are not
union compatible.
40Example
Students Awards
Dangling tuples
Students Awards
41Division
- Consider two relations R and S. Assume that R has
only two attributes r1 and r2 and S has only one
attribute r2 with the same domain as in R. This
is to ensure that the degree of the numerator is
more than the degree of the denominator. Now we
shall define R / S as, - For each r1 value in R, consider the set of r2
values that appear in tuples of R with that r2
value. If this set contains S, the r1 is in the
result of R / S. - One possible restriction in this definition is
that every attribute in S should be in R. - Alternatively, we can define R / S as,
- For, R(a1, a2, ., an, b1, b2, ., bn) / S(b1,
b2, ., bn) and T ?a1, a2, ., an (R), return
the subset of T, say W, such that every tuple in
W S is in R. W is the largest subset of T,
such that, - (W S) ? R
42Example
Electives3
Electives1
Electives2
Options/Electives3
Options/Electives1
Options/Electives2
43GROUPING AND AGGREGATE FUNCTIONS
- We shall define Function formally using the
symbol ? (pronounced as script F ) as, - grouping_attributes ? aggregate_functions
(Relation) - The steps involved in the evaluation of this
function is, - Partition the relation into groups.
- Apply aggregate function to each group.
- Output group and aggregate values, one tuple per
group.
44- Query-18 Retrieve department number, number of
employees and their average salary. - Result (DNo, N, AvgSal) ? DNo ? Count(SSN),
Avg(Salary) (Employee) - What happens when grouping attribute is dropped?
- ? Count(SSN), Avg(Salary) (Employee)
45