Title: Lecture 13: Relational Decomposition and Relational Algebra
1Lecture 13Relational Decomposition and
Relational Algebra
2Summary of Previous Discussion
- FDs are given as part of the schema. You derived
keys from them. You can also specify keys
directly (theyre just another FD) - First, compute the keys and the FDs that hold on
the relation youre considering. - Then, look for violations of BCNF. There may be a
few. Choose one.
3Summary (continued)
- You may need to decompose the decomposed
relations - To decide that, you need to project the FDs on
the decomposed relations. - The end result may differ, depending on which
violation you chose to decompose by. Theyre all
correct. - Relations with 2 attributes are in BCNF.
4Summary (continued 2)
- If you have only a single FD representing a key,
then the relation is in BCNF. - But, you can still have another FD that forces
you to decompose. - This is all really pretty simple if you think
about it long enough.
5Boyce-Codd Normal Form
A simple condition for removing anomalies from
relations
A relation R is in BCNF if Whenever there
is a nontrivial dependency A1, ..., An ? B
in R , A1, ..., An is a key for R
In English (though a bit vague) Whenever a
set of attributes of R is determining another
attribute, should determine all the
attributes of R.
6Example Decomposition
Person(name, SSN, age, hairColor,
phoneNumber) SSN ? name, age, hairColor age ?
hairColor
Decompose in BCNF (in class) Step 1 find all
keys ssn, phoneNumber SSN, name, age,
hairColor SSN, phoneNumber Step 2 now decompose
7Other Example
- R(A,B,C,D) A B, B C
- Keys
- Violations of BCNF
8Correct Decompositions
- A decomposition is lossless if we can recover
- R(A,B,C)
- R1(A,B) R2(A,C)
- R(A,B,C) should be the same
as R(A,B,C)
Decompose
Recover
R is in general larger than R. Must ensure R
R
9Correct Decompositions
- Given R(A,B,C) s.t. A?B, the decomposition into
R1(A,B), R2(A,C) is lossless
103NF A Problem with BCNF
Unit Company
Product
FDs Unit ? Company Company, Product ?
Unit So, there is a BCNF violation, and we
decompose.
Unit Company
Unit ? Company
Unit Product
No FDs
11So Whats the Problem?
Unit Company
Unit Product
Galaga99 UW Galaga99
databases Bingo UW
Bingo databases
No problem so far. All local FDs are
satisfied. Lets put all the data back into a
single table again
Unit Company
Product
Galaga99 UW
databases Bingo UW
databases
Violates the dependency company, product -gt
unit!
12Solution 3rd Normal Form (3NF)
A simple condition for removing anomalies from
relations
A relation R is in 3rd normal form if Whenever
there is a nontrivial dependency A1, A2, ..., An
? Bfor R , then A1, A2, ..., An a super-key
for R, or B is part of a key.
13Relational Algebra
- Formalism for creating new relations from
existing ones - Its place in the big picture
Declartivequerylanguage
Algebra
Implementation
Relational algebraRelational bag algebra
SQL,relational calculus
14Relational Algebra
- Five operators
- Union ?
- Difference -
- Selection s
- Projection P
- Cartesian Product ?
- Derived or auxiliary operators
- Intersection, complement
- Joins (natural,equi-join, theta join, semi-join)
- Renaming r
151. Union and 2. Difference
- R1 ? R2
- Example
- ActiveEmployees ? RetiredEmployees
- R1 R2
- Example
- AllEmployees -- RetiredEmployees
16What about Intersection ?
- It is a derived operator
- R1 ? R2 R1 (R1 R2)
- Also expressed as a join (will see later)
- Example
- UnionizedEmployees ? RetiredEmployees
173. Selection
- Returns all tuples which satisfy a condition
- Notation sc(R)
- Examples
- sSalary gt 40000 (Employee)
- sname Smithh (Employee)
- The condition c can be , lt, ?, gt, ?, ltgt
18Find all employees with salary more than
40,000. s Salary gt 40000 (Employee)
194. Projection
- Eliminates columns, then removes duplicates
- Notation P A1,,An (R)
- Example project social-security number and
names - P SSN, Name (Employee)
- Output schema Answer(SSN, Name)
20P SSN, Name (Employee)
215. Cartesian Product
- Each tuple in R1 with each tuple in R2
- Notation R1 ? R2
- Example
- Employee ? Dependents
- Very rare in practice mainly used to express
joins
22(No Transcript)
23Relational Algebra
- Five operators
- Union ?
- Difference -
- Selection s
- Projection P
- Cartesian Product ?
- Derived or auxiliary operators
- Intersection, complement
- Joins (natural,equi-join, theta join, semi-join)
- Renaming r
24Renaming
- Changes the schema, not the instance
- Notation r B1,,Bn (R)
- Example
- rLastName, SocSocNo (Employee)
- Output schema Answer(LastName, SocSocNo)
25Renaming Example
Employee
Name
SSN
John
999999999
Tony
777777777
- LastName, SocSocNo (Employee)
LastName
SocSocNo
John
999999999
Tony
777777777
26Natural Join
- Notation R1 ? R2
- Meaning R1 ? R2 PA(sC(R1 ? R2))
- Where
- The selection sC checks equality of all common
attributes - The projection eliminates the duplicate common
attributes
27Natural Join Example
Employee
Name
SSN
John
999999999
Tony
777777777
Dependents
SSN
Dname
999999999
Emily
777777777
Joe
Name
SSN
Dname
John
999999999
Emily
Tony
777777777
Joe
28Natural Join
A B
X Y
X Z
Y Z
Z V
B C
Z U
V W
Z V
A B C
X Z U
X Z V
Y Z U
Y Z V
Z V W
29Natural Join
- Given the schemas R(A, B, C, D), S(A, C, E), what
is the schema of R ? S ? - Given R(A, B, C), S(D, E), what is R ? S ?
- Given R(A, B), S(A, B), what is R ? S ?
30Theta Join
- A join that involves a predicate
- R1 ? q R2 s q (R1 ? R2)
- Here q can be any condition
31Eq-join
- A theta join where q is an equality
- R1 ?AB R2 s AB (R1 ? R2)
- Example
- Employee ?SSNSSN Dependents
- Most useful join in practice
32Semijoin
- R ? S P A1,,An (R ? S)
- Where A1, , An are the attributes in R
- Example
- Employee ? Dependents
33Semijoins in Distributed Databases
- Semijoins are used in distributed databases
Dependents
Employee
SSN Dname Age
. . . . . .
SSN Name
. . . . . .
network
Employee ?ssnssn (s agegt71 (Dependents))
T P SSN s agegt71 (Dependents)
R Employee ? T
Answer R ? Dependents
34Complex RA Expressions
P name
P pid
P ssn
snamefred
snamegizmo
- Person Purchase Person
Product
35Operations on Bags
- A bag a set with repeated elements
- All operations need to be defined carefully on
bags - a,b,b,c?a,b,b,b,e,f,fa,a,b,b,b,b,b,c,e,f,f
- a,b,b,b,c,c b,c,c,c,d a,b,b,d
- sC(R) preserve the number of occurrences
- PA(R) no duplicate elimination
- Cartesian product, join no duplicate elimination
- Important ! Relational Engines work on bags, not
sets !
Reading assignment 5.3 5.4
36Finally RA has Limitations !
- Cannot compute transitive closure
- Find all direct and indirect relatives of Fred
- Cannot express in RA !!! Need to write C program
Name1 Name2 Relationship
Fred Mary Father
Mary Joe Cousin
Mary Bill Spouse
Nancy Lou Sister