Title: Normalization
1Normalization
2Why Normalization?
- To remove potential redundancy in design
- Redundancy causes several anomalies insert,
delete and update - Redundancy wastes storage, and often slows down
query processing
3Insert Anomaly
Student
sNumber sName pNumber pName
s1 Dave p1 MM
s2 Greg p2 ER
Question Could we insert any professor ? Note
We cannot insert a professor who has no students.
Insert Anomaly We are not able to insert valid
value/(s)
4Delete Anomaly
Student
sNumber sName pNumber pName
s1 Dave p1 MM
s2 Greg p2 ER
Question Can we delete a student and keep a
professor info ? Note We cannot delete a student
that is the only student of a professor.
Delete Anomaly We are not able to perform a
delete without losing some valid information.
Note In both cases, minimum cardinality of
Professor in the corresponding ER schema is 0
5Update Anomaly
Student
sNumber sName pNumber pName
s1 Dave p1 MM
s2 Greg p1 MM
Question Can we simply update a professors
name ? Note To update the name of a professor,
we have to update in multiple tuples.
Update Anomaly To update a value, we have to
update multiple rows.
Update anomalies are due to redundancy.
Note the maximum cardinality of Professor in the
corresponding ER schema is
6Normalization
- Need a method to find dependencies between
attributes - Functional dependencies
- Need a method to remove such harmful
dependencies, when they exist - Relational decomposition
- Break R (A,B,C,D) into R1 (A, B) and R2 (B, C, D)
7Keys Revisited
- A key for a relation R (a1, a2, , an) is a set
of attributes, K, that together uniquely
determine the values for all attributes of R. - A key is minimal no subset of K is a key.
- A superkey may not be minimal
- A prime attribute an attribute that is part of a
key
8Keys Example
Student
sNumber sName address
1 Dave 144FL
2 Greg 320FL
Primary Key ltsNumbergt Candidate key
ltsNamegt Some superkeys ltsNumber, addressgt,
ltsNamegt, ltsNumbergt, ltsNumber, sNamegt,
ltsNumber, sName, addressgt Prime Attribute
sNumber, sName
9Functional Dependencies (FDs)
Student
sNumber group address
1 DB 144FL
2 AI 320FL
Suppose we have the FD group? address That is,
there is a function from group to
address Meaning For any two rows in the
Student relation with the same value for group,
the value for address must be same.
10FD and Keys
Student
sNumber group address
1 DB 144FL
2 AI 320FL
Primary Key ltsNumbergt FD group ? address
- Questions
- Does a key implies functional dependencies?
Which ones ? - Does a functional dependency imply keys ? Which
ones ?
Observation Any key (primary or candidate) or
superkey of a relation R functionally determines
all attributes of R.
11Properties of FDs
- Consider A, B, C, Z are sets of attributes
- Reflexive (trivial FD) if A ? B, then A ? B
- Transitive if A ? B, and B ? C, then A ? C
- Augmentation if A ? B, then AZ ? BZ
- Union if A ? B, A ? C, then A ? BC
- Decomposition if A ? BC, then A ? B, A ? C
- Note Sound and complete inference rules for FDs
12Inferring FDs
- Suppose we have
- a relation R (A, B, C) and
- functional dependencies A ? B, B ? C, C ? A
- Questions
- What is a key for R?
- Should we split R into multiple relations?
- We can infer A ? ABC, B ? ABC, C ? ABC.
- Hence A, B, C are all keys.
13Reasoning About FDs
- An FD f is implied by a set of FDs F if f
holds whenever all FDs in F hold. - Closure of F, denoted by F, is the set of all
FDs that are implied by F. - Computing closure F of a set of FDs can be
expensive. - Size of closure is exponential in attrs!
14Reasoning About FDs
- But given question
- Is X ? Y in closure of a set of FDs
F? - Fortunately, computing just attribute closure is
sufficient (and linear time complexity) - Compute attribute closure of X, denoted X, wrt
F - Set of all attributes A such that X ? A is in F
- Check if Y is in X . If yes, then X ? Y in
F.
15Reasoning About FDs (Contd.)
- Does F A B, B C, C D E
imply A E? - Question
- i.e, is A E in the closure F ?
- Equivalent Question
- Is E in the attribute closure ?
16Algorithm for Inference of FDs
- Computing the closure of set of attributes A1,
A2, , An - Let X A1, A2, , An
- If there exists a FD B1, B2, , Bm ? C,
such that every Bi ? X, then X X ? C - Repeat step 2 until no more attributes can be
added. - A1, A2, , An X
17Inferring FDs Example 1
- Consider R (A, B, C, D, E)
- with FDs A ?? B, B ? C, CD ? E
- Does A ? E? (Is A ? E in F ?)
- Rephrase as Is E in A ?
- Let us compute A
- A A, B, C
- Therefore, A ? E is false
18Inferring FDs Example 2
- Given R (A, B, C), and
- FDs A ? B, B ? C, C ? A
- What are possible keys for R ?
- Compute the closure of attributes
- A A, B, C
- B A, B, C
- C A, B, C
- So keys for R are ltAgt, ltBgt, ltCgt
19Decomposing Relations
StudentProf
FDs pNumber ? pName
20Decomposition
- Decomposition
- Must be Lossless (no spurious tuples)
21Decomposition Lossless Join
StudentProf
sNumber sName pNumber pName
s1 Dave p1 MM
s1 Dave p2 MM
s2 Greg p1 MM
s2 Greg p2 MM
Spurious Tuples
22Normalization
- Once decided, what is the algorithm for
(lossless) decomposing?
23Normalization Step Decompose
- Consider relation R with set of attributes AR.
Consider a FD A ? B
(such that no other attribute in
(AR A B) is functionally determined by A). - If A is not a superkey for R,
we may decompose R as - Create R with attributes (AR B)
- Create R with attributes A ? B
- Key for R A
- Foreign key R (A) references R (A)
24Example Decomposition Revisited
StudentProf
sNumber sName pNumber pName
s1 Dave p1 MM
s2 Greg p2 MM
FDs pNumber ? pName
Student
Professor
sNumber sName pNumber
s1 Dave p1
s2 Greg p2
pNumber pName
p1 MM
p2 MM
FOREIGN KEY Student (PNum) references Professor
(PNum)
25Normalization
- How do I decide if I need to further decompose?
- Once decided, what is the algorithm for
decomposing?