Title: CS 453: Database Systems
1CS 453 Database Systems
Chapter 4 Relational Normalization
2Chapter outline
- Introduction
- Limitations of E-R Designs (Redundancy,
Anomalies) - Schema Refinement (Decomposition)
- Relational Normalization Theory
- Functional Dependencies (FD)
- Armstrongs Axioms for FD
- Derived inference rules
- Entailment, Closure, Equivalence
- Normal Forms (1NF, 2NF, 3NF, BCNF)
- Decomposition
3Limitations of E-R Designs
- Provides a set of guidelines, does not result in
a unique database schema - Does not provide a way of evaluating alternative
schemas - Normalization theory provides a mechanism for
analyzing and refining the schema produced by an
E-R design
4Problem Redundancy
- Dependencies between attributes cause redundancy
- Eg. All addresses in the same town have the same
zip code
5Problem Example
ER Model
Relational Model
6Problem Anomalies
- Redundancy leads to anomalies
- Update anomaly A change in Address must be made
in several places - Deletion anomaly Suppose a person gives up all
hobbies. Do we - Set Hobby attribute to null? No, since Hobby is
part of key - Delete the entire row? No, since we lose other
information in the row - Insertion anomaly Hobby value must be supplied
for any inserted row since Hobby is part of key
7Solution Decomposition
- Solution use two relations to store Person
information - Person1 (SSN, Name, Address)
- Hobbies (SSN, Hobby)
- The decomposition is more general people with
hobbies can now be described - No update anomalies
- Name and address stored once
- A hobby can be separately supplied or deleted
8Normalization Theory
- Result of E-R analysis need further refinement
- Appropriate decomposition can solve problems
- The underlying theory is referred to as
normalization theory and is based on functional
dependencies (and other kinds, like multi-valued
dependencies)
9Functional Dependence
- Existence dependence The existence of B depends
on A - Functional dependence Bs value depends on As
value - EmpName is functionally dependent on EmpNo
- Given the EmpNo, I can one and only one value of
EmpName - Constraints on the set of legal relation
instances - Require that the value for a certain set of
attributes determines uniquely the value for
another set of attributes. - Functional dependence is a generalization of the
notion of a key
10Functional Dependencies
- Definition A functional dependency (FD) on a
relation schema R is a constraint X ? Y, where X
and Y are subsets of attributes of R - An FD X ? Y is satisfied in an instance r of R if
for every pair of tuples, t1 and t2 if t1 and
t2 agree on all attributes in X then they must
agree on all attributes in Y - Key constraint is a special kind of functional
dependency all attributes of relation occur on
the right-hand side of the FD - SSN ? SSN, Name, Address
11Functional Dependencies
- Let R be a relation schema ? ? R, ? ? R
- The functional dependency ? ? ?
- Definition
- If two tuples agree on the attributes
- then they must also agree on the attributes
- Formally
R ( A, B, C, D, E ) ? A, B, C ? C, D
A1 , A2 , An
B1 , B2 , Bm
12Functional Dependencies
- holds on R if and only if for any legal relation
r(R), whenever any two tuples t1 and t2 of r
agree on the attributes ?, they also agree on the
attributes ?. That is, - t1? t2? ? t1? t2?
- True for all tuple pairs
- True for all instances
13Examples
- EmpID Name, Phone, Position
- Position Phone
- but Phone Position
14In General
- To check A ? B, erase all other columns
- check if the remaining relation is many-one
(called functional in mathematics)
15Typical Examples of FDs
Product name ? price, manufacturerPerson
ssn ? name, ageCompany name ?
stockprice, president
- loan-number ? amountloan-number ?
branch-nameloan-number ? customer-name
?
Another example reverse of the fds above
16Alternative Definitions of Keys
- K is a superkey for relation schema R if and only
if K ? R - This is the uniqueness property of key
- K is a candidate key for R if and only if
- K ? R, and
- there is no ? ? K, ? ? R ?make sure key is
shortest possible (minimality)
17Use of Functional Dependencies
- We use functional dependencies to
- test relations to see if they are legal under a
given set of functional dependencies. If a
relation r is legal under a set F of functional
dependencies, we say that r satisfies F. - Specify constraints on the set of legal
relations we say that F holds on R if all legal
relations on R satisfy the set of functional
dependencies F.
A specific instance of a relation schema may
satisfy a functional dependency even if the
functional dependency does not hold on all legal
instances. For example, a specific instance of
Loan-schema may, by chance, satisfy loan-number ?
customer-name.
18Armstrongs Axioms for FDs
- This is the syntactic way of computing/testing
the various properties of FDs - Reflexivity If Y ? X, then X ? Y (trivial FD)
- Name, Address ? Name
- Augmentation If X ? Y, then X Z ? YZ
- If Town ? Zip then Town, Name ? Zip, Name
- Transitivity If X ? Y and Y ? Z, then X ? Z
19Derived inference rules
- These additional rules are not essential their
soundness can be proved using Armstrongs Axioms - Union if X Y and X Z, then X YZ
- Decomposition if X YZ, then X Y and X Z
- Pseudo-transitivity if X Y and WY Z, then WX
Z - Exercise Prove rules Decomposition and
Pseudo-transitivity using A.A
20Entailment, Closure, Equivalence
- Definition If F is a set of FDs on schema R and
f is another FD on R, then F entails f if every
instance r of R that satisfies every FD in F also
satisfies f - Ex F A ? B, B? C and f is A ? C
- If Streetaddr ? Town and Town ? Zip then
Streetaddr ? Zip - Definition The closure of F, denoted F, is the
set of all FDs entailed by F - Definition F and G are equivalent if F entails G
and G entails F
21Closure of a Set of Functional Dependencies
- Given a set of functional dependencies F, there
are certain other functional dependencies that
are logically implied by F. - The set of all functional dependencies logically
implied by F is the closure of F. - We denote the closure of F by F.
- We can find all of F by applying Armstrongs
Axioms - if ? ? ?, then ? ? ? (reflexivity)
- if ? ? ?, then ?? ? ?? (augmentation)
- if ? ? ? and ?? ?, then ? ? ? (transitivity)
- these rules are sound and complete
22Derived inference rules for F
- We can further simplify computation of F by
using the following additional rules. - If ? ? ? holds and ? ? ? holds, then ? ? ?? holds
(union) - If ? ? ?? holds, then ? ? ? holds and ? ? ? holds
(decomposition) - If ? ? ? holds and ?? ? ? holds, then ?? ? ?
holds (pseudo-transitivity) - The above rules can be inferred from Armstrongs
axioms. - E.g., ? ? ?, ?? ? ? (given)
- ?? ? ?? (by augmentation)
- ?? ? ? (by transitivity)
23Exercise
- Given loan-no? amount
- Does loan-no, branch-name ? amount
- Why???
- It is not covered by any of the above axioms, so
we must derive it - loan-no, branch-name ? loan-no (reflexivity)
- loan-no? amount (given)
- loan-no, branch-name ? amount (transitivity)
24Examples of Armstrongs Axioms
- We can find all of F by applying Armstrongs
Axioms - if ? ? ?, then ? ? ? (reflexivity)loan-no ?
loan-no loan-no, amount ? loan-noloan-no,
amount ? amount - if ? ? ?, then ?? ? ?? (augmentation)loan-no ?
amount (given)loan-no, branch-name ? amount,
branch-name - if ? ? ? and ?? ?, then ? ? ? (transitivity)loan-
no ? branch-name (given) branch-name ?
branch-city (given)loan-no ? branch-city
25Example
- R (A, B, C, G, H, I)F A ? B A ? C CG
? H CG ? I B ? H - some members of F
- A ? H
- by transitivity from A ? B and B ? H
- AG ? I
- by augmenting A ? C with G, to get AG ? CG
and then transitivity with CG ? I - CG ? HI
- from CG ? H and CG ? I union rule can be
inferred from - definition of functional dependencies, or
- Augmentation of CG ? I to infer CG ? CGI,
augmentation ofCG ? H to infer CGI ? HI, and
then transitivity
26Generating F
F AB? C
AB? BCD A? D AB? BD
AB? BCDE AB? CDE D? E
BCD ? BCDE
union
decomp
aug
trans
Aug,ref
Thus, AB? BD, AB ? BCD, AB ? BCDE, and AB ? CDE
are all elements of F
27Attribute Closure
- Calculating attribute closure leads to a more
efficient way of checking entailment - The attribute closure of a set of attributes, X,
with respect to a set of functional dependencies,
F, (denoted XF) is the set of all attributes,
A, such that X ? A - X F1 is not necessarily the same as X F2 if F1
? F2 - Attribute closure and entailment
- Algorithm Given a set of FDs, F, then X ? Y if
and only if XF ? Y
28Closure of Attribute Sets
- Define the closure of ? under F (denoted by ?)
as the set of attributes that are functionally
determined by ? under F ? ? ? is in F ? ? ?
?Given loan-noIf loan-no ? amountthen amount
is part of loan-no I.e., loan-no
loan-no,amount, If loan-no ? branch-namethen
branch-name is part of loan-no I.e., loan-no
loan-no,amount, branch-name If loan-no ?
customer-name then continue .Else stop
? is a set of attributes
29Example Computing Attribute Closure XF
X XF A A, D, E AB
A, B, C, D, E (Hence AB is
a key) B B D D, E
F AB ? C A ? D D ? E AC
? B
Is AB ? E entailed by F? Yes Is D? C
entailed by F? No Result XF allows us
to determine FDs of the form X ? Y entailed by F
30Computation of Attribute Closure XF
closure X // since X ?
XF repeat old closure if there is an
FD Z ? V in F such that Z ?
closure and V ? closure then closure
closure ? V until old closure If T ?
closure then X ? T is entailed by F
31Example Computation of Attribute Closure
Problem Compute the attribute closure of AB with
respect to the set of FDs
AB ? C (a) A ? D (b)
D ? E (c) AC ? B (d)
Solution
Initially closure AB Using (a) closure
ABC Using (b) closure ABCD Using (c)
closure ABCDE
32Normal Forms
- Each normal form is a set of conditions on a
schema that guarantees certain properties
(relating to redundancy and update anomalies) - First normal form (1NF) is the same as the
definition of relational model (relations sets
of tuples each tuple sequence of atomic
values) - Second normal form (2NF) a research lab
accident has no practical or theoretical value
wont discuss - The two commonly used normal forms are third
normal form (3NF) and Boyce-Codd normal form
(BCNF)
33First Normal Form 1NF
- Atomicity is actually a property of how the
elements of the domain are used. - E.g. Strings would normally be considered
indivisible - Suppose that students are given roll numbers
which are strings of the form CS0012 or EE1127 - If the first two characters are extracted to find
the department, the domain of roll numbers is not
atomic - Doing so is a bad idea leads to encoding of
information in application program rather than in
the database
34First Normal Form 1NF
- A relation is said normalized (or in first normal
form 1NF), if and only if it possesses a key and
that every value of column is atomic - Example The relation TYPE_OF_BOOK following is
not 1NF - Possible normalization
35Second Normal Form 2NF
- A relation is said in second normal form (2NF),
if and only if it is 1NF and that every column
non key depends elementarily on the primary key - Example Let's consider the relation R(A, B, C,
D) with the following functional dependencies - A ? B A, D ? C B ? C
- The graph of the functional dependencies is the
next one - Transitive closing ?
- Key of the relation ?
- Relation in second normal form?
36Second Normal Form 2NF
The relation is not 2NF because B that doesn't
belong at the key depends on a part of the key
that is A, D
R1(A, B) A ? B R2(A, D, C) A, D ? C R3(B, C)
B ? C
Decomposition in 3 relations 2NF
37Third Normal Form 3NF
- A relation is said in third normal form (3NF), if
and only if it is 2NF and that none of its
attributes non keys depends on another attribute
non key - Example Let's consider the relation R(A, B, C,
D) with the following functional dependencies - A ? B B ? C C ? D
- The graph of the functional dependencies is the
next one - Transitive closing ?
- Key of the relation ?
- Relation in third normal form?
38Third Normal Form 3NF
The relation is not 3NF because C depends on an
attribute non key (D also)
R1(A, B) A ? B R2(B, C) B ? C R3(C, D) C ? D
Decomposition in 3 relations 3NF
39Boyce-Codd Normal Form BCNF
- Let's consider relation Wines (Raw, Country,
Region) with the supposes functional dependences
- Region ? Country
- (Raw, Country) ? Region
- This relation is well in third normal form
because no attribute non key doesn't depends on a
part of the key or on an attribute non key.
However, one finds numerous redundancies
40Boyce-Codd Normal Form BCNF
- In order to eliminate these redundancies, Boyces
and Codds introduced a normal form that carries
their names (Boyce Codd Normal Form / BCNF) - A relation is in BCNF if and only if the only
elementary functional dependencies are those in
which a key determines an attribute -
- Relation Wines will be able to be decomposed in
two relations - Raw (Raw, Region)
- Regions (Region, Country)
- The functional dependence (Raw, Country) ? Region
is lost but it can be recomposed by joint. -
41Boyce-Codd Normal Form BCNF
- Definition A relation schema R is in BCNF if for
every FD X ? Y associated with R either - Y ? X (i.e., the FD is trivial) or
- X is a superkey of R
- Example Person1(SSN, Name, Address)
- The only FD is SSN ? Name, Address
- Since SSN is a key, Person1 is in BCNF
- (non) BCNF Examples Person (SSN, Name, Address,
Hobby) - The FD SSN ? Name, Address does not satisfy
requirements of BCNF - since the key is (SSN, Hobby)
42Comparison of BCNF and 3NF
- A relational schema R is in 3NF if for every FD
X ? Y associated with R either - Y ? X (i.e., the FD is trivial) or
- X is a superkey of R or
- Every A ? Y is part of some key of R
- 3NF is weaker than BCNF (every schema that is in
BCNF is also in 3NF)
BCNF conditions
43Redundancy
- Suppose R has a FD A ? B. If an instance has 2
rows with same value in A, they must also have
same value in B (gt redundancy, if the A-value
repeats twice) - If A is a superkey, there cannot be two rows with
same value of A - Hence, BCNF eliminates redundancy
SSN ? Name, Address SSN Name
Address Hobby 1111 Joe 123 Main
stamps 1111 Joe 123 Main coins
redundancy
44Decompositions
- Goal Eliminate redundancy by decomposing a
relation into several relations in a higher
normal form - Decomposition must be lossless it must be
possible to reconstruct the original relation
from the relations in the decomposition - We will see why
45Decomposition
- Schema R (R, F)
- R is a set of attributes
- F is a set of functional dependencies over R
- The decomposition of schema R is a collection of
schemas Ri (Ri, Fi) where - R ?i Ri for all i (no new attributes)
- Fi is a set of functional dependences involving
only attributes of Ri - F entails Fi for all i (no new FDs)
- The decomposition of an instance, r, of R is a
set of relations ri ?Ri(r) for all i
46Example Decomposition
Schema (R, F) where R SSN, Name,
Address, Hobby F SSN? Name, Address
can be decomposed into R1 SSN, Name,
Address F1 SSN ? Name, Address and
R2 SSN, Hobby F2
47Relational Database Design Review
- Two approaches to DB design
- 1) Design ER model, then translate to relation
schemes - 2) Put every attribute together in one relation,
identify all the functional dependencies, and
then decompose into 3NF at least - The first approach is more popular, but
relational theory helps formalizing some concepts
such as key (what does it mean by A key uniquely
identifies the tuples?) - Identifying the FDs is part of the DB design
process it helps you understand the requirements
better
48Relational Normalization Summary
- Normalization theory provides a mechanism for
analyzing and refining the schema produced by an
E-R design. - Result of E-R analysis need further refinement
- Eliminate redundancy by decomposing a relation
into several relations in a higher normal form - The underlying theory is referred to as
normalization theory and is based on functional
dependencies - ? ? ?, if and only if, ? t1 and t2 in r(R) t1?
t2? ? t1? t2? - Armstrongs Axioms for FD
- Derived inference rules
- Entailment, Closure, Equivalence
- Each normal form is a set of conditions on a
schema that guarantees certain properties
(relating to redundancy and update anomalies) - The two commonly used normal forms are 3NF and
BCNF - Every schema that is in BCNF is also in 3NF