Title: CS 728 Advanced Database Systems Chapter 15
1CS 728 Advanced Database Systems Chapter 15
- Database Design Theory Normalization Algorithms
2Chapter Outline
- 0. Designing a Set of Relations
- 1. Properties of Relational Decompositions
- 2. Algorithms for Relational Database Schema
- 3. Multivalued Dependencies and 4th Normal Form
- 4. Join Dependencies and 5th Normal Form
- 5. Inclusion Dependencies
- 6. Other Dependencies and Normal Forms
3 Designing a Set of Relations (1)
- The 1st approach is (Chapter 14) a Top-Down
Design (Relational Design by Analysis) - 1. Designing a conceptual schema in a high-level
data model, such as the EER model - 2. Mapping the conceptual schema into a set of
relations using mapping procedures. - 3. Each of the relations is analyzed based on the
functional dependencies and assigned primary
keys, by applying the normalization procedure to
remove partial and transitive dependencies if any
remain.
4 Designing a Set of Relations (2)
- The 2nd approach is (Chapter 15) a Bottom-Up
Design (Relational Design by Synthesis -
???????) - 1. First constructs a minimal set of FDs
- Assumes that all possible functional dependencies
are known. - 2. a normalization algorithm is applied to
construct a target set of 3NF or BCNF relations. - start by one large relation schema, called the
universal relation, which includes all the
database attributes. - then repeatedly perform decomposition until it is
no longer feasible or no longer desirable, based
on the functional and other dependencies
specified by the database designer.
5 Designing a Set of Relations (3)
- Additional criteria may be needed to ensure the
set of relations in a relational database are
satisfactory. - Two desirable properties of decompositions
- The dependency preservation property and
- The lossless (or nonadditive) join property
6 Designing a Set of Relations (4)
- When we decompose a relation schema R with a set
of functional dependencies F into R1, R2, , Rn
we want - Dependency preservation
- Otherwise, checking updates for violation of
functional dependencies may require computing
joins, which is expensive. - Nonadditive (Lossless) join decomposition
- Otherwise decomposition would result in
information loss. - No redundancy
- The relations Ri preferably should be in either
Boyce-Codd Normal Form or 3NF.
7 Designing a Set of Relations (5)
- Example R (A, B, C) F A?B, B?C
- Can be decomposed in two different ways
- R1 (A, B), R2 (B, C)
- Lossless-join decomposition
- R1 ? R2 B and B ? BC
- Dependency preserving
- R1 (A, B), R2 (A, C)
- Lossless-join decomposition
- R1 ? R2 A and A ? AB
- Not dependency preserving
- cannot check B ? C without computing R1 R2
8Properties of Relational Decompositions (1)
- Relation Decomposition and Insufficiency of
Normal Forms - Universal Relation Schema
- a relation schema R A1, A2, , An that
includes all the attributes of the database. - Universal relation assumption
- every attribute name is unique.
- Decomposition
- The process of decomposing the universal relation
schema R into a set of relation schemas - D R1, R2, , Rm that will become the
relational database schema by using the
functional dependencies.
9Properties of Relational Decompositions (2)
- Relation Decomposition and Insufficiency of
Normal Forms - Attribute preservation condition
- Each attribute in R will appear in at least one
relation schema Ri in the decomposition so that
no attributes are lost. - Another goal of decomposition is to have each
individual relation Ri in the decomposition D be
in BCNF or 3NF. - Additional properties of decomposition are
needed to prevent from generating spurious
(???????) tuples.
10Properties of Relational Decompositions (3)
- Example of spurious tuples.
- Decomposition of R (A, B)
- R1 (A) R2 (B)
A
B
A
B
? ? ?
1 2 1
? ?
1 2
?B(r)
?A(r)
r
A
B
?A(r) ?B(r)
? ? ? ?
1 2 1 2
11Properties of Relational Decompositions (4)
- Dependency Preservation Property of a
Decomposition - It would be useful if each functional dependency
X ? Y specified in F either - appeared directly in one of the relation schemas
in the decomposition D or - could be inferred from the dependencies that
appear in some Ri . - Informally, this is the
- dependency preservation condition.
12Properties of Relational Decompositions (5)
- Dependency Preservation Property of a
Decomposition - We want to preserve the dependencies because each
dependency in F represents a constraint on the
database. - If one of the dependencies is not represented in
some individual relation of the decomposition, we
cannot enforce this constraint by dealing with an
individual relation instead, - we have to join two or more of the relations in
the decomposition and then check that the
functional dependency holds in the result of the
join operation. - This is clearly an inefficient and impractical
procedure.
13Properties of Relational Decompositions (6)
- Dependency Preservation Property of a
Decomposition - It is not necessary that the exact dependencies
specified in F appear themselves in individual
relations of the decomposition D. - It is sufficient that the union of the
dependencies that hold on the individual
relations in D be equivalent to F. - We now define these concepts more formally.
14Properties of Relational Decompositions (7)
- Dependency Preservation Property of a
Decomposition - Definition
- Given a set of dependencies F on R, the
projection of F on Ri, denoted by ?Ri(F) where Ri
is a subset of R, is the set of dependencies X ?
Y in F such that the attributes in X ? Y are all
contained in Ri. - Hence, the projection of F on each relation
schema Ri in the decomposition D is the set of
functional dependencies in F, the closure of F,
such that all their left- and right-hand-side
attributes are in Ri.
15Properties of Relational Decompositions (8)
- Dependency Preservation Property of a
Decomposition - A decomposition D R1, R2, ..., Rm of R is
dependency-preserving with respect to F if the
union of the projections of F on each Ri in D is
equivalent to F that is, - ((?R1(F)) ? ? (?Rm(F))) F
- Claim 1 It is always possible to find a
dependency-preserving decomposition D with
respect to F such that each relation Ri in D is
in 3NF.
16Properties of Relational Decompositions (9)
- Consider a decomposition of R (R, F) into
- R1 (R1, F1) and R2 (R2, F2)
- How to compute the projections F1 and F2?
- Fi is the projection of FDs in F over Ri
- Example RABC and F A?B, B?C, C?A
- Let R1AB and R2BC
- Not enough to let F1 A?B and F2 B?C
- Consider FDs in F B?A and C?B
- So F1 A?B, B?A and F2 B?C, C?B
- Now F and F1 ? F2 are equivalent
17Properties of Relational Decompositions (10)
- Lossless (Non-additive) Join Property of a
Decomposition - This property ensures that no spurious tuples are
generated when a NATURAL JOIN operation is
applied to the relations in the decomposition. - Lossless join property a decomposition D R1,
R2, ..., Rm of R has the lossless (nonadditive)
join property with respect to the set of
dependencies F on R if, for every relation state
r of R that satisfies F, the following holds,
where is the natural join of all the relations
in D - (?R1(r), ..., ?Rm(r)) r
- r ? r1 r2 rm and
- r1 r2 rm ? r
18Properties of Relational Decompositions (11)
- Consider
- What happens if we decompose on
- (Id, Name, Address) and
- (C, Description, Grade)?
- Spurious tuples will be generated
19Properties of Relational Decompositions (12)
- Lossy Decomposition
- Problem Name is not a key
SSN Name Address SSN Name Name
Address 1111 Joe 1 Pine 1111 Joe
Joe 1 Pine 2222 Alice 2 Oak
2222 Alice Alice 2 Oak 3333 Alice
3 Pine 3333 Alice Alice 3 Pine
?
r1
r2
r
20Properties of Relational Decompositions (13)
- Lossless (Non-additive) Join Property of a
Decomposition - Note The word loss in lossless refers to loss of
information, not to loss of tuples. In fact, for
loss of information a better term is addition
of spurious information. - Algorithm 15.3 Testing for Lossless Join
Property - Input A universal relation R, a decomposition D
R1, R2, ..., Rm of R, and a set F of
functional dependencies.
21Properties of Relational Decompositions (14)
- 1. Create an initial matrix S with one row i for
each relation Ri in D, and one column j for each
attribute Aj in R. - 2. Set S(i, j) bij for all matrix entries.
- // each bij is a symbol associated with indices
(i, j) - 3. For each row i representing relation schema Ri
- For each column j representing attribute Aj
- if (relation Ri includes attribute Aj) then
S(i, j) aj - // each aj is a symbol associated with
index j
22Properties of Relational Decompositions (15)
- 4. Repeat until a complete loop execution results
in no changes to S - for each functional dependency X ?Y in F
- for all rows in S which have the same symbols
in the - columns corresponding to attributes
in X - make the symbols in each column that
correspond - to an attribute in Y be the same
in all these rows as - follows
- If any of the rows has an a symbol
for the - column, set the other rows to that
same a - symbol in the column.
-
23Properties of Relational Decompositions (16)
- If no a symbol exists for the attribute in
any of the rows, choose one of the b symbols
that appear in one of the rows for the
attribute and set the other rows to that same
b symbol in the column - 5. If a row is made up entirely of a symbols,
then the decomposition has the lossless join
property otherwise it does not.
24Properties of Relational Decompositions (17)
- Lossless (non-additive) join test for n-ary
decompositions. - (a) Case 1 Decomposition of EMP_PROJ into
EMP_PROJ1 and EMP_LOCS fails test. - (b) A decomposition of EMP_PROJ that has the
lossless join property. - (c) Case 2 Decomposition of EMP_PROJ into EMP,
PROJECT, and WORKS_ON satisfies test.
25Properties of Relational Decompositions (18)
26Properties of Relational Decompositions (19)
27Properties of Relational Decompositions (20)
- Binary Decomposition
- decomposition of a relation R into two relations.
- Non-additive (Lossless) Join Test for Binary
decompositions (NJB) - A decomposition D R1, R2 of R has the
lossless join property with respect to a set of
functional dependencies F on R if and only if
either - The FD ((R1 n R2) ? (R1- R2)) is in F, or
- The FD ((R1 n R2) ? (R2 - R1)) is in F.
28Properties of Relational Decompositions (21)
- Intuition for Test for Losslessness
- Suppose R1 ? R2 ? R2. Then a row of r1 can
combine with exactly one row of r2 in the
natural join (since in r2 a particular set of
values for the shared attributes defines a unique
row), i.e., - R1 ? R2 is a superkey of R2
R1?R2 R1?R2 . a
a ... a b
. b c .
c r1 r2
29Properties of Relational Decompositions (22)
- Schema (R, F) where
- R SSN, Name, Address, Hobby
- F SSN ? Name, Address
- can be decomposed into
- R1 SSN, Name, Address
- F1 SSN ? Name, Address
- and
- R2 SSN, Hobby
- F2
- Since R1 ? R2 SSN and
- SSN ? R1- R2
- SSN ? Name, Address is in F, then
- the decomposition is lossless
30Properties of Relational Decompositions (23)
- Example WRT the FD set
- Id ? Name, Address
- C ? Description
- Id, C ? Grade
- Is
- (Id, Name, Address) and
- (Id, C, Description, Grade)
- a lossless decomposition?
31Properties of Relational Decompositions (24)
- A relation scheme
- Sname, Sadd, City, Zip, Item, Price
- The FD set
- Sname ? Sadd, City
- Sadd, City ? Zip
- Sname, Item ? Price
- Consider the decomposition
- Sname, Sadd, City, Zip and
- Sname, Item, Price
- Is it lossless?
- Is it dependency preserving?
- What if we replaced the first FD by
- Sname, Sadd ? City?
32Properties of Relational Decompositions (25)
- The scheme
- Student, Teacher, Subject
- The FD set
- Teacher ? Subject
- Student, Subject ? Teacher
- The decomposition
- Student, Teacher and
- Teacher, Subject
- Is it lossless?
- Is it dependency preserving?
33Properties of Relational Decompositions (26)
- Claim 2 (Preservation of non-additivity in
successive decompositions) - If a decomposition D R1, R2, ..., Rm of R has
the lossless (non-additive) join property with
respect to a set of functional dependencies F on
R, and - if a decomposition Di Q1, Q2, ..., Qk of Ri
has the lossless (non-additive) join property
with respect to the projection of F on Ri, - then the decomposition D2 R1, R2, ..., Ri-1,
Q1, Q2, ..., Qk, Ri1, ..., Rm of R has the
lossless (non-additive) join property with
respect to F.
34Algorithms for RDB Schema Design (1)
- Algorithm 15.4 Relational Synthesis into 3NF
with Dependency Preservation - Input A universal relation R and a set of
functional dependencies F on the attributes of R. - Find a minimal cover G for F
- For each left-hand-side X of a functional
dependency that appears in G, create a relation
schema in D with attributes X ? A1 ? A2 ...
? Ak, where X ? A1, X ? A2, ..., X ? Ak are
the only dependencies in G with X as
left-hand-side (X is the key of this relation) - Place any remaining attributes (that have not
been placed in any relation) in a single relation
schema to ensure the attribute preservation
property.
35Algorithms for RDB Schema Design (2)
- A set of FDs F is minimal if it satisfies the
following conditions - Every FD in F is of the form X?A, where A is a
single attribute, - We cannot remove any dependency from F and have a
set of dependencies that is equivalent to F. - For no X?A in F is F-X?A equivalent to F.
- We cannot replace any dependency X?A in F with a
dependency Y?A, where Y is a proper-subset of X
(Y subset-of X) and still have a set of
dependencies that is equivalent to F. - For no X?A in F and Y?X is F-X?A?Y?A
equivalent to F.
36Algorithms for RDB Schema Design (3)
- Examples
- A?C, A?B is a minimal cover for AB?C, A?B
- What about AB?C, B ? AB, D?BC?
- Every set of FDs has an equivalent minimal set
- There can be several equivalent minimal sets
- There is no simple algorithm for computing a
minimal set of FDs that is equivalent to a set F
of FDs - To synthesize a set of relations, we assume that
we start with a set of dependencies that is a
minimal set.
37Algorithms for RDB Schema Design (4)
- Two sets of FDs F and G are equivalent if
- Every FD in F can be inferred from G, and
- Every FD in G can be inferred from F
- Hence, F and G are equivalent if F G
- Definition (Covers)
- F covers G if every FD in G can be inferred from
F - (i.e., if G is subset-of F)
- F and G are equivalent if F covers G and G covers
F - There is an algorithm for checking equivalence of
sets of FDs
38Algorithms for RDB Schema Design (5)
- Algorithm 15.2 Finding a minimal cover G for F
- 1. Set G F.
- 2. Replace each FD X?A1, A2, ..., Ak in G by
the n functional dependencies X?A1, X?A2 , ,
X?Ak. - 3. For each FD X?A in G
- For each attribute B that is an element of X
- if ((G -X?A) ? (X-B)?A) is equivalent to G,
- then replace X?A with (X-B)?A in G.
- 4. For each remaining FD X?A in G,
- if (G-X?A) is equivalent to G, then remove X?A
from G.
39Algorithms for RDB Schema Design (6)
- Example
- A?B, ABCD?E, EF?GH, ACDF?EG
- Make RHS a single attribute
- A?B, ABCD?E, EF?G, EF?H, ACDF?E, ACDF?G
- Minimize LHS ACD?E instead of ABCD?E
- Eliminate redundant FDs
- Can ACDF?G be removed?
- Can ACDF?E be removed?
- Final answer A?B, ACD?E, EF?G, EF?H
40Algorithms for RDB Schema Design (7)
- Minimal Cover Exercise
- Compute the minimal cover of the following set of
functional dependencies - ABC ? DE, BD ? DE, E ? CF, EG ? F
- ABC ? D
- ABC ? E //
- BD ? D // reflexive
- BD ? E
- E ? C
- E ? F
- EG ? F // augmentation
- The minimal cover is
- ABC ? D, BD ? E, E ? C, E ? F
41Algorithms for RDB Schema Design (8)
- Example of Algorithm 15.4 (3NF Decomposition)
- Consider
- the relation R CSJDPQV
- FDs F C?CSJDPQV, SD?P, JP?C,J?S
- Find minimal cover
- C?J, C?D, C?Q, C?V, SD?P, JP?C, J?S
- New relations
- R1CJDQV, R2JPC,
- R3JS, R4SDP
-
42Algorithms for RDB Schema Design (9)
- Algorithm 15.5 Relational Decomposition into
BCNF with Lossless (non-additive) join property - Input A universal relation R and a set of
functional dependencies F on the attributes of R. - Set D R
- While there is a relation schema Q in D that is
not in BCNF - do
- choose a relation schema Q in D that is not in
BCNF - find a FD X?Y in Q that violates
BCNF - replace Q in D by two relation
schemas (Q-Y) (X?Y) -
43Algorithms for RDB Schema Design (10)
- Example of Algorithm 15.5 (BCNF Decomposition)
- R (branch-name, branch-city, assets,
customer-name, loan-number, amount) - F branch-name ? branch-city, assets
- loan-number ? branch-name, amount
- Key loan-number, customer-name
- Decomposition
- R1 (branch-name, branch-city, assets)
- R2 (branch-name, customer-name, loan-number,
amount) - R3 (loan-number, branch-name, amount)
- R4 (loan-number, customer-name)
- Final decomposition R1, R3, R4
44Algorithms for RDB Schema Design (11)
- Example of Algorithm 15.5 (BCNF Decomposition)
- R (A, B, C)F A ? B, B ? CKey A
- R is not in BCNF
- Decomposition R1 (A, B), R2 (B, C)
- R1 and R2 in BCNF
- Lossless-join decomposition
- Dependency preserving
45Algorithms for RDB Schema Design (12)
- Algorithm 15.6 Relational Synthesis into 3NF with
Dependency Preservation and Lossless
(Non-Additive) Join Property - Input A universal relation R and a set of
functional dependencies F on the attributes of R. - Find a minimal cover G for F (Use Algorithm
15.2). - For each left-hand-side X of a functional
dependency that appears in G, create a relation
schema in D with attributes X?A1?A2...?
Ak, where X?A1, X?A2, ..., X?Ak are the only
dependencies in G with X as left-hand-side (X is
the key of this relation). - If none of the relation schemas in D contains a
key of R, then create one more relation schema in
D that contains attributes that form a key of R. - Eliminate redundant relations from the resulting
set of relations. A relation T is considered
redundant if T is a projection of another
relation S.
46Algorithms for RDB Schema Design (13)
- Algorithm 15.2a Finding a Key K for R Given a set
F of Functional Dependencies - Input A universal relation R and a set of
functional dependencies F on the attributes of R. - Set K R
- For each attribute A in K
- compute (K - A) with respect to F
- If (K - A) contains all the attributes in R,
- then set K K - A
47Algorithms for RDB Schema Design (14)
- Discussion of Normalization Algorithms
- Problems
- The database designer must first specify all the
relevant functional dependencies among the
database attributes. - These algorithms are not deterministic in
general. - It is not always possible to find a decomposition
into relation schemas that preserves dependencies
and allows each relation schema in the
decomposition to be in BCNF (instead of 3NF).
48Algorithms for RDB Schema Design (15)
Table 15.1 Summary of some of the algorithms
discussed
49Multivalued Dependencies and 4th Normal Form (1)
- Beyond BCNF
- CustService (State, SalesPerson, Delivery)
Is this BCNF?
50Multivalued Dependencies and 4th Normal Form (2)
- Everything is in the key -- must be BCNF
- Still problems with duplication
- Multivalued Dependencies
51Multivalued Dependencies and 4th Normal Form (3)
- At least three attributes (A, B, C)
- A B and A C
- B and C are independent of each other (they
really shouldnt be in the same table)
52Multivalued Dependency and 4NF
- Multivalued dependency (MVD)
- Consequence of first normal form (1NF)
53Multivalued Dependency and 4NF
- Relations containing nontrivial MVDs
- All-key relations
- Fourth normal form (4NF)
- Violated when a relation has undesirable
multivalued dependencies
54Multivalued Dependencies and 4th Normal Form (4)
- Definition
- A multivalued dependency (MVD) X gtgt Y specified
on relation schema R, where X and Y are both
subsets of R, specifies the following constraint
on any relation state r of R - If two tuples t1 and t2 exist in r such that
t1X t2X, then two tuples t3 and t4 should
also exist in r with the following properties,
where we use Z to denote (R - (X ? Y)) - t3X t4X t1X t2X
- t3Y t1Y and t4Y t2Y.
- t3Z t2Z and t4Z t1Z.
- An MVD X gtgt Y in R is called a trivial MVD if
- (a) Y is a subset of X, or
- (b) X ? Y R
55Multivalued Dependencies and 4th Normal Form (5)
- 4th Normal Form
- BCNF with no multivalued dependencies
- Create separate tables for each separate
functional dependency
56Multivalued Dependencies and 4th Normal Form (6)
- (a) The EMP relation with two MVDs ENAME gtgt
PNAME and ENAME gtgt DNAME. (b) Decomposing the
EMP relation into two 4NF relations EMP_PROJECTS
and EMP_DEPENDENTS.
57Multivalued Dependencies and 4th Normal Form (7)
SalesForce (State, SalesPerson) Delivery
(State, Delivery)
58Multivalued Dependencies and 4th Normal Form (8)
- Inference Rules for Functional and Multivalued
Dependencies - IR1 (reflexive rule for FDs)
- If X ? Y, then X gt Y.
- IR2 (augmentation rule for FDs)
- X gt Y ?? XZ gt YZ.
- IR3 (transitive rule for FDs)
- X gt Y, Y gtZ ?? X gt Z.
- IR4 (complementation rule for MVDs)
- X gtgt Y ?? X gtgt (R (X ? Y)).
- IR5 (augmentation rule for MVDs)
- If X gtgt Y and W ? Z then WX gtgt YZ.
- IR6 (transitive rule for MVDs)
- X gtgt Y, Y gtgt Z ?? X gtgt (Z - Y).
- IR7 (replication rule for FD to MVD)
- X gt Y ?? X gtgt Y.
- IR8 (coalescence (???????) rule for FDs and
MVDs) - If X gtgt Y and there exists W with the properties
that (a) W ? Y is empty, (b) W gt Z, and (c) Y ?
Z, then X gt Z.
59Multivalued Dependencies and 4th Normal Form (9)
- A relation schema R is in 4NF with respect to a
set of dependencies F (that includes functional
dependencies and multivalued dependencies), at
least one of the following hold - X gtgt Y is trivial (i.e., Y ? X or X ? Y R)
- X is a superkey for schema R
- If a relation is in 4NF it is in BCNF
60Multivalued Dependencies and 4NF (10)
- Decomposing a relation state of EMP that is not
in 4NF. (a) EMP relation with additional tuples.
(b) Two corresponding 4NF relations EMP_PROJECTS
and EMP_DEPENDENTS.
61Multivalued Dependencies and 4NF (11)
- Algorithm 15.7 Relational decomposition into 4NF
relations with non-additive join property - Input A universal relation R and a set of
functional and multivalued dependencies F. - Set D R
- While there is a relation schema Q in D that is
not in 4NF do - choose a relation schema Q in D that is not in
4NF - find a nontrivial MVD X gtgt Y in Q that
violates 4NF - replace Q in D by two relation schemas (Q - Y)
and (X ? Y) -
62Multivalued Dependencies and 4NF (12)
- R (A, B, C, G, H, I)
- F A gtgt B, B gtgt HI, CG gtgt H
- R is not in 4NF since A gtgt B and A is not a
superkey for R - Decomposition
- R1 (A, B) (R1 is in 4NF)
- R2 (A, C, G, H, I) (R2 is not in 4NF)
- R3 (C, G, H) (R3 is in 4NF)
- R4 (A, C, G, I) (R4 is not in 4NF)
- Since A gtgt B and B gtgt HI, A gtgt HI, A gtgt I
- R5 (A, I) (R5 is in 4NF)
- R6 (A, C, G) (R6 is in 4NF)