CS 728 Advanced Database Systems Chapter 15

About This Presentation

Title:

CS 728 Advanced Database Systems Chapter 15

Description:

CS 728 Advanced Database Systems Chapter 15 Database Design Theory: Normalization Algorithms Chapter Outline 0. Designing a Set of Relations 1. Properties of ... – PowerPoint PPT presentation

Number of Views:376

Avg rating:3.0/5.0

Slides: 63

Provided by: justEduJo

Category:

more less

Transcript and Presenter's Notes

Title: CS 728 Advanced Database Systems Chapter 15

1
CS 728 Advanced Database Systems Chapter 15

Database Design Theory Normalization Algorithms

2
Chapter Outline

0. Designing a Set of Relations
1. Properties of Relational Decompositions
2. Algorithms for Relational Database Schema
3. Multivalued Dependencies and 4th Normal Form
4. Join Dependencies and 5th Normal Form
5. Inclusion Dependencies
6. Other Dependencies and Normal Forms

3
Designing a Set of Relations (1)

The 1st approach is (Chapter 14) a Top-Down
Design (Relational Design by Analysis)
1. Designing a conceptual schema in a high-level
data model, such as the EER model
2. Mapping the conceptual schema into a set of
relations using mapping procedures.
3. Each of the relations is analyzed based on the
functional dependencies and assigned primary
keys, by applying the normalization procedure to
remove partial and transitive dependencies if any
remain.

4
Designing a Set of Relations (2)

The 2nd approach is (Chapter 15) a Bottom-Up
Design (Relational Design by Synthesis -
???????)
1. First constructs a minimal set of FDs
Assumes that all possible functional dependencies
are known.
2. a normalization algorithm is applied to
construct a target set of 3NF or BCNF relations.
start by one large relation schema, called the
universal relation, which includes all the
database attributes.
then repeatedly perform decomposition until it is
no longer feasible or no longer desirable, based
on the functional and other dependencies
specified by the database designer.

5
Designing a Set of Relations (3)

Additional criteria may be needed to ensure the
set of relations in a relational database are
satisfactory.
Two desirable properties of decompositions
The dependency preservation property and
The lossless (or nonadditive) join property

6
Designing a Set of Relations (4)

When we decompose a relation schema R with a set
of functional dependencies F into R1, R2, , Rn
we want
Dependency preservation
Otherwise, checking updates for violation of
functional dependencies may require computing
joins, which is expensive.
Nonadditive (Lossless) join decomposition
Otherwise decomposition would result in
information loss.
No redundancy
The relations Ri preferably should be in either
Boyce-Codd Normal Form or 3NF.

7
Designing a Set of Relations (5)

Example R (A, B, C) F A?B, B?C
Can be decomposed in two different ways
R1 (A, B), R2 (B, C)
Lossless-join decomposition
R1 ? R2 B and B ? BC
Dependency preserving
R1 (A, B), R2 (A, C)
Lossless-join decomposition
R1 ? R2 A and A ? AB
Not dependency preserving
cannot check B ? C without computing R1 R2

8
Properties of Relational Decompositions (1)

Relation Decomposition and Insufficiency of
Normal Forms
Universal Relation Schema
a relation schema R A1, A2, , An that
includes all the attributes of the database.
Universal relation assumption
every attribute name is unique.
Decomposition
The process of decomposing the universal relation
schema R into a set of relation schemas
D R1, R2, , Rm that will become the
relational database schema by using the
functional dependencies.

9
Properties of Relational Decompositions (2)

Relation Decomposition and Insufficiency of
Normal Forms
Attribute preservation condition
Each attribute in R will appear in at least one
relation schema Ri in the decomposition so that
no attributes are lost.
Another goal of decomposition is to have each
individual relation Ri in the decomposition D be
in BCNF or 3NF.
Additional properties of decomposition are
needed to prevent from generating spurious
(???????) tuples.

10
Properties of Relational Decompositions (3)

Example of spurious tuples.
Decomposition of R (A, B)
R1 (A) R2 (B)

A
B
A
B
? ? ?
1 2 1
? ?
1 2
?B(r)
?A(r)
r
A
B
?A(r) ?B(r)
? ? ? ?
1 2 1 2
11
Properties of Relational Decompositions (4)

Dependency Preservation Property of a
Decomposition
It would be useful if each functional dependency
X ? Y specified in F either
appeared directly in one of the relation schemas
in the decomposition D or
could be inferred from the dependencies that
appear in some Ri .
Informally, this is the
dependency preservation condition.

12
Properties of Relational Decompositions (5)

Dependency Preservation Property of a
Decomposition
We want to preserve the dependencies because each
dependency in F represents a constraint on the
database.
If one of the dependencies is not represented in
some individual relation of the decomposition, we
cannot enforce this constraint by dealing with an
individual relation instead,
we have to join two or more of the relations in
the decomposition and then check that the
functional dependency holds in the result of the
join operation.
This is clearly an inefficient and impractical
procedure.

13
Properties of Relational Decompositions (6)

Dependency Preservation Property of a
Decomposition
It is not necessary that the exact dependencies
specified in F appear themselves in individual
relations of the decomposition D.
It is sufficient that the union of the
dependencies that hold on the individual
relations in D be equivalent to F.
We now define these concepts more formally.

14
Properties of Relational Decompositions (7)

Dependency Preservation Property of a
Decomposition
Definition
Given a set of dependencies F on R, the
projection of F on Ri, denoted by ?Ri(F) where Ri
is a subset of R, is the set of dependencies X ?
Y in F such that the attributes in X ? Y are all
contained in Ri.
Hence, the projection of F on each relation
schema Ri in the decomposition D is the set of
functional dependencies in F, the closure of F,
such that all their left- and right-hand-side
attributes are in Ri.

15
Properties of Relational Decompositions (8)

Dependency Preservation Property of a
Decomposition
A decomposition D R1, R2, ..., Rm of R is
dependency-preserving with respect to F if the
union of the projections of F on each Ri in D is
equivalent to F that is,
((?R1(F)) ? ? (?Rm(F))) F
Claim 1 It is always possible to find a
dependency-preserving decomposition D with
respect to F such that each relation Ri in D is
in 3NF.

16
Properties of Relational Decompositions (9)

Consider a decomposition of R (R, F) into
R1 (R1, F1) and R2 (R2, F2)
How to compute the projections F1 and F2?
Fi is the projection of FDs in F over Ri
Example RABC and F A?B, B?C, C?A
Let R1AB and R2BC
Not enough to let F1 A?B and F2 B?C
Consider FDs in F B?A and C?B
So F1 A?B, B?A and F2 B?C, C?B
Now F and F1 ? F2 are equivalent

17
Properties of Relational Decompositions (10)

Lossless (Non-additive) Join Property of a
Decomposition
This property ensures that no spurious tuples are
generated when a NATURAL JOIN operation is
applied to the relations in the decomposition.
Lossless join property a decomposition D R1,
R2, ..., Rm of R has the lossless (nonadditive)
join property with respect to the set of
dependencies F on R if, for every relation state
r of R that satisfies F, the following holds,
where is the natural join of all the relations
in D
(?R1(r), ..., ?Rm(r)) r
r ? r1 r2 rm and
r1 r2 rm ? r

18
Properties of Relational Decompositions (11)

Consider
What happens if we decompose on
(Id, Name, Address) and
(C, Description, Grade)?
Spurious tuples will be generated

19
Properties of Relational Decompositions (12)

Lossy Decomposition
Problem Name is not a key

SSN Name Address SSN Name Name
Address 1111 Joe 1 Pine 1111 Joe
Joe 1 Pine 2222 Alice 2 Oak
2222 Alice Alice 2 Oak 3333 Alice
3 Pine 3333 Alice Alice 3 Pine
?
r1
r2
r
20
Properties of Relational Decompositions (13)

Lossless (Non-additive) Join Property of a
Decomposition
Note The word loss in lossless refers to loss of
information, not to loss of tuples. In fact, for
loss of information a better term is addition
of spurious information.
Algorithm 15.3 Testing for Lossless Join
Property
Input A universal relation R, a decomposition D
R1, R2, ..., Rm of R, and a set F of
functional dependencies.

21
Properties of Relational Decompositions (14)

1. Create an initial matrix S with one row i for
each relation Ri in D, and one column j for each
attribute Aj in R.
2. Set S(i, j) bij for all matrix entries.
// each bij is a symbol associated with indices
(i, j)
3. For each row i representing relation schema Ri
For each column j representing attribute Aj
if (relation Ri includes attribute Aj) then
S(i, j) aj
// each aj is a symbol associated with
index j

22
Properties of Relational Decompositions (15)

4. Repeat until a complete loop execution results
in no changes to S
for each functional dependency X ?Y in F
for all rows in S which have the same symbols
in the
columns corresponding to attributes
in X
make the symbols in each column that
correspond
to an attribute in Y be the same
in all these rows as
follows
If any of the rows has an a symbol
for the
column, set the other rows to that
same a
symbol in the column.

23
Properties of Relational Decompositions (16)

If no a symbol exists for the attribute in
any of the rows, choose one of the b symbols
that appear in one of the rows for the
attribute and set the other rows to that same
b symbol in the column
5. If a row is made up entirely of a symbols,
then the decomposition has the lossless join
property otherwise it does not.

24
Properties of Relational Decompositions (17)

Lossless (non-additive) join test for n-ary
decompositions.
(a) Case 1 Decomposition of EMP_PROJ into
EMP_PROJ1 and EMP_LOCS fails test.
(b) A decomposition of EMP_PROJ that has the
lossless join property.
(c) Case 2 Decomposition of EMP_PROJ into EMP,
PROJECT, and WORKS_ON satisfies test.

25
Properties of Relational Decompositions (18)
26
Properties of Relational Decompositions (19)
27
Properties of Relational Decompositions (20)

Binary Decomposition
decomposition of a relation R into two relations.
Non-additive (Lossless) Join Test for Binary
decompositions (NJB)
A decomposition D R1, R2 of R has the
lossless join property with respect to a set of
functional dependencies F on R if and only if
either
The FD ((R1 n R2) ? (R1- R2)) is in F, or
The FD ((R1 n R2) ? (R2 - R1)) is in F.

28
Properties of Relational Decompositions (21)

Intuition for Test for Losslessness
Suppose R1 ? R2 ? R2. Then a row of r1 can
combine with exactly one row of r2 in the
natural join (since in r2 a particular set of
values for the shared attributes defines a unique
row), i.e.,
R1 ? R2 is a superkey of R2

R1?R2 R1?R2 . a
a ... a b
. b c .
c r1 r2
29
Properties of Relational Decompositions (22)

Schema (R, F) where
R SSN, Name, Address, Hobby
F SSN ? Name, Address
can be decomposed into
R1 SSN, Name, Address
F1 SSN ? Name, Address
and
R2 SSN, Hobby
F2
Since R1 ? R2 SSN and
SSN ? R1- R2
SSN ? Name, Address is in F, then
the decomposition is lossless

30
Properties of Relational Decompositions (23)

Example WRT the FD set
Id ? Name, Address
C ? Description
Id, C ? Grade
Is
(Id, Name, Address) and
(Id, C, Description, Grade)
a lossless decomposition?

31
Properties of Relational Decompositions (24)

A relation scheme
Sname, Sadd, City, Zip, Item, Price
The FD set
Sname ? Sadd, City
Sadd, City ? Zip
Sname, Item ? Price
Consider the decomposition
Sname, Sadd, City, Zip and
Sname, Item, Price
Is it lossless?
Is it dependency preserving?
What if we replaced the first FD by
Sname, Sadd ? City?

32
Properties of Relational Decompositions (25)

The scheme
Student, Teacher, Subject
The FD set
Teacher ? Subject
Student, Subject ? Teacher
The decomposition
Student, Teacher and
Teacher, Subject
Is it lossless?
Is it dependency preserving?

33
Properties of Relational Decompositions (26)

Claim 2 (Preservation of non-additivity in
successive decompositions)
If a decomposition D R1, R2, ..., Rm of R has
the lossless (non-additive) join property with
respect to a set of functional dependencies F on
R, and
if a decomposition Di Q1, Q2, ..., Qk of Ri
has the lossless (non-additive) join property
with respect to the projection of F on Ri,
then the decomposition D2 R1, R2, ..., Ri-1,
Q1, Q2, ..., Qk, Ri1, ..., Rm of R has the
lossless (non-additive) join property with
respect to F.

34
Algorithms for RDB Schema Design (1)

Algorithm 15.4 Relational Synthesis into 3NF
with Dependency Preservation
Input A universal relation R and a set of
functional dependencies F on the attributes of R.
Find a minimal cover G for F
For each left-hand-side X of a functional
dependency that appears in G, create a relation
schema in D with attributes X ? A1 ? A2 ...
? Ak, where X ? A1, X ? A2, ..., X ? Ak are
the only dependencies in G with X as
left-hand-side (X is the key of this relation)
Place any remaining attributes (that have not
been placed in any relation) in a single relation
schema to ensure the attribute preservation
property.

35
Algorithms for RDB Schema Design (2)

A set of FDs F is minimal if it satisfies the
following conditions
Every FD in F is of the form X?A, where A is a
single attribute,
We cannot remove any dependency from F and have a
set of dependencies that is equivalent to F.
For no X?A in F is F-X?A equivalent to F.
We cannot replace any dependency X?A in F with a
dependency Y?A, where Y is a proper-subset of X
(Y subset-of X) and still have a set of
dependencies that is equivalent to F.
For no X?A in F and Y?X is F-X?A?Y?A
equivalent to F.

36
Algorithms for RDB Schema Design (3)

Examples
A?C, A?B is a minimal cover for AB?C, A?B
What about AB?C, B ? AB, D?BC?
Every set of FDs has an equivalent minimal set
There can be several equivalent minimal sets
There is no simple algorithm for computing a
minimal set of FDs that is equivalent to a set F
of FDs
To synthesize a set of relations, we assume that
we start with a set of dependencies that is a
minimal set.

37
Algorithms for RDB Schema Design (4)

Two sets of FDs F and G are equivalent if
Every FD in F can be inferred from G, and
Every FD in G can be inferred from F
Hence, F and G are equivalent if F G
Definition (Covers)
F covers G if every FD in G can be inferred from
F
(i.e., if G is subset-of F)
F and G are equivalent if F covers G and G covers
F
There is an algorithm for checking equivalence of
sets of FDs

38
Algorithms for RDB Schema Design (5)

Algorithm 15.2 Finding a minimal cover G for F
1. Set G F.
2. Replace each FD X?A1, A2, ..., Ak in G by
the n functional dependencies X?A1, X?A2 , ,
X?Ak.
3. For each FD X?A in G
For each attribute B that is an element of X
if ((G -X?A) ? (X-B)?A) is equivalent to G,
then replace X?A with (X-B)?A in G.
4. For each remaining FD X?A in G,
if (G-X?A) is equivalent to G, then remove X?A
from G.

39
Algorithms for RDB Schema Design (6)

Example
A?B, ABCD?E, EF?GH, ACDF?EG
Make RHS a single attribute
A?B, ABCD?E, EF?G, EF?H, ACDF?E, ACDF?G
Minimize LHS ACD?E instead of ABCD?E
Eliminate redundant FDs
Can ACDF?G be removed?
Can ACDF?E be removed?
Final answer A?B, ACD?E, EF?G, EF?H

40
Algorithms for RDB Schema Design (7)

Minimal Cover Exercise
Compute the minimal cover of the following set of
functional dependencies
ABC ? DE, BD ? DE, E ? CF, EG ? F
ABC ? D
ABC ? E //
BD ? D // reflexive
BD ? E
E ? C
E ? F
EG ? F // augmentation
The minimal cover is
ABC ? D, BD ? E, E ? C, E ? F

41
Algorithms for RDB Schema Design (8)

Example of Algorithm 15.4 (3NF Decomposition)
Consider
the relation R CSJDPQV
FDs F C?CSJDPQV, SD?P, JP?C,J?S
Find minimal cover
C?J, C?D, C?Q, C?V, SD?P, JP?C, J?S
New relations
R1CJDQV, R2JPC,
R3JS, R4SDP

42
Algorithms for RDB Schema Design (9)

Algorithm 15.5 Relational Decomposition into
BCNF with Lossless (non-additive) join property
Input A universal relation R and a set of
functional dependencies F on the attributes of R.
Set D R
While there is a relation schema Q in D that is
not in BCNF
do
choose a relation schema Q in D that is not in
BCNF
find a FD X?Y in Q that violates
BCNF
replace Q in D by two relation
schemas (Q-Y) (X?Y)

43
Algorithms for RDB Schema Design (10)

Example of Algorithm 15.5 (BCNF Decomposition)
R (branch-name, branch-city, assets,
customer-name, loan-number, amount)
F branch-name ? branch-city, assets
loan-number ? branch-name, amount
Key loan-number, customer-name
Decomposition
R1 (branch-name, branch-city, assets)
R2 (branch-name, customer-name, loan-number,
amount)
R3 (loan-number, branch-name, amount)
R4 (loan-number, customer-name)
Final decomposition R1, R3, R4

44
Algorithms for RDB Schema Design (11)

Example of Algorithm 15.5 (BCNF Decomposition)
R (A, B, C)F A ? B, B ? CKey A
R is not in BCNF
Decomposition R1 (A, B), R2 (B, C)
R1 and R2 in BCNF
Lossless-join decomposition
Dependency preserving

45
Algorithms for RDB Schema Design (12)

Algorithm 15.6 Relational Synthesis into 3NF with
Dependency Preservation and Lossless
(Non-Additive) Join Property
Input A universal relation R and a set of
functional dependencies F on the attributes of R.
Find a minimal cover G for F (Use Algorithm
15.2).
For each left-hand-side X of a functional
dependency that appears in G, create a relation
schema in D with attributes X?A1?A2...?
Ak, where X?A1, X?A2, ..., X?Ak are the only
dependencies in G with X as left-hand-side (X is
the key of this relation).
If none of the relation schemas in D contains a
key of R, then create one more relation schema in
D that contains attributes that form a key of R.
Eliminate redundant relations from the resulting
set of relations. A relation T is considered
redundant if T is a projection of another
relation S.

46
Algorithms for RDB Schema Design (13)

Algorithm 15.2a Finding a Key K for R Given a set
F of Functional Dependencies
Input A universal relation R and a set of
functional dependencies F on the attributes of R.
Set K R
For each attribute A in K
compute (K - A) with respect to F
If (K - A) contains all the attributes in R,
then set K K - A

47
Algorithms for RDB Schema Design (14)

Discussion of Normalization Algorithms
Problems
The database designer must first specify all the
relevant functional dependencies among the
database attributes.
These algorithms are not deterministic in
general.
It is not always possible to find a decomposition
into relation schemas that preserves dependencies
and allows each relation schema in the
decomposition to be in BCNF (instead of 3NF).

48
Algorithms for RDB Schema Design (15)
Table 15.1 Summary of some of the algorithms
discussed
49
Multivalued Dependencies and 4th Normal Form (1)

Beyond BCNF
CustService (State, SalesPerson, Delivery)

Is this BCNF?
50
Multivalued Dependencies and 4th Normal Form (2)

Everything is in the key -- must be BCNF
Still problems with duplication
Multivalued Dependencies

51
Multivalued Dependencies and 4th Normal Form (3)

At least three attributes (A, B, C)
A B and A C
B and C are independent of each other (they
really shouldnt be in the same table)

52
Multivalued Dependency and 4NF

Multivalued dependency (MVD)
Consequence of first normal form (1NF)

53
Multivalued Dependency and 4NF

Relations containing nontrivial MVDs
All-key relations
Fourth normal form (4NF)
Violated when a relation has undesirable
multivalued dependencies

54
Multivalued Dependencies and 4th Normal Form (4)

Definition
A multivalued dependency (MVD) X gtgt Y specified
on relation schema R, where X and Y are both
subsets of R, specifies the following constraint
on any relation state r of R
If two tuples t1 and t2 exist in r such that
t1X t2X, then two tuples t3 and t4 should
also exist in r with the following properties,
where we use Z to denote (R - (X ? Y))
t3X t4X t1X t2X
t3Y t1Y and t4Y t2Y.
t3Z t2Z and t4Z t1Z.
An MVD X gtgt Y in R is called a trivial MVD if
(a) Y is a subset of X, or
(b) X ? Y R

55
Multivalued Dependencies and 4th Normal Form (5)

4th Normal Form
BCNF with no multivalued dependencies
Create separate tables for each separate
functional dependency

56
Multivalued Dependencies and 4th Normal Form (6)

(a) The EMP relation with two MVDs ENAME gtgt
PNAME and ENAME gtgt DNAME. (b) Decomposing the
EMP relation into two 4NF relations EMP_PROJECTS
and EMP_DEPENDENTS.

57
Multivalued Dependencies and 4th Normal Form (7)
SalesForce (State, SalesPerson) Delivery
(State, Delivery)
58
Multivalued Dependencies and 4th Normal Form (8)

Inference Rules for Functional and Multivalued
Dependencies
IR1 (reflexive rule for FDs)
If X ? Y, then X gt Y.
IR2 (augmentation rule for FDs)
X gt Y ?? XZ gt YZ.
IR3 (transitive rule for FDs)
X gt Y, Y gtZ ?? X gt Z.
IR4 (complementation rule for MVDs)
X gtgt Y ?? X gtgt (R (X ? Y)).
IR5 (augmentation rule for MVDs)
If X gtgt Y and W ? Z then WX gtgt YZ.
IR6 (transitive rule for MVDs)
X gtgt Y, Y gtgt Z ?? X gtgt (Z - Y).
IR7 (replication rule for FD to MVD)
X gt Y ?? X gtgt Y.
IR8 (coalescence (???????) rule for FDs and
MVDs)
If X gtgt Y and there exists W with the properties
that (a) W ? Y is empty, (b) W gt Z, and (c) Y ?
Z, then X gt Z.

59
Multivalued Dependencies and 4th Normal Form (9)

A relation schema R is in 4NF with respect to a
set of dependencies F (that includes functional
dependencies and multivalued dependencies), at
least one of the following hold
X gtgt Y is trivial (i.e., Y ? X or X ? Y R)
X is a superkey for schema R
If a relation is in 4NF it is in BCNF

60
Multivalued Dependencies and 4NF (10)

Decomposing a relation state of EMP that is not
in 4NF. (a) EMP relation with additional tuples.
(b) Two corresponding 4NF relations EMP_PROJECTS
and EMP_DEPENDENTS.

61
Multivalued Dependencies and 4NF (11)

Algorithm 15.7 Relational decomposition into 4NF
relations with non-additive join property
Input A universal relation R and a set of
functional and multivalued dependencies F.
Set D R
While there is a relation schema Q in D that is
not in 4NF do
choose a relation schema Q in D that is not in
4NF
find a nontrivial MVD X gtgt Y in Q that
violates 4NF
replace Q in D by two relation schemas (Q - Y)
and (X ? Y)

62
Multivalued Dependencies and 4NF (12)