Normalisation

About This Presentation

Title:

Normalisation

Description:

... of some relvar R, and R1 and R2 between them include all of the attributes of R, ... insert the fact that a supplier is located in a particular city until supplier ... – PowerPoint PPT presentation

Number of Views:35

Avg rating:3.0/5.0

Slides: 39

Provided by: liup

Category:

more less

Transcript and Presenter's Notes

Title: Normalisation

1
Normalisation

LIU Peng, Ph.D/Prof.
School of Information Management and Engineering
Shanghai University of Finance and Economics

2
Normalisation

What you will learn from this lecture
Why normalisation?
Decomposition of a relation
Normal Forms 1NF, 2NF and 3NF
Dependency Preservation
Boyce/Codd normal form
Methods for normalising relations
General procedure for normalisation

3
Why normalisation?

What is wrong with this design? --Redundancy
leads to several further problems
A good design principle is one fact in one
place
subject of normalization
a formalization of simple ideas like this one
practical application

Sample value for relvar STUDENT
4
Why normalisation?

What is wrong with this design? --Redundancy
leads to several further problems
A good design principle is one fact in one
place
subject of normalization
a formalization of simple ideas like this one
practical application

Sample value for relvar STUDENT
5
Normal forms

The process of normalization is build around the
concept of normal forms.
Many normal form have been defined (see Fig)

the database designer should generally aim for a
design involving relvars in 3NF, not ones that
are merely in 2NF or 1NF.
"more desirable" (in a sense to be explained)

6
Normalisation procedure

Codd introduce the idea of normalisation
procedure
Normalisation is a progressive process of
reducing a set of relations to a more desirable
form (i.e. containing less redundancy)
The method of normalisation is based on a
progression through five increasingly desirable
levels of forms (concerning some aspects of good
design)
Note the procedure is reversible
input ? output
output ? input
Reversible is important, it means
the normalization process is information-preservin
g

7
Normalisation procedure

One crucial aspect of that procedure
the concept of nonloss decomposition
the only decompositions we are interested in
The question of whether a given decomposition is
nonloss is intimately bound up with the concept
of function dependence.
We start with all our data in a relation
Decompose this relation into a set of
well-designed relations
Non-loss decomposition ensures that no
information represented in the original relation
is lost

8
Decomposition
Two decomposition here (a) S-1
S-2a

How can we ensure that no information represented
in the original relation is lost?
e.g. S

(b) S-1 S-2b
9
Decomposition

What exactly is it here that makes the first
decomposition nonloss and the other lossy?
observe
the process is really a process of projection
S-1,S-2a,S-2b are each projections of the
original relvar S (see the last slide)
the recomposition operator is join.
Case (a) S-1 S-2a ? S ( see the following
slides)
Case (b) S-1 S-2b ? S
"reversibility" means that the original relvar is
equal to the join of its projections.

/
10
Decomposition

Case (a)
S
S-1 S-2a

11
Decomposition

Case (b)
S
S-1 S-2b

12
Decomposition

the interesting question is this
If R1 and R2 are projections of some relvar R,
and R1 and R2 between them include all of the
attributes of R,
what conditions have to be satisfied in order to
guarantee that joining R1 and R2 back together
takes us back to the original R?
Returning to our example
relvar S satisfies the irreducible set of FDs
Name?Major, Sex,Major?School
it surely cannot be coincidence that
S is equal to the join of its projections on
Name, Sex, Major and Major, School
in all time, Case (a) is correct

13
Decomposition

Heaths theorem
Given R with sets of attributes A, B and C, then
if R satisfies the FD
A ? B
then R is equal to the join of its projections on
A, Band A, C.
This theorem confirms what we have already
observed
e.g. (take A as Major, B as School, C as
Name,Sex)
Given SName,Sex,Major,School, S is equal to the
join of
Name,Sex,Major and Major,School -- non-loss
decomposition
not equal to the join of Name,Sex, Major and
Sex,School

14
Observation on Decomposition
Aim remove redundancy, but preserve FDs The
aim can be represented graphically, i.e. FDs can
be represented by arrows in diagrams e.g.
SName,Major,School with FDs
Name?Major ,School Major ? School Aim
remove arrows which are not out of candidate
key Normalisation is a procedure for achieving
this aim
Name
Major
School
FD (functional dependency) diagram for
SName,Major,School
15
First Normal Form (1NF)
A relation is in 1NF if and only if all
underlying fields contain atomic (scalar) values
only.
A field contains multiple values
Course No.
Course Title
Student
Dept
C00061
DB 1
Peter Brown,
Jenny Wong,
Neil Smith
Accounts
Repeating groups
Course No.
Course Title
Student 1
Student 1 dept
Student 2
Student 2 dept
C00061
DB 1
Peter Brown
Accounts
Neil Smith
Accounts
C00023
Java
Peter Brown
Computer
Jenny Wong
IT
16
First Normal Form (1NF)
Another example here
17
First Normal Form (1NF)
18
Problems with 1NF
Primary Key (Supplier, Part)
Status
City
Part
Quantity
Supplier
20
London
P1
1600
S1
20
London
P2
2000
S1
S2
10
Paris
P2
500
S3
20
London
P3
3000
Insert - cannot insert the fact that a supplier
is located in a particular city until supplier
supplies at least one part Delete - delete the
value P3, and we also lose the information that
S3 is located in London Update - S1 appears in
the table more than once, and we must change all
of them ---Possibility of producing an
inconsistent result.
19
Dependencies between Attributes
By analysing a given relation, we can identify
some dependencies between its attributes. In
principle, we want to make attributes independent
of each other as far as possible so that updating
one attribute will have no impact on the others.
For example
Supplier
City
Quantity
Part
Status
20
Second Normal Form (2NF)
A relation is in 2NF if and only if it is 1NF and
every non-key field is irreducibly dependent on
the primary key.
Primary Key (Supplier, Part)
Status
City
Part
Quantity
Supplier
20
London
P1
1600
S1
20
London
P2
2000
S1
S2
10
Paris
P2
500
S3
20
London
P3
3000
Note that City is a non-key field and only
dependent on part of the primary key (i.e.
supplier). So this relation is not in 2NF.
21
Solution to Problems
The solution is to replace the relation by two
new relations (called One and Two)
One Primary Key (Supplier)
Two Primary Key (Supplier, Part)
Status
City
Supplier
Part
Quantity
Supplier
20
London
S1
P1
1600
S1
10
Paris
S2
P2
2000
S1
S3
20
London
S2
P2
500
S3
P3
3000
This revised structure overcomes all the problems
sketched earlier
22
Procedure to Get 2NF

Given a relation R as follows R(A, B, C, D)
with Primary Key (A, B) FD A ? D Replace R by
the two projections R1 and R2 as follows R1(A,
D) with Primary Key A R2(A, B, C) with Primary
Key (A, B) Foreign Key A references R1
23
Problems with 2NF
One Primary Key (Supplier)
Two Primary Key (Supplier, Part)
Status
City
Supplier
Part
Quantity
Supplier
20
London
S1
P1
1600
S1
10
Paris
S2
P2
2000
S1
S3
20
London
S2
P2
500
S3
P3
3000
This revised structure overcomes all the problems
sketched earlier, but Insert - cannot insert the
fact that a particular city has a status until a
supplier is actually located in
that city Delete - delete the value S2, and we
also lose the information that Paris has the
status value 10 Update - Status value 20 appears
in the table more than once
24
Third Normal Form (3NF)
A relation is in 3NF if and only if it is 2NF and
all non-key fields are mutually independent.
Two Primary Key (Supplier, Part)
One Primary Key (Supplier)
Part
Quantity
Supplier
Status
City
Supplier
P1
1600
S1
20
London
S1
P2
2000
S1
10
Paris
S2
S2
P2
500
S3
20
London
S3
P3
3000
Note that relation Two is in 3NF (with only one
non-key field), but relation One is not in 3NF
because Status is dependent on City (both are
non-key fields)
25
Converting 2NF into 3NF
We can now normalise relation One into two new
relations (called three and four), which both are
in 3NF.
Three Primary Key (Supplier)
Four Primary Key (City)
City
Supplier
Status
City
London
S1
20
London

Paris
S2

10
Paris
S3
London
Higher Normal Forms (e.g. 4NF and 5NF) do exist,
but they are mainly of interest in academic
societies rather than in the practical
applications of database design.
26
Procedure to Get 3NF

Given a relation R as follows R(A, B, C) with
Primary Key A FD B ? C Replace R by the two
projections R1 and R2 as follows R1(B, C) with
Primary Key B R2(A, B) with Primary Key A
Foreign Key B references R1
27
Dependency Preservation

During the reduction process, a given relvar can
be nonloss-decomposed in a variety of different
ways.(refer to Fig.11.11)

Two 3NF projections A SC(S,CITY)
CS(CITY,STATUS) B SC(S,CITY)
SS(S,STATUS) Decomposition B is less
satisfactory than decomposition A for a number of
reasons. See page 364.
Fig.11.11
28
Dependency Preservation

We expect that the projections are independent of
one another, in the following sense Update can
be made to either one without regard for the
other.
In the process of decomposing, no FDs should
span two relvars. enforcing these constraints
is very difficulty.
Projections R1 and R2 of relvar R are independent
if and only if
Every FD in R is a logical consequence of those
in R1 and R2, and
The common attributes of R1 and R2 form a
candidate key for at least one of the pair.

29
Dependency Preservation

Consider decompositions A and B as defined
earlier.
In A the two projections are independent
In B, by contrast, the two projections are not
independent
CITY ? STATUS cannot be deduced from the FDs for
those projectionsalthough it is true that their
common attribute, S, does constitute a candidate
key for both.
Note The third possibility, replacing SECOND by
its two projections on S,STATUS and
CITY,STATUS, is not a valid decomposition,
because it is not nonloss.
Exercise Prove this statement.

30
Boyce/Codd Normal Form
A more general form of 3NF in which relations
may have more than one candidate key No
reference to 1NF or 2NF, with a simpler
definition Boyce/Codd Normal Form (BCNF) a
relation is in BCNF if and only if every
non-trivial, left-irreducible FD has a candidate
key as its determinant In other words the only
arrows in the dependency diagram are arrows out
of candidate keys AND no other arrows Look at
exercise 11.3 in Date book the answer gives an
algorithm for reducing a 1NF relation into a set
of BCNF relations
31
3NF vs. BCNF
3NF and BCNF are equivalent if the combination of
the following conditions does not occur for a
relation that 1. had two (or more) candidate
keys, such that 2. the two keys were composite,
and 3. they overlapped (i.e. at least one
attribute in common)
S
CITY
e.g. S (S, SNAME, STATUS, CITY) CANDIDATE
KEY (S) CANDIDATE KEY (SNAME)
STATUS
SNAME
32
3NF But Not BCNF
e.g. SSP (S, SNAME, P, QTY) CANDIDATE
KEY (S, P) CANDIDATE KEY (SNAME, P)
S
SNAME
P
QTY
S1 S1 S1 S1
Smith Smith Smith Smith
P1 P2 P3 P4
300 200 400 200
S
P
QTY
SNAME
S, P ? SNAME But SNAME is not
irreducibly dependent on S, P because S ?
SNAME
33
More about Normal Forms

Are Normal Forms a wholly good thing?
Remove a number of serious problems resulting
from redundant information
Can be achieved by following a general procedure
Decomposition may lead to poor performance (too
many small tables)
Decomposition may make it easy to break semantic
constraints although constraints of referential
integrity could help

34
Procedure for Normalisation

1) take projections of 1NF relation to eliminate
reducible FDs -- giving a set of 2NF relations
2) take projections of these 2NF relations to
eliminate transitive FDs -- giving a set of 3NF
relations
3) take projections of these 3NF relations to
eliminate FDs in which the determinant is not a
candidate key -- giving a set of BCNF relations
4) take projections of these BCNF relations to
eliminate MVDs which are not also FDs -- giving a
set of 4NF relations
5) take projections of these 4NF relations to
eliminate JDs which are not implied by candidate
keys -- giving a set of 5NF relations

35
Exercises

For each of the following relation schemas and
sets of functional dependencies
R(A,B,C,D) with FDs AB? C, C? D, and D? A
R(A,B,C,D) with FDs B? C, and B? D
R(A,B,C,D) with FDs AB? C, BC? D, CD? A and AD?
B
R(A,B,C,D) with FDs A? B, B?C, C? D and D? A
R(A,B,C,D,E) with FDs AB? C, DE? C, and B? D
R(A,B,C,D,E) with FDs AB? C, C? D, D? B and D? E
Do the following
Indicate all the 2NF/3NF/BCNF violations
Decompose the relations, as necessary, into
collections of relations that are in 2NF/3NF/BCNF.

36
Solutions for Exercise 1

There are 14 nontrivial dependencies, including
the three given ones and 11 derived dependencies.
These are C?A, C ? D, D ? A, AB ? D, AB ? C, AC
? D, BC ? A, BC ? D, BD?A, BD?C, CD?A, ABC?D,
ABD?C, and BCD?A. We also learned that the three
keys were AB, BC, and BD. Thus, any dependency
above that does not have one of these pairs on
the left is a BCNF violation. These are C?A,
C?D, D?A, AC?D, and CD?A.
One choice is to decompose using C?D. That gives
us ABC and CD as decomposed relations. CD is
surely in BCNF, since any two-attribute relation
is. ABC is not in BCNF, since AB and BC are its
only keys, but C?A is a dependency that holds in
ABCD and therefore holds in ABC. We must further
decompose ABC into AC and BC. Thus, the three
relations of the decomposition are AC, BC, and
CD.
Since all attributes are in at least one key of
ABCD, that relation is already in 3NF, and no
decomposition is necessary.

37
Solutions for Exercise 2

The only key is AB. Thus, B?C and B?D are BCNF
violations. These are the only nontrivial BCNF
violations. The reason is that the only
nontrivial derived dependencies must have A and B
on the left, and therefore contain a key. One
possible BCNF decomposition is AB and BCD. AB is
the only key for AB, and B is the only key for
BCD.
Since there is only one key for ABCD, the 3NF
violations are the same, and so is the
decomposition.

38
Reading
Chapter 11 (Dates book)

Write a Comment

User Comments (0)