Normalisation - PowerPoint PPT Presentation

1 / 38
About This Presentation
Title:

Normalisation

Description:

... of some relvar R, and R1 and R2 between them include all of the attributes of R, ... insert the fact that a supplier is located in a particular city until supplier ... – PowerPoint PPT presentation

Number of Views:35
Avg rating:3.0/5.0
Slides: 39
Provided by: liup
Category:
Tags: and | city | normalisation | the

less

Transcript and Presenter's Notes

Title: Normalisation


1
Normalisation
  • LIU Peng, Ph.D/Prof.
  • School of Information Management and Engineering
  • Shanghai University of Finance and Economics

2
Normalisation
  • What you will learn from this lecture
  • Why normalisation?
  • Decomposition of a relation
  • Normal Forms 1NF, 2NF and 3NF
  • Dependency Preservation
  • Boyce/Codd normal form
  • Methods for normalising relations
  • General procedure for normalisation

3
Why normalisation?
  • What is wrong with this design? --Redundancy
  • leads to several further problems
  • A good design principle is one fact in one
    place
  • subject of normalization
  • a formalization of simple ideas like this one
  • practical application

Sample value for relvar STUDENT
4
Why normalisation?
  • What is wrong with this design? --Redundancy
  • leads to several further problems
  • A good design principle is one fact in one
    place
  • subject of normalization
  • a formalization of simple ideas like this one
  • practical application

Sample value for relvar STUDENT
5
Normal forms
  • The process of normalization is build around the
    concept of normal forms.
  • Many normal form have been defined (see Fig)
  • the database designer should generally aim for a
    design involving relvars in 3NF, not ones that
    are merely in 2NF or 1NF.
  • "more desirable" (in a sense to be explained)

6
Normalisation procedure
  • Codd introduce the idea of normalisation
    procedure
  • Normalisation is a progressive process of
    reducing a set of relations to a more desirable
    form (i.e. containing less redundancy)
  • The method of normalisation is based on a
    progression through five increasingly desirable
    levels of forms (concerning some aspects of good
    design)
  • Note the procedure is reversible
  • input ? output
  • output ? input
  • Reversible is important, it means
  • the normalization process is information-preservin
    g

7
Normalisation procedure
  • One crucial aspect of that procedure
  • the concept of nonloss decomposition
  • the only decompositions we are interested in
  • The question of whether a given decomposition is
    nonloss is intimately bound up with the concept
    of function dependence.
  • We start with all our data in a relation
  • Decompose this relation into a set of
    well-designed relations
  • Non-loss decomposition ensures that no
    information represented in the original relation
    is lost

8
Decomposition
Two decomposition here (a) S-1
S-2a
  • How can we ensure that no information represented
    in the original relation is lost?
  • e.g. S

(b) S-1 S-2b
9
Decomposition
  • What exactly is it here that makes the first
    decomposition nonloss and the other lossy?
  • observe
  • the process is really a process of projection
  • S-1,S-2a,S-2b are each projections of the
    original relvar S (see the last slide)
  • the recomposition operator is join.
  • Case (a) S-1 S-2a ? S ( see the following
    slides)
  • Case (b) S-1 S-2b ? S
  • "reversibility" means that the original relvar is
    equal to the join of its projections.

/
10
Decomposition
  • Case (a)
  • S
    S-1 S-2a

11
Decomposition
  • Case (b)
  • S
    S-1 S-2b

12
Decomposition
  • the interesting question is this
  • If R1 and R2 are projections of some relvar R,
    and R1 and R2 between them include all of the
    attributes of R,
  • what conditions have to be satisfied in order to
    guarantee that joining R1 and R2 back together
    takes us back to the original R?
  • Returning to our example
  • relvar S satisfies the irreducible set of FDs
  • Name?Major, Sex,Major?School
  • it surely cannot be coincidence that
  • S is equal to the join of its projections on
    Name, Sex, Major and Major, School
  • in all time, Case (a) is correct

13
Decomposition
  • Heaths theorem
  • Given R with sets of attributes A, B and C, then
    if R satisfies the FD
  • A ? B
  • then R is equal to the join of its projections on
    A, Band A, C.
  • This theorem confirms what we have already
    observed
  • e.g. (take A as Major, B as School, C as
    Name,Sex)
  • Given SName,Sex,Major,School, S is equal to the
    join of
  • Name,Sex,Major and Major,School -- non-loss
    decomposition
  • not equal to the join of Name,Sex, Major and
    Sex,School

14
Observation on Decomposition
Aim remove redundancy, but preserve FDs The
aim can be represented graphically, i.e. FDs can
be represented by arrows in diagrams e.g.
SName,Major,School with FDs
Name?Major ,School Major ? School Aim
remove arrows which are not out of candidate
key Normalisation is a procedure for achieving
this aim
Name
Major
School
FD (functional dependency) diagram for
SName,Major,School
15
First Normal Form (1NF)
A relation is in 1NF if and only if all
underlying fields contain atomic (scalar) values
only.
A field contains multiple values
Course No.
Course Title
Student
Dept
C00061
DB 1
Peter Brown,
Jenny Wong,
Neil Smith
Accounts
Repeating groups
Course No.
Course Title
Student 1
Student 1 dept
Student 2
Student 2 dept
C00061
DB 1
Peter Brown
Accounts
Neil Smith
Accounts
C00023
Java
Peter Brown
Computer
Jenny Wong
IT
16
First Normal Form (1NF)
Another example here
17
First Normal Form (1NF)
18
Problems with 1NF
Primary Key (Supplier, Part)
Status
City
Part
Quantity
Supplier
20
London
P1
1600
S1
20
London
P2
2000
S1
S2
10
Paris
P2
500
S3
20
London
P3
3000
Insert - cannot insert the fact that a supplier
is located in a particular city until supplier
supplies at least one part Delete - delete the
value P3, and we also lose the information that
S3 is located in London Update - S1 appears in
the table more than once, and we must change all
of them ---Possibility of producing an
inconsistent result.
19
Dependencies between Attributes
By analysing a given relation, we can identify
some dependencies between its attributes. In
principle, we want to make attributes independent
of each other as far as possible so that updating
one attribute will have no impact on the others.
For example
Supplier
City
Quantity
Part
Status
20
Second Normal Form (2NF)
A relation is in 2NF if and only if it is 1NF and
every non-key field is irreducibly dependent on
the primary key.
Primary Key (Supplier, Part)
Status
City
Part
Quantity
Supplier
20
London
P1
1600
S1
20
London
P2
2000
S1
S2
10
Paris
P2
500
S3
20
London
P3
3000
Note that City is a non-key field and only
dependent on part of the primary key (i.e.
supplier). So this relation is not in 2NF.
21
Solution to Problems
The solution is to replace the relation by two
new relations (called One and Two)
One Primary Key (Supplier)
Two Primary Key (Supplier, Part)
Status
City
Supplier
Part
Quantity
Supplier
20
London
S1
P1
1600
S1
10
Paris
S2
P2
2000
S1
S3
20
London
S2
P2
500
S3
P3
3000
This revised structure overcomes all the problems
sketched earlier
22
Procedure to Get 2NF

Given a relation R as follows R(A, B, C, D)
with Primary Key (A, B) FD A ? D Replace R by
the two projections R1 and R2 as follows R1(A,
D) with Primary Key A R2(A, B, C) with Primary
Key (A, B) Foreign Key A references R1
23
Problems with 2NF
One Primary Key (Supplier)
Two Primary Key (Supplier, Part)
Status
City
Supplier
Part
Quantity
Supplier
20
London
S1
P1
1600
S1
10
Paris
S2
P2
2000
S1
S3
20
London
S2
P2
500
S3
P3
3000
This revised structure overcomes all the problems
sketched earlier, but Insert - cannot insert the
fact that a particular city has a status until a
supplier is actually located in
that city Delete - delete the value S2, and we
also lose the information that Paris has the
status value 10 Update - Status value 20 appears
in the table more than once
24
Third Normal Form (3NF)
A relation is in 3NF if and only if it is 2NF and
all non-key fields are mutually independent.
Two Primary Key (Supplier, Part)
One Primary Key (Supplier)
Part
Quantity
Supplier
Status
City
Supplier
P1
1600
S1
20
London
S1
P2
2000
S1
10
Paris
S2
S2
P2
500
S3
20
London
S3
P3
3000
Note that relation Two is in 3NF (with only one
non-key field), but relation One is not in 3NF
because Status is dependent on City (both are
non-key fields)
25
Converting 2NF into 3NF
We can now normalise relation One into two new
relations (called three and four), which both are
in 3NF.
Three Primary Key (Supplier)
Four Primary Key (City)
City
Supplier
Status
City
London
S1
20
London

Paris
S2

10
Paris
S3
London
Higher Normal Forms (e.g. 4NF and 5NF) do exist,
but they are mainly of interest in academic
societies rather than in the practical
applications of database design.
26
Procedure to Get 3NF

Given a relation R as follows R(A, B, C) with
Primary Key A FD B ? C Replace R by the two
projections R1 and R2 as follows R1(B, C) with
Primary Key B R2(A, B) with Primary Key A
Foreign Key B references R1
27
Dependency Preservation
  • During the reduction process, a given relvar can
    be nonloss-decomposed in a variety of different
    ways.(refer to Fig.11.11)

Two 3NF projections A SC(S,CITY)
CS(CITY,STATUS) B SC(S,CITY)
SS(S,STATUS) Decomposition B is less
satisfactory than decomposition A for a number of
reasons. See page 364.
Fig.11.11
28
Dependency Preservation
  • We expect that the projections are independent of
    one another, in the following sense Update can
    be made to either one without regard for the
    other.
  • In the process of decomposing, no FDs should
    span two relvars. enforcing these constraints
    is very difficulty.
  • Projections R1 and R2 of relvar R are independent
    if and only if
  • Every FD in R is a logical consequence of those
    in R1 and R2, and
  • The common attributes of R1 and R2 form a
    candidate key for at least one of the pair.

29
Dependency Preservation
  • Consider decompositions A and B as defined
    earlier.
  • In A the two projections are independent
  • In B, by contrast, the two projections are not
    independent
  • CITY ? STATUS cannot be deduced from the FDs for
    those projectionsalthough it is true that their
    common attribute, S, does constitute a candidate
    key for both.
  • Note The third possibility, replacing SECOND by
    its two projections on S,STATUS and
    CITY,STATUS, is not a valid decomposition,
    because it is not nonloss.
  • Exercise Prove this statement.

30
Boyce/Codd Normal Form
A more general form of 3NF in which relations
may have more than one candidate key No
reference to 1NF or 2NF, with a simpler
definition Boyce/Codd Normal Form (BCNF) a
relation is in BCNF if and only if every
non-trivial, left-irreducible FD has a candidate
key as its determinant In other words the only
arrows in the dependency diagram are arrows out
of candidate keys AND no other arrows Look at
exercise 11.3 in Date book the answer gives an
algorithm for reducing a 1NF relation into a set
of BCNF relations
31
3NF vs. BCNF
3NF and BCNF are equivalent if the combination of
the following conditions does not occur for a
relation that 1. had two (or more) candidate
keys, such that 2. the two keys were composite,
and 3. they overlapped (i.e. at least one
attribute in common)
S
CITY
e.g. S (S, SNAME, STATUS, CITY) CANDIDATE
KEY (S) CANDIDATE KEY (SNAME)
STATUS
SNAME
32
3NF But Not BCNF
e.g. SSP (S, SNAME, P, QTY) CANDIDATE
KEY (S, P) CANDIDATE KEY (SNAME, P)
S
SNAME
P
QTY
S1 S1 S1 S1
Smith Smith Smith Smith
P1 P2 P3 P4
300 200 400 200
S
P
QTY
SNAME
S, P ? SNAME But SNAME is not
irreducibly dependent on S, P because S ?
SNAME
33
More about Normal Forms
  • Are Normal Forms a wholly good thing?
  • Remove a number of serious problems resulting
    from redundant information
  • Can be achieved by following a general procedure
  • Decomposition may lead to poor performance (too
    many small tables)
  • Decomposition may make it easy to break semantic
    constraints although constraints of referential
    integrity could help

34
Procedure for Normalisation
  • 1) take projections of 1NF relation to eliminate
    reducible FDs -- giving a set of 2NF relations
  • 2) take projections of these 2NF relations to
    eliminate transitive FDs -- giving a set of 3NF
    relations
  • 3) take projections of these 3NF relations to
    eliminate FDs in which the determinant is not a
    candidate key -- giving a set of BCNF relations
  • 4) take projections of these BCNF relations to
    eliminate MVDs which are not also FDs -- giving a
    set of 4NF relations
  • 5) take projections of these 4NF relations to
    eliminate JDs which are not implied by candidate
    keys -- giving a set of 5NF relations

35
Exercises
  • For each of the following relation schemas and
    sets of functional dependencies
  • R(A,B,C,D) with FDs AB? C, C? D, and D? A
  • R(A,B,C,D) with FDs B? C, and B? D
  • R(A,B,C,D) with FDs AB? C, BC? D, CD? A and AD?
    B
  • R(A,B,C,D) with FDs A? B, B?C, C? D and D? A
  • R(A,B,C,D,E) with FDs AB? C, DE? C, and B? D
  • R(A,B,C,D,E) with FDs AB? C, C? D, D? B and D? E
  • Do the following
  • Indicate all the 2NF/3NF/BCNF violations
  • Decompose the relations, as necessary, into
    collections of relations that are in 2NF/3NF/BCNF.

36
Solutions for Exercise 1
  • There are 14 nontrivial dependencies, including
    the three given ones and 11 derived dependencies.
    These are C?A, C ? D, D ? A, AB ? D, AB ? C, AC
    ? D, BC ? A, BC ? D, BD?A, BD?C, CD?A, ABC?D,
    ABD?C, and BCD?A. We also learned that the three
    keys were AB, BC, and BD. Thus, any dependency
    above that does not have one of these pairs on
    the left is a BCNF violation. These are C?A,
    C?D, D?A, AC?D, and CD?A.
  • One choice is to decompose using C?D. That gives
    us ABC and CD as decomposed relations. CD is
    surely in BCNF, since any two-attribute relation
    is. ABC is not in BCNF, since AB and BC are its
    only keys, but C?A is a dependency that holds in
    ABCD and therefore holds in ABC. We must further
    decompose ABC into AC and BC. Thus, the three
    relations of the decomposition are AC, BC, and
    CD.
  • Since all attributes are in at least one key of
    ABCD, that relation is already in 3NF, and no
    decomposition is necessary.

37
Solutions for Exercise 2
  • The only key is AB. Thus, B?C and B?D are BCNF
    violations. These are the only nontrivial BCNF
    violations. The reason is that the only
    nontrivial derived dependencies must have A and B
    on the left, and therefore contain a key. One
    possible BCNF decomposition is AB and BCD. AB is
    the only key for AB, and B is the only key for
    BCD.
  • Since there is only one key for ABCD, the 3NF
    violations are the same, and so is the
    decomposition.

38
Reading
Chapter 11 (Dates book)
Write a Comment
User Comments (0)
About PowerShow.com