Normalisation - PowerPoint PPT Presentation

1 / 37

About This Presentation

Title:

Normalisation

Description:

Term 2, 2004, Normalisation Marian Ursu, Department of Computing, Goldsmiths College ... [ Hint: you may use PHP code samples from pervious lab] ... – PowerPoint PPT presentation

Number of Views:60

Avg rating:3.0/5.0

Slides: 38

Provided by: marian5

Category:

more less

Transcript and Presenter's Notes

Title: Normalisation

1
Normalisation
5
2
Outline

Boyce-Codd Normal Form (BCNF)
normalisation
non-loss decomposition
Heaths theorem
normalisation process
semantic assumptions and FDs
CKs
decomposition
normalisation vs dependency preservation
a decomposition may yield to a better solution
than another one
either-or situations normalise or preserve FDs

3
1
4
2NF and 3NF equivalent definitions

2NF (in Connollys book)
a relation is in 2NF if and only if it is in 1NF
and all non-primary key attributes are
irreducibly dependent (or fully functional
dependent that is, the functional dependency is
left irreducible) on the primary key.
3NF (in Connollys book)
a relation is in 3NF if and only if it is in 2NF
and no non-primary key attribute A is
transitively dependent on the primary key X (that
is, if one has a FD X-gtY one cannot find a FD
Y-gtA that makes A to depend on X via Y).
These definitions can be generalised if one has
more than one CK in the table (see Connollys)

5
BCNF

a relation is in Boyce/Codd normal form (BCNF) if
and only if every non-trivial irreducible FD has
a candidate key as its determinant
informally
the determinant of each relevant FD is a CK

6
Example

(M_id, M_name, Type, Value)
M_id ? M_name
M_id ? Type
M_id ? Value
Type ? Value problematic FD since Type is
not key
not BCNF
(Type, Value)
(M_id, M_name, Type)
both in BCNF

7
BCNF

any relation can be non-loss decomposed into an
equivalent set of BCNF relations
BCNF ? 3NF ? 2NF ? 1NF
BCNF is still not guaranteed to be free of any
update anomalies

8
2
9
Normalisation

the process of transforming a relation with
redundancies into an equivalent set of
relations that have less redundancies
transformation ? projection
input one relation, say R
output many relations, say R1, , Rn
equivalent ? non-loss decomposition
R1 join R2 join Rn R
R1, , Rn should have normal forms higher than or
equal to that of R

10
Non-loss decomposition

semantic assumptions
exercise

11
Lossy decomposition

semantic assumptions
exercise

12
Heaths theorem

can be used as the basis for normalisation
theorem
suppose
R (A, B, C), where A, B and C are disjoint sets
of attributes
A?B
then
R (A, B) join (A, C)
state in English

13
Normalisation rules of thumb

take as basis for normalisation/Heaths theorem a
problematic FD A?B
maximise B when applying Heaths theorem, on the
basis of A?B
Example (stud_id, stud_name, module_id,
module_name, level, result)
FDsmodule_id ?module_name FDs are
replaced by what?
module_id ?level
module_id, stud_id ? result
stud_id ? stud_name
try to maintain a one-to-one correspondence with
real life entities

14
Normalisation

steps
semantic assumptions
FDs
CKs
decomposition

15
Simple example

(M_id, M_name, Type, Value)
M_id ? M_name
M_id ? Type
M_id ? Value
Type ? Value
not BCNF
Apply Heaths theorem for Type ? Value
(Type, Value)
(M_id, M_name, Type)
both relations are now in BCNF

16
3
17
Example (R)

(project, task, max-budget, duration,
payment-rate, contractor, contr-time)

FDs (project, task) ? max_budget, duration
(task, max_budget, duration) ? payment_rate
(project, task, contractor) ? contr_time
(project, task, max-budget, duration,
payment-rate, contractor, contr-time)
18
Example decomposition for R

Heaths theorem for R (the initial relation)
based on
task, max_budget, duration ? payment_rate
leads to
R1 (task, max_budget, duration, payment_rate)
R2 (project, task, max_budget, duration,
contractor, contr_time)
R1 is in BCNF
R2 is not in BCNF, due to
project, task ? max_budget, duration

19
Example decomposition for R2

Heaths theorem for R2, based on
project, task ? max_budget, duration
leads to
R21 (project, task, max_budget, duration)
R22 (project, task, contractor, contracted_time)
R21 is in BCNF
R22 is in BCNF

20
Example solution

(task, max_budget, duration, payment_rate)
(project, task, max_budget, duration)
(project, task, contractor, contracted_time)

21
Exercise

Consider the following table R with FDs.
Decompose it in BCNF.
(patient, drug_id, drug_name, admin, start, end,
dosage, special_diet)
1) drug_id ? drug_name
2) drug_id ? admin
3) patient, drug_id ? start, end, dosage
4) drug_id, dosage ? special_diet

22
4
23
Decomposition 2 or more solutions

in the normalisation process, it may be possible
that a certain (non-loss) decomposition yields to
a better solution than another one

24
Decomposition 2 solutions example

Modules(M_id, M_name, Type, Value)
FD Type ? Value M_id ?M_name, Type
solution 1
Modules_Descr(M_id, M_name, Type)
Type_Val(Type, Val)
solution 2
Modules_Descr(M_id, M_name, Type)
Module_Val(M_id, Val)
are they both non-loss? (find appropriate FDs)
is there one better than the other?

25
Solution 1 vs Solution 2

updates
u1 insert the fact that a 3 semester module is
worth 1.5cu
u2 modify 1 semester modules they are not worth
0.5cu any longer, they are 0.75cu
u3 change the type of a module but forget to
change its value
solution 2
u1 and u2 are impossible or difficult to perform
u3 is allowed
solution 1
u1 and u2 are straightforward
u3 is not allowed

26
Solution 1 vs Solution 2

solution 1 more expressive and better
certain facts cannot be expressed in solution 2
e.g. the value of a new type
in solution 2 Type ? Value is lost, so this
constraint must be enforced by the user by
procedural code

27
Independent projections (smaller tables)
M_name
M-id
Type
Value
Solution 1
Solution 2
one FD is lost Type ?Value Additional code using
two tables needs to be written to impose
FD, thus tables are dependent
all FD are preserved via Keys
28
5
29
Normalisation vs dependency preservation

there are cases when there is an either-or
situation regarding the normalisation and the
preserving of functional dependencies
either the relation is normalised and some FDs
are lost
or, some FDs are not lost (they are expressed in
the original relation), but the relation is not
in its higher normal form possible (BCNF)
in this case, no solution is better than the
other
other criteria will have to be considered to
judge better

30
Normalisation vs dependency preservation Example

a patient is treated by a single doctor for a
certain kind of disease
each doctor only treats one kind of disease
a doctor can treat more than one patient
What happens in all possible splits?
consider also (Patient, Disease, Doctor,
Treatment)
with Patient, Disease ? Treatment

31
All possible decompositions
Not acceptable lossy decomposition. Why?
Not acceptable lossy decomposition. Why?
Heaths theorem Doctor ? Disease You get BCNF
Non-loss decomposition
32
BCNF vs dependency preservation
and
do not enforce a FD existing in the original
specification, namely
e.g. a patient can be given two doctors that
treat the same disease (the system will not
disallow this) the constraint would have to be
maintained by additional procedural code
33
BCNF vs dependency preservation

not every FD is preserved through normalisation
In initial table (Patient, Doctor, Disease) not
BCNF
Doctor ? Disease could generate update anomalies
When table is decomposed in BCNF
(Patient, Doctor) (Doctor, Disease)
Doctor ? Disease is preserved
(Patient, Disease) ? Doctor is lost because you
can insert a patient with two doctors, both
treating same patients disease
Thus update (insert) anomalies
this latter FD would not have been preserved via
any decomposition

34

35
Conclusions

normal forms formalization of common sense
art ? engineering
possibility for automation yes, but DB designer
needs to assess result
BCNF
always achievable
not always completely free of update anomalies
because some FD might have been lost

36
Conclusions

Other normal forms based on other concepts
We study normalization up to BCNF
Sufficient for our purposes in database design
With normalization we finish the Database Logical
Design
Next Physical Design

37
Practical work

Assume the table tutors_modules
(lectid, lectname, job, address, modid, modname,
level)
Assumptions
A module can be taught by one ore more tutors
A tutor can teach one or more modules
lectid ? lectname, job, address
modid ? modname, level
Do the following
Normalize the table tutors_modules to BCNF
Connect to the DB server and create the tables
you obtain, and populate them with rows
Write a PHP interface for the normalized tables,
that would display the tutors names in a form.
When you select one tutor name from the form, all
the modules taught by him/her should be displayed
(module names and levels). Hint you may use PHP
code samples from pervious lab
If you finish, tackle Task 3 from the lab handout
for current week