Title: Normalisation
1Normalisation
5
2Outline
- Boyce-Codd Normal Form (BCNF)
- normalisation
- non-loss decomposition
- Heaths theorem
- normalisation process
- semantic assumptions and FDs
- CKs
- decomposition
- normalisation vs dependency preservation
- a decomposition may yield to a better solution
than another one - either-or situations normalise or preserve FDs
31
42NF and 3NF equivalent definitions
- 2NF (in Connollys book)
- a relation is in 2NF if and only if it is in 1NF
and all non-primary key attributes are
irreducibly dependent (or fully functional
dependent that is, the functional dependency is
left irreducible) on the primary key. - 3NF (in Connollys book)
- a relation is in 3NF if and only if it is in 2NF
and no non-primary key attribute A is
transitively dependent on the primary key X (that
is, if one has a FD X-gtY one cannot find a FD
Y-gtA that makes A to depend on X via Y). - These definitions can be generalised if one has
more than one CK in the table (see Connollys)
5BCNF
- a relation is in Boyce/Codd normal form (BCNF) if
and only if every non-trivial irreducible FD has
a candidate key as its determinant - informally
- the determinant of each relevant FD is a CK
6Example
- (M_id, M_name, Type, Value)
- M_id ? M_name
- M_id ? Type
- M_id ? Value
- Type ? Value problematic FD since Type is
not key - not BCNF
- (Type, Value)
- (M_id, M_name, Type)
- both in BCNF
7BCNF
- any relation can be non-loss decomposed into an
equivalent set of BCNF relations - BCNF ? 3NF ? 2NF ? 1NF
- BCNF is still not guaranteed to be free of any
update anomalies
82
9Normalisation
- the process of transforming a relation with
redundancies into an equivalent set of
relations that have less redundancies - transformation ? projection
- input one relation, say R
- output many relations, say R1, , Rn
- equivalent ? non-loss decomposition
- R1 join R2 join Rn R
- R1, , Rn should have normal forms higher than or
equal to that of R
10Non-loss decomposition
- semantic assumptions
- exercise
11Lossy decomposition
- semantic assumptions
- exercise
12Heaths theorem
- can be used as the basis for normalisation
- theorem
- suppose
- R (A, B, C), where A, B and C are disjoint sets
of attributes - A?B
- then
- R (A, B) join (A, C)
- state in English
13Normalisation rules of thumb
- take as basis for normalisation/Heaths theorem a
problematic FD A?B - maximise B when applying Heaths theorem, on the
basis of A?B - Example (stud_id, stud_name, module_id,
module_name, level, result) - FDsmodule_id ?module_name FDs are
replaced by what? - module_id ?level
- module_id, stud_id ? result
- stud_id ? stud_name
- try to maintain a one-to-one correspondence with
real life entities
14Normalisation
- steps
- semantic assumptions
- FDs
- CKs
- decomposition
15Simple example
- (M_id, M_name, Type, Value)
- M_id ? M_name
- M_id ? Type
- M_id ? Value
- Type ? Value
- not BCNF
- Apply Heaths theorem for Type ? Value
- (Type, Value)
- (M_id, M_name, Type)
- both relations are now in BCNF
163
17Example (R)
- (project, task, max-budget, duration,
payment-rate, contractor, contr-time)
FDs (project, task) ? max_budget, duration
(task, max_budget, duration) ? payment_rate
(project, task, contractor) ? contr_time
(project, task, max-budget, duration,
payment-rate, contractor, contr-time)
18Example decomposition for R
- Heaths theorem for R (the initial relation)
based on - task, max_budget, duration ? payment_rate
- leads to
- R1 (task, max_budget, duration, payment_rate)
- R2 (project, task, max_budget, duration,
contractor, contr_time) - R1 is in BCNF
- R2 is not in BCNF, due to
- project, task ? max_budget, duration
19Example decomposition for R2
- Heaths theorem for R2, based on
- project, task ? max_budget, duration
- leads to
- R21 (project, task, max_budget, duration)
- R22 (project, task, contractor, contracted_time)
- R21 is in BCNF
- R22 is in BCNF
20Example solution
- (task, max_budget, duration, payment_rate)
- (project, task, max_budget, duration)
- (project, task, contractor, contracted_time)
21Exercise
- Consider the following table R with FDs.
Decompose it in BCNF. - (patient, drug_id, drug_name, admin, start, end,
dosage, special_diet) - 1) drug_id ? drug_name
- 2) drug_id ? admin
- 3) patient, drug_id ? start, end, dosage
- 4) drug_id, dosage ? special_diet
224
23Decomposition 2 or more solutions
- in the normalisation process, it may be possible
that a certain (non-loss) decomposition yields to
a better solution than another one
24Decomposition 2 solutions example
- Modules(M_id, M_name, Type, Value)
- FD Type ? Value M_id ?M_name, Type
- solution 1
- Modules_Descr(M_id, M_name, Type)
- Type_Val(Type, Val)
- solution 2
- Modules_Descr(M_id, M_name, Type)
- Module_Val(M_id, Val)
- are they both non-loss? (find appropriate FDs)
- is there one better than the other?
25Solution 1 vs Solution 2
- updates
- u1 insert the fact that a 3 semester module is
worth 1.5cu - u2 modify 1 semester modules they are not worth
0.5cu any longer, they are 0.75cu - u3 change the type of a module but forget to
change its value - solution 2
- u1 and u2 are impossible or difficult to perform
- u3 is allowed
- solution 1
- u1 and u2 are straightforward
- u3 is not allowed
26Solution 1 vs Solution 2
- solution 1 more expressive and better
- certain facts cannot be expressed in solution 2
e.g. the value of a new type - in solution 2 Type ? Value is lost, so this
constraint must be enforced by the user by
procedural code
27Independent projections (smaller tables)
M_name
M-id
Type
Value
Solution 1
Solution 2
one FD is lost Type ?Value Additional code using
two tables needs to be written to impose
FD, thus tables are dependent
all FD are preserved via Keys
285
29Normalisation vs dependency preservation
- there are cases when there is an either-or
situation regarding the normalisation and the
preserving of functional dependencies - either the relation is normalised and some FDs
are lost - or, some FDs are not lost (they are expressed in
the original relation), but the relation is not
in its higher normal form possible (BCNF) - in this case, no solution is better than the
other - other criteria will have to be considered to
judge better
30Normalisation vs dependency preservation Example
- a patient is treated by a single doctor for a
certain kind of disease - each doctor only treats one kind of disease
- a doctor can treat more than one patient
- What happens in all possible splits?
- consider also (Patient, Disease, Doctor,
Treatment) - with Patient, Disease ? Treatment
31All possible decompositions
Not acceptable lossy decomposition. Why?
Not acceptable lossy decomposition. Why?
Heaths theorem Doctor ? Disease You get BCNF
Non-loss decomposition
32BCNF vs dependency preservation
and
do not enforce a FD existing in the original
specification, namely
e.g. a patient can be given two doctors that
treat the same disease (the system will not
disallow this) the constraint would have to be
maintained by additional procedural code
33BCNF vs dependency preservation
- not every FD is preserved through normalisation
- In initial table (Patient, Doctor, Disease) not
BCNF - Doctor ? Disease could generate update anomalies
- When table is decomposed in BCNF
- (Patient, Doctor) (Doctor, Disease)
- Doctor ? Disease is preserved
- (Patient, Disease) ? Doctor is lost because you
can insert a patient with two doctors, both
treating same patients disease - Thus update (insert) anomalies
- this latter FD would not have been preserved via
any decomposition
34 35Conclusions
- normal forms formalization of common sense
- art ? engineering
- possibility for automation yes, but DB designer
needs to assess result - BCNF
- always achievable
- not always completely free of update anomalies
- because some FD might have been lost
36Conclusions
- Other normal forms based on other concepts
- We study normalization up to BCNF
- Sufficient for our purposes in database design
- With normalization we finish the Database Logical
Design - Next Physical Design
37Practical work
- Assume the table tutors_modules
- (lectid, lectname, job, address, modid, modname,
level) - Assumptions
- A module can be taught by one ore more tutors
- A tutor can teach one or more modules
- lectid ? lectname, job, address
- modid ? modname, level
- Do the following
- Normalize the table tutors_modules to BCNF
- Connect to the DB server and create the tables
you obtain, and populate them with rows - Write a PHP interface for the normalized tables,
that would display the tutors names in a form.
When you select one tutor name from the form, all
the modules taught by him/her should be displayed
(module names and levels). Hint you may use PHP
code samples from pervious lab - If you finish, tackle Task 3 from the lab handout
for current week