PPT – Chapter 10. Database Design & Normalisation - Lecture Material PowerPoint presentation

About This Presentation

Title:

Chapter 10. Database Design & Normalisation - Lecture Material

Description:

Chapter 10. Database Design & Normalisation - Lecture Material This chapter covers database design and normalisation. 10-1. Introduction to Database Design – PowerPoint PPT presentation

Number of Views:122

Avg rating:3.0/5.0

Slides: 42

Provided by: computin68

Category:

more less

Transcript and Presenter's Notes

Title: Chapter 10. Database Design & Normalisation - Lecture Material

1

Chapter 10. Database Design Normalisation -
Lecture Material
This chapter covers database design and
normalisation.
10-1. Introduction to Database Design
10-2. 3NF, 2NF and 1NF
10-3. BCNF
10-3-1. BCNF Example 1
10-3-2. BCNF Example 2
10-3-3. BCNF Example 3
10-3-4. BCNF Example 4
10-4. 4NF
10-5. 5NF
10-6. Database Design
Sources Elmasri Navathe pp 391 - 445

10-1. Introduction to Database Design
An important part of database design is deciding
on a suitable logical structure or schema to
implement ... called database design.
Considering supplier parts example (S,P,SP) there
is a feeling of correctness.

3
Introduction to Database Design

Good database design means
Attributes with meaning
Limited redundant information
Limited NULL values
Disallow possibility of spurious tuples

10-1. Introduction to Database Design
Normalisation theory is a formalism of simple
ideas with a practical application in logical
database schema design.
Normalisation theory should allow us to recognise
relations with undesirable properties, tell us
what is "wrong" how to "correct" it.

10-1. Introduction to Database Design
Normalisation theory is built around normal forms
- each normal form has a set of satisfiable
criteria.
Normal forms exist in a hierarchy
1NF? 2NF? 3NF? BCNF? 4NF? PJ/NF (5NF)
Codd defined 1NF, 2NF, 3NF in 1972.
3NF had inadequacies so it was revised in 1974 by
Boyce/Codd (BCNF).
1977 Fagin defined 4NF, 1979 defined 5NF.
6NF,7NF ?... dependencies theory suggests there
may be higher NFs but not practicable in database
environment.

10-1. Introduction to Database Design
DB designers should aim for higher NFs but this
is not law - just recommended as normalisation
simply provides guidelines for database design.
There are often good reason for not using
normalisation theory.
In order to describe the various normal forms we
must first introduce some definitions

10-1. Introduction to Database Design
Functional DependencyGiven relation R, attribute
Y of R is functionally dependent on X of R, R.X
-gt R.Y, or R.X functionally determines R.Y ...
... iff each R.X value has associated with it
precisely one R.Y value, where X and/or Y may be
composite.
S.SNAME, S.STATUS and S.CITY are each
functionally dependent on S.S
If R.X is a candidate key or if R.X is the
primary key, then all R.Y must be functionally
dependent on R.X
In SP we have a composite primary key so
SP.(S,P) ? SP.QTY

10-1. Introduction to Database Design
There is no requirement in the definition of
functional dependence that R.X be a candidate
key, thus
R.X ? R.Y iff whenever 2 tuples of R.X are the
same then the corresponding R.Y values are also
the same.
R.Y is fully functionally dependent on R.X iff it
is functionally dependent on R.X and not fully
functionally dependent on any subset of R.X
S.(S,STATUS) ? S.CITY is true but not full
functional dependence as S.S -gt S.CITY
If R.X ? R.Y but not fully then R.X must be
composite

9
Normalisation Normal Forms

Define 3NF where we want to get to (1972)
Demonstrate how 3NF defn is broken and the
problems with this 1NF
Fix problems by defining the FD and breaking up
table 2NF
Problems with 2NF
Fix problems with 3NF

10
3NF, 2NF and 1NF

1NF
Nearly assumed Disallow multi-valued or
composite attributes
2NF
Every non-key attribute is fully functionally
dependent on the primary key
City ? Status

10-2. 3NF, 2NF and 1NF
Definition 1 3NFA relation R is in 3NF iff the
nonkey attributes of R are mutually independent
and fully dependent on the primary key of R
Nonkey in this sense means not part of the
primary key and mutually independent means none
of the attributes are functionally dependent on
any others.
P(P,PNAME,COLOUR,WEIGHT) is 3NF because we can
change nonkey attributes independently and all
are functionally dependent on P.

In order to show 2NF let us unite S and SP to
get
FIRST(S,STATUS,CITY,P,QTY)
We also introduce a new constraint such that
STATUS is functionally dependent on CITY, eg
London suppliers have status 10, alwaysParis
suppliers have status 20, alwaysMunich suppliers
have status 20, always etc ...
Primary key is (S,P) and the functional
dependency diagram is ...

A relation in 3NF has arrows out of primary key
only.
In FIRST, additional arrows cause trouble as the
nonkey attributes are not mutually independent
and not all attributes are dependent on the
primary key.
What are the difficulties with this relation
anyway ?
The problem with FIRST is that it stores
redundant information which can lead to update
anomalies as follows

INSERT Cannot insert the fact that a supplier
exists until that supplier actually makes a
shipment
DELETE Deleting the last tuple based on S,P
could lose the information that S3 is located in
CITY
UPDATE CITY values occur for each shipment thus
an update of CITY is unnecessarily expensive.

One solution is to replace FIRST by
SECOND(S,STATUS,CITY) and SP(S,P,QTY)
This yields the following FD diagram

This is appealing as follows INSERT can enter
the fact that S5 is in ATHENS without S5 actually
having to make a shipment DELETE can delete
shipment tuples and not lose location
information UPDATE information appears once only
thus updating is more efficient
16

10-2. 3NF, 2NF and 1NF
Definition of 2NFA relation R is in 2NF iff it
is in 1NF and every nonkey attribute is fully
dependent on the primary key.
A relation in 1NF and not in 2NF can be reduced
to a collection of 2NF relations by replacing by
an equivalent collection of projections.
Equivalence here means 1NF form can be obtained
from the 2NF forms by taking the natural joins so
the decomposition is non-loss and reversible.
What makes the 2NF form more desirable is that it
can represent information which is not
representable in the 1NF form
SECOND(S,STATUS,CITY) and SP(S,P,QTY)

10-2. 3NF, 2NF and 1NF
The relation SECOND still has some problems
however, it lacks mutual independence among the
nonkey attributes.

10-2. 3NF, 2NF and 1NF
The dependency of STATUS on S is fully
functional but is transitive via CITY and this
can lead to update anomalies as follows
INSERT cannot enter status of a supplier in Rome
as 50 until we have some supplier in Rome
DELETE if we delete the only SECOND tuple of a
city we destroy the status of that city
UPDATE STATUS of a CITY appears many times

10-2. 3NF, 2NF and 1NF
So we replace SECOND by
SC(S,CITY) and CS(CITY,STATUS)
This gives the following FD diagram
In SECOND, STATUS did not describe the entity
identified by the primary key (S).

10-2. 3NF, 2NF and 1NF
Definition 4 3NFRelation R is in 3NF iff it is
in 2NF and every nonkey attribute is
non-transitively dependent on the primary key.
A relation in 2NF and not in 3NF can be
(reversibly) non-loss reduced to an equivalent
collection of 3NF relations able to contain
information not representable in the 2NF form.

10-2. 3NF, 2NF and 1NF
Normalisation is a semantic notion, not to do
with the values that happen to exist so it is not
possible to look at the values in a table and say
it is in 3NF without knowing about dependencies
and meaning of data.

10-3. BCNF
3NF has the following inadequacies in that it
cannot handle cases of relations with
multiple candidate keys where
candidate keys are composite
candidate keys overlap
The above combination of events do not occur very
often in practice, but they are not contrived and
they do exist.
BCNF was defined to address the above and the
definition of BCNF is stronger than that of 3NF.

10-3. BCNF
A functional determinant is an attribute on which
some other attribute is fully functionally
dependent.
Definition of BCNFA relation R is in BCNF iff
every determinant is a candidate key ... not just
primary keys!
This is a simpler definition than 3NF with no
references to 1NF or 2NF or transitive
dependencies.

10-3. BCNF
Now for some confusion ...
Textbooks sometimes differ in their definitions
of 3NF and whether a relation in 3NF is also in
BCNF !
The precise and exact definitions do not assume
that each R has exactly one CK, the PK, as is
done in most textbooks, and those definitions are
as follows
2NF 1NF and each non-prime attribute is FFD on
each CK
3NF 2NF and none of the non-prime attributes
are transitively dependent on any CKs
Here non-prime is not part of any candidate key.
However many textbooks simplify the definitions
by assuming that each R has one CK which is the
PK !

10-3. BCNF
Any given relation can be non-loss decomposed
into an equivalent collection of BCNF relations.
FIRST,SECOND are not in BCNF
SP,SC,CS are in BCNF
Lets illustrate BCNF with a set of examples, some
in BCNF, some not.

10-3-1. BCNF Example 1
S(S, SNAME, STATUS, CITY)
S and SNAME are both candidate keys, i.e.
numbers and names of suppliers are both unique.
STATUS and CITY are mutually independent, with
the FD diagram ...
S is in BCNF as the only determinants are
candidate keys.
In S, candidate keys are atomic and thus
non-overlapping

10-3-2. BCNF Example 2
SSP(S, SNAME, P, QTY)
Candidate keys are (S, P) and (SNAME, P), say
1st is primary key with FD diagram ...
Not in BCNF as 2 determinants, S,SNAME are not
candidate keys so the table will contain
redundancies and have certain update anomalies.

10-3-3. BCNF Example 3
SJT(S, J, T)
Here student S is taught subject J by teacher T
with the following constraints
1. For each subject each student is taught by
only 1 teacher2. Each teacher teaches only 1
subject3. Each subject taught by several
teachers
This is a bit like secondary school, with the
following FD diagram ...

Here we have two overlapping candidate keys (S,
J) and (S, T) and SJT is in 3NF but it is not in
BCNF so we could get update anomalies caused by T
being a determinant but not a CK (Candidate Key).
Solution replace SJT by 2 projections
ST(S, T) and TJ(T, J)

10-3-4. BCNF Example 4
EXAM(S, J, P)
Here student S was examined in subject J and
achieved rank position P in the class with the
constraint that there are no ties for positions.
This yields the following FD diagram ....
Here we have composite and overlapping candidate
keys (S, J) and (J, P) but just because we have
such a situation does not mean we need to
normalise because EXAM is already in BCNF !

10-4. 4NF
Consider the following ...
course (C) taught by one of a set of teachers (T)
for each course there is a repeating set of
recommended textbooks (X)
for each course there may be any numbers of
teachers and any numbers of recommended texts
teachers and texts are independent
teachers can be associated with any number of
courses
This corresponds closely with a large secondary
school with DoE recommended textbooks, teachers
doubling up and many teachers.
We could "flatten" this information into a 1NF
relation called CTX.

10-4. 4NF
There are no FDs in data so no basis for
decomposition but there is still some redundancy
in CTX.
If (C1, T1, X1) and (C1, T2, X2) then there must
also be the following tuples present ... (C1, T1,
X2) and (C1, T2, X1) !
This is redundancy and thus we can have update
anomalies.
For example, if we add (Geography, Ryan, Holland)
and (Geography, Scott, Gaines) then we must also
add (Geography, Ryan, Gaines) and (Geography,
Scott, Holland).
Examining the criteria for normal forms however
we find CTX is (trivially) in BCNF as the 3
attributes make up the sole CK !

It would be desirable to decompose CTX into
CT(Course, Teacher) and CX(Course, Text)
Both of these are in BCNF as both are "all key".

So CTX would be represented as CV
CT
CX
34

10-4. 4NF
This decomposition is based on Fagin's
multi-valued dependencies (MVDs).
course -gt-gt teachercourse -gt-gt text
A course does not have a single corresponding
teacher, it has a well-defined set of teachers
and for a course c and text x the set of teachers
depends on the value of c, independent of x.
Definition of 4NFA relation R is in 4NF if it is
in BCNF and all MVDs are FDs.
CTX is not in 4NF CT, CX are in 4NF 4NF is
more desirable as it eliminates redundancies

35
4NF

For R with attributes A, B C (which may be
composite !)
R.A -gt-gt R.B
iff the set of R.B values match (Avalue, Cvalue)
in R and this depends on A, independent of C
R must have at least 3 attributes and for R(A, B,
C) then R.A -gt-gt R.B holds iff
R.A -gt-gt R.C also holds
MVDs always go in pairs
If R.A -gt-gt R.B R.C then R can be non-loss
decomposed into R1(A, B) and R2(A, C)

10-5. 5NF
Some relations cannot be non-loss decomposed by
projection into 2 relations but can be composed
by projection into 3 relations ... called
n-decomposable for n gt 2.
5NF is more theoretical than real ... anyway here
is an example
SPJ is a relation about suppliers, parts and
projects.
1. Smith supplies wrenches2. wrenches are used
in Block23. Smith supplies Block2
If 1,2 3 hold then Smith supplies wrenches to
Block2 also holds as true.

37
5NF

Normally this implication does not hold, but if
it does we call it a JOIN dependency and SPJ is a
JD over (SP,PJ,JS) and should be decomposed into
3 relations yielding three relations, all in 5NF.
SPJ is not in 5NF because it has a join
dependency but discovering such JDs is not easy
... this is because FDs and even MVDs have a
straightforward real-world interpretation whereas
a JD does not.
If R is in 5NF then it is also in 4NF.

10-6. Database Design
Database design is all about designing a schema
of tables which captures all information needs
from the portion of the real world being
modelled, in such a way that no unnecessary
redundancies are stored which could lead to
update anomalies.
Reasoning about the normal forms of tables in a
schema helps us determine if update anomalies can
occur in theory.
Database design is a give-and-take task, fluid,
revised continuously as users needs change and
the information being modelled changes.
A database design is never complete, it is always
evolving.
The task of database design is separate but
related to the task of the DBA.

10-6. Database Design
If the design of a database has commenced with
the construction of an E-R diagram, then this can
be used to determine a first version of the
relational schema, but only a first version.
Turning E-R entities into relational tables is
easy
Turning E-R 11 relationships into tables is also
easy by storing the PK of one entity as an FK
attribute of the other ... which one to embed in
the other affects performance of queries, choice
of database designer
Turning E-R 1N relationships into tables is done
by placing the PK of the relation representing
the parent entity as a FK in the relation
representing the child entity ... unlike 11, it
does matter which is FK embedded in the other.
Turning E-R MN relationships into tables is done
by creating an additional table, an intersection
relation, to represent the relationship itself
... i.e. decompose the MN into two 1N
relationships. The PK of the new relation is the
combination of PKs of its "parents".
Representing recursive relationships, which can
be 11, 1N or NM, is done by embedding key for
one in itself (11 and 1N) or creating an
additional table (NM) ... so the fact that it is
recursive is actually not important.

40
Database Design

Having gone through the effort of an E-R
modelling exercise and then the generation of a
first approximation at a database schema, this
first version may then be refined or
de-normalised.

10-6. Database Design
Given R in 1NF and FDs, MVDs and JDs, we
systematically reduce R to a collection of
smaller relations which are "more desirable", by
taking projections in order to eliminate
redundancy and the possibility of update
anomalies.
But these are guidelines only and don't always
have to be followed often we want to
de-normalise a database design.
NADDR(name, street, city, state, zip)
Name is PK and all others are FDs on name ...
assume also that zip -gt (city, state), i.e. city
names are unique within a state ...

NADDR is in 2NF and normalisation guidelines say
to decompose into nsz(name, street, zip) and
zcs(zip, city, state).
Both of these would be in 3NF but this is
impractical in practice.
The most regular use of the data is to get access
to all field values in one go and having to join
nsz and zcs for every address lookup is
inefficient ... so ... de-normalise.

Write a Comment

User Comments (0)