Databases I H.10H.11 Summary FDs and Normalisation presentation

About This Presentation

Transcript and Presenter's Notes

Title: Databases I H.10H.11 Summary FDs and Normalisation

1
Databases I (H.10/H.11)Summary FDs and
Normalisation
versie 2007
Steven KlusenerVrije Universiteit, Amsterdam
2
Topics

Functional dependencies (FDs)
Formal definition of an FD
Deriving FDs from others, or, constructing a
minimal cover
Showing counter examples
Candidate and primary keys, based on FDs
Normalisation 2 NF, 3 NF and BCNF
Decomposition and the lossless join property
A 3NF decomposition algorithm that satisfies the
lossless join property.

3
Functional Dependency (FD)

Given
A relation R with attributes A1, A2, , An
X ? A1, A2, , An and Y ? A1, A2, , An
Y depends functionally on X (notation X ? Y)iff
for each possible extension of R it holds that
?t1,t2 ? R (t1X t2X) ? (t1Y t2Y)
Note that if all tuples in a R have a different
value for X, then the FD X ? Y is trivially
satisfied

4
Minimal set of FDs

Certain functional dependencies can be derived
from others
For example
ENR ? BDATE
(ENR value E1 (John) determines the birth date
1964-08-28)
BDATE ? ZODIAC zodiac sterrenbeeld
(The birth date 1964-08-28 determines the zodiac
value Virgin)
Hence, each employee number determines the zodiac
sign of that employee, so we have ENR ? ZODIAC,
which follows from ENR ? BDATE and BDATE ? ZODIAC
(by transitivity as we will see later)

5
Derivability of FDs, closure

We can derive FDs from others using the
following inteference rules
Reflexive Rule the FD X ? Y is given, and hence
derivable,
for any Y ? X.
Augmentation Rule the FD XZ ? YZ is derivable
from
the FD X ? Y.
Transitive Rule the FD X ? Z is derivable from
the two FDs X ? Y and Y? Z
From these rules other interference rules can be
obtained, such as
Decomposition Rule the FD X ? Y is derivable
from
the FD X ? YZ.
Additive Rule the composed FD X ? YZ is
derivable from
the two FDs X ? Y and X ? Z.
From a set of FDs F we write F for the set of
all FDs that can be derived from F, F is called
the closure of F

6
Minimal set of FDs

A set F of functional dependencies is minimal
iff
Every FD in F is of the form X ? A, where A is an
attribute
We cannot make any left-hand-side smaller, so if
a rule XY ? A is simplified into X ? A then the
closure gets strictly smaller
We cannot remove FDs from F without losing FDs
in the closure of F
In general, the construction of a minimal cover
does not lead to unique one

7
Keys

Given
A relation R with attributes A1, A2, , An
A minimal set of FDs F
A set of attributes X ? A1, A2, , An
then
X is a (candidate) key of R iff for each i in
1,..,n
1) identification X ? Ai is derivable from F èn
2) minimalitity if Y?X ? Y?X then Y ? Ai is not
derivable from F

8
Some terminology w.r.t. keys

superkey (identification, but no minimality)
Like (ENR,NAME)
candidate key (a candidate, see previous
slide)
Like (ENR)
primary key (the selected candidate key)
alternate/alternative key (all remaining
candidate keys)

9
Categorizing FDs, based on the key

Given a relation R and a minimal set of FDs F
An attribute A is a prime attribute if A ? K, for
some candidate key K, otherwise it is a non-prime
attribute
If attr. A is a non-prime attribute, then X ? A
is a regular FD if X is a candidate key
is a partial FD if X ? K, for some candiate key K
is a transitive FD if all atributes in X are
non-prime
If attr. A is a prime attribute, then X ? A is a
prime FD

10
Normal forms

Given a relation R, a minimal set of FDs F
R is in
2NF, if there are no partial FDs
3NF, there are no partial and no transitive FDs
BCNF, if R is in 3NF, ànd if for every X ? A in F
it holds that X is a key

11
2 NF

2 NF no partial FDs are allowed, if they occur,
then the relation has to be decomposed.
EMP_PROJ(SSN,PNUMBER,HOURS,ENAME,PNAME,PLOC)
With FDs
SSN, PNUMBER ? HOURS
SSN ? ENAME (partial dependency)
PNUMBER ? PNAME, PLOC (partial dependency)
After decomposition
EMP_PROJ1 (SSN,PNUMBER, HOURS)
EMP (SSN, ENAME)
PROJ (PNUMBER, PNAME, PLOC)
Note
If the key contains only one attribute, the 2NF
property holds trivially
Be sure that you have lossless projection (to be
discussed later)

12
3 NF

3 NF no transitive FDs are allowed, if they
occur, then the relation has to be decomposed.
EMP_DEPT(SSN, ENAME, BDATE, DNUMBER, DNAME,
DMGRSSN)
With FDs
SSN ? ENAME, BDATE, DNUMBER
DNUMBER ? DNAME, DMGRSSN (transitive
dependency)
After decomposition
EMP (SSN, ENAME, BDATE, DNUMBER)
DEPT (DNUMBER, DNAME, DMGRSSN)
Rephrasing the 3 NF property for every non-prime
attribute A and FD X ? A in F, X must be a
candidate key

13
BCNF

For every FD X ? A, X must be a superkey
TEACH(STUDENT, COURSE, INSTRUCTOR)
With FDs
STUDENT, COURSE ? INSTRUCTOR
INSTRUCTOR ? COURSE (INSTRUCTOR is not a key)
Decomposition is not trivial
S-I-1(STUDENT, INSTRUCTOR) and S-C-1(STUDENT,
COURSE)
C-I-2(INSTRUCTOR, COURSE) and C-S-2(COURSE,
STUDENT)
I-C-3(INSTRUCTOR, COURSE) and I-S-3(INSTRUCTOR,
STUDENT)
Conclusion
FD1 is lost in all three cases (no subrelation
contains all three attributes)
I-C-3/I-S-3 is the best, because it is
non-additive (to be discussed later)

14
Dependency preservation during decomposition

During normalisation we decompose relations into
subrelations, until we arrive at the right level
However, we have to take care this decomposition
guarantees
Lossless joins we must be able to construct the
original relation from joining the subrelations
In other words, joining the subrelations may not
introduce spurious tuples

15
Example Natural Join (?)

EMP DEPT
E ENAME BDATE D D DNAME
BUDGET
E1 John 28-08-1964 D1 D1
engineering 500,000
E2 Joe 04-04-1968 D1 D2 sales
200,000
E3 Jack 03-09-1969 D1
E4 Will 21-03-1971 D2
E5 Bridget 22-01-1972 D2
EMP ? DEPT Join of EMP and DEPT, combine every
tuple from EMP and tuple of DEPT if their common
attributes (here D) have the same value
EMP ? DEPT
E ENAME BDATE D NAME BUDGET
E1 John 28-08-1964 D1 engineering 500,000
E2 Joe 04-04-1968 D1 engineering 500,000
E3 Jack 03-09-1969 D1 engineering 500,000
E4 Will 21-03-1971 D2 sales 200,000
E5 Bridget 22-01-1972 D2 sales 200,000

16
(No Transcript)
17
(No Transcript)
18
(No Transcript)
19
(No Transcript)
20
(No Transcript)
21
Dependency preservation during decomposition

During normalisation we decompose relations into
subrelations, until we arrive at the right level
However, we have to take care this decomposition
guarantees
Lossless joins we must be able to construct the
original relation from joining the subrelations
In other words, joining the subrelations may not
introduce spurious tuples

22
Lossless-join, decomp. into 2 projections

D is a lossless join decomposition w.r.t. F iff
(R1 ? R2) ? R1 ? F, òr
(R1 ? R2) ? R2 ? F
R DPD_EMP E, DPD_N, REL, EMP_N, BDATE,
Dwith F E, DPD_N ? REL, E ?
EMP_N, E ? BDATE, E ? D
R1 DPD E, DPD_N, RELR2 EMP E,
EMP_N, BDATE, DR1 ? R2 E and E ?
E, EMP_N, BDATE, D (EMP)Hence this
decomposition of DPD_EMP is lossless w.r.t. F

23
Ex. lossless-join (with n projections) (1/5)

DPD_EMP_DPM (E, DPD_N, REL, EMP_N, BDATE,
D, DPM_N, BUDGET)
with F E, DPD_N ? REL, D ? DPM_N,
E ? EMP_N, D ? BUDGET, E ?
BDATE, DPM_N ? D, E ? D, DPM_N ?
BUDGET DPD (E, DPD_N, REL)EMP (E,
EMP_N, BDATE, D)DEPT (D, DPM_N, BUDGET)

24
Ex. lossless-join (with n projections) (2/5)

DPD_EMP_DPM (E, DPD_N, REL, EMP_N,
BDATE, D, DPM_N, BUDGET)
with F E, DPD_N ? REL, D ? DPM_N, E
? EMP_N, D ? BUDGET, E ? BDATE, DPM_N
? D, E ? D, DPM_N ? BUDGET
INITIAL MATRIX ai for each relation/row that
has a certain attribute (whit index i)
E DPD_N REL EMP_N BDATE D DPM_N BUDGET
DPD a1 a2 a3 b14 b15 b16 b17 b18
EMP a1 b22 b23 a4 a5 a6 b27 b28
DEPT b31 b32 b33 b34 b35 a6 a7
a8

25
Ex. lossless-join (with n projections) (3/5)

DPD_EMP_DPM (E, DPD_N, REL, EMP_N,
BDATE, D, DPM_N, BUDGET)
met F E, DPD_N ? REL, D ? DPM_N, E ?
EMP_N, D ? BUDGET, E ? BDATE, DPM_N ?
D, E ? D, DPM_N ? BUDGET
E DPD_N REL EMP_N BDATE D DPM_N BUDGET
DPD a1 a2 a3 b14 b15 b16 b17 b18
EMP a1 b22 b23 a4 a5 a6 b27 b28
DEPT b31 b32 b33 b34 b35 a6 a7
a8
N.B. After applying the FD E ? EMP_N, BDATE, D

26
Ex. lossless-join (with n projections) (4/5)

DPD_EMP_DPM (E, DPD_N, REL, EMP_N,
BDATE, D, DPM_N, BUDGET)
met F E, DPD_N ? REL, D ? DPM_N, E ?
EMP_N, D ? BUDGET, E ? BDATE, DPM_N ?
D, E ? D, DPM_N ? BUDGET
E DPD_N REL EMP_N BDATE D DPM_N BUDGET
DPD a1 a2 a3 a4 a5 a6 b17 b18
EMP a1 b22 b23 a4 a5 a6 b27 b28
DEPT b31 b32 b33 b34 b35 a6 a7
a8
N.B. After applying the FD E ? EMP_N, BDATE, D

27
Ex. lossless-join (with n projections) (5/5)

DPD_EMP_DPM (E, DPD_N, REL, EMP_N,
BDATE, D, DPM_N, BUDGET)
met F E, DPD_N ? REL, D ? DPM_N, E ?
EMP_N, D ? BUDGET, E ? BDATE, DPM_N ?
D, E ? D, DPM_N ? BUDGET
E DPD_N REL EMP_N BDATE D DPM_N BUDGET
DPD a1 a2 a3 a4 a5 a6 a7 a8
EMP a1 b22 b23 a4 a5 a6 a7 a8
DEPT b31 b32 b33 b34 b35 a6 a7
a8
N.B. After applying the FDs E ? EMP_N, BDATE,
D and D ?
DPM_N, BUDGET
Lossless-join eigenschap has been shown row DPD
contains only as!

28
Algorithm lossless-join

Does the decomposition R1, , Rk of R satisfy
the lossless-join property w.r.t. m.b.t. A set of
FDs F?
1) construct the initial matrix, consisting of
A row for each subrelation Ri
A column for each attribute of R
For each entry, thus for each row i and column j
Put ai in the entry, if row i has the attribute
of column j
Otherwise, put bij in this entry
2) apply each FD X ? Y ? F (see below) until
Either one row is of the form a1, , an,, and the
composition is lossless indeed
Or, applying the FDs again will not change the
matrix anymore, and the composition is not
lossless
Applying FD X ? YFor all rows that have the
same values for the attributes in X, make the
attributes in Y equal as well (first start with
the aj, otherwise try the bij)

29
3NF decomposition algoritme (lossless d.p.)

Given a relation scheme R and a set of FDs F
Construct a minimal cover of F, call it G
For each Xi ? Ai in G, construct a subrelation
scheme XiAi
If Xi Xj for two subrelation schemes (XiAi and
XiAj), merge them into XiAiAj.
Construct one relation scheme X, where X is the
prime key of R.
Check whether there are still remaing attributes
(which are not yet covered by the earlier steps),
put them in separate subrelation schemes
N.B. Steps 2 and 3 can also be combined into one
step.

30
Final remarks

With this material one must be capable to
recognize potential redundancies, and to avoid
them to a certain extent
Normalisation can lead to a large number of
small tables (i.e., tables with a small number of
attributes) which have to be combined with others
to obtain proper data. This may cost performance
and maintenance overhead.
Hence, it is up to the database designer to
decide whether these potential redundancies have
to be resolved or not.

Write a Comment

User Comments (0)

About PowerShow.com

Databases I H.10H.11 Summary FDs and Normalisation PowerPoint PPT Presentation