Databases I H.10H.11 Summary FDs and Normalisation PowerPoint PPT Presentation

presentation player overlay
1 / 30
About This Presentation
Transcript and Presenter's Notes

Title: Databases I H.10H.11 Summary FDs and Normalisation


1
Databases I (H.10/H.11)Summary FDs and
Normalisation
versie 2007
Steven KlusenerVrije Universiteit, Amsterdam
2
Topics
  • Functional dependencies (FDs)
  • Formal definition of an FD
  • Deriving FDs from others, or, constructing a
    minimal cover
  • Showing counter examples
  • Candidate and primary keys, based on FDs
  • Normalisation 2 NF, 3 NF and BCNF
  • Decomposition and the lossless join property
  • A 3NF decomposition algorithm that satisfies the
    lossless join property.

3
Functional Dependency (FD)
  • Given
  • A relation R with attributes A1, A2, , An
  • X ? A1, A2, , An and Y ? A1, A2, , An
  • Y depends functionally on X (notation X ? Y)iff
    for each possible extension of R it holds that
  • ?t1,t2 ? R (t1X t2X) ? (t1Y t2Y)
  • Note that if all tuples in a R have a different
    value for X, then the FD X ? Y is trivially
    satisfied

4
Minimal set of FDs
  • Certain functional dependencies can be derived
    from others
  • For example
  • ENR ? BDATE
  • (ENR value E1 (John) determines the birth date
    1964-08-28)
  • BDATE ? ZODIAC zodiac sterrenbeeld
  • (The birth date 1964-08-28 determines the zodiac
    value Virgin)
  • Hence, each employee number determines the zodiac
    sign of that employee, so we have ENR ? ZODIAC,
    which follows from ENR ? BDATE and BDATE ? ZODIAC
    (by transitivity as we will see later)

5
Derivability of FDs, closure
  • We can derive FDs from others using the
    following inteference rules
  • Reflexive Rule the FD X ? Y is given, and hence
    derivable,
  • for any Y ? X.
  • Augmentation Rule the FD XZ ? YZ is derivable
    from
  • the FD X ? Y.
  • Transitive Rule the FD X ? Z is derivable from
  • the two FDs X ? Y and Y? Z
  • From these rules other interference rules can be
    obtained, such as
  • Decomposition Rule the FD X ? Y is derivable
    from
  • the FD X ? YZ.
  • Additive Rule the composed FD X ? YZ is
    derivable from
  • the two FDs X ? Y and X ? Z.
  • From a set of FDs F we write F for the set of
    all FDs that can be derived from F, F is called
    the closure of F

6
Minimal set of FDs
  • A set F of functional dependencies is minimal
    iff
  • Every FD in F is of the form X ? A, where A is an
    attribute
  • We cannot make any left-hand-side smaller, so if
    a rule XY ? A is simplified into X ? A then the
    closure gets strictly smaller
  • We cannot remove FDs from F without losing FDs
    in the closure of F
  • In general, the construction of a minimal cover
    does not lead to unique one

7
Keys
  • Given
  • A relation R with attributes A1, A2, , An
  • A minimal set of FDs F
  • A set of attributes X ? A1, A2, , An
  • then
  • X is a (candidate) key of R iff for each i in
    1,..,n
  • 1) identification X ? Ai is derivable from F èn
  • 2) minimalitity if Y?X ? Y?X then Y ? Ai is not
    derivable from F

8
Some terminology w.r.t. keys
  • superkey (identification, but no minimality)
  • Like (ENR,NAME)
  • candidate key (a candidate, see previous
    slide)
  • Like (ENR)
  • primary key (the selected candidate key)
  • alternate/alternative key (all remaining
    candidate keys)

9
Categorizing FDs, based on the key
  • Given a relation R and a minimal set of FDs F
  • An attribute A is a prime attribute if A ? K, for
    some candidate key K, otherwise it is a non-prime
    attribute
  • If attr. A is a non-prime attribute, then X ? A
  • is a regular FD if X is a candidate key
  • is a partial FD if X ? K, for some candiate key K
  • is a transitive FD if all atributes in X are
    non-prime
  • If attr. A is a prime attribute, then X ? A is a
    prime FD

10
Normal forms
  • Given a relation R, a minimal set of FDs F
  • R is in
  • 2NF, if there are no partial FDs
  • 3NF, there are no partial and no transitive FDs
  • BCNF, if R is in 3NF, ànd if for every X ? A in F
    it holds that X is a key

11
2 NF
  • 2 NF no partial FDs are allowed, if they occur,
    then the relation has to be decomposed.
  • EMP_PROJ(SSN,PNUMBER,HOURS,ENAME,PNAME,PLOC)
  • With FDs
  • SSN, PNUMBER ? HOURS
  • SSN ? ENAME (partial dependency)
  • PNUMBER ? PNAME, PLOC (partial dependency)
  • After decomposition
  • EMP_PROJ1 (SSN,PNUMBER, HOURS)
  • EMP (SSN, ENAME)
  • PROJ (PNUMBER, PNAME, PLOC)
  • Note
  • If the key contains only one attribute, the 2NF
    property holds trivially
  • Be sure that you have lossless projection (to be
    discussed later)

12
3 NF
  • 3 NF no transitive FDs are allowed, if they
    occur, then the relation has to be decomposed.
  • EMP_DEPT(SSN, ENAME, BDATE, DNUMBER, DNAME,
    DMGRSSN)
  • With FDs
  • SSN ? ENAME, BDATE, DNUMBER
  • DNUMBER ? DNAME, DMGRSSN (transitive
    dependency)
  • After decomposition
  • EMP (SSN, ENAME, BDATE, DNUMBER)
  • DEPT (DNUMBER, DNAME, DMGRSSN)
  • Rephrasing the 3 NF property for every non-prime
    attribute A and FD X ? A in F, X must be a
    candidate key

13
BCNF
  • For every FD X ? A, X must be a superkey
  • TEACH(STUDENT, COURSE, INSTRUCTOR)
  • With FDs
  • STUDENT, COURSE ? INSTRUCTOR
  • INSTRUCTOR ? COURSE (INSTRUCTOR is not a key)
  • Decomposition is not trivial
  • S-I-1(STUDENT, INSTRUCTOR) and S-C-1(STUDENT,
    COURSE)
  • C-I-2(INSTRUCTOR, COURSE) and C-S-2(COURSE,
    STUDENT)
  • I-C-3(INSTRUCTOR, COURSE) and I-S-3(INSTRUCTOR,
    STUDENT)
  • Conclusion
  • FD1 is lost in all three cases (no subrelation
    contains all three attributes)
  • I-C-3/I-S-3 is the best, because it is
    non-additive (to be discussed later)

14
Dependency preservation during decomposition
  • During normalisation we decompose relations into
    subrelations, until we arrive at the right level
  • However, we have to take care this decomposition
    guarantees
  • Lossless joins we must be able to construct the
    original relation from joining the subrelations
  • In other words, joining the subrelations may not
    introduce spurious tuples

15
Example Natural Join (?)
  • EMP DEPT
  • E ENAME BDATE D D DNAME
    BUDGET
  • E1 John 28-08-1964 D1 D1
    engineering 500,000
  • E2 Joe 04-04-1968 D1 D2 sales
    200,000
  • E3 Jack 03-09-1969 D1
  • E4 Will 21-03-1971 D2
  • E5 Bridget 22-01-1972 D2
  • EMP ? DEPT Join of EMP and DEPT, combine every
    tuple from EMP and tuple of DEPT if their common
    attributes (here D) have the same value
  • EMP ? DEPT
  • E ENAME BDATE D NAME BUDGET
  • E1 John 28-08-1964 D1 engineering 500,000
  • E2 Joe 04-04-1968 D1 engineering 500,000
  • E3 Jack 03-09-1969 D1 engineering 500,000
  • E4 Will 21-03-1971 D2 sales 200,000
  • E5 Bridget 22-01-1972 D2 sales 200,000

16
(No Transcript)
17
(No Transcript)
18
(No Transcript)
19
(No Transcript)
20
(No Transcript)
21
Dependency preservation during decomposition
  • During normalisation we decompose relations into
    subrelations, until we arrive at the right level
  • However, we have to take care this decomposition
    guarantees
  • Lossless joins we must be able to construct the
    original relation from joining the subrelations
  • In other words, joining the subrelations may not
    introduce spurious tuples

22
Lossless-join, decomp. into 2 projections
  • D is a lossless join decomposition w.r.t. F iff
  • (R1 ? R2) ? R1 ? F, òr
  • (R1 ? R2) ? R2 ? F
  • R DPD_EMP E, DPD_N, REL, EMP_N, BDATE,
    Dwith F E, DPD_N ? REL, E ?
    EMP_N, E ? BDATE, E ? D
    R1 DPD E, DPD_N, RELR2 EMP E,
    EMP_N, BDATE, DR1 ? R2 E and E ?
    E, EMP_N, BDATE, D (EMP)Hence this
    decomposition of DPD_EMP is lossless w.r.t. F

23
Ex. lossless-join (with n projections) (1/5)
  • DPD_EMP_DPM (E, DPD_N, REL, EMP_N, BDATE,
    D, DPM_N, BUDGET)
  • with F E, DPD_N ? REL, D ? DPM_N,
    E ? EMP_N, D ? BUDGET, E ?
    BDATE, DPM_N ? D, E ? D, DPM_N ?
    BUDGET DPD (E, DPD_N, REL)EMP (E,
    EMP_N, BDATE, D)DEPT (D, DPM_N, BUDGET)

24
Ex. lossless-join (with n projections) (2/5)
  • DPD_EMP_DPM (E, DPD_N, REL, EMP_N,
    BDATE, D, DPM_N, BUDGET)
  • with F E, DPD_N ? REL, D ? DPM_N, E
    ? EMP_N, D ? BUDGET, E ? BDATE, DPM_N
    ? D, E ? D, DPM_N ? BUDGET
  • INITIAL MATRIX ai for each relation/row that
    has a certain attribute (whit index i)
  • E DPD_N REL EMP_N BDATE D DPM_N BUDGET
  • DPD a1 a2 a3 b14 b15 b16 b17 b18
  • EMP a1 b22 b23 a4 a5 a6 b27 b28
  • DEPT b31 b32 b33 b34 b35 a6 a7
    a8

25
Ex. lossless-join (with n projections) (3/5)
  • DPD_EMP_DPM (E, DPD_N, REL, EMP_N,
    BDATE, D, DPM_N, BUDGET)
  • met F E, DPD_N ? REL, D ? DPM_N, E ?
    EMP_N, D ? BUDGET, E ? BDATE, DPM_N ?
    D, E ? D, DPM_N ? BUDGET
  • E DPD_N REL EMP_N BDATE D DPM_N BUDGET
  • DPD a1 a2 a3 b14 b15 b16 b17 b18
  • EMP a1 b22 b23 a4 a5 a6 b27 b28
  • DEPT b31 b32 b33 b34 b35 a6 a7
    a8
  • N.B. After applying the FD E ? EMP_N, BDATE, D

26
Ex. lossless-join (with n projections) (4/5)
  • DPD_EMP_DPM (E, DPD_N, REL, EMP_N,
    BDATE, D, DPM_N, BUDGET)
  • met F E, DPD_N ? REL, D ? DPM_N, E ?
    EMP_N, D ? BUDGET, E ? BDATE, DPM_N ?
    D, E ? D, DPM_N ? BUDGET
  • E DPD_N REL EMP_N BDATE D DPM_N BUDGET
  • DPD a1 a2 a3 a4 a5 a6 b17 b18
  • EMP a1 b22 b23 a4 a5 a6 b27 b28
  • DEPT b31 b32 b33 b34 b35 a6 a7
    a8
  • N.B. After applying the FD E ? EMP_N, BDATE, D

27
Ex. lossless-join (with n projections) (5/5)
  • DPD_EMP_DPM (E, DPD_N, REL, EMP_N,
    BDATE, D, DPM_N, BUDGET)
  • met F E, DPD_N ? REL, D ? DPM_N, E ?
    EMP_N, D ? BUDGET, E ? BDATE, DPM_N ?
    D, E ? D, DPM_N ? BUDGET
  • E DPD_N REL EMP_N BDATE D DPM_N BUDGET
  • DPD a1 a2 a3 a4 a5 a6 a7 a8
  • EMP a1 b22 b23 a4 a5 a6 a7 a8
  • DEPT b31 b32 b33 b34 b35 a6 a7
    a8
  • N.B. After applying the FDs E ? EMP_N, BDATE,
    D and D ?
    DPM_N, BUDGET
  • Lossless-join eigenschap has been shown row DPD
    contains only as!

28
Algorithm lossless-join
  • Does the decomposition R1, , Rk of R satisfy
    the lossless-join property w.r.t. m.b.t. A set of
    FDs F?
  • 1) construct the initial matrix, consisting of
  • A row for each subrelation Ri
  • A column for each attribute of R
  • For each entry, thus for each row i and column j
  • Put ai in the entry, if row i has the attribute
    of column j
  • Otherwise, put bij in this entry
  • 2) apply each FD X ? Y ? F (see below) until
  • Either one row is of the form a1, , an,, and the
    composition is lossless indeed
  • Or, applying the FDs again will not change the
    matrix anymore, and the composition is not
    lossless
  • Applying FD X ? YFor all rows that have the
    same values for the attributes in X, make the
    attributes in Y equal as well (first start with
    the aj, otherwise try the bij)

29
3NF decomposition algoritme (lossless d.p.)
  • Given a relation scheme R and a set of FDs F
  • Construct a minimal cover of F, call it G
  • For each Xi ? Ai in G, construct a subrelation
    scheme XiAi
  • If Xi Xj for two subrelation schemes (XiAi and
    XiAj), merge them into XiAiAj.
  • Construct one relation scheme X, where X is the
    prime key of R.
  • Check whether there are still remaing attributes
    (which are not yet covered by the earlier steps),
    put them in separate subrelation schemes
  • N.B. Steps 2 and 3 can also be combined into one
    step.

30
Final remarks
  • With this material one must be capable to
    recognize potential redundancies, and to avoid
    them to a certain extent
  • Normalisation can lead to a large number of
    small tables (i.e., tables with a small number of
    attributes) which have to be combined with others
    to obtain proper data. This may cost performance
    and maintenance overhead.
  • Hence, it is up to the database designer to
    decide whether these potential redundancies have
    to be resolved or not.
Write a Comment
User Comments (0)
About PowerShow.com