Relational Data Base Design in Practice - PowerPoint PPT Presentation

About This Presentation
Title:

Relational Data Base Design in Practice

Description:

A relational database scheme can be regarded as ... record the department selling perfume in Debenhams if it doesn't have a manager. ... – PowerPoint PPT presentation

Number of Views:42
Avg rating:3.0/5.0
Slides: 24
Provided by: meurig
Category:

less

Transcript and Presenter's Notes

Title: Relational Data Base Design in Practice


1
Relational Data Base Design in Practice
  • Normal Forms
  • More about Anomalies
  • Algorithms for Database Design

2
Database Design Concepts Review 1
  • A relational database scheme can be regarded as
  • defined by an abstract relation R(A1, A2, ...,
    An) where
  • A1, A2, ..., An are all the attributes of
    interest. This is
  • simplest possible relation scheme for the
    database.
  • In general, when setting up the database use a
  • decomposition of the relation scheme into
    subschemes.
  • Issues
  • what is the nature of this association?
  • does it matter what is associated in tables?
  • if so, what principles apply to the design of
    tables?

3
Database Design Concepts Review 2
  • Answer
  • the association of attributes should be guided by
    data dependency
  • failure to take account of dependency creates
    anomalies on modifying the database
  • anomalies can be largely eliminated through an
    appropriate choice of tables and normalisation
  • cf. an object-oriented approach, where attributes
    are grouped according to the objects that contain
    them

4
Database Design Concepts Review 3
  • The relational database design problem
  • Any relational database could in principle be
    organised as a single relation R(A1, A2, ..., An)
    where A1, A2, ..., An are all the attributes of
    interest.
  • Why not do it this way?
  • convenience easy to handle small relations
  • ... but also need to consider semantics of
    attributes
  • In practice, typically choose to decompose R into
    a set of smaller relations R1, R2, ..., Rk, where
    the set of attributes in R the set of
    attributes in R1, R2, ..., Rk.

5
Database Design Concepts Review 4
  • Relational database design theory serves to guide
    the choice of decomposition for a given relation
    scheme.
  • Use the data dependencies for this if no data
    dependencies, then no decomposition
  • Key concepts for studying decompositions
  • data dependency
  • lossless join
  • dependency preservation

6
Database Design Concepts Review 5
  • Anomalies illustrated by HVFC
  • Form of the SUPPLIER relation
  • SAIP (SNAME, SADDRESS, ITEM, PRICE)
  • leads to anomalies on update, deletion and
    insertion
  • update anomalies when supplier address is
    updated, must be updated in all tuples, else two
    addresses recorded for same supplier.
  • insertion anomalies can't record info on
    supplier unless he supplies part, whence ....
  • deletion anomalies delete all items supplied by
    one supplier and you lose supplier's address.

7
Database Design Concepts Review 6
  • Anomalies illustrated by HVFC (cont.)
  • Form of the SUPPLIER relation
  • SAIP (SNAME, SADDRESS, ITEM, PRICE)
  • leads to anomalies on update, deletion and
    insertion
  • A better design of the HVFC relation scheme will
  • resolve these problems. The decomposition
  • SA (SNAME, SADDRESS)
  • SIP (SNAME, ITEM, PRICE)
  • resolves the anomalies.

8
Normal Forms for Relational Schemes 0
  • Database design is essentially an informal
    activity.
  • It relies upon observation and analysis of
    external state.
  • Some patterns of dependency between attributes
    recur.
  • Normal forms (NFs) for relation schemes encode
    rules
  • for design that have been developed from
    experience.
  • They provide a basis for database design
    patterns.

9
Normal Forms for Relational Schemes 1
  • Boyce-Codd Normal Form (BCNF)
  • Use ltR,Fgt to denote relation scheme R(A1, A2,
    ..., An)
  • with set of dependencies generated by F
  • Definition ltR,Fgt is in BCNF if
  • for all sets of attributes X È A ? A1, A2,
    ..., An
  • XA holds in R and AÏX Þ X contains a key for R
  • Terminology
  • If X contains a key for R, call X is a superkey
    for R

10
Normal Forms for Relational Schemes 2
  • Examples relating to Boyce-Codd Normal Form
  • R is SCAIP(S, C, A , I, P) where S is SNAME,
  • C is CITY, A is AGENT, I is ITEM, P is PRICE
  • F is S C, C A, S I P
  • Not in BCNF, since C A, but C is not a superkey
  • SCAIP SIP SC CA is decomposition that
    is
  • lossless and dependency preserving such that
  • all the sub-schemes are in BCNF
  • Desirable kind of decomposition, but can't always
    achieve it .... need weaker NFs than BCNF

11
Normal Forms for Relational Schemes 3
  • Third Normal Form (3NF)
  • Relation scheme ltRº R(A1, A2, ..., An), Fgt is in
    3NF if
  • for all sets of attributes X È A ? A1, A2,
    ..., An
  • XA holds in R and AÏX
  • Þ either X is a superkey for R
  • or A is an attribute contained in a key for R
  • Definition If A is contained in a key for R then
    A is a prime attribute for R

12
Normal Forms for Relational Schemes 4
  • Example of Third Normal Form
  • Consider relation scheme STD, where S is local
    student id, T is tutor, D is department.
    Assuming that
  • "students in different departments can have same
    id"
  • have dependencies F T D, SD T
  • STD is not in BCNF T D, but T not superkey
  • STD is in 3NF D is a prime attribute in key SD

13
Normal Forms for Relational Schemes 5
  • Historical digression Second Normal Form
  • If ltR,Fgt is not in 3NF, then there is a
    functional dependency XA such that X is not a
    superkey and A is not a prime attribute
  • There are then two possibilities
  • either X is a proper subset of a key
  • or there is no key that contains X.
  • These two cases can be linked to types of anomaly.

14
Normal Forms for Relational Schemes 6
  • Historical digression Second Normal Form (cont)
  • If X is a proper subset of a key, then
  • X A is a partial dependency
  • Such dependency is linked with update anomalies
  • If X is not a proper subset of a key, then
  • X A is a transitive dependency
  • Such dependency is linked with insert/delete
    anomalies.

15
Normal Forms for Relational Schemes 7
  • Historical digression Second Normal Form (cont)
  • If X is not a proper subset of a key, then
  • X A is a transitive dependency
  • Use term transitive because if Y is a key, then
  • Y X A is a non-trivial chain of dependencies
  • X is not a proper subset of a key ...
  • \ X is not contained in Y,
  • A is not prime ...
  • \ A Ï Y,
  • X A is non-trivial,
  • \ A Ï X by hypothesis.

16
Normal Forms for Relational Schemes 8
  • Historical digression Second Normal Form (cont)
  • If X is not a proper subset of a key, then
  • X A is a transitive dependency
  • Definition ltR,Fgt is in second normal form (2NF)
    if
  • either R is in 3NF
  • or R is not in 3NF and has transitive
  • dependencies but no partial dependencies
  • partial dependencies ? update anomalies .
  • BUT
  • not all update anomalies ? partial dependencies

17
Normal Forms for Relational Schemes 9
  • Illustrative examples related to 2NF
  • Example 1. The SCA relation is in 2NF
  • C A is a transitive dependency in SCA.
  • C is not a subset of a key for SCA the only key
    is S
  • Example 2. The relation SAIP is not in 2NF
  • S A is a partial dependency in SAIP.
  • S is a subset of the key SI of SAIP

18
Normal Forms for Relational Schemes 10
  • Illustrative examples related to 2NF (cont.)
  • Example 3. Consider SIDM where S is store, I is
    item,
  • D is dept, M is manager with FDs SI D, SD
    M
  • The relation scheme SIDM is in 2NF but not in
    3NF
  • SD M, but SD isn't a superkey, and M isn't
    prime
  • Only key for SIDM is SI, so SD not contained in
    key
  • \ SD M is a transitive dependency in SIDM.

19
More about Anomalies 1
  • Two symptoms of anomalies
  • Different kinds of anomaly exhibit 2 different
    symptoms
  • redundancy
  • information loss inability to represent
    information
  • 1. redundancy
  • This leads to update anomalies
  • when an attribute of one tuple is updated, the
    same attribute must be updated in many places
  • E.g. in SAIP, have redundancy (s,a,i,p)
    (s,a,i',p')

20
More about Anomalies 2
  • Two symptoms of anomalies (cont.)
  • 2. loss of information / inability to represent
    info
  • leads to deletion/insertion anomaly
  • when a tuple is deleted essential information is
    incidentally lost no provision to store
    information in the absence of irrelevant
    additional details
  • E.g. In SIDM, can't record the department selling
    perfume in Debenhams if it doesn't have a
    manager.
  • In STD, no means to record dept of tutor unless
    the tutor has a student.

21
More about Anomalies 3
  • Commentary on the examples
  • Neither SAIP nor SIDM is in 3NF ...
  • In SAIP, S A is a partial dependency.
  • Choose a distinct pair of values (s, i) and (s,
    i') for the key SI (possible since S doesn't
    determine I). The two corresponding tuples will
    be distinct, but have the same value for
    attribute A.

22
More about Anomalies 4
  • Commentary on the examples (cont.)
  • Neither SAIP nor SIDM is in 3NF ...
  • In SIDM, SD M is a transitive dependency.
  • non-trivial chain of dependencies SI SD M.
    To store triple (s, i, d), need value of m
    determined by s and d. Without this can't record
    (s, i, d).
  • \ information loss deletion anomaly

23
More about Anomalies 5
  • Commentary on the examples (cont.)
  • Note that STD is in 3NF
  • In STD, the only key is SD. For SD T, SD is a
    superkey. For T D, though T is not a superkey,
    D is a prime attribute.
  • This shows that even relation schemes in 3NF can
    have certain types of update deletion/insertion
    anomaly.

24
More about Anomalies 6
  • Linking anomalies with partial / transitive
    dependencies
  • The examples informally illustrate a link between
    certain kinds of dependency and types of anomaly
  • dependency anomaly
  • partial redundancy / update
  • transitive deletion / insertion
  • Can interpret this link in more abstract terms .

25
More about Anomalies 7
  • Linking anomalies with partial / transitive
    dependencies
  • dependency anomaly
  • partial redundancy / update
  • transitive deletion / insertion
  • Suppose that X A is a partial dependency that
    arises from violation of 3NF. Then A isn't
    prime attribute, and X is a subset of a key Y.
  • X not a key Þ tuples that agree on X but
    disagree on some attribute B in Y. Have distinct
    tuples that must have the same value for
    attribute A, as X A
  • \ redundancy update anomaly on updating A.

26
More about Anomalies 8
  • Linking anomalies with partial / transitive
    dependencies
  • dependency anomaly
  • partial redundancy / update
  • transitive deletion / insertion
  • Suppose that X A is a transitive dependency
    that arises from violation of 3NF. non-trivial
    chain of dependencies Y X A, where Y is a
    key.
  • To record which values of attributes in X are
    associated with a given set of attributes in Y
    must know what value of A is associated with
    given values of attributes in X.
  • \ information loss deletion anomaly

27
More about Anomalies 9
  • Linking anomalies with partial / transitive
    dependencies
  • dependency anomaly
  • partial redundancy / update
  • transitive deletion / insertion
  • Note
  • where there is partial dependency will also have
    deletion / insertion anomalies e.g. SAIP - cant
    record supplier address without item and price
  • where there is transitive dependency may also
    have update anomalies e.g. SCA - updating agent
    for a city entails updating all records with
    suppliers in that city

28
Desirable properties of decompositions review 1
  • Lossless decompositions
  • A decomposition of the relation scheme R into
    subschemes R1, R2, ..., Rn is lossless if, given
    tuples r1, r2, ..., rn in R1, R2, ..., Rn
    respectively, such that ri and rj agree on all
    common attributes for all pairs of indices (i,j),
    the - uniquely defined - tuple derived by joining
    r1, r2, ..., rn is in R.
  • Terminology "lossless join" decomposition

29
Desirable properties of decompositions review 2
  • Dependency preserving decompositions
  • A decomposition of the relation scheme R into
    subschemes R1, R2, ..., Rn is dependency
    preserving if all the FDs within R can be derived
    from those within the relations R1, R2, ..., Rn.
  • If F is the set of dependencies defined on R,
    then the requirement is that the set G of
    dependencies that can be obtained as projections
    of dependencies in F onto R1, R2, ..., Rn
    together generate F.
  • Note carefully that it is not enough to check
    whether projections of dependencies in F onto R1,
    R2, ..., Rn together generate F.

30
Minimal cover review
  • Representing the set of dependencies (cont.)
  • Definition G is a minimal cover for F if
  • a) G F
  • b) every RHS in G is a single attribute
  • c) every LHS minimal subject to determining RHS
  • i.e. for no X Y Î G is there a proper subset
    Z of X such that G \ X Y È Z Y
    generates F
  • d) no proper subset of G also generates F.
  • Minimal cover isn't necessarily unique.

31
Lossless Join Decomposition review
  • Theorem
  • If r S, T is a decomposition of R, and F is
    the set of FDs for R, then r is a lossless join
    decomposition with respect to F if and only if
  • either T\S ? (S Ç T) or S\T ? (S Ç T).
  • Corollary to the theorem If R is a relation
    scheme, and X A is a functional dependency in
    R, where A is a an attribute, X is a set of
    attributes not containing A, and XA is a proper
    subset of R, then R1XA, R2R\A is a lossless
    join decomposition of R.

32
Some decomposition algorithms 1
  • Algorithm 1 Decomposing a relation scheme as a
    lossless join of BCNF subschemes
  • Suppose R is a relation scheme that is not in
    BCNF
  • Take X A, where X is not a superkey and A Ï X
  • Decompose R as S È T, where S AX and T R\A.
  • X not a superkey Þ S and T are proper subsets of
    R
  • where S Ç T X, and X A S\T
  • so decomposition of R as S È T is a lossless
    join.
  • \ R lossless join of arbitrarily small
    subschemes
  • Every scheme with at most 2 attributes is in BCNF
    ...

33
Some decomposition algorithms 2
  • Algorithm 1 Decomposing a relation scheme as a
    lossless join of BCNF subschemes (cont.)
  • R lossless join of arbitrarily small
    subschemes
  • Every scheme with at most 2 attributes is in BCNF
    ...
  • Theorem 1
  • Every relation scheme can be expressed as a
    lossless join of BCNF relation schemes.

34
Some decomposition algorithms 3
  • Example A lossless decomposition into BCNF
  • Consider the relation scheme CTHRSG, where
  • Ccourse, Tteacher, Rroom, Sstudent, Ggrade,
  • subject to the dependencies
  • C T course determines teacher
  • HR C hour and room determine the course
  • HT R hour and teacher determine the room
  • CS G course and student determine the grade
  • HS R hour and student determine the room

35
Some decomposition algorithms 4
  • Example A lossless decomposition into BCNF
    (cont.)
  • Consider the relation scheme CTHRSG with
    dependencies
  • C T, HR C, HT R, CS G, HS R
  • Using Algorithm 1, arrive at a decomposition into
    the BCNF subschemes CSG, CT, CHR, CHS NB CH
    R.
  • This decomposition doesn't preserve dependencies
  • HT R
  • is not consequence of the dependencies on the
    subschemes CSG, CT, CHR and CHS.
  • There is no decomposition into BCNF that is both
    lossless join and dependency preserving in
    general.

36
Some decomposition algorithms 5
  • Algorithm 2 Dependency preserving decomposition
    into 3NF subschemes
  • Given (R, F) and G a minimal cover for F.
  • If an attribute is not involved in any dependency
    in G,
  • it can form a relation scheme by itself.
  • Suppose R? is the set of attributes involved in
    G.
  • If a dependency in G involves all the attributes
    in R?, then R? is in 3NF,
  • else form relational sub-schemes Y1B1, Y2B2, ...,
    YnBn
  • for each of the dependencies
  • Y1B1, Y2B2, ..., YnBn in minimal cover G.

37
Some decomposition algorithms 6
  • Theorem 2
  • Algorithm 2 generates a dependency preserving
    decomposition into 3NF subschemes
  • Proof of Theorem 2
  • Projected dependencies involve a cover for F, so
    the decomposition is dependency-preserving.
  • Claim that each sub-scheme YB where Y B belongs
    to the minimal cover G for F is in 3NF. Must
    show
  • if XA?YB, where X A is non-trivial dependency,
    then
  • either X is a superkey of YB
  • or A is a prime attribute of YB.

38
Some decomposition algorithms 7
  • Proof of Theorem 2 (cont.)
  • if XA?YB, where X A is non-trivial
    dependency, then
  • either X is a superkey of YB
  • or A is a prime attribute of YB.
  • If A?B, then A Î Y. But Y is a key for YB, since
    Y B
  • and no proper subset of Y determines B as G is a
    minimal cover. Hence A ? B Þ A is prime.
  • If AB, then X is a subset of Y such that X B,
    and since G is a minimal cover, XY.
  • Hence AB Þ X is a superkey for YB.

39
Some decomposition algorithms 8
  • Algorithm 3 Dependency preserving, lossless join
    decomposition into 3NF subschemes
  • Apply Algorithm 2 to obtain a dependency-preservin
    g
  • decomposition s of R with dependencies F. Now
    let Z
  • be a key for R, and introduce Z as an additional
  • subscheme to derive the decomposition t s È
    Z.
  • Theorem 3
  • The decomposition t of R is both lossless join
    and dependency preserving into 3NF subschemes

40
Some decomposition algorithms 9
  • Sketch proof of Theorem 3
  • The decomposition t of R is both lossless join
    and dependency preserving into 3NF subschemes
  • to prove lossless join apply lossless join
    algorithm
  • Construct a table with rows ? Y1B1, Y2B2, ...,
    YnBn
  • where FDs Y1B1, Y2B2, ..., YnBn are a minimal
  • cover G, together with row ? Z where Z is a key
    for R.
  • Call these rows row1, row2, ..., rown, and rown1

41
Some decomposition algorithms 10
  • Sketch proof of Theorem 3 (cont)
  • Construct a table with rows ? Y1B1, Y2B2, ...,
    YnBn where
  • FDs Y1B1, Y2B2, ..., YnBn are a minimal
    cover G,
  • together with a row ? Z where Z is a key for R.
  • Call these rows row1, row2, ..., rown, and
    rown1
  • In rown1, have as in all columns associated
    with key Z
  • Suppose that theres a b in the kth column in
    rown1.
  • Z is a key, so there must be a way of inferring
    the value
  • at the location rown1k from the a values in
    rown1 using
  • the FDs in the minimal cover G in some sequence.
  • Tracing this sequence via matchings on the table
    will
  • then convert the entry at location rown1k to an
    a.

42
Some decomposition algorithms 12
  • Split R ABCDE into R1AB, R2BC, R3CDE,
    R4ACE, ZBDE
  • with the FDs A B, B C, DE C, CE A in
    minimal cover G

43
Some decomposition algorithms 13
  • Split R ABCDE into AB, BC, CDE, ACE, BDE where
  • AB, BC, DEC, CEA are a minimal cover (G)
  • and BDE is a key (Z)
  • Have a b in column 1
  • (k 3), can infer that
  • BDE A, since
  • BC and CEA
  • Can trace these inferences by matching rows 2 and
    5,
  • then matching rows 4 and 5 to replace b51 by a1
    etc.

44
Some decomposition algorithms 14
  • Split R ABCDE into R1AB, R2BC, R3CDE,
    R4ACE, ZBDE
  • with the FDs A B, B C, DE C, CE A in
    minimal cover G

lossless
Write a Comment
User Comments (0)
About PowerShow.com