Title: Relational Data Base Design in Practice
1Relational Data Base Design in Practice
- Normal Forms
- More about Anomalies
- Algorithms for Database Design
2Database Design Concepts Review 1
- A relational database scheme can be regarded as
- defined by an abstract relation R(A1, A2, ...,
An) where - A1, A2, ..., An are all the attributes of
interest. This is - simplest possible relation scheme for the
database. - In general, when setting up the database use a
- decomposition of the relation scheme into
subschemes. - Issues
- what is the nature of this association?
- does it matter what is associated in tables?
- if so, what principles apply to the design of
tables?
3Database Design Concepts Review 2
- Answer
- the association of attributes should be guided by
data dependency - failure to take account of dependency creates
anomalies on modifying the database - anomalies can be largely eliminated through an
appropriate choice of tables and normalisation - cf. an object-oriented approach, where attributes
are grouped according to the objects that contain
them
4Database Design Concepts Review 3
- The relational database design problem
- Any relational database could in principle be
organised as a single relation R(A1, A2, ..., An)
where A1, A2, ..., An are all the attributes of
interest. - Why not do it this way?
- convenience easy to handle small relations
- ... but also need to consider semantics of
attributes - In practice, typically choose to decompose R into
a set of smaller relations R1, R2, ..., Rk, where
the set of attributes in R the set of
attributes in R1, R2, ..., Rk.
5Database Design Concepts Review 4
- Relational database design theory serves to guide
the choice of decomposition for a given relation
scheme. - Use the data dependencies for this if no data
dependencies, then no decomposition - Key concepts for studying decompositions
- data dependency
- lossless join
- dependency preservation
6Database Design Concepts Review 5
- Anomalies illustrated by HVFC
- Form of the SUPPLIER relation
- SAIP (SNAME, SADDRESS, ITEM, PRICE)
- leads to anomalies on update, deletion and
insertion - update anomalies when supplier address is
updated, must be updated in all tuples, else two
addresses recorded for same supplier. - insertion anomalies can't record info on
supplier unless he supplies part, whence .... - deletion anomalies delete all items supplied by
one supplier and you lose supplier's address.
7Database Design Concepts Review 6
- Anomalies illustrated by HVFC (cont.)
- Form of the SUPPLIER relation
- SAIP (SNAME, SADDRESS, ITEM, PRICE)
- leads to anomalies on update, deletion and
insertion - A better design of the HVFC relation scheme will
- resolve these problems. The decomposition
- SA (SNAME, SADDRESS)
- SIP (SNAME, ITEM, PRICE)
- resolves the anomalies.
8Normal Forms for Relational Schemes 0
- Database design is essentially an informal
activity. - It relies upon observation and analysis of
external state. - Some patterns of dependency between attributes
recur. - Normal forms (NFs) for relation schemes encode
rules - for design that have been developed from
experience. - They provide a basis for database design
patterns.
9Normal Forms for Relational Schemes 1
- Boyce-Codd Normal Form (BCNF)
- Use ltR,Fgt to denote relation scheme R(A1, A2,
..., An) - with set of dependencies generated by F
- Definition ltR,Fgt is in BCNF if
- for all sets of attributes X È A ? A1, A2,
..., An - XA holds in R and AÏX Þ X contains a key for R
- Terminology
- If X contains a key for R, call X is a superkey
for R
10Normal Forms for Relational Schemes 2
- Examples relating to Boyce-Codd Normal Form
- R is SCAIP(S, C, A , I, P) where S is SNAME,
- C is CITY, A is AGENT, I is ITEM, P is PRICE
- F is S C, C A, S I P
- Not in BCNF, since C A, but C is not a superkey
- SCAIP SIP SC CA is decomposition that
is - lossless and dependency preserving such that
- all the sub-schemes are in BCNF
- Desirable kind of decomposition, but can't always
achieve it .... need weaker NFs than BCNF
11Normal Forms for Relational Schemes 3
- Third Normal Form (3NF)
- Relation scheme ltRº R(A1, A2, ..., An), Fgt is in
3NF if - for all sets of attributes X È A ? A1, A2,
..., An - XA holds in R and AÏX
- Þ either X is a superkey for R
- or A is an attribute contained in a key for R
- Definition If A is contained in a key for R then
A is a prime attribute for R
12Normal Forms for Relational Schemes 4
- Example of Third Normal Form
- Consider relation scheme STD, where S is local
student id, T is tutor, D is department.
Assuming that - "students in different departments can have same
id" - have dependencies F T D, SD T
- STD is not in BCNF T D, but T not superkey
- STD is in 3NF D is a prime attribute in key SD
13Normal Forms for Relational Schemes 5
- Historical digression Second Normal Form
- If ltR,Fgt is not in 3NF, then there is a
functional dependency XA such that X is not a
superkey and A is not a prime attribute - There are then two possibilities
- either X is a proper subset of a key
- or there is no key that contains X.
- These two cases can be linked to types of anomaly.
14Normal Forms for Relational Schemes 6
- Historical digression Second Normal Form (cont)
- If X is a proper subset of a key, then
- X A is a partial dependency
- Such dependency is linked with update anomalies
- If X is not a proper subset of a key, then
- X A is a transitive dependency
- Such dependency is linked with insert/delete
anomalies.
15Normal Forms for Relational Schemes 7
- Historical digression Second Normal Form (cont)
- If X is not a proper subset of a key, then
- X A is a transitive dependency
- Use term transitive because if Y is a key, then
- Y X A is a non-trivial chain of dependencies
- X is not a proper subset of a key ...
- \ X is not contained in Y,
- A is not prime ...
- \ A Ï Y,
- X A is non-trivial,
- \ A Ï X by hypothesis.
16Normal Forms for Relational Schemes 8
- Historical digression Second Normal Form (cont)
- If X is not a proper subset of a key, then
- X A is a transitive dependency
- Definition ltR,Fgt is in second normal form (2NF)
if - either R is in 3NF
- or R is not in 3NF and has transitive
- dependencies but no partial dependencies
- partial dependencies ? update anomalies .
- BUT
- not all update anomalies ? partial dependencies
17Normal Forms for Relational Schemes 9
- Illustrative examples related to 2NF
- Example 1. The SCA relation is in 2NF
- C A is a transitive dependency in SCA.
- C is not a subset of a key for SCA the only key
is S - Example 2. The relation SAIP is not in 2NF
- S A is a partial dependency in SAIP.
- S is a subset of the key SI of SAIP
18Normal Forms for Relational Schemes 10
- Illustrative examples related to 2NF (cont.)
- Example 3. Consider SIDM where S is store, I is
item, - D is dept, M is manager with FDs SI D, SD
M - The relation scheme SIDM is in 2NF but not in
3NF - SD M, but SD isn't a superkey, and M isn't
prime - Only key for SIDM is SI, so SD not contained in
key - \ SD M is a transitive dependency in SIDM.
19More about Anomalies 1
- Two symptoms of anomalies
- Different kinds of anomaly exhibit 2 different
symptoms - redundancy
- information loss inability to represent
information - 1. redundancy
- This leads to update anomalies
- when an attribute of one tuple is updated, the
same attribute must be updated in many places - E.g. in SAIP, have redundancy (s,a,i,p)
(s,a,i',p')
20More about Anomalies 2
- Two symptoms of anomalies (cont.)
- 2. loss of information / inability to represent
info - leads to deletion/insertion anomaly
- when a tuple is deleted essential information is
incidentally lost no provision to store
information in the absence of irrelevant
additional details - E.g. In SIDM, can't record the department selling
perfume in Debenhams if it doesn't have a
manager. - In STD, no means to record dept of tutor unless
the tutor has a student.
21More about Anomalies 3
- Commentary on the examples
- Neither SAIP nor SIDM is in 3NF ...
- In SAIP, S A is a partial dependency.
- Choose a distinct pair of values (s, i) and (s,
i') for the key SI (possible since S doesn't
determine I). The two corresponding tuples will
be distinct, but have the same value for
attribute A.
22More about Anomalies 4
- Commentary on the examples (cont.)
- Neither SAIP nor SIDM is in 3NF ...
- In SIDM, SD M is a transitive dependency.
- non-trivial chain of dependencies SI SD M.
To store triple (s, i, d), need value of m
determined by s and d. Without this can't record
(s, i, d). - \ information loss deletion anomaly
23More about Anomalies 5
- Commentary on the examples (cont.)
-
- Note that STD is in 3NF
- In STD, the only key is SD. For SD T, SD is a
superkey. For T D, though T is not a superkey,
D is a prime attribute. - This shows that even relation schemes in 3NF can
have certain types of update deletion/insertion
anomaly.
24More about Anomalies 6
- Linking anomalies with partial / transitive
dependencies - The examples informally illustrate a link between
certain kinds of dependency and types of anomaly - dependency anomaly
- partial redundancy / update
- transitive deletion / insertion
- Can interpret this link in more abstract terms .
25More about Anomalies 7
- Linking anomalies with partial / transitive
dependencies - dependency anomaly
- partial redundancy / update
- transitive deletion / insertion
- Suppose that X A is a partial dependency that
arises from violation of 3NF. Then A isn't
prime attribute, and X is a subset of a key Y. - X not a key Þ tuples that agree on X but
disagree on some attribute B in Y. Have distinct
tuples that must have the same value for
attribute A, as X A - \ redundancy update anomaly on updating A.
26More about Anomalies 8
- Linking anomalies with partial / transitive
dependencies - dependency anomaly
- partial redundancy / update
- transitive deletion / insertion
- Suppose that X A is a transitive dependency
that arises from violation of 3NF. non-trivial
chain of dependencies Y X A, where Y is a
key. - To record which values of attributes in X are
associated with a given set of attributes in Y
must know what value of A is associated with
given values of attributes in X. - \ information loss deletion anomaly
27More about Anomalies 9
- Linking anomalies with partial / transitive
dependencies - dependency anomaly
- partial redundancy / update
- transitive deletion / insertion
- Note
- where there is partial dependency will also have
deletion / insertion anomalies e.g. SAIP - cant
record supplier address without item and price - where there is transitive dependency may also
have update anomalies e.g. SCA - updating agent
for a city entails updating all records with
suppliers in that city
28Desirable properties of decompositions review 1
- Lossless decompositions
- A decomposition of the relation scheme R into
subschemes R1, R2, ..., Rn is lossless if, given
tuples r1, r2, ..., rn in R1, R2, ..., Rn
respectively, such that ri and rj agree on all
common attributes for all pairs of indices (i,j),
the - uniquely defined - tuple derived by joining
r1, r2, ..., rn is in R. - Terminology "lossless join" decomposition
29Desirable properties of decompositions review 2
- Dependency preserving decompositions
- A decomposition of the relation scheme R into
subschemes R1, R2, ..., Rn is dependency
preserving if all the FDs within R can be derived
from those within the relations R1, R2, ..., Rn. - If F is the set of dependencies defined on R,
then the requirement is that the set G of
dependencies that can be obtained as projections
of dependencies in F onto R1, R2, ..., Rn
together generate F. - Note carefully that it is not enough to check
whether projections of dependencies in F onto R1,
R2, ..., Rn together generate F.
30Minimal cover review
- Representing the set of dependencies (cont.)
- Definition G is a minimal cover for F if
- a) G F
- b) every RHS in G is a single attribute
- c) every LHS minimal subject to determining RHS
- i.e. for no X Y Î G is there a proper subset
Z of X such that G \ X Y È Z Y
generates F - d) no proper subset of G also generates F.
- Minimal cover isn't necessarily unique.
31Lossless Join Decomposition review
- Theorem
- If r S, T is a decomposition of R, and F is
the set of FDs for R, then r is a lossless join
decomposition with respect to F if and only if - either T\S ? (S Ç T) or S\T ? (S Ç T).
- Corollary to the theorem If R is a relation
scheme, and X A is a functional dependency in
R, where A is a an attribute, X is a set of
attributes not containing A, and XA is a proper
subset of R, then R1XA, R2R\A is a lossless
join decomposition of R.
32Some decomposition algorithms 1
- Algorithm 1 Decomposing a relation scheme as a
lossless join of BCNF subschemes - Suppose R is a relation scheme that is not in
BCNF - Take X A, where X is not a superkey and A Ï X
- Decompose R as S È T, where S AX and T R\A.
- X not a superkey Þ S and T are proper subsets of
R - where S Ç T X, and X A S\T
- so decomposition of R as S È T is a lossless
join. - \ R lossless join of arbitrarily small
subschemes - Every scheme with at most 2 attributes is in BCNF
...
33Some decomposition algorithms 2
- Algorithm 1 Decomposing a relation scheme as a
lossless join of BCNF subschemes (cont.) - R lossless join of arbitrarily small
subschemes - Every scheme with at most 2 attributes is in BCNF
... - Theorem 1
- Every relation scheme can be expressed as a
lossless join of BCNF relation schemes.
34Some decomposition algorithms 3
- Example A lossless decomposition into BCNF
- Consider the relation scheme CTHRSG, where
- Ccourse, Tteacher, Rroom, Sstudent, Ggrade,
- subject to the dependencies
- C T course determines teacher
- HR C hour and room determine the course
- HT R hour and teacher determine the room
- CS G course and student determine the grade
- HS R hour and student determine the room
35Some decomposition algorithms 4
- Example A lossless decomposition into BCNF
(cont.) - Consider the relation scheme CTHRSG with
dependencies - C T, HR C, HT R, CS G, HS R
- Using Algorithm 1, arrive at a decomposition into
the BCNF subschemes CSG, CT, CHR, CHS NB CH
R. - This decomposition doesn't preserve dependencies
- HT R
- is not consequence of the dependencies on the
subschemes CSG, CT, CHR and CHS. - There is no decomposition into BCNF that is both
lossless join and dependency preserving in
general.
36Some decomposition algorithms 5
- Algorithm 2 Dependency preserving decomposition
into 3NF subschemes - Given (R, F) and G a minimal cover for F.
- If an attribute is not involved in any dependency
in G, - it can form a relation scheme by itself.
- Suppose R? is the set of attributes involved in
G. - If a dependency in G involves all the attributes
in R?, then R? is in 3NF, - else form relational sub-schemes Y1B1, Y2B2, ...,
YnBn - for each of the dependencies
- Y1B1, Y2B2, ..., YnBn in minimal cover G.
37Some decomposition algorithms 6
- Theorem 2
- Algorithm 2 generates a dependency preserving
decomposition into 3NF subschemes - Proof of Theorem 2
- Projected dependencies involve a cover for F, so
the decomposition is dependency-preserving. - Claim that each sub-scheme YB where Y B belongs
to the minimal cover G for F is in 3NF. Must
show - if XA?YB, where X A is non-trivial dependency,
then - either X is a superkey of YB
- or A is a prime attribute of YB.
38Some decomposition algorithms 7
- Proof of Theorem 2 (cont.)
- if XA?YB, where X A is non-trivial
dependency, then - either X is a superkey of YB
- or A is a prime attribute of YB.
- If A?B, then A Î Y. But Y is a key for YB, since
Y B - and no proper subset of Y determines B as G is a
minimal cover. Hence A ? B Þ A is prime. - If AB, then X is a subset of Y such that X B,
and since G is a minimal cover, XY. - Hence AB Þ X is a superkey for YB.
39Some decomposition algorithms 8
- Algorithm 3 Dependency preserving, lossless join
decomposition into 3NF subschemes - Apply Algorithm 2 to obtain a dependency-preservin
g - decomposition s of R with dependencies F. Now
let Z - be a key for R, and introduce Z as an additional
- subscheme to derive the decomposition t s È
Z. - Theorem 3
- The decomposition t of R is both lossless join
and dependency preserving into 3NF subschemes
40Some decomposition algorithms 9
- Sketch proof of Theorem 3
- The decomposition t of R is both lossless join
and dependency preserving into 3NF subschemes - to prove lossless join apply lossless join
algorithm - Construct a table with rows ? Y1B1, Y2B2, ...,
YnBn - where FDs Y1B1, Y2B2, ..., YnBn are a minimal
- cover G, together with row ? Z where Z is a key
for R. - Call these rows row1, row2, ..., rown, and rown1
41Some decomposition algorithms 10
- Sketch proof of Theorem 3 (cont)
- Construct a table with rows ? Y1B1, Y2B2, ...,
YnBn where - FDs Y1B1, Y2B2, ..., YnBn are a minimal
cover G, - together with a row ? Z where Z is a key for R.
- Call these rows row1, row2, ..., rown, and
rown1 - In rown1, have as in all columns associated
with key Z - Suppose that theres a b in the kth column in
rown1. - Z is a key, so there must be a way of inferring
the value - at the location rown1k from the a values in
rown1 using - the FDs in the minimal cover G in some sequence.
- Tracing this sequence via matchings on the table
will - then convert the entry at location rown1k to an
a.
42Some decomposition algorithms 12
- Split R ABCDE into R1AB, R2BC, R3CDE,
R4ACE, ZBDE - with the FDs A B, B C, DE C, CE A in
minimal cover G
43Some decomposition algorithms 13
- Split R ABCDE into AB, BC, CDE, ACE, BDE where
- AB, BC, DEC, CEA are a minimal cover (G)
- and BDE is a key (Z)
- Have a b in column 1
- (k 3), can infer that
- BDE A, since
- BC and CEA
- Can trace these inferences by matching rows 2 and
5, - then matching rows 4 and 5 to replace b51 by a1
etc.
44Some decomposition algorithms 14
- Split R ABCDE into R1AB, R2BC, R3CDE,
R4ACE, ZBDE - with the FDs A B, B C, DE C, CE A in
minimal cover G
lossless