Relational Data Base Design in Practice

About This Presentation

Title:

Relational Data Base Design in Practice

Description:

A relational database scheme can be regarded as ... record the department selling perfume in Debenhams if it doesn't have a manager. ... – PowerPoint PPT presentation

Number of Views:42

Avg rating:3.0/5.0

Slides: 24

Provided by: meurig

Category:

more less

Transcript and Presenter's Notes

Title: Relational Data Base Design in Practice

1
Relational Data Base Design in Practice

Normal Forms
More about Anomalies
Algorithms for Database Design

2
Database Design Concepts Review 1

A relational database scheme can be regarded as
defined by an abstract relation R(A1, A2, ...,
An) where
A1, A2, ..., An are all the attributes of
interest. This is
simplest possible relation scheme for the
database.
In general, when setting up the database use a
decomposition of the relation scheme into
subschemes.
Issues
what is the nature of this association?
does it matter what is associated in tables?
if so, what principles apply to the design of
tables?

3
Database Design Concepts Review 2

Answer
the association of attributes should be guided by
data dependency
failure to take account of dependency creates
anomalies on modifying the database
anomalies can be largely eliminated through an
appropriate choice of tables and normalisation
cf. an object-oriented approach, where attributes
are grouped according to the objects that contain
them

4
Database Design Concepts Review 3

The relational database design problem
Any relational database could in principle be
organised as a single relation R(A1, A2, ..., An)
where A1, A2, ..., An are all the attributes of
interest.
Why not do it this way?
convenience easy to handle small relations
... but also need to consider semantics of
attributes
In practice, typically choose to decompose R into
a set of smaller relations R1, R2, ..., Rk, where
the set of attributes in R the set of
attributes in R1, R2, ..., Rk.

5
Database Design Concepts Review 4

Relational database design theory serves to guide
the choice of decomposition for a given relation
scheme.
Use the data dependencies for this if no data
dependencies, then no decomposition
Key concepts for studying decompositions
data dependency
lossless join
dependency preservation

6
Database Design Concepts Review 5

Anomalies illustrated by HVFC
Form of the SUPPLIER relation
SAIP (SNAME, SADDRESS, ITEM, PRICE)
leads to anomalies on update, deletion and
insertion
update anomalies when supplier address is
updated, must be updated in all tuples, else two
addresses recorded for same supplier.
insertion anomalies can't record info on
supplier unless he supplies part, whence ....
deletion anomalies delete all items supplied by
one supplier and you lose supplier's address.

7
Database Design Concepts Review 6

Anomalies illustrated by HVFC (cont.)
Form of the SUPPLIER relation
SAIP (SNAME, SADDRESS, ITEM, PRICE)
leads to anomalies on update, deletion and
insertion
A better design of the HVFC relation scheme will
resolve these problems. The decomposition
SA (SNAME, SADDRESS)
SIP (SNAME, ITEM, PRICE)
resolves the anomalies.

8
Normal Forms for Relational Schemes 0

Database design is essentially an informal
activity.
It relies upon observation and analysis of
external state.
Some patterns of dependency between attributes
recur.
Normal forms (NFs) for relation schemes encode
rules
for design that have been developed from
experience.
They provide a basis for database design
patterns.

9
Normal Forms for Relational Schemes 1

Boyce-Codd Normal Form (BCNF)
Use ltR,Fgt to denote relation scheme R(A1, A2,
..., An)
with set of dependencies generated by F
Definition ltR,Fgt is in BCNF if
for all sets of attributes X È A ? A1, A2,
..., An
XA holds in R and AÏX Þ X contains a key for R
Terminology
If X contains a key for R, call X is a superkey
for R

10
Normal Forms for Relational Schemes 2

Examples relating to Boyce-Codd Normal Form
R is SCAIP(S, C, A , I, P) where S is SNAME,
C is CITY, A is AGENT, I is ITEM, P is PRICE
F is S C, C A, S I P
Not in BCNF, since C A, but C is not a superkey
SCAIP SIP SC CA is decomposition that
is
lossless and dependency preserving such that
all the sub-schemes are in BCNF
Desirable kind of decomposition, but can't always
achieve it .... need weaker NFs than BCNF

11
Normal Forms for Relational Schemes 3

Third Normal Form (3NF)
Relation scheme ltRº R(A1, A2, ..., An), Fgt is in
3NF if
for all sets of attributes X È A ? A1, A2,
..., An
XA holds in R and AÏX
Þ either X is a superkey for R
or A is an attribute contained in a key for R
Definition If A is contained in a key for R then
A is a prime attribute for R

12
Normal Forms for Relational Schemes 4

Example of Third Normal Form
Consider relation scheme STD, where S is local
student id, T is tutor, D is department.
Assuming that
"students in different departments can have same
id"
have dependencies F T D, SD T
STD is not in BCNF T D, but T not superkey
STD is in 3NF D is a prime attribute in key SD

13
Normal Forms for Relational Schemes 5

Historical digression Second Normal Form
If ltR,Fgt is not in 3NF, then there is a
functional dependency XA such that X is not a
superkey and A is not a prime attribute
There are then two possibilities
either X is a proper subset of a key
or there is no key that contains X.
These two cases can be linked to types of anomaly.

14
Normal Forms for Relational Schemes 6

Historical digression Second Normal Form (cont)
If X is a proper subset of a key, then
X A is a partial dependency
Such dependency is linked with update anomalies
If X is not a proper subset of a key, then
X A is a transitive dependency
Such dependency is linked with insert/delete
anomalies.

15
Normal Forms for Relational Schemes 7

Historical digression Second Normal Form (cont)
If X is not a proper subset of a key, then
X A is a transitive dependency
Use term transitive because if Y is a key, then
Y X A is a non-trivial chain of dependencies
X is not a proper subset of a key ...
\ X is not contained in Y,
A is not prime ...
\ A Ï Y,
X A is non-trivial,
\ A Ï X by hypothesis.

16
Normal Forms for Relational Schemes 8

Historical digression Second Normal Form (cont)
If X is not a proper subset of a key, then
X A is a transitive dependency
Definition ltR,Fgt is in second normal form (2NF)
if
either R is in 3NF
or R is not in 3NF and has transitive
dependencies but no partial dependencies
partial dependencies ? update anomalies .
BUT
not all update anomalies ? partial dependencies

17
Normal Forms for Relational Schemes 9

Illustrative examples related to 2NF
Example 1. The SCA relation is in 2NF
C A is a transitive dependency in SCA.
C is not a subset of a key for SCA the only key
is S
Example 2. The relation SAIP is not in 2NF
S A is a partial dependency in SAIP.
S is a subset of the key SI of SAIP

18
Normal Forms for Relational Schemes 10

Illustrative examples related to 2NF (cont.)
Example 3. Consider SIDM where S is store, I is
item,
D is dept, M is manager with FDs SI D, SD
M
The relation scheme SIDM is in 2NF but not in
3NF
SD M, but SD isn't a superkey, and M isn't
prime
Only key for SIDM is SI, so SD not contained in
key
\ SD M is a transitive dependency in SIDM.

19
More about Anomalies 1

Two symptoms of anomalies
Different kinds of anomaly exhibit 2 different
symptoms
redundancy
information loss inability to represent
information
1. redundancy
This leads to update anomalies
when an attribute of one tuple is updated, the
same attribute must be updated in many places
E.g. in SAIP, have redundancy (s,a,i,p)
(s,a,i',p')

20
More about Anomalies 2

Two symptoms of anomalies (cont.)
2. loss of information / inability to represent
info
leads to deletion/insertion anomaly
when a tuple is deleted essential information is
incidentally lost no provision to store
information in the absence of irrelevant
additional details
E.g. In SIDM, can't record the department selling
perfume in Debenhams if it doesn't have a
manager.
In STD, no means to record dept of tutor unless
the tutor has a student.

21
More about Anomalies 3

Commentary on the examples
Neither SAIP nor SIDM is in 3NF ...
In SAIP, S A is a partial dependency.
Choose a distinct pair of values (s, i) and (s,
i') for the key SI (possible since S doesn't
determine I). The two corresponding tuples will
be distinct, but have the same value for
attribute A.

22
More about Anomalies 4

Commentary on the examples (cont.)
Neither SAIP nor SIDM is in 3NF ...
In SIDM, SD M is a transitive dependency.
non-trivial chain of dependencies SI SD M.
To store triple (s, i, d), need value of m
determined by s and d. Without this can't record
(s, i, d).
\ information loss deletion anomaly

23
More about Anomalies 5

Commentary on the examples (cont.)
Note that STD is in 3NF
In STD, the only key is SD. For SD T, SD is a
superkey. For T D, though T is not a superkey,
D is a prime attribute.
This shows that even relation schemes in 3NF can
have certain types of update deletion/insertion
anomaly.

24
More about Anomalies 6

Linking anomalies with partial / transitive
dependencies
The examples informally illustrate a link between
certain kinds of dependency and types of anomaly
dependency anomaly
partial redundancy / update
transitive deletion / insertion
Can interpret this link in more abstract terms .

25
More about Anomalies 7

Linking anomalies with partial / transitive
dependencies
dependency anomaly
partial redundancy / update
transitive deletion / insertion
Suppose that X A is a partial dependency that
arises from violation of 3NF. Then A isn't
prime attribute, and X is a subset of a key Y.
X not a key Þ tuples that agree on X but
disagree on some attribute B in Y. Have distinct
tuples that must have the same value for
attribute A, as X A
\ redundancy update anomaly on updating A.

26
More about Anomalies 8

Linking anomalies with partial / transitive
dependencies
dependency anomaly
partial redundancy / update
transitive deletion / insertion
Suppose that X A is a transitive dependency
that arises from violation of 3NF. non-trivial
chain of dependencies Y X A, where Y is a
key.
To record which values of attributes in X are
associated with a given set of attributes in Y
must know what value of A is associated with
given values of attributes in X.
\ information loss deletion anomaly

27
More about Anomalies 9

Linking anomalies with partial / transitive
dependencies
dependency anomaly
partial redundancy / update
transitive deletion / insertion
Note
where there is partial dependency will also have
deletion / insertion anomalies e.g. SAIP - cant
record supplier address without item and price
where there is transitive dependency may also
have update anomalies e.g. SCA - updating agent
for a city entails updating all records with
suppliers in that city

28
Desirable properties of decompositions review 1

Lossless decompositions
A decomposition of the relation scheme R into
subschemes R1, R2, ..., Rn is lossless if, given
tuples r1, r2, ..., rn in R1, R2, ..., Rn
respectively, such that ri and rj agree on all
common attributes for all pairs of indices (i,j),
the - uniquely defined - tuple derived by joining
r1, r2, ..., rn is in R.
Terminology "lossless join" decomposition

29
Desirable properties of decompositions review 2

Dependency preserving decompositions
A decomposition of the relation scheme R into
subschemes R1, R2, ..., Rn is dependency
preserving if all the FDs within R can be derived
from those within the relations R1, R2, ..., Rn.
If F is the set of dependencies defined on R,
then the requirement is that the set G of
dependencies that can be obtained as projections
of dependencies in F onto R1, R2, ..., Rn
together generate F.
Note carefully that it is not enough to check
whether projections of dependencies in F onto R1,
R2, ..., Rn together generate F.

30
Minimal cover review

Representing the set of dependencies (cont.)
Definition G is a minimal cover for F if
a) G F
b) every RHS in G is a single attribute
c) every LHS minimal subject to determining RHS
i.e. for no X Y Î G is there a proper subset
Z of X such that G \ X Y È Z Y
generates F
d) no proper subset of G also generates F.
Minimal cover isn't necessarily unique.

31
Lossless Join Decomposition review

Theorem
If r S, T is a decomposition of R, and F is
the set of FDs for R, then r is a lossless join
decomposition with respect to F if and only if
either T\S ? (S Ç T) or S\T ? (S Ç T).
Corollary to the theorem If R is a relation
scheme, and X A is a functional dependency in
R, where A is a an attribute, X is a set of
attributes not containing A, and XA is a proper
subset of R, then R1XA, R2R\A is a lossless
join decomposition of R.

32
Some decomposition algorithms 1

Algorithm 1 Decomposing a relation scheme as a
lossless join of BCNF subschemes
Suppose R is a relation scheme that is not in
BCNF
Take X A, where X is not a superkey and A Ï X
Decompose R as S È T, where S AX and T R\A.
X not a superkey Þ S and T are proper subsets of
R
where S Ç T X, and X A S\T
so decomposition of R as S È T is a lossless
join.
\ R lossless join of arbitrarily small
subschemes
Every scheme with at most 2 attributes is in BCNF
...

33
Some decomposition algorithms 2

Algorithm 1 Decomposing a relation scheme as a
lossless join of BCNF subschemes (cont.)
R lossless join of arbitrarily small
subschemes
Every scheme with at most 2 attributes is in BCNF
...
Theorem 1
Every relation scheme can be expressed as a
lossless join of BCNF relation schemes.

34
Some decomposition algorithms 3

Example A lossless decomposition into BCNF
Consider the relation scheme CTHRSG, where
Ccourse, Tteacher, Rroom, Sstudent, Ggrade,
subject to the dependencies
C T course determines teacher
HR C hour and room determine the course
HT R hour and teacher determine the room
CS G course and student determine the grade
HS R hour and student determine the room

35
Some decomposition algorithms 4

Example A lossless decomposition into BCNF
(cont.)
Consider the relation scheme CTHRSG with
dependencies
C T, HR C, HT R, CS G, HS R
Using Algorithm 1, arrive at a decomposition into
the BCNF subschemes CSG, CT, CHR, CHS NB CH
R.
This decomposition doesn't preserve dependencies
HT R
is not consequence of the dependencies on the
subschemes CSG, CT, CHR and CHS.
There is no decomposition into BCNF that is both
lossless join and dependency preserving in
general.

36
Some decomposition algorithms 5

Algorithm 2 Dependency preserving decomposition
into 3NF subschemes
Given (R, F) and G a minimal cover for F.
If an attribute is not involved in any dependency
in G,
it can form a relation scheme by itself.
Suppose R? is the set of attributes involved in
G.
If a dependency in G involves all the attributes
in R?, then R? is in 3NF,
else form relational sub-schemes Y1B1, Y2B2, ...,
YnBn
for each of the dependencies
Y1B1, Y2B2, ..., YnBn in minimal cover G.

37
Some decomposition algorithms 6

Theorem 2
Algorithm 2 generates a dependency preserving
decomposition into 3NF subschemes
Proof of Theorem 2
Projected dependencies involve a cover for F, so
the decomposition is dependency-preserving.
Claim that each sub-scheme YB where Y B belongs
to the minimal cover G for F is in 3NF. Must
show
if XA?YB, where X A is non-trivial dependency,
then
either X is a superkey of YB
or A is a prime attribute of YB.

38
Some decomposition algorithms 7

Proof of Theorem 2 (cont.)
if XA?YB, where X A is non-trivial
dependency, then
either X is a superkey of YB
or A is a prime attribute of YB.
If A?B, then A Î Y. But Y is a key for YB, since
Y B
and no proper subset of Y determines B as G is a
minimal cover. Hence A ? B Þ A is prime.
If AB, then X is a subset of Y such that X B,
and since G is a minimal cover, XY.
Hence AB Þ X is a superkey for YB.

39
Some decomposition algorithms 8

Algorithm 3 Dependency preserving, lossless join
decomposition into 3NF subschemes
Apply Algorithm 2 to obtain a dependency-preservin
g
decomposition s of R with dependencies F. Now
let Z
be a key for R, and introduce Z as an additional
subscheme to derive the decomposition t s È
Z.
Theorem 3
The decomposition t of R is both lossless join
and dependency preserving into 3NF subschemes

40
Some decomposition algorithms 9

Sketch proof of Theorem 3
The decomposition t of R is both lossless join
and dependency preserving into 3NF subschemes
to prove lossless join apply lossless join
algorithm
Construct a table with rows ? Y1B1, Y2B2, ...,
YnBn
where FDs Y1B1, Y2B2, ..., YnBn are a minimal
cover G, together with row ? Z where Z is a key
for R.
Call these rows row1, row2, ..., rown, and rown1

41
Some decomposition algorithms 10

Sketch proof of Theorem 3 (cont)
Construct a table with rows ? Y1B1, Y2B2, ...,
YnBn where
FDs Y1B1, Y2B2, ..., YnBn are a minimal
cover G,
together with a row ? Z where Z is a key for R.
Call these rows row1, row2, ..., rown, and
rown1
In rown1, have as in all columns associated
with key Z
Suppose that theres a b in the kth column in
rown1.
Z is a key, so there must be a way of inferring
the value
at the location rown1k from the a values in
rown1 using
the FDs in the minimal cover G in some sequence.
Tracing this sequence via matchings on the table
will
then convert the entry at location rown1k to an
a.

42
Some decomposition algorithms 12

Split R ABCDE into R1AB, R2BC, R3CDE,
R4ACE, ZBDE
with the FDs A B, B C, DE C, CE A in
minimal cover G

43
Some decomposition algorithms 13

Split R ABCDE into AB, BC, CDE, ACE, BDE where
AB, BC, DEC, CEA are a minimal cover (G)
and BDE is a key (Z)
Have a b in column 1
(k 3), can infer that
BDE A, since
BC and CEA
Can trace these inferences by matching rows 2 and
5,
then matching rows 4 and 5 to replace b51 by a1
etc.

44
Some decomposition algorithms 14

Split R ABCDE into R1AB, R2BC, R3CDE,
R4ACE, ZBDE
with the FDs A B, B C, DE C, CE A in
minimal cover G

lossless

Write a Comment

User Comments (0)

About PowerShow.com

Relational Data Base Design in Practice - PowerPoint PPT Presentation

Relational Data Base Design in Practice

A relational database scheme can be regarded as ... record the department selling perfume in Debenhams if it doesn't have a manager. ... – PowerPoint PPT presentation