Title: Design guidelines and Normalization
1Design guidelines and Normalization
- Informal design guidelines for Relation Schema
- Functional dependencies
- Normal forms
Dr. Reuven Bakalash
2Design Guidelines for Relation Schema
- Discussed four informal measures of quality of
relation schema design - Semantics of the attributes
- Reducing the redundant values in tuples
- Reducing the null values in tuples
- Disallowing the possibility of generating
spurious tuples
Dr. Reuven Bakalash
3Semantics of the Relation Attribute
- Semantics, specifies how to interpret the
attribute values. They should have a clear
meaning for themselves and how they relate to one
another (see the COMPANY schema) - Guideline 1
- Design a relation schema such that it is easy to
explain its meaning. If a relation schema
corresponds to one entity type or one
relationship type, the meaning tends to be clear. - So, do not combine attributes from entity type
and relationship type to a single relation.
Dr. Reuven Bakalash
4Semantics of the Relation Attribute
EMP_DEPT
(a)
EMP_PROJ
(b)
- (a) and (b) have clear semantics
- Both violate guideline 1. They mix attributes.
- (a) represents a single employee but includes
also information on department - (b) relates an employee to a project but also
include employee and project information - They can be used as views, but they cause
problems when used as base relations. -
Dr. Reuven Bakalash
5Information in tuples
- Minimize the storage space that the base
relations occupy
Compare the space used by two base relations
EMPLOYEE and DEPARTMENT in Fig. 14.2 with the
space of EMP_DEPT in Fig. 14.4 In 14.4 all
department information repeats for each employee.
Dr. Reuven Bakalash
6Information in tuples
- Update anomalies (insertion, deletion and
modification anomalies) - Insertion anomalies (examples based on EMP_DEPT,
Fig. 14.4).
To insert a new employee into EMP_DEPT we must to
enter the attributes of department correctly. In
14.2 we enter only dept. number in the new
tuple. It is difficult to enter a new department
that has no employees yet, since SSN is a primary
key.
If we delete the tuple representing the last
employee in a department, we lost the information
concerning that department from database.
Dr. Reuven Bakalash
7Information in tuples
In EMP_DEPT if we change, say, the manager of
dept 5, we must update the tuples of all
employees that work for this department.
- Guideline 2
- Design the base relations schemas so that no
insertion, deletion, or modification anomalies
are present in the relations. - If any anomalies are present, note them clearly
and make sure that the programs that update the
database will operate correctly.
Dr. Reuven Bakalash
8Information in tuples
- Nulls waste space and lead to problems of
understanding the meaning. Nulls can have
multiple interpretations, such as - The attribute does not apply to this tuple
- The attribute value is unknown
- The value is known but absent (not recorded yet)
Nulls also cause problems with COUNT and
SUM. Grouping many attributes in a fat relation
causes many nulls.
- Guideline 3
- As far as possible, avoid placing attributes in a
base relation whose values may frequently be
null. - If nulls are unavoidable, make sure that they
apply in exceptional cases only and do not apply
to a majority of tuples in a relation.
Dr. Reuven Bakalash
9Information in tuples
- Generation of spurious (false) tuples
Consider two tables EMP_LOCS and EMP_PROJ1 (Fig.
14.5) which will replace EMP_PROJ (Fig.
14.4b). This is a bad schema design because we
cant recover the original information from
EMP_PROJ. The result of NATURAL JOIN on EMP_LOCS
and EMP_PROJ1 is shown in Fig. 14.6. Spurious
tuples are generated! PLOCATION is neither a
primary key nor a foreign key.
- Guideline 4
- Design relation schemas so that they can be
JOINed with equality conditions on attributes
that are either primary keys or foreign keys in a
way that guarantees that no spurious tuples are
generated. Do not have relations that contain
matching attributes other than foreign_key-primary
_key combinations. - If such relations are unavoidable, do not join
them on such attributes, because the join may
produce spurious tuples.
Dr. Reuven Bakalash
10Functional Dependencies
- The following are functional dependencies
- State, Driver_license_number ? SSN
- SSN ? ENAME
- PNUMBER ? PNAME, PLOCATION
- SSN, PNUMBER ? HOURS
- A FD, denoted X ? Y, between two sets of
attributes X and Y that are subset of R specifies
a constraint on the possible tuples that can form
a relation state of R. - X functionally determines Y in R if and only if,
whenever two tuples of r(R) agree on their
X-value, they must necessarily agree on their
Y-value - If there cant be more than one tuple with a
given X-value in any relation instance r(R) -
that is, X is a candidate key of R this implies
that X ? Y for any subset of attributes Y of R
(because the key constraint doesnt allow that
two tuples will have the same value of X). - If X ? Y in R, this does not say whether or not Y
? X in R.
11Functional dependencies
- FD is a property of the relation schema
(intension) R, not of a particular legal relation
state (extension) r of R. Hence, an FD cannot be
inferred automatically from a given relation
extension r but must be defined by someone who
knows the semantic of the attributes of R.
TEXT ? COURSE ? Cannot confirm unless we know
that it is true for all possible states of
TEACH Single counterexample disproves a
functional dependancy. COURSE ? TEXT is ruled out
See Figure 14.3 for diagrammatic notation