Title: Functional Dependency and Normalization
1Functional Dependency and Normalization
- Informal design guidelines for relation schemas.
- Functional dependencies.
- Normal forms.
- Normalization.
2Informal Design Guidelines
- Semantics of relations and attributes.
- Guideline 1 Design a relation schema so that it
is easy to explain its meaning. (Fig. 14.1, 14.2) - Do not combine attributes from multiple
entity types and relationship types into single
relation. (Fig. 14.3) - Reducing redundant values in tuples saves storage
space and avoid update anomalies. (Fig. 14.4) - - Insertion anomalies.
- - Deletion anomalies.
- - Modification anomalies.
- Guideline 2 Design the base relation schemas so
that no insertion, deletion, or modification
anomalies occur.
3- Insert Anomalies
- Inserting a dept with no employee info null
values need to assign, which will create problems - Inconsistency problem with insertion of new tuple
- Deletion Anomalies
- If we delete last employee, dept info is
deleted. - -Modification anomalies if we change manager of
department 5, we must update all the tuples
4Figure 14.1 Simplified version of the COMPANY
relational database schema.
5Figure 14.2 Example relations for the schema of
Figure 14.1
6Figure 14.3 Two relation schemas and their
functional dependencies. Both suffer from update
anomalies. (a) The EMP_DEPT relation schema.(b)
The EMP_PROJ relation schema.
7Figure 14.4 Example relations for the schemas in
Figure 14.3 that result from applying NATURAL
JOIN to the relations in Figure 14.2.
8Figure 14.5 Alternative (bad) representation of
the EMP_PROJ relation. (a) Representing EMP_PROJ
of Figure 14.3(b) by two relation schemas
EMP_LOCS and EMP_PROJ1. (b) Result of projecting
the populated relation EMP_PROJ of Figure 14.4 on
the attributes of EMP_LOCS and EMP_PROJ1.
9Figure 14.5 (continued)
10Figure 14.6 Result of applying the NATURAL JOIN
operation to the tuples above dotted lines in
EMP_PROJ1 and EMP_LOCS, with generated spurious
tuples marked by an asterisk.
11Informal Design Guidelines
- Reducing the null values in tuples. e.g., if 10
of employees have offices, it is better to have a
separate relation, EMP_OFFICE, rather than an
attribute OFFICE_NUMBER in EMPLOYEE. - Guideline 3 Avoid placing attributes in a base
relation whose values are mostly null. - Disallowing spurious tuples.
- - Spurious tuples tuples that are not in
the original relation but - generated by natural join of decomposed
subrelations. - - Example decompose EMP_PROJ into EMP_LOCS
and - EMP_PROJ1. (Fig. 14.5)
- - natural join of EMP_LOCS and EMP_PROJ1
results in spurious - tuples. (Fig. 14.6)
- Guideline 4 Design relation schemas so that they
can be naturally JOINed on primary keys or
foreign keys in a away that guarantees no
spurious tuples are generated.
12Functional Dependencies
- A functional dependency, denoted by X ? Y,
between two sets of attributes X and Y (X and Y
are subsets of R) specifies a constraint on the
possible tuples that can form a relation instance
r of R for any two tuples t1 and t2 in r such
that t1X t2X, we must have t1Y t2Y. - If X ? Y, we say X functionally determines Y or Y
is functionally dependent on X. - We abbreviate functional dependency by FD. X is
called the left-hand side of the FD. Y is called
the right-hand side of the FD. - A functional dependency is a property of the
meaning or semantics of the attributes, I.e., a
property of the relation schema. They must hold
on all relation states (extensions) of R.
Relation extensions r(R) that satisfy the FD are
called legal extensions.
13Figure 14.7 The teach relation state with an
apparent functional dependency text ? COURSE.
However, COURSE ? TEXT is ruled out.
14Functional Dependencies (Cont.)
- Examples.
- 1. SSN ? ENAME
- 2. PNUMBER ? PNAME, PLOCATION
- 3. SSN, PNUMBER ? HOURS
- 4. Others?
- Diagrammatic notation for displaying FDs. (Fig.
14.3) - FD is property of the relation schema R, not of a
particular relation state/instance r(R). - FDs cannot be inferred from a given relation
extension r, but must be defined explicitly by
someone who knows the semantics of the attributes
of R. (Fig. 14.7)
15Figure 14.3 Two relation schemas and their
functional dependencies. Both suffer from update
anomalies. (a) The EMP_DEPT relation schema.(b)
The EMP_PROJ relation schema.
16Functional Dependencies (Cont)
- From the FDs
- F SSN ? ENAME, BDATE, ADDRESS,
DNUMBER, - DNUMBER ? DNAME, DMGRSSN
- we can infer the following FDs
- SSN ? ENAME, DMGRSSN,
- SSN ? SSN,
- DNUMBER ? DNAME
- A FD X ? Y is inferred from a set of
dependencies F specified on R if X ? Y holds in
every relation state r that is a legal extension
of R. - F X ? Y denotes X ? Y is inferred from F.
- The closure of F, denoted by F, is the set of
all FDs that can be inferred from F.
17Functional Dependencies (Cont.)
- Inference rules for FDs.
- Abbreviated notation XYZ ? UV for X, Y, Z ?
U, V - Reflective If Y ? X, then X ? Y
- Augmentation X ? Y XZ ? YZ
- Transitive X ? Y, Y ? Z X ? Z
- Decomposition (projective) X ? YZ X ? Y
- Union (additive) X ? Y, X ? Z X ? YZ
- Pseudotransitive X ? Y, WY ? Z WX ? Z
- The first three rules are sound and complete,
called Armstrong's inference rules.
18Functional Dependencies (Cont.)
- Closure of X under F, denoted by X, is the set
of all attributes that are functionally
determined by X under F. - Algorithms for determining X
- X X
- repeat
- oldX X
- for each FD Y ? Z in F do
- if Y ? X then X X ? Z
- until oldX X
- Example
- F SSN ? ENAME, PNUMBER ?PNAME,
PLOCATION, - SSN, PNUMBER ? HOURS
- SSN SSN, ENAME
- PNUMBER ?
- SSN, PNUMBER ?
19Functional Dependencies (Cont.)
- Equivalence of sets of FDs.
- E is covered by F if every FD in E is also in F,
i.e., every FD in E can be inferred from F. - E and F are equivalent if E F, i.e, E covers
F and F covers E. - F is minimal if
- - every dependency in F has a single
attribute for its right hand side - - we cannot remove any FD from F and still
have a set of FDs - equivalent to F
- - we cannot replace any FD X ? A in F with a
FD Y ? A where - Y ? X and still have a set of FDs
equivalent to F. - Minimal set a standard or canonical form with no
redundancies. - A minimal cover of F is a minimal set of
dependencies, Fmin, that is equivalent to F.
20Functional Dependencies (Cont.)
- Compute a minimal cover
- Algorithm 14.2 Find a minimal cover G for F.
- 1. G f
- 2. Replace each FD X ? A1, A2,, AK in G by
the k FDs X ? A1, X ? A2, - X ? AK
- 3. for each FD X ? A in G
- for each attribute B X
- if (X B) with-respect-to G
contains A - then replace X ? A with X
B ? A in G - 4. For each FD X ? A in G
- if X with-respect-to G-X ?
A contains A - then remove X ? A from G
- There is at least one minimal cover for any F,
maybe several.
21Normal Forms
- Superkey, candidate key or key, primary key.
- A FD X ? Y is a full functional dependency if
removal of any attribute from X means that the
dependency does not hold any more otherwise, it
is a partial functional dependency. - An attribute is prime if it is a member of any
key (Primary or candidate). - A relation R is in first normal form if domains
of attributes include only atomic values. (Fig.
14.8, 14.9) - A relation R is in second normal form if every
non-prime attribute A in R is not partially
dependent on any key of R. - Alternatively, R is in 2NF if every non-prime
attribute A in R is fully dependent on every key
of R. - Examples. (Fig. 14.10 a, b)
22Figure 14.8 Normalization into 1NF. (a)
Relational schema that is not in 1NF. (b) Example
relation instance. (c) 1NF relation with
redundancy.
23Figure 14.9 Normalizing nested relations into
1NF. (a) Schema of the EMP_PROJ relation with a
nested relation PROJS. (b) Example extension of
the EMP_PROJ relation showing nested relations
within each tuple.
24Figure 14.9 (continued) (c) Decomposing EMP_PROJ
into 1NF relations EMP_PROJ1 and EMP_PROJ2 by
propagating the primary key.
25Figure 14.10 The normalization process. (a)
Normalizing EMP_PROJ into 2NF relations. (b)
Normalizing EMP_DEPT into 3NF relations.
26Normal Forms
- A relation R is in third normal form if for every
FD X ? A that holds on R, either - - X is a superkey of R, or
- - A is a prime attribute of R.
- (Alternative Def . - No transitive dependencies
If there is a set of attributes Z that is neither
a candidate key nor a subset of any key (primary
or candidate) of R , X ? Z and Z ? Y holds. - SSN ? DMGRSSN is transitive as SSN ? Dnumber ?
DMGRSSN (Emp-dept) and dnumber is neither a key
nor a subset of key. - Example. (Fig. 14.10 c)
- A relation R is in Boyce-Codd normal form if for
every FD X ? A that holds on R, X is a superkey
of R. - Example. (Fig. 14.12)
- Increasing Order of restrictiveness 1NF, 2NF,
3NF, BCNF. For example, if a relation schema R is
in BCNF, it is in 3NF.
27Figure 14.11 Normalization to 2NF and 3NF. (a)
The lots relation schema and its functional
dependencies FD1 through FD4. (b) Decomposing
lots into the 2NF relations LOTS1 and LOTS2.
28Figure 14.11 (continued) (c) Decomposing LOTS1
into the 3NF relations LOTS1A and LOTS1B. (d)
Summary of normalization of lots.
29Figure 14.12 Boyce-Codd normal form. (a) BCNF
normalization with the dependency of FD2 being
lost in the decomposition. (b) A relation R in
3NF but not in BCNF.
30Figure 14.13 A relation TEACH that is in 3NF but
not in BCNF.
31Normalization
- Database design revisited. Top-down approach
conceptual design. A more purist way
decomposition. - Normalization a process in which unsatisfactory
relational schemas are decomposed into smaller
relation schemas that possess desirable
properties. - Starting with a single universal relation schema
R A1, A2,. An that includes all the attributes
of the database. - Decompose R into a set of relation schemas D
R1, R2, Rm using the FDs specified by the
database designers. D is called a decomposition
of R. - Guidelines for normalization normal forms,
attribute preservation, dependency preservation,
lossless join.
32Normalization (Cont.)
- Attribute preservation. No attributes are lost.
- m
- U Ri R
- i1
- Dependency preservation.
- (?F(R1) ? ?F(R2) ? . ? ?F(Rm)
) F - where ?F(R1) is the set of FDs, X ? Y , in F
such that - X ? Y ? Ri.
- A decomposition DR1, R2,., Rm of R has the
lossless join property with respect to the set of
dependencies F on R if, for every relation state
r of R that satisfies F, - (?ltR1gt(r),, ?ltRmgt(r)) r
- where ltRigt are the attributes in Ri.
33Normalization (Cont.)
- Decomposition into 3NF relation schemas
- Algorithm 15.1 Dependency-preserving and
lossless decomposition into 3NF relation schemas. -
- 1. Find a minimal cover G for F (Algorithm
14.2) - 2. For each left-hand side X of a FD in G
- create a relation schema X ?
A1 ? A2 ? Ak in D where X ? A1, X ? A2,., X ?
Ak are the only dependencies in G with X as
left-hand side - 3. Place any remaining (unplaced)
attributes in a single relation schema - 4. If none of the relation schemas
contains a key of R, create one more relation
schema that contains attributes that form a key
for R.
34Normalization (Cont.)
- Determine a key
- Algorithm 15.4a Find a key K for R.
- 1. K R
- 2. For each attribute A in K
- if (K A) with-respect-to F
contains A then remove A from K - Example. (Fig. 14.11)
- It is not always possible to find a decomposition
that preserves dependencies and in BCNF. (Fig.
14.12) - The lossless join decomposition is based on the
assumption that no null values are allowed for
the join attributes.