Title: Normalization
1 Normalization
2Database Design
- Give some body of data to be represented in a
database, how do we decide on a suitable logical
structure for that data? - We are concerned here with logical (or
conceptual) design only, not physical design - Database design is still very much an art, not a
science - Database design is not just a question of getting
the data structure right data integrity is a
(perhaps the) key ingredient too. - We will be concerned for the most part with what
might be termed application independent design.
3Database Normalization
- Database normalization is the process of removing
redundant data from your tables in to improve
storage efficiency, data integrity, and
scalability. - In the relational model, methods exist for
quantifying how efficient a database is. These
classifications are called normal forms (or NF),
and there are algorithms for converting a given
database between them. - Normalization generally involves splitting
existing tables into multiple ones, which must be
re-joined or linked each time a query is issued.
4History
- Edgar F. Codd first proposed the process of
normalization and what came to be known as the
1st normal form in his paper A Relational Model
of Data for Large Shared Data Banks Codd stated - There is, in fact, a very simple elimination
procedure which we shall call normalization.
Through decomposition nonsimple domains are
replaced by domains whose elements are atomic
(nondecomposable) values.
5Normalized Design Pros and Cons
Pros of Normalizing Cons of Normalizing
More efficient database structure. Better understanding of your data. More flexible database structure. Easier to maintain database structure. Few (if any) costly surprises down the road. Validates your common sense and intuition. Avoids redundant fields. Ensures that distinct tables exist when necessary. You can't start building the database before you know what the user needs.
6Functional Dependencies
- Informal definition
- A many-to-one relationship between one set of
attributes A and another set of attributes B in a
given relation R - i.e. for many values of the set A, there is only
one value in set B. - functional dependencies (FDs) tell us the meaning
of data - (e.g. every supplier is located in only one
city ) - FDs represent integrity constraints.
- FDs are checked by the database management system
(DBMS) at every update. - So we are interested in finding the smallest set
of FDs that capture the intended meaning of the
data.
7Definitions
- A more formal definition
- Given R, an instance of a relation, and X and Y,
arbitrary attribute subsets of R, then Y is
functionally dependent on X - X ? Y
- if and only if each X-value in R is associated
with precisely one Y-value in R
8Use of Functional Dependencies
- We use functional dependencies to
- test relations to see if they are legal under a
given set of functional dependencies. If a
relation r is legal under a set S of functional
dependencies, we say that r satisfies S. - specify constraints on the set of legal
relations we say that S holds on R if all legal
relations on R satisfy the set of functional
dependencies S. - Note A specific instance of a relation schema
may satisfy a functional dependency even if the
functional dependency does not hold on all legal
instances.
9 Dependencies Definitions
- Partial Dependency
- A partial dependency is a dependency where A is
functionally dependant on B ( A ? B), but there
is some attribute on A that can be removed from A
and yet the dependency stills holds. For instance
if the relation existed StaffNo, sName ?
branchNo Then you could say that for every
StaffNo, sName there is only one value of
branchNo, but since there is no relation between
branchNo and staffNo the relation is only partial.
10- Partial Dependency when an non-key attribute is
determined by a part, but not the whole, of a
COMPOSITE primary key.
Partial Dependency
11Dependencies Definitions
- Transitive Dependency In a transitive
dependancy is where A ? B and B ? C, therefore A
? C (provided that B ? A, and C ? A doesn't
exist).
Transitive Dependency
12Normal Forms Review
- Unnormalized There are multivalued attributes
or repeating groups - 1 NF No multivalued attributes or repeating
groups. - 2 NF 1 NF plus no partial dependencies
- 3 NF 2 NF plus no transitive dependencies
13Example 1 Determine NF
All attributes are directly or indirectly
determined by the primary key therefore, the
relation is at least in 1 NF
- ISBN ? Title
- ISBN ? Publisher
- Publisher ? Address
14Example 1 Determine NF
The relation is at least in 1NF. There is no
COMPOSITE primary key, therefore there cant be
partial dependencies. Therefore, the relation is
at least in 2NF
- ISBN ? Title
- ISBN ? Publisher
- Publisher ? Address
15Example 1 Determine NF
Publisher is a non-key attribute, and it
determines Address, another non-key attribute.
Therefore, there is a transitive dependency,
which means that the relation is NOT in 3 NF.
- ISBN ? Title
- ISBN ? Publisher
- Publisher ? Address
16Example 1 Determine NF
We know that the relation is at least in 2NF, and
it is not in 3 NF. Therefore, we conclude that
the relation is in 2NF.
- ISBN ? Title
- ISBN ? Publisher
- Publisher ? Address
17Example 1 Determine NF
- In your solution you will write the following
justification - No M/V attributes, therefore at least 1NF
- No partial dependencies, therefore at least 2NF
- There is a transitive dependency (Publisher ?
Address), therefore, not 3NF - Conclusion The relation is in 2NF
- ISBN ? Title
- ISBN ? Publisher
- Publisher ? Address
18Example 2 Determine NF
All attributes are directly or indirectly
determined by the primary key therefore, the
relation is at least in 1 NF
19Example 2 Determine NF
The relation is at least in 1NF. There is a
COMPOSITE Primary Key (PK) (Order_No,
Product_ID), therefore there can be partial
dependencies. Product_ID, which is a part of
PK, determines Description hence, there is a
partial dependency. Therefore, the relation is
not 2NF. No sense to check for transitive
dependencies!
20Example 2 Determine NF
We know that the relation is at least in 1NF, and
it is not in 2 NF. Therefore, we conclude that
the relation is in 1 NF.
21Example 2 Determine NF
In your solution you will write the following
justification 1) No M/V attributes, therefore at
least 1NF 2) There is a partial dependency
(Product_ID ? Description), therefore not in
2NF Conclusion The relation is in 1NF
22Example 3 Determine NF
Comp_ID and No are not determined by the primary
key therefore, the relation is NOT in 1 NF. No
sense in looking at partial or transitive
dependencies.
- Part_ID ? Description
- Part_ID ? Price
- Part_ID, Comp_ID ? No
23Example 3 Determine NF
- In your solution you will write the following
justification - There are M/V attributes therefore, not 1NF
- Conclusion The relation is not normalized.
- Part_ID ? Description
- Part_ID ? Price
- Part_ID, Comp_ID ? No
24Bringing a Relation to 1NF
25Bringing a Relation to 1NF
- Option 1 Make a determinant of the repeating
group (or the multivalued attribute) a part of
the primary key.
26Bringing a Relation to 1NF
- Option 2 Remove the entire repeating group from
the relation. Create another relation which would
contain all the attributes of the repeating
group, plus the primary key from the first
relation. In this new relation, the primary key
from the original relation and the determinant of
the repeating group will comprise a primary key.
27Bringing a Relation to 1NF
28Bringing a Relation to 2NF
29Bringing a Relation to 2NF
- Goal Remove Partial Dependencies
30Bringing a Relation to 2NF
- Remove attributes that are dependent from the
part but not the whole of the primary key from
the original relation. For each partial
dependency, create a new relation, with the
corresponding part of the primary key from the
original as the primary key.
31Bringing a Relation to 2NF
32Bringing a Relation to 3NF
- Goal Get rid of transitive dependencies.
Transitive Dependency
33Bringing a Relation to 3NF
- Remove the attributes, which are dependent on a
non-key attribute, from the original relation.
For each transitive dependency, create a new
relation with the non-key attribute which is a
determinant in the transitive dependency as a
primary key, and the dependent non-key attribute
as a dependent.
34Bringing a Relation to 3NF
35Boyce-Codd Normal Form (BCNF)
- Definition A relation is in Boyce-Codd Normal
Form (BCNF) if every determinant is a candidate
key, where as candidate key is a column in a
table which has the ability to become a primary
key.