Title: Normalization for better DB deign
1Normalization for better DB deign
2Normalization
- Normalization is a process for creating several
tables from an un-normalized relation (Table).
The purpose of normalization is to reduce data
redundancies and eliminate the data anomalies. - The techniques of normalization now provide a
foundation for logical data base design.
3Data anomalies (An undesirable consequence of a
data modification) occurs with poorly designed
Tables.
- Insertion anomaly facts (data) about two or more
different entities must be added to a single row
of a relation. - Deletion anomaly facts (data) about two or more
different entities are lost when a single row of
a relation is deleted. - Update anomaly
4Insertion anomaly The table below contains info
about customers and their agents. To add new
customers to this table, we must enter the
corresponding agent data. By doing so, the
potential for creating data inconsistency is
great!
5Modification anomaly Using The same table below,
if agent Leah Hahn married to John Doe
yesterday, we need to change her name to Leah
Doe. In this case, only three changes must be
made. In a large table, such change might occur
in thousands of records. By doing so, the
potential for creating data inconsistency is
great!
6Deletion anomaly Using The same table below, if
agent Leah Hahn decides to quit the job because
John is a millionaire, we must delete all
fields in 3 records. In this case, only three
changes must be made. In a large table, such
change might occur in thousands of records. By
doing so, the potential for creating data
inconsistency is great!
7Normalization You have seen various types of
data anomalies with poorly designed tables, lets
design a better table using normalization. .
- Normalization is done through the analysis of
functional dependencies between attributes (or
data items or fields). - The attribute B is functionally dependent on the
attribute A if each value in column A determines
one and only one value in column B.
8Steps in Normalization
- User views are identified.
- Each user view is converted to the form of an
unnormalized relation. Any repeating groups are
then removed from the unnormalized relation the
result is a relation in first normal form (1NF). - Any partial dependencies are removed from these
relations the result is a set of relations in
second normal form (2NF). - Any transitive dependencies are removed, creating
a set of relations in third normal form (3NF).
9Example 1
10User View
- The user view the data as if they were laid out
in tabular form, which is an unnormalized
relation (a relation that contains one or more
repeating groups). - Student_Activity REPORT is an example of
user-view (next slide).
11User-view of student activity SID is a unique
student ID, Club is the name of a club, Cost is
the cost of joining a club, and AmtPaid is the
amount the student has paid to that club.
12An un-normalized relation user view SID is a
unique student ID, Club is the name of a club,
Cost is the cost of joining a club, and AmtPaid
is the amount the student has paid to that club.
13 Remove repeating groups
Remove partial dependencies
Remove transitive dependencies
14Unnormalized Relations
- Student Activity REPORT is an example of an
unnormalized relation. - An unnormalized relation is a relation that
contains one or more repeating groups. - Repeating groups a group of several data
entries associated with one attribute. - Derives its name from the fact that a group of
multiple (related) entries can exist for any
single key attribute occurrence
15Conversion to First Normal Form
- A relational table must not contain repeating
groups. The user-view of student activity report
contains a repeating group, the 3rd row from the
bottom. - Repeating groups can be eliminated by adding the
appropriate entry in at least the primary key
column(s).
16Normalized Relations (1NF)
17- 1NF Definition
- The term first normal form (1NF) describes the
tabular format in which - There are no repeating groups in the table.
- (All attributes are dependent on the primary key)
- (All the key attributes are defined.)
- All relational tables satisfy the 1NF
requirements.
18From 1NF to 2NF (As we remove partial dependency
from 1NF table, the table becomes 2NF).
- To do so, we need to draw a functional dependency
diagram. - A ----? B (This notation is interpreted as
- The attribute B is functionally dependent on the
attribute A if each value in column A determines
one and only one value in column B. - The attribute A determines only one value in
Column B.
19Functional Dependency analysis
20Functional Dependency Diagram
21Normalized Tables (2NF and 3NF)Each table is 3NF
table, which means it has no partial dependency
and transitive dependency.
22Example 2
- Example company that manages building projects
- Charges its clients by billing hours spent on
each contract - Hourly billing rate is dependent on employees
position - Periodically, a report is generated that contains
information displayed in Table 5.1
23A Sample Report Layout
24A Table in the Report Format
25The Need for Normalization (continued)
- Structure of data set in Figure 5.1 does not
handle data very well - The table structure appears to work report is
generated with ease - Unfortunately, the report may yield different
results, depending on what data anomaly has
occurred
26Conversion to First Normal Form
- Repeating group
- Derives its name from the fact that a group of
multiple (related) entries can exist for any
single key attribute occurrence - Relational table must not contain repeating
groups - Normalizing the table structure will reduce these
data redundancies - Normalization is three-step procedure
27Step 1 Eliminate the Repeating Groups
- Present data in a tabular format, where each cell
has a single value and there are no repeating
groups - Eliminate repeating groups by eliminating nulls,
making sure that each repeating group attribute
contains an appropriate data value
28Data Organization First Normal Form
29Step 2 Identify the Primary Key
- Primary key must uniquely identify attribute
value - New key must be composed
30Step 3 Identify all Dependencies
- Dependencies can be depicted with the help of a
diagram - Dependency diagram
- Depicts all dependencies found within a given
table structure - Helpful in getting birds-eye view of all
relationships among a tables attributes - Use makes it much less likely that an important
dependency will be overlooked
31A Dependency Diagram First Normal Form (1NF)
32First Normal Form
- Tabular format in which
- All key attributes are defined
- There are no repeating groups in the table
- All attributes are dependent on primary key
- All relational tables satisfy 1NF requirements
- Some tables contain partial dependencies
- Dependencies based on only part of the primary
key - Sometimes used for performance reasons, but
should be used with caution - Still subject to data redundancies
33Conversion to Second Normal Form
- Relational database design can be improved by
converting the database into second normal form
(2NF) - Two steps
34Step 1 Identify All Key Components
- Write each key component on separate line, and
then write the original (composite) key on the
last line - Each component will become the key in a new table
35Step 2 Identify the Dependent Attributes
- Determine which attributes are dependent on which
other attributes - At this point, most anomalies have been
eliminated
36Second Normal Form (2NF) Conversion Results
37Second Normal Form
- Table is in second normal form (2NF) if
- It is in 1NF and
- It includes no partial dependencies
- No attribute is dependent on only a portion of
the primary key
38Conversion to Third Normal Form
- Data anomalies created are easily eliminated by
completing three steps
39Step 1 Identify Each New Determinant
- For every transitive dependency, write its
determinant as a PK (Primary Key) for a new table - Determinant
- Any attribute whose value determines other values
within a row
40Step 2 Identify the Dependent Attributes
- Identify the attributes dependent on each
determinant identified in Step 1 and identify the
dependency - Name the table to reflect its contents and
function
41Step 3 Remove the Dependent Attributes from
Transitive Dependencies
- Eliminate all dependent attributes in transitive
relationship(s) from each table that has such a
transitive relationship - Draw a new dependency diagram to show all tables
defined in Steps 13 - Check new tables and modified tables from Step 3
to make sure that each has a determinant and does
not contain inappropriate dependencies
42Third Normal Form (3NF) Conversion Results
43Third Normal Form
- A table is in third normal form (3NF) if
- It is in 2NF and
- It contains no transitive dependencies
44A transitive dependency occurs when one nonkey
attribute (such as Charge_Hour in Fig. 5.4) is
dependent on one or more nonkey attributes (such
as Job_Class).
PRIMARY KEY
A
B
- In this case, there is a transitive dependency
between the primary key and attributes A and B.
Transitive dependencies result in insertion,
deletion, and update anomalies, similar to those
for partial dependencies.
45Improving the Design
- Table structures are cleaned up to eliminate the
troublesome initial partial and transitive
dependencies - Normalization cannot, by itself, be relied on to
make good designs - It is valuable because its use helps eliminate
data redundancies
46The Completed Database
47The Completed Database (continued)
48Summary
- Normalization is a table design technique aimed
at minimizing data redundancies - First three normal forms (1NF, 2NF, and 3NF) are
most commonly encountered - 2NF is better than 1NF 3NF is better than 2NF
- For most business database design purposes, 3NF
is highest we need to go in the normalization
process