A Comprehensive Approach to Normalization and Denormalization - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

A Comprehensive Approach to Normalization and Denormalization

Description:

Disallowed partial dependencies on any key in 2NF. Disallowed transitive dependencies on any key in 3NF. Simpler definition for 3NF: ... – PowerPoint PPT presentation

Number of Views:215
Avg rating:3.0/5.0
Slides: 21
Provided by: csS1
Category:

less

Transcript and Presenter's Notes

Title: A Comprehensive Approach to Normalization and Denormalization


1
A Comprehensive Approach to Normalization and
Denormalization
  • by
  • Abdul Rashid Ahmad

2
Normalization overview
  • E.F. Codd proposed three normal forms
  • The first, second, and third normal forms
  • 1NF, 2NF and 3NF are based on the functional
    dependencies among the attributes of a relation
  • Boyce-Codd normal form was proposed by Boyce and
    Codd and is known as a stronger definition of 3NF

3
Continued
  • In the normalization process, we start with a
    single universal relation schema (URS) which is
    a table with collection of attributes present in
    the set of functional dependencies
  • Then apply the normal form restrictions to
    develop the complete relational schema,
    (decompose the tables as they specify).

4
Normalization of data
  • based on set of functional dependencies (FD) and
    Prime Key(PK) you analyze the relation schemas
    to achieve two desirable properties
  • 1) Minimizing data redundancy
  • 2) Minimizing insertion, deletion and update
    anomalies
  • Therefore the ideal goal of DB design will be
    developing a fully normalized relational schema,
    preferably in BCNF.

5
Using normal forms
  • Decomposed the relation when a NF test fails
    starting from 1NF.
  • Disclaimers to use of normal forms
  • 1) Do not guarantee good database design.
  • 2) Sometimes highest normal form may not be the
    best design for performance reasons.

6
Normalization Approach
  • Normalizing the relation into 1NF
  • Consider this schema for departments
  • The FD of EMP_DEPT is
  • dnumber -gt dname, dmgrssn
  • Problems with this?
  • department location is multivalued.
  • Possible solution are
  • Make composite primary key dnumber, dlocation

dlocations dmgrssn dnumber dname
7
Continued
  • Problem solved ?
  • Make attributes for the locations if you know
    the possible number of locations.
  • Better solution?
  • Make a new table with dname and dlocation
  • This is the preferred solution

8
Normalizing the relation into 2NF
  • Based on the concept of a full functional
    dependency.
  • A FD X -gt Y is a full functional dependency if
    removal of any attribute from X means that the
    dependency does not hold any more.
  • A partial dependency can occur only if the
    determinate is composite.
  • Partial dependency, if there is some attribute A
    in X that can be removed from X and the
    dependency still holds.

9
Continued
  • A relation is in 2NF if every non-key attribute
    is fully functionally dependent on the prime
    attribute.
  • If a relation is not in 2NF, it can be normalized
    into a number of 2NF relations
  • This involves decomposing the table

10
EMP_PROJ
SSN PNUMBER hours ename pname plocation


  • Normalize the EMP_PROJ table
  • Steps
  • 1) Figure out the dependencies, noting the
    determinates that is any attribute(s) on which
    some other attribute(s) are dependent)
  • 2) Put each determinate in a table by itself.
  • 3) Include in each table the attributes that are
    dependent on that determinate

11
Normalizing the relation into 3NF EMP_DEPT
ename SSN bdate address dnumber dname dmgrssn
  • Analyze the EMP_DEPT relation
  • 1) In 1NF ?
  • 2) In 2NF ?
  • 3) Still a problem ?
  • 4)Identify the problem.
  • The problem with the EMP_DEPT table
  • DNAME depends on DNUM, not the prime
    key.(transitive dependency)
  • X-gt Y is a transitive dependency if there is a
    set of attributes Z of the relation and both X -gt
    Z and Z -gtY hold.
  • A relation is in 3NF if every non-key attribute
    is
  • In 2NF).
  • Non transitively dependent on the prime key.

12
Continued EMP_DEPT
ename SSN bdate address dnumber dname dmgrssn
  • We can normalize EMP_DEPT by decomposing it into
    two 3NF relations.
  • D R1, R2 where
  • R1 EMP (ename, SSN, bdate, address)
  • R2 DEPT(dnumber, dname, dmgrssn)
  • Intuitively, the two results represent
    independent entity facts.
  • To recover the original relation use natural join

13
Generalizing the definition of 2NF and 3NF
  • To assure we do not have update anomalies we need
    to extend this definition to include all
    candidate key rather than just PK.
  • Disallowed partial dependencies on any key in
    2NF.
  • Disallowed transitive dependencies on any key in
    3NF
  • Simpler definition for 3NF
  • Every nonprime attribute must be
  • Fully functionally dependent on every key
  • Non transitively dependent on every key
  •  

14
Boyce-Codd Normal Form
  • Simpler, yet stricter form of 3NF.
  • A relation is in BCNF if and only if every
    determinate is a candidate key.
  • Decomposed into an equivalent set of BCNF if
    relation is not in BCNF.
  • If relations have a composite candidate key, one
    of whose members are determined by a non prime
    attribute are in 3NF but not in BCNF.

15
Continued
property_ID county_name Lot area price tax_rate





  • Few cases where a database is in 3NF and not in
    BCNF.
  • Example
  • Define another FD on the Lots DB
  • All the lots in the DB come from only two
    counties
  • In one county, all lots are gt one acre
  • In the other county, all lots are lt one acre
  • The FD is area -gt county_name

16
Summary
  • In general, it is best to have a relational
    schema in BCNF.
  • If that is not possible, 3NF will do.
  • 2NF and 1NF are not considered good relation
    schema designs.
  • They allow too much data redundancy which leads
    to update anomalies.
  •  

17
Denormalization
  • Jovial Remark, Normalize until it hurts, and
    denormalize until it works.
  • Easier to query but reintroduce data
    redundancies.
  • Always improves data retrieval performance.
  • Example R EMPLOYEE
  • Employee of a larger corporation have access to a
    handful of mutual fund companies for their
    retirement investment

Emp Name Age Salary M_fund Fund_nm Fund_mgr
18
Continued
  • The FD M_fund -gt Fund_nm, Fund_mgr causing a
    violation of 3NF in R.
  • Decompose R to eliminate the 3ND violation
  • D R1, R2 where
  • R1 FUND (M_fund, Fund_nm, Fund_mgr)
  • R2 EMPLOYEE (Emp, Name, Age, Salary, M_fund)
  • Query to the normalized relation which is
    EMPLOYEE requiring retirement financial data lead
    to joining R1 and R2.
  • The (denormalized) R is a better option in such
    situation

19
Continued
  • However, if there are thousands of employees and
    just a handful of these retirement mutual funds.
  • Query requiring only mutual fund data will
    execute rather inefficiently in denormalized
    design.
  • In contrary, denormalization might not improve
    data retrieval performance in certain situation.

20
References
  • http//www.cs.sjsu.edu/faculty/lee/cs157/cs157a.ht
    ml
  • http//www.cs.montana.edu/defrance/classes/spring
    -2003/cs435/lectures/FD.html
Write a Comment
User Comments (0)
About PowerShow.com