Title: Normalization
1Normalization
- Also called loss-less decomposition
- Process of optimizing table structures to
eliminate redundancy and avoid anomalies and
problems with extensibility. - Supports the golden rule Each fact should be
stored in the database only once. - Does not provide the solution to all design
problems but provides a solid foundation.
2Normal Forms
- 1st Normal Form
- 2nd Normal Form
- 3rd Normal Form
- BCNF
- 4th Normal Form
- 5th Normal Form
- Domain-Key Normal Form
31st Normal Form
First Normal Form is violated if
- The relation has no identifiable primary key.
- Any attempt has been made to store a multi-valued
fact in a tuple.
41st NF - Example
Evaluate the design solutions on the next four
slides for
- Query-ability
- Join-ability
- Constrain-ability
- Extensibility (of Language Domain)
- Extensibility (of Schema)
51NF Example Schema 1 (correct)
Programs Table
Employees Table
EMPID
LANGUAGE
EMPID
LNAME
FNAME
DEPT
PHONE
SALARY
SEX
23
COBOL
23
Jones
Mark
ITR
555-1087
45000
M
23
JAVA
25
Smith
Sara
FINC
555-2222
55000
F
23
SQL
26
Billings
David
ACTG
555-4356
42000
M
31
SQL
31
Dance
Ivanna
ACTG
444-4887
60000
F
32
JAVA
32
Jones
Mary
ITR
555-8745
70000
F
32
SQL
35
Barker
Bob
ACTG
555-6565
44000
M
36
Woods
Robin
ITR
555-9812
90000
M
32
VB
37
Jones
Mary
FINC
555-1234
56000
F
32
COBOL
36
VB
36
SQL
36
JAVA
Languages Table
37
COBOL
NAME
FULLNAME
37
SQL
COBOL
COmmon Business Oriented Language
JAVA
JAVA
SQL
Structured Query Language
VB
Visual Basic
61NF Example Schema 2 (incorrect)
Employees Table
LANGUAGES
EMPID
LNAME
FNAME
DEPT
PHONE
SALARY
SEX
COBOL, JAVA, SQL
23
Jones
Mark
ITR
555-1087
45000
M
25
Smith
Sara
FINC
555-2222
55000
F
26
Billings
David
ACTG
555-4356
42000
M
SQL
31
Dance
Ivanna
ACTG
444-4887
60000
F
JAVA, SQL, VB, COBOL
32
Jones
Mary
ITR
555-8745
70000
F
35
Barker
Bob
ACTG
555-6565
44000
M
36
Woods
Robin
ITR
555-9812
90000
M
VB, SQL, JAVA
37
Jones
Mary
FINC
555-1234
56000
F
COBOL, SQL
Languages Table
NAME
FULLNAME
COBOL
COmmon Business Oriented Language
JAVA
JAVA
SQL
Structured Query Language
VB
Visual Basic
71NF Example Schema 3 (incorrect)
Employees Table
LANG1
EMPID
LNAME
FNAME
DEPT
PHONE
SALARY
SEX
LANG2
LANG3
LANG4
23
Jones
Mark
ITR
555-1087
45000
M
COBOL
SQL
JAVA
25
Smith
Sara
FINC
555-2222
55000
F
26
Billings
David
ACTG
555-4356
42000
M
31
Dance
Ivanna
ACTG
444-4887
60000
F
SQL
32
Jones
Mary
ITR
555-8745
70000
F
SQL
JAVA
VB
COBOL
35
Barker
Bob
ACTG
555-6565
44000
M
36
Woods
Robin
ITR
555-9812
90000
M
VB
SQL
JAVA
37
Jones
Mary
FINC
555-1234
56000
F
COBOL
SQL
Languages Table
NAME
FULLNAME
COBOL
COmmon Business Oriented Language
JAVA
JAVA
SQL
Structured Query Language
VB
Visual Basic
81NF Example Schema 4 (incorrect)
Employees Table
COBOL
EMPID
LNAME
FNAME
DEPT
PHONE
SALARY
SEX
JAVA
SQL
VB
23
Jones
Mark
ITR
555-1087
45000
M
T
T
T
F
25
Smith
Sara
FINC
555-2222
55000
F
F
F
F
F
26
Billings
David
ACTG
555-4356
42000
M
F
F
F
F
31
Dance
Ivanna
ACTG
444-4887
60000
F
F
F
T
F
32
Jones
Mary
ITR
555-8745
70000
F
T
T
T
T
35
Barker
Bob
ACTG
555-6565
44000
M
F
F
F
F
36
Woods
Robin
ITR
555-9812
90000
M
F
T
T
T
37
Jones
Mary
FINC
555-1234
56000
F
T
F
T
F
Languages Table
NAME
FULLNAME
COBOL
COmmon Business Oriented Language
JAVA
JAVA
SQL
Structured Query Language
VB
Visual Basic
92nd Normal Form
Second Normal Form is violated if
- First Normal Form is violated
- If there exists a non-key field(s) which is
functionally dependent on a partial key. - partial key non-key
102NF Example Raw Data
JE 1 02-JAN-2003 100 Cash 310
Smith-Capital (owner investment)
20,000
20,000
JE 2 03-JAN-2003 100 Cash 220 Notes
Payable (borrowed money)
30,000
30,000
JE 3 03-JAN-2003 120 Supplies 100 Cash
220 Notes Payable (purchased supplies)
5,000
1,000
4,000
112NF Example Violation
Transactions Table
JENO
LINENO
DESCRIPTION
ACCTNO
ACCTNAME
AMOUNT
DATE
1
1
Owner investment
100
Cash
20,000
02-JAN-2003
1
2
Owner investment
310
Smith-Capital
(20,000)
02-JAN-2003
2
1
Borrowed money
100
Cash
30,000
03-JAN-2003
2
2
Borrowed money
220
Notes Payable
(30,000)
03-JAN-2003
3
1
Purchased Supplies
120
Supplies
5,000
03-JAN-2003
3
2
Purchased Supplies
100
Cash
(1,000)
03-JAN-2003
3
3
Purchased Supplies
220
Notes Payable
(4,000)
03-JAN-2003
Is there a non-key field which is functional
dependent on a partial key?
122NF Example ViolationFDs that indicate
violation of 2NF
JENO
LINENO
DESCRIPTION
ACCTNO
ACCTNAME
AMOUNT
DATE
1
1
Owner investment
100
Cash
20,000
02-JAN-2003
1
2
Owner investment
310
Smith-Capital
(20,000)
02-JAN-2003
2
1
Borrowed money
100
Cash
30,000
03-JAN-2003
2
2
Borrowed money
220
Notes Payable
(30,000)
03-JAN-2003
3
1
Purchased Supplies
120
Supplies
5,000
03-JAN-2003
3
2
Purchased Supplies
100
Cash
(1,000)
03-JAN-2003
3
3
Purchased Supplies
220
Notes Payable
(4,000)
03-JAN-2003
132NF Example Corrected
Journal_Entry Table
JENO
DESCRIPTION
DATE
1
Owner investment
02-JAN-2003
2
Borrowed money
03-JAN-2003
3
Purchased Supplies
03-JAN-2003
Transactions Table
JENO
LINENO
ACCTNO
ACCTNAME
AMOUNT
1
1
100
Cash
20,000
1
2
310
Smith-Capital
(20,000)
2
1
100
Cash
30,000
2
2
220
Notes Payable
(30,000)
3
1
120
Supplies
5,000
3
2
100
Cash
(1,000)
3
3
220
Notes Payable
(4,000)
143rd Normal Form
Third Normal Form is violated if
- Second Normal Form is violated
- If there exists a non-key field(s) which is
functionally dependent on another non-key
field(s). - non-key non-key
Note A candidate key is not a non-key field.
153NF Example Violation
Journal_Entry Table
Are there any non-key fields which functional
determine another non-key field?
JENO
DESCRIPTION
DATE
1
Owner investment
02-JAN-2003
2
Borrowed money
03-JAN-2003
3
Purchased Supplies
03-JAN-2003
Transactions Table
JENO
LINENO
ACCTNO
ACCTNAME
AMOUNT
1
1
100
Cash
20,000
Are there any redundant facts?
1
2
310
Smith-Capital
(20,000)
2
1
100
Cash
30,000
2
2
220
Notes Payable
(30,000)
3
1
120
Supplies
5,000
3
2
100
Cash
(1,000)
3
3
220
Notes Payable
(4,000)
163NF Example ViolationFD that indicates
violation of 3NF
Journal_Entry Table
- Anomalies if not corrected
- update (if name of account 100 changes it must
be changed in multiple places risking
inconsistancy) - deletion (can't delete JE3 and its
transactions without losing information about
account 120) - insertion (can't set up a new account,
Jones-capital, for a new partner unless we first
have a transaction involving that account.
JENO
DESCRIPTION
DATE
1
Owner investment
02-JAN-2003
2
Borrowed money
03-JAN-2003
3
Purchased Supplies
03-JAN-2003
JENO
LINENO
ACCTNO
ACCTNAME
AMOUNT
1
1
100
Cash
20,000
1
2
310
Smith-Capital
(20,000)
2
1
100
Cash
30,000
2
2
220
Notes Payable
(30,000)
3
1
120
Supplies
5,000
3
2
100
Cash
(1,000)
3
3
220
Notes Payable
(4,000)
173NF Example Corrected
Journal_Entry Table
Accounts Table
JENO
DESCRIPTION
DATE
ACCTNO
ACCTNAME
1
Owner investment
02-JAN-2003
100
Cash
2
Borrowed money
03-JAN-2003
120
Supplies
3
Purchased Supplies
03-JAN-2003
220
Notes Payable
310
Smith-Capital
Transactions Table
JENO
LINENO
ACCTNO
AMOUNT
1
1
100
20,000
1
2
310
(20,000)
2
1
100
30,000
2
2
220
(30,000)
3
1
120
5,000
3
2
100
(1,000)
3
3
220
(4,000)
183NF Example CorrectedFinal Dependencies
JENO
DESCRIPTION
DATE
ACCTNO
ACCTNAME
1
Owner investment
02-JAN-2003
100
Cash
2
Borrowed money
03-JAN-2003
120
Supplies
3
Purchased Supplies
03-JAN-2003
220
Notes Payable
310
Smith-Capital
JENO
LINENO
ACCTNO
AMOUNT
1
1
100
20,000
All non-key fields are FD on the PK and only the
PK.
1
2
310
(20,000)
2
1
100
30,000
2
2
220
(30,000)
3
1
120
5,000
3
2
100
(1,000)
3
3
220
(4,000)
19BCNF Normal Form
Boyce-Codd Normal Form is violated if
- Third Normal Form is violated
- If there exists a partial key which is
functionally dependent on a non-key field(s). - non-key partial-key
20BCNF ExampleSemantics
- A student can have more than one major
- A student has a different advisor for each major.
- Each advisor advises for only one major.
21BCNF Example Violation
Student_Majors Table
SID
MAJOR
ADVISOR
1
PHYSICS
EINSTEIN
1
BIOLOGY
LIVINGSTON
2
PHYSICS
BOHR
2
COMPUTER SCIENCE
CODD
3
PHYSICS
EINSTEIN
4
BIOLOGY
LIVINGSTON
4
ACCOUNTING
PACIOLI
5
PHYSICS
EINSTEIN
6
PHYSICS
BOHR
6
BIOLOGY
DARWIN
7
COMPUTER SCIENCE
CODD
7
BIOLOGY
DARWIN
Does this relation violate third normal form? Are
there any redundant facts?
22BCNF Example ViolationFD that violates BCNF
SID
MAJOR
ADVISOR
It is important that you convince yourself that
major does not FD advisor.
1
PHYSICS
EINSTEIN
1
BIOLOGY
LIVINGSTON
2
PHYSICS
BOHR
2
COMPUTER SCIENCE
CODD
3
PHYSICS
EINSTEIN
4
BIOLOGY
LIVINGSTON
4
ACCOUNTING
PACIOLI
5
PHYSICS
EINSTEIN
6
PHYSICS
BOHR
6
BIOLOGY
DARWIN
7
COMPUTER SCIENCE
CODD
7
BIOLOGY
DARWIN
23BCNF Example Corrected
Advisors Table
MAJOR
ADVISOR
PHYSICS
BOHR
Student_Advisors Table
COMPUTER SCIENCE
CODD
SID
ADVISOR
BIOLOGY
DARWIN
1
EINSTEIN
PHYSICS
EINSTEIN
1
LIVINGSTON
BIOLOGY
LIVINGSTON
2
BOHR
ACCOUNTING
PACIOLI
2
CODD
3
EINSTEIN
Note that the if the original key,
counter-intuitively, in schema 1 had been
defined as SID ADVISOR this would have been a
2NF violation.
4
LIVINGSTON
4
PACIOLI
5
EINSTEIN
6
BOHR
6
DARWIN
7
CODD
7
DARWIN
244th Normal Form
4th Normal Form is violated if
- Boyce Codd Normal Form is violated
- If there exists a partial key which has multiple
independent multi-valued functional dependencies
to other partial keys. - partial-key1 partial-key2
- partial-key3
254NF Example Violation
Instruments_Languages
Name
Language
Instrument
Fred
French
Piano
Fred
Italian
Flute
Fred
Spanish
Flute
Jane
French
Piano
Jane
French
Oboe
Sam
French
Piano
Sam
Spanish
Oboe
Sam
Spanish
Flute
264NF Example Violation
Name
Language
Instrument
Fred
French
Piano
Fred
Italian
Flute
Fred
Spanish
Flute
Jane
French
Piano
Jane
French
Oboe
Sam
French
Piano
Sam
Spanish
Oboe
Sam
Spanish
Flute
Does this relation violate 1st, 2nd, 3rd, or
BCNF? Are there any redundant facts?
274NF Example Correction
LanguagesSpoken
InstrumentsPlayed
Name
Language
Name
Instrument
Fred
French
Fred
Piano
Fred
Italian
Fred
Flute
Fred
Spanish
Jane
Piano
Jane
French
Jane
Oboe
Sam
French
Sam
Piano
Sam
Spanish
Sam
Oboe
Sam
Flute