Title: Normalization
1Normalization
- Anomalies
- Boyce-Codd Normal Form
- 3rd Normal Form
2Anomalies
- Goal of relational schema design is to avoid
anomalies and redundancy. - Update anomaly one occurrence of a fact is
changed, but not all occurrences. - Deletion anomaly valid fact is lost when a
tuple is deleted.
3Example of Bad Design
Drivers(name, addr, carsLiked, manf,
favcar) name addr carsLiked manf favcar Janeway
Voyager Mustang Ford Carolla Janeway ??? Carolla
Toyota ??? Spock Enterprise Mustang ??? Mustang
Data is redundant, because each of the ???s can
be figured out by using the FDs name -gt addr
favcar and carsLiked -gt manf.
4This Bad Design AlsoExhibits Anomalies
name addr carsLiked manf favcar Janeway Voyage
r Mustang Ford Carolla Janeway Voyager Carolla Toy
ota Carolla Spock Enterprise Mustang Ford Mustang
- Update anomaly if Janeway is transferred to
Intrepid, - will we remember to change each of her tuples?
- Deletion anomaly If nobody likes Mustang, we
lose track - of the fact that Ford manufactures Mustang.
5Boyce-Codd Normal Form
- We say a relation R is in BCNF if whenever X
-gtA is a nontrivial FD that holds in R, X is a
superkey. - Remember nontrivial means A is not a member of
set X. - Remember, a superkey is any superset of a key
(not necessarily a proper superset).
6Example
- Drivers(name, addr, carsLiked, manf, favcar)
- FDs name-gtaddr favcar, carsLiked-gtmanf
- Only key is name, carsLiked.
- In each FD, the left side is not a superkey.
- Any one of these FDs shows Drivers is not in
BCNF
7Another Example
- cars(name, manf, manfAddr)
- FDs name-gtmanf, manf-gtmanfAddr
- Only key is name .
- name-gtmanf does not violate BCNF, but
manf-gtmanfAddr does.
8Decomposition into BCNF
- Given relation R with FDs F.
- Look among the given FDs for a BCNF violation X
-gtB. - If any FD following from F violates BCNF, then
there will surely be an FD in F itself that
violates BCNF. - Compute X .
- Not all attributes, or else X is a superkey.
9Decompose R Using X -gt B
- Replace R by relations with schemas
- R1 X .
- R2 R (X X ).
- Project given FDs F onto the two new relations.
10Decomposition Picture
R1
R-X
X
X -X
R2
R
11Example
- Drivers(name, addr, carsLiked, manf, favcar)
- F name-gtaddr, name -gt favcar, carsLiked-gtmanf
- Pick BCNF violation name-gtaddr.
- Close the left side name name, addr,
favcar. - Decomposed relations
- Drivers1(name, addr, favcar)
- Drivers2(name, carsLiked, manf)
12Example, Continued
- We are not done we need to check Drivers1 and
Drivers2 for BCNF. - Projecting FDs is easy here.
- For Drivers1(name, addr, favcar), relevant FDs
are name-gtaddr and name-gtfavcar. - Thus, name is the only key and Drivers1 is in
BCNF.
13Example, Continued
- For Drivers2(name, carsLiked, manf), the only FD
is carsLiked-gtmanf, and the only key is name,
carsLiked. - Violation of BCNF.
- carsLiked carsLiked, manf, so we decompose
Drivers2 into - Drivers3(carsLiked, manf)
- Drivers4(name, carsLiked)
14Example, Concluded
- The resulting decomposition of Drivers
- Drivers1(name, addr, favcar)
- Drivers3(carsLiked, manf)
- Drivers4(name, carsLiked)
- Notice Drivers1 tells us about Drivers,
Drivers3 tells us about cars, and Drivers4
tells us the relationship between Drivers and the
cars they like.
15Third Normal Form - Motivation
- There is one structure of FDs that causes
trouble when we decompose. - AB -gtC and C -gtB.
- Example A street address, B city, C
zip code. - There are two keys, A,B and A,C .
- C -gtB is a BCNF violation, so we must decompose
into AC, BC.
16We Cannot Enforce FDs
- The problem is that if we use AC and BC as our
database schema, we cannot enforce the FD AB -gtC
by checking FDs in these decomposed relations. - Example with A street, B city, and C zip on
the next slide.
17An Unenforceable FD
street zip 545 Tech Sq. 02138 545 Tech
Sq. 02139
city zip Cambridge 02138 Cambridge 02139
Although no FDs were violated in the decomposed
relations, FD street city -gt zip is violated by
the database as a whole.
183NF Lets Us Avoid This Problem
- 3rd Normal Form (3NF) modifies the BCNF condition
so we do not have to decompose in this problem
situation. - An attribute is prime if it is a member of any
key. - X -gtA violates 3NF if and only if X is not a
superkey, and also A is not prime.
19Example
- In our problem situation with FDs AB -gtC
and C -gtB, we have keys AB and AC. - Thus A, B, and C are each prime.
- Although C -gtB violates BCNF, it does not
violate 3NF.
20What 3NF and BCNF Give You
- There are two important properties of a
decomposition - Recovery it should be possible to project the
original relations onto the decomposed schema,
and then reconstruct the original. - Dependency Preservation it should be possible
to check in the projected relations whether all
the given FDs are satisfied.
213NF and BCNF, Continued
- We can get (1) with a BCNF decomposition.
- Explanation needs to wait for relational algebra.
- We can get both (1) and (2) with a 3NF
decomposition. - But we cant always get (1) and (2) with a BCNF
decomposition. - street-city-zip is an example.
22Multivalued Dependencies
23Definition of MVD
- A multivalued dependency (MVD) on R, X -gt-gtY ,
says that if two tuples of R agree on all the
attributes of X, then their components in Y may
be swapped, and the result will be two tuples
that are also in the relation. - i.e., for each value of X, the values of Y are
independent of the values of R-X-Y.
24Example
- Drivers(name, addr, phones, carsLiked)
- A Drivers phones are independent of the cars
they like. - name-gt-gtphones and name -gt-gtcarsLiked.
- Thus, each of a Drivers phones appears with each
of the cars they like in all combinations. - This repetition is unlike FD redundancy.
- name-gtaddr is the only FD.
25Tuples Implied by name-gt-gtphones
If we have tuples
name addr phones carsLiked sue a p1
b1 sue a p2 b2
26Picture of MVD X -gt-gtY
X Y others equal exchange
27MVD Rules
- Every FD is an MVD (promotion ).
- If X -gtY, then swapping Y s between two tuples
that agree on X doesnt change the tuples. - Therefore, the new tuples are surely in the
relation, and we know X -gt-gtY. - Complementation If X -gt-gtY, and Z is all the
other attributes, then X -gt-gtZ.
28Splitting Doesnt Hold
- Like FDs, we cannot generally split the left
side of an MVD. - But unlike FDs, we cannot split the right side
either --- sometimes you have to leave several
attributes on the right side.
29Example
- Drivers(name, areaCode, phone, carsLiked, manf)
- A Driver can have several phones, with the number
divided between areaCode and phone (last 7
digits). - A Driver can like several cars, each with its own
manufacturer.
30Example, Continued
- Since the areaCode-phone combinations for a
Driver are independent of the carsLiked-manf
combinations, we expect that the following MVDs
hold - name -gt-gt areaCode phone
- name -gt-gt carsLiked manf
31Example Data
Here is possible data satisfying these
MVDs name areaCode phone carsLiked manf Sue 65
0 555-1111 Mustang Ford Sue 650 555-1111 Corvett
e G.M. Sue 415 555-9999 Mustang Ford Sue 415 555
-9999 Corvette G.M.
But we cannot swap area codes or phones by
themselves. That is, neither name-gt-gtareaCode nor
name-gt-gtphone holds for this relation.
32Fourth Normal Form
- The redundancy that comes from MVDs is not
removable by putting the database schema in BCNF. - There is a stronger normal form, called 4NF, that
(intuitively) treats MVDs as FDs when it comes
to decomposition, but not when determining keys
of the relation.
334NF Definition
- A relation R is in 4NF if whenever X
-gt-gtY is a nontrivial MVD, then X is a
superkey. - Nontrivial MVD means that
- Y is not a subset of X, and
- X and Y are not, together, all the attributes.
- Note that the definition of superkey still
depends on FDs only.
34BCNF Versus 4NF
- Remember that every FD X -gtY is also an MVD, X
-gt-gtY. - Thus, if R is in 4NF, it is certainly in BCNF.
- Because any BCNF violation is a 4NF violation
(after conversion to an MVD). - But R could be in BCNF and not 4NF, because
MVDs are invisible to BCNF.
35Decomposition and 4NF
- If X -gt-gtY is a 4NF violation for relation R, we
can decompose R using the same technique as for
BCNF. - XY is one of the decomposed relations.
- All but Y X is the other.
36Example
- Drivers(name, addr, phones, carsLiked)
- FD name -gt addr
- MVDs name -gt-gt phones
- name -gt-gt carsLiked
- Key is name, phones, carsLiked.
- All dependencies violate 4NF.
37Example, Continued
- Decompose using name -gt addr
- Drivers1(name, addr)
- In 4NF only dependency is name -gt addr.
- Drivers2(name, phones, carsLiked)
- Not in 4NF. MVDs name -gt-gt phones and name -gt-gt
carsLiked apply. No FDs, so all three
attributes form the key.
38Example Decompose Drivers2
- Either MVD name -gt-gt phones or name -gt-gt
carsLiked tells us to decompose to - Drivers3(name, phones)
- Drivers4(name, carsLiked)