Normalization - PowerPoint PPT Presentation

About This Presentation
Title:

Normalization

Description:

Example: A = street address, B = city, C = zip code. There are two keys, {A,B } and {A,C } ... But we cannot swap area codes or phones by themselves. ... – PowerPoint PPT presentation

Number of Views:38
Avg rating:3.0/5.0
Slides: 39
Provided by: jeff467
Learn more at: http://sandbox.mc.edu
Category:

less

Transcript and Presenter's Notes

Title: Normalization


1
Normalization
  • Anomalies
  • Boyce-Codd Normal Form
  • 3rd Normal Form

2
Anomalies
  • Goal of relational schema design is to avoid
    anomalies and redundancy.
  • Update anomaly one occurrence of a fact is
    changed, but not all occurrences.
  • Deletion anomaly valid fact is lost when a
    tuple is deleted.

3
Example of Bad Design
Drivers(name, addr, carsLiked, manf,
favcar) name addr carsLiked manf favcar Janeway
Voyager Mustang Ford Carolla Janeway ??? Carolla
Toyota ??? Spock Enterprise Mustang ??? Mustang
Data is redundant, because each of the ???s can
be figured out by using the FDs name -gt addr
favcar and carsLiked -gt manf.
4
This Bad Design AlsoExhibits Anomalies
name addr carsLiked manf favcar Janeway Voyage
r Mustang Ford Carolla Janeway Voyager Carolla Toy
ota Carolla Spock Enterprise Mustang Ford Mustang
  • Update anomaly if Janeway is transferred to
    Intrepid,
  • will we remember to change each of her tuples?
  • Deletion anomaly If nobody likes Mustang, we
    lose track
  • of the fact that Ford manufactures Mustang.

5
Boyce-Codd Normal Form
  • We say a relation R is in BCNF if whenever X
    -gtA is a nontrivial FD that holds in R, X is a
    superkey.
  • Remember nontrivial means A is not a member of
    set X.
  • Remember, a superkey is any superset of a key
    (not necessarily a proper superset).

6
Example
  • Drivers(name, addr, carsLiked, manf, favcar)
  • FDs name-gtaddr favcar, carsLiked-gtmanf
  • Only key is name, carsLiked.
  • In each FD, the left side is not a superkey.
  • Any one of these FDs shows Drivers is not in
    BCNF

7
Another Example
  • cars(name, manf, manfAddr)
  • FDs name-gtmanf, manf-gtmanfAddr
  • Only key is name .
  • name-gtmanf does not violate BCNF, but
    manf-gtmanfAddr does.

8
Decomposition into BCNF
  • Given relation R with FDs F.
  • Look among the given FDs for a BCNF violation X
    -gtB.
  • If any FD following from F violates BCNF, then
    there will surely be an FD in F itself that
    violates BCNF.
  • Compute X .
  • Not all attributes, or else X is a superkey.

9
Decompose R Using X -gt B
  • Replace R by relations with schemas
  • R1 X .
  • R2 R (X X ).
  • Project given FDs F onto the two new relations.

10
Decomposition Picture
R1
R-X
X
X -X
R2
R
11
Example
  • Drivers(name, addr, carsLiked, manf, favcar)
  • F name-gtaddr, name -gt favcar, carsLiked-gtmanf
  • Pick BCNF violation name-gtaddr.
  • Close the left side name name, addr,
    favcar.
  • Decomposed relations
  • Drivers1(name, addr, favcar)
  • Drivers2(name, carsLiked, manf)

12
Example, Continued
  • We are not done we need to check Drivers1 and
    Drivers2 for BCNF.
  • Projecting FDs is easy here.
  • For Drivers1(name, addr, favcar), relevant FDs
    are name-gtaddr and name-gtfavcar.
  • Thus, name is the only key and Drivers1 is in
    BCNF.

13
Example, Continued
  • For Drivers2(name, carsLiked, manf), the only FD
    is carsLiked-gtmanf, and the only key is name,
    carsLiked.
  • Violation of BCNF.
  • carsLiked carsLiked, manf, so we decompose
    Drivers2 into
  • Drivers3(carsLiked, manf)
  • Drivers4(name, carsLiked)

14
Example, Concluded
  • The resulting decomposition of Drivers
  • Drivers1(name, addr, favcar)
  • Drivers3(carsLiked, manf)
  • Drivers4(name, carsLiked)
  • Notice Drivers1 tells us about Drivers,
    Drivers3 tells us about cars, and Drivers4
    tells us the relationship between Drivers and the
    cars they like.

15
Third Normal Form - Motivation
  • There is one structure of FDs that causes
    trouble when we decompose.
  • AB -gtC and C -gtB.
  • Example A street address, B city, C
    zip code.
  • There are two keys, A,B and A,C .
  • C -gtB is a BCNF violation, so we must decompose
    into AC, BC.

16
We Cannot Enforce FDs
  • The problem is that if we use AC and BC as our
    database schema, we cannot enforce the FD AB -gtC
    by checking FDs in these decomposed relations.
  • Example with A street, B city, and C zip on
    the next slide.

17
An Unenforceable FD
street zip 545 Tech Sq. 02138 545 Tech
Sq. 02139
city zip Cambridge 02138 Cambridge 02139
Although no FDs were violated in the decomposed
relations, FD street city -gt zip is violated by
the database as a whole.
18
3NF Lets Us Avoid This Problem
  • 3rd Normal Form (3NF) modifies the BCNF condition
    so we do not have to decompose in this problem
    situation.
  • An attribute is prime if it is a member of any
    key.
  • X -gtA violates 3NF if and only if X is not a
    superkey, and also A is not prime.

19
Example
  • In our problem situation with FDs AB -gtC
    and C -gtB, we have keys AB and AC.
  • Thus A, B, and C are each prime.
  • Although C -gtB violates BCNF, it does not
    violate 3NF.

20
What 3NF and BCNF Give You
  • There are two important properties of a
    decomposition
  • Recovery it should be possible to project the
    original relations onto the decomposed schema,
    and then reconstruct the original.
  • Dependency Preservation it should be possible
    to check in the projected relations whether all
    the given FDs are satisfied.

21
3NF and BCNF, Continued
  • We can get (1) with a BCNF decomposition.
  • Explanation needs to wait for relational algebra.
  • We can get both (1) and (2) with a 3NF
    decomposition.
  • But we cant always get (1) and (2) with a BCNF
    decomposition.
  • street-city-zip is an example.

22
Multivalued Dependencies
  • Fourth Normal Form

23
Definition of MVD
  • A multivalued dependency (MVD) on R, X -gt-gtY ,
    says that if two tuples of R agree on all the
    attributes of X, then their components in Y may
    be swapped, and the result will be two tuples
    that are also in the relation.
  • i.e., for each value of X, the values of Y are
    independent of the values of R-X-Y.

24
Example
  • Drivers(name, addr, phones, carsLiked)
  • A Drivers phones are independent of the cars
    they like.
  • name-gt-gtphones and name -gt-gtcarsLiked.
  • Thus, each of a Drivers phones appears with each
    of the cars they like in all combinations.
  • This repetition is unlike FD redundancy.
  • name-gtaddr is the only FD.

25
Tuples Implied by name-gt-gtphones
If we have tuples
name addr phones carsLiked sue a p1
b1 sue a p2 b2
26
Picture of MVD X -gt-gtY
X Y others equal exchange
27
MVD Rules
  • Every FD is an MVD (promotion ).
  • If X -gtY, then swapping Y s between two tuples
    that agree on X doesnt change the tuples.
  • Therefore, the new tuples are surely in the
    relation, and we know X -gt-gtY.
  • Complementation If X -gt-gtY, and Z is all the
    other attributes, then X -gt-gtZ.

28
Splitting Doesnt Hold
  • Like FDs, we cannot generally split the left
    side of an MVD.
  • But unlike FDs, we cannot split the right side
    either --- sometimes you have to leave several
    attributes on the right side.

29
Example
  • Drivers(name, areaCode, phone, carsLiked, manf)
  • A Driver can have several phones, with the number
    divided between areaCode and phone (last 7
    digits).
  • A Driver can like several cars, each with its own
    manufacturer.

30
Example, Continued
  • Since the areaCode-phone combinations for a
    Driver are independent of the carsLiked-manf
    combinations, we expect that the following MVDs
    hold
  • name -gt-gt areaCode phone
  • name -gt-gt carsLiked manf

31
Example Data
Here is possible data satisfying these
MVDs name areaCode phone carsLiked manf Sue 65
0 555-1111 Mustang Ford Sue 650 555-1111 Corvett
e G.M. Sue 415 555-9999 Mustang Ford Sue 415 555
-9999 Corvette G.M.
But we cannot swap area codes or phones by
themselves. That is, neither name-gt-gtareaCode nor
name-gt-gtphone holds for this relation.
32
Fourth Normal Form
  • The redundancy that comes from MVDs is not
    removable by putting the database schema in BCNF.
  • There is a stronger normal form, called 4NF, that
    (intuitively) treats MVDs as FDs when it comes
    to decomposition, but not when determining keys
    of the relation.

33
4NF Definition
  • A relation R is in 4NF if whenever X
    -gt-gtY is a nontrivial MVD, then X is a
    superkey.
  • Nontrivial MVD means that
  • Y is not a subset of X, and
  • X and Y are not, together, all the attributes.
  • Note that the definition of superkey still
    depends on FDs only.

34
BCNF Versus 4NF
  • Remember that every FD X -gtY is also an MVD, X
    -gt-gtY.
  • Thus, if R is in 4NF, it is certainly in BCNF.
  • Because any BCNF violation is a 4NF violation
    (after conversion to an MVD).
  • But R could be in BCNF and not 4NF, because
    MVDs are invisible to BCNF.

35
Decomposition and 4NF
  • If X -gt-gtY is a 4NF violation for relation R, we
    can decompose R using the same technique as for
    BCNF.
  • XY is one of the decomposed relations.
  • All but Y X is the other.

36
Example
  • Drivers(name, addr, phones, carsLiked)
  • FD name -gt addr
  • MVDs name -gt-gt phones
  • name -gt-gt carsLiked
  • Key is name, phones, carsLiked.
  • All dependencies violate 4NF.

37
Example, Continued
  • Decompose using name -gt addr
  • Drivers1(name, addr)
  • In 4NF only dependency is name -gt addr.
  • Drivers2(name, phones, carsLiked)
  • Not in 4NF. MVDs name -gt-gt phones and name -gt-gt
    carsLiked apply. No FDs, so all three
    attributes form the key.

38
Example Decompose Drivers2
  • Either MVD name -gt-gt phones or name -gt-gt
    carsLiked tells us to decompose to
  • Drivers3(name, phones)
  • Drivers4(name, carsLiked)
Write a Comment
User Comments (0)
About PowerShow.com