Normalisation - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Normalisation

Description:

now different rates for squash courts - suspect business rule is same rate for ... orginal table: select price from facility where facilityno=6. normalised tables: ... – PowerPoint PPT presentation

Number of Views:43
Avg rating:3.0/5.0
Slides: 25
Provided by: wall163
Category:

less

Transcript and Presenter's Notes

Title: Normalisation


1
Normalisation
  • Normalisation is a process of simplifying
    relations to remove update anomalies
  • Informally, this reduces data redundancy in the
    database

2
Facilities Table
3
Potential anomalies
  • UPDATE the price per session of facility 1
  • now different rates for squash courts - suspect
    business rule is same rate for all facilities for
    a sport
  • ADD a new tennis court
  • must also ensure the same rate is used as for
    other tennis courts
  • DELETE facility 11
  • lose the rate per session for skittles

4
Normalisation
  • Two tables
  • FacilityType (ftypeno, name, price)
  • Facility (facilityno, ftypeno, facilityunit)

5
Two better tables
Facility
FacilityType
6
Robustness v Speed
  • No update anomalies - check
  • but may be SLOWER
  • orginal table
  • select price from facility where facilityno6
  • normalised tables
  • select price from facility inner join
    facilitytype
  • on facility.ftypenofacilityType.ftypeno
  • where faciltyno 6

7
Normalisation
  • Informally we feel that the second solution is
    better than the first.
  • Normalisation formalises this design process.

8
Normalisation Process
  • start for existing factbase. Repeated data may
    suggest redundancy
  • analyse the network of functional dependencies
    - depends on the semantics of the UoD
  • decompose tables to form simpler tables using the
    guidelines of 2nd and 3rd Normal Forms

9
1st Normal form - 1NF
  • the data must be in the form of valid relations -
    i.e. with uniquely named columns, each column
    containing only atomic (indivisible) values
  • each row unique - identify the primary key

10
Functional Dependency ??
  • A Function in mathematics is a relation between
    two domains A,B such that
  • for every a in A there is exactly one
    corresponding b in B
  • e.g. A and B both real numbers
  • square(x) is a function
  • but
  • sqrt(x) is not a function
  • sqrt(4) (-2,2) since -2-2 4 as well as 224
  • this is an example of a non-functional dependency

11
Project - Employee data
  • RAW Date in 1NF

12
In the base data table
  • Project number to project name is a function
  • P -gt PNAME
  • P DETERMINES PNAME
  • PNAME is functionally dependent on P
  • project number to employee number is not
  • P ? E
  • I.e. P and E are independent
  • what other dependencies are there?

13
All functional dependencies
  • P -gt PNAME
  • E -gt ENAME
  • E -gt RCAT
  • RCAT -gt RATE
  • (P,E) -gt START

14
Fully dependent ??
  • Consider mathematical functions from pairs of
    numbers to a number
  • (x,y) -gt xy
  • changing either x or y will change the result
  • but
  • (x,y) -gt 2x
  • result is independent of y
  • so result not FULLY dependent on x and y

15
2nd Normal Form
  • Non-key fields are fully dependent on the primary
    key
  • primary key is (P,E)
  • START depends on (P,E)
  • but
  • PNAME depends only on P
  • ENAME, RCAT and RATE depends only on E

16
So decompose table to three tables
  • Project(P,PNAME)
  • Employee(E,RCAT,RATE)
  • ProjectEmployee
  • (P,E,START)

17
3rd Normal Form 3NF
  • A relation R is in 3NF if and only if
  • the non-key fields are fully dependent on the
    primary key i.e is 2NF and
  • the non-key fields are mutually independent
  • e.g R(A,B,C,D)
  • then
  • C fully dependent on A,B
  • D fully dependent on A,B
  • C and D independent

18
Checking for 3NF
  • Project
  • PNAME fully dependent on P
  • ProjectEmployee
  • START fully dependent on (P,E)
  • Employee
  • Employee(E,ENAME,RCAT,RATE)
  • ENAME,RCAT,RATE fully dependent on E
  • RCAT -gt RATE so RATE and RCAT not independent

19
Decompose Employee
  • Employee(E,ENAME,RCAT)
  • RateCategory(RCAT,RATE)
  • now in 3NF

20
Mnemonic
  • the key, the whole key and nothing but the key
  • 2NF
  • the key non-key fields depend on the key
  • the whole key non-key fields depend fully on
    the key
  • 3NF
  • nothing but the key non-key fields are not
    dependent on each other

21
Further improvements
  • Boyce/Codd 3NF - stronger than 3NF
  • 4NF and 5NF - minor improvements only
  • but sometimes we have to reverse this process -
    aggregate tables for efficiency
  • DE-NORMALISATION

22
Dependency network
  • Could we design the tables directly from the
    network of dependencies?
  • P -gt PNAME
  • E -gt ENAME
  • E -gt RCAT
  • RCAT -gt RATE
  • (P,E) -gt START

START
PNAME
P
ENAME
E
RCAT
RATE
23
Trade-off
  • This is an example of a core problem in database
    (indeed every) design problem
  • You cant be best at everything
  • You have to decide what is more important
  • avoiding update anomalies and data duplication,
  • or speed
  • depends on the usage
  • volume of updates versus volume of queries

24
Next week
  • SQL, ER, Normalisation Revision Lecture
  • Tutorial -ER and Normalisation
  • How can we explain the changes we made to the
    Facilities table as Normalisation?
Write a Comment
User Comments (0)
About PowerShow.com