Title: Normalization
1Normalization
- Data inconsistency, incorrectness and
incompleteness is (partially) caused by
redundancy. - Normalization is the process of eliminating
redundancy as much as possible, without losing
data.
2Tools of Normalization
- Normalization is achieved by
- lossless decomposition
- functional dependency
3Functional dependency
- A functional dependency is a 1-n relation of a
set of attributes within a relation to another
set of attributes in the same relation. - Example anr ? last_name
4Functional dependency
Airport is functionally dependent on IATA. For
each airport, exactly one IATA exists. Notation
IATA ? Airport
5Case Description
- Airline Reservation System (assignment 1999)
- Flights are operated by Carriers
- flight number, origin, destination, ETD, ETA
- destination is coded in IATA codes
- Passengers make a Reservation
- last name, initials, passport, nationality
- booked, standby, checked in, boarded
6UNF
7UNF (continued)
8First Normal Form
- A relation is in first normal form if, and only
if all domains contain atomic values only
- Problems with current tables
- Carrier contains both ID and full name
- Departure and Arrival contains both date and time
- Solution
- Split up columns
-
91 NF
- Anomalies
- Flights without reservations do not exist
- Carriers without flights do not exist
10Second Normal Form
- A relation is in second normal form if, and only
if the relation is in first normal form and if
all non key-attributes are fully functionally
dependent on the primary key.
- Full functional dependency A ? B is a functional
dependency on B of A, so that B is not dependant
on a part of A. - Example
- anr last_name ? address,
111NF Dependency Diagram
122NF Violations
- FlightNumber is a determinant for Destination,
Origin, CarrierID and CarrierName - Passport is a determinant for Initials, LastName
and Nationality. - FlightNumber and DepartureDate together determine
DepartureTime, ArrivalDate and ArrivalTime. - FlightNumber, Passport and DepartureDate together
determine Status.
13Splitting up the tables...
Anomalies Carriers without flights cannot be
stored in the database
14Third Normal Form
- A relation is in third normal form if, and only
if the relation is in second normal form and if
all non key-attributes are non-transitively
dependent on the primary key.
A transitive dependency is a situation where two
functional dependencies are related in the sense
that A ? B and B ? C In other words C is
transitively dependent on A
152NF Dependency Diagram
3NF violation FlightNo ? CarrierId CarrierId ?
CarrierName
16Splitting up the tables...
17Boyce-Codd Normal Form
Relations in 3NF can still contain redundancy
when 1. A relation has two or more candidate
keys, and 2. Candidate keys are composite keys,
and 3. Candidate keys overlap. Boyce-Codd Normal
form A relation is in BCNF if all determinants
are candidate keys. Usually, a relation that is
in 3NF is in BCNF too.
18Conclusion Normalization
- 1NF a relation is in 1NF iff all cells contain
atomic values only. - 2NF a relation is in 2NF iff it is in 1NF and
all non-key attributes are fully functionally
dependent on the primary key - 3NF a relation is in 3NF iff it is in 2NF and
all non-key attributes are non-transitively
dependent on the primary key - BCNF a relation is in BCNF iff all determinants
are candidate keys. - Tools of normalization are functional dependency
and - lossless decomposition.
19Normalisation assignment 2000
- Case Study Digital Library
- Database components
- Document Records
- Thesaurus
20(No Transcript)
21Thesaurus
- A thesaurus consists of Terms and Relationships
- Example
- information retrieval - narrower term - text
retrieval - TERM Relationship TERM
- A document is classified into a (set of)
thesaurus entries.
22Linking records with thesauri
- Example
- Conceptualising information spaces in federated
digital libraries (Papazoglou M, Hoppenbrouwers
J) 1999 SIGMOD Record - has keywords meta data semantics information
retrieval query optimisation federated data
base electronic library
23UNF digital library
Bibliographic Records Table
Thesaurus Table
NOT A GOOD IDEA!