Title: Normalization, Robertss Rules and Introduction to Data Modeling
1Normalization, Robertss Rules and Introduction
to Data Modeling
- CSCI 242
- January 30, 2007
2Agenda
- Robertss Rules
- Normalization
- Robertss Rules and Normalization
3Why Are We Talking About This?
- To design a database, we choose a set of entities
that models a problem - We will store data in tables corresponding to our
entity choices - The names of the entity types become embedded in
our programs - Changing later on is expensive, so we want a
stable model of the problem
4Midterm Question
- The first question on the midterm will deal with
normal forms. It will deal with the relationship
between normal forms and Robertss Rules. - This one question will count more than any other
question on the exam. - The homework assignment on normalization and
Robertss Rules looks a lot like Question 1 on
the midterm.
5Syntax and Semantics
- Syntax deals with the structure and form of a
statement or language - Semantics deals with the meaning that is conveyed
by a statement or language
6Question
- Is normalization a syntactic or a semantic
construct? - That is, does it deal with the form of
information, or is it involved with meaning?
7Intentional vs. Extensional Data
- Extensional datathe data that is actually
present - Intentional dataall the data that is allowed to
be present - Question does normalization deal with
intentional or extensional data?
8Entity and Entity Type
- An entity is something that we record information
about in the database - An entity type is a set of similar things that we
store information about - An entity instance is one example of some entity
type. - Usually we dont say entity instance and entity
type when context makes the meaning clear we
just say entity.
9Summary of Terminology
10Fact
- A value of an attribute in a row conveys one fact
about an entity instance - An attribute is a fact stating that This entity
instance has the value ltvaluegt - Consider emp(empno,ename,job,deptno)
- Each value of ename in a row states that This
persons name is ltvaluegt. - Each row of this table can be viewed as a
collection of four facts
11Robertss Rules
- Robertss Rules are a set of plain English rules
that, if followed during database design, result
in a highly normalized database design. - We will explore the relationship of Robertss
Rules to normalization, and vice versa.
12Normalization
- A set of formal rules that are intended to be a
definition of a properly-structured database - A normal form generally deals with and removes
certain anomalous behavior from the use of a
relation that is normalized.
13Examples of Anomalies
- Insert anomalies
- If we want to enter information about a new
entity in the database we need to enter
information about some other entity first - Delete anomalies
- In order to delete information about an entity we
must delete information about another entity - Update anomalies
- In order to change the value of a single fact we
may have to change many stored values in the
database
14Robertss Rules
15Rule 1
- Each relation describes exactly one entity type.
- A relation models a distinct entity type, and
each tuple of the relation models an instance of
that entity. - The relation models an entity by storing its
attributes. The attributes that identify it are
called candidate keys the other attributes are
non-key.
16Entities and Attributes
- Employee
- Department
- Job
- Server
- IP Address
- Salary
- SSN
- Order
- Invoice
- Item Number
- Location
- Property ID Number
- Building
- City
Which of these are entity types and which are
attributes?
17Rule 2
- Each fact is represented only once in the
database. - A tuple (aka row) is a collection of facts about
an entity instance, one fact per column.
18Duplicate Representation?
19Rule 3
- Each tuple can reside in only one relation.
- A relation is a model of an entity type, not a
station on a factory assembly line. - Instead of moving a tuple from relation to
relation, add an attribute that characterizes
status.
20Rule 4
- Database design must be insensitive to
cardinality of the problem. - Its easyand very riskyto presume that the
cardinality of various entity types and subtypes
will remain the same.
21Rule 4 Examples
- Company car
- College degree
- Telephone number
- Home address
- Business address
- Email address
22Example of Robertss Rules
- EMP ( EMPNO, ENAME, DEPTNO, DNAME)
- DEPT (DEPTNO, DNAME, DLOC)
- This relation violates the following Robertss
rules - Rule 1. The EMP table describes employee as well
as department - Rule 2. In the EMP table, if we have the same
DEPTNO in multiple rows, DNAME will be
represented multiple times.
23Another Example
- EMP (ENAME, DEGREE1, DEGREE2, DEGREE3)
- This schema violates the following Robertss rule
Rule 4. The design assumes every employee has a
maximum of 3 degrees. If an employee has 4
degrees, then the database needs to be
restructured by adding DEGREE4 in the EMP table. - Rule 4 deals with an aspect of data independence.
It can be stated informally as - "Grow down, not across"
24Normalization
25Basic Concepts
- Entity Type a class of an object that we record
information about. Aka relation, table - Attribute a characteristic of an entity. Aka
column. - Entity Instance a single occurrence of an entity
type. Aka tuple, row
26Candidate Keys
- Candidate key a set of attributes Ai, Aj,Ak
that is a candidate key has two (time-invariant)
properties - Uniqueness no two tuples have the same value
for the candidate key - 2. Minimality if any Ai is discarded from the
candidate key, then the uniqueness property is
lost. It is the smallest set of attributes that
identifies a row. - How many candidate keys can a table have?
27Primary Key
- One of the candidate keys is selected to be the
primary identifier of rows. It is called the
primary key. - The selection is usually made based on the
usefulness of the attribute that is the primary
key.
28Functional Dependence
- R.X?R.Y or R.X FD R.Y
- Given a relation R, attribute Y of R is
functionally dependent on attribute X of R iff
each X-value in R has associated with it
precisely one Y-value in R (at any one time) - In other words, for each value of X in table R,
there is one and only one value of Y. A given X
value must always occur with the same Y value.
29Functional Dependence Examples
Does X?Y? Does Y?X?
30Anomalies
- Update anomalies If one copy of repeated data
is updated, inconsistency is created unless all
copies are similarly updated. - Insert anomalies It may not be possible to
store some information unless some other
information is stored as well. - Delete anomalies It may not be possible to
delete some information without losing some other
information as well.
31Full Functional Dependence
- Y is fully functionally dependent on X iff X?Y
and no subset of X determines Y. - That is, X is the smallest collection of columns
that determines Y.
32Normalization
33First Normal Form
34First Normal Form
- A relation is said to be in first normal form iff
every attribute of every tuple is atomic.
351NF Example
Question Is this relation in 1NF?
Question Does this relation show any anomalies?
36Second Normal Form
37Second Normal Form
- A relation is said to be in second normal form
iff it is in first normal form and every
attribute is fully functionally dependent on the
primary key.
382NF Example
Does this relation follow Robertss Rules?
Do you see any anomalies?
SID
City
Status
SNAME
392NF and RR
- What is the relationship between 2NF and
Robertss Rules? - If Rule 1 is met, is the relation in 2NF?
- What about Rule 2?
40Third Normal Form
41Third Normal Form
- A relation is said to be in third normal form iff
it is in second normal form and there are no
transitive dependencies.
42How Do We Convert To 3NF?
SID
City
SNAME
Status
City
433NF and RR
- If a relation is in 3NF, what about rules 1 and 2?
44Fourth Normal Form
45Multi-Valued Dependency
- R.X is said to multi-value determine R.Y if there
is a set of values for Y that must appear in any
relation where R.X appears. - For example, if a course has two textbooks, then
there will be an MVD between the course number
and the names of the books.
46Fourth Normal Form
- A relation is said to be in fourth normal form
iff it is in third normal form and it does not
have more than one multi-valued dependency.
47Example of 4NF
SPORT-INSTRUMENT
Instrument
MVD
SID
MVD
Sport
48Converting to 4NF
SPORT
INSTRUMENT
Instrument
MVD
SID
SID
Sport
MVD
49Boyce-Codd Normal Form
50Boyce-Codd Normal Form
- A relation is said to be in Boyce-Codd normal
form iff every determinant is a key. - BCNF deals with problems that can be caused by
overlapping candidate keys.
51Example of BCNF
How does this relation comply with Rule 1 and
Rule 2?
Are there any anomalies?
S
SNAME
QTY
P
52Converting to BCNF
S
SNAME
S
QTY
P
53Fifth Normal Form
54Join Dependency
- A relation R satisfies join dependency (R1, R2,
..., Rn) if and only if R is equal to the join of
R1, R2, ..., Rn where Ri are subsets of the set
of attributes of R.
555NF
- A relation R is in 5NF (or project-join normal
form, PJNF) if for all join dependencies at least
one of the following holds - (a) (R1, R2, ..., Rn) is a trivial
join-dependency (that is, one of Ri is R) - (b) Every Ri is a candidate key for R.
56Example of 5NF
- Dept Subject Student
- Comp. Sc. CP1000 John Smith
- Comp. Sc. CP2000 John Smith
- Comp. Sc. CP3000 Arun Kumar
- Mathematics MA1000 Reena Rani
- Chemistry CH2000 Raymond Chew
- Physics PH1000 Albert Garcia
- This relation states that Comp. Sc. offers
subjects CP1000, CP2000 and CP3000 which are
taken by a variety of students. No student takes
all the subjects and no subject has all students
enrolled in it. Therefore, all three attributes
are needed to represent the information.
57Is The Relation in 5NF?
- The relation cannot be decomposed into two
relations - (dept, subject), and (dept, student)
- without losing some important information.
- However, the relation can be decomposed into the
following three relations - (dept, subject), (dept, student) and (subject,
student) - and it can be shown that this decomposition is
lossless. So the relation is not in 5NF.
58Data Modeling
- When we design a relational database, we search
for a set of entity types that will model the
problem of interest - If we choose a robust data model, it will last a
long time without major changes, even though the
programs that use it may change - Now that we have some idea what a good data model
is, we will talk about how to design one.