Normalization, Robertss Rules and Introduction to Data Modeling - PowerPoint PPT Presentation

1 / 69
About This Presentation
Title:

Normalization, Robertss Rules and Introduction to Data Modeling

Description:

... xml.rels ppt/s/_rels/20.xml.rels ppt/s/_rels/19.xml. ... app.xml docProps/core.xml ppt/tags/tag3.xml ppt/tags/tag1.xml ppt/tags/tag4.xml ... – PowerPoint PPT presentation

Number of Views:63
Avg rating:3.0/5.0
Slides: 70
Provided by: csci5
Category:

less

Transcript and Presenter's Notes

Title: Normalization, Robertss Rules and Introduction to Data Modeling


1
Normalization, Robertss Rules and Introduction
to Data Modeling
  • CSCI 242
  • February 4, 2009

2
Agenda
  • Robertss Rules
  • Normalization
  • Robertss Rules and Normalization

3
Why Are We Talking About This?
  • To design a database, we choose a set of entities
    that models a problem
  • We will store data in tables corresponding to our
    entity choices
  • The names of the entity types become embedded in
    our programs
  • Changing later on is expensive, so we want a
    stable model of the problem

4
Midterm Question
  • The first question on the midterm will deal with
    normal forms. It will deal with the relationship
    between normal forms and Robertss Rules.
  • This one question will count more than any other
    question on the exam.
  • The homework assignment on normalization and
    Robertss Rules looks a lot like Question 1 on
    the midterm.

5
Syntax and Semantics
  • Syntax deals with the structure and form of a
    statement or language
  • Semantics deals with the meaning that is conveyed
    by a statement or language

6
Question
  • Is normalization a syntactic or a semantic
    construct?
  • That is, does it deal with the form of
    information, or is it involved with meaning?

7
Intentional vs. Extensional Data
  • Extensional datathe data that is actually
    present
  • Intentional dataall the data that is allowed to
    be present
  • Question does normalization deal with
    intentional or extensional data?

8
Entity and Entity Type
  • An entity is something that we record information
    about in the database
  • An entity type is a set of similar things that we
    store information about
  • An entity instance is one example of some entity
    type.
  • Usually we dont say entity instance and entity
    type when context makes the meaning clear we
    just say entity.

9
Summary of Terminology
10
Fact
  • A value of an attribute in a row conveys one fact
    about an entity instance
  • An attribute is a fact stating that This entity
    instance has the value ltvaluegt
  • Consider emp(empno,ename,job,deptno)
  • Each value of ename in a row states that This
    persons name is ltvaluegt.
  • Each row of this table can be viewed as a
    collection of four facts

11
Robertss Rules
  • Robertss Rules are a set of plain English rules
    that, if followed during database design, result
    in a highly normalized database design.
  • We will explore the relationship of Robertss
    Rules to normalization, and vice versa.

12
Normalization
  • A set of formal rules that are intended to be a
    definition of a properly-structured database
  • A normal form generally deals with and removes
    certain anomalous behavior from the use of a
    relation that is normalized.

13
Examples of Anomalies
  • Insert anomalies
  • If we want to enter information about a new
    entity in the database we need to enter
    information about some other entity first
  • Delete anomalies
  • In order to delete information about an entity we
    must delete information about another entity
  • Update anomalies
  • In order to change the value of a single fact we
    may have to change many stored values in the
    database

14
Robertss Rules
15
Rule 1
  • Each relation describes exactly one entity type.
  • A relation models a distinct entity type, and
    each tuple of the relation models an instance of
    that entity.
  • The relation models an entity by storing its
    attributes. The attributes that identify it are
    called candidate keys the other attributes are
    non-key.

16
Example of Rule 1
  • These are OK
  • Emp(empno,ename,job,sal,deptno)
  • Dept(deptno,dname,loc)
  • Not OK
  • Emp(empno,ename,job,sal,deptno,loc)
  • Dept(deptno,dname,loc,loc_sales_tax)

17
Rule 2
  • Each fact is represented only once in the
    database.
  • A tuple (aka row) is a collection of facts about
    an entity instance, one fact per column. A row
    with 5 columns has five facts about an instance
    in each row.

18
Facts
  • Emp(empno,ename,job,sal,deptno)
  • Each row consists of five facts about an employee
  • Each of these facts must be represented just once
    in the database

19
Duplicate Representation?
20
Rule 3
  • Each tuple can reside in only one relation.
  • A relation is a model of an entity type, not a
    station on a factory assembly line.
  • Instead of moving a tuple from relation to
    relation, add an attribute that characterizes
    status.

21
Rule 3 Example
  • Modeling cars on an assembly line that has three
    stages
  • Chassisbuild(VIN,date_built,date_ord,dealer,make,m
    odel)
  • IntFinish(VIN,date_built,date_ord,dealer,make,mode
    l)
  • FinalAssmbl(VIN,date_built,date_ord,dealer,make,mo
    del)
  • Instead should be one table
  • Vehicle(VIN,date_built,date_ord,dealer,make,model,
    prod_stage)
  • Note the use of one additional attribute to
    indicate the stage of production

22
Rule 4
  • Database design must be insensitive to
    cardinality of the problem, when the cardinality
    is greater than one.
  • Its easyand very riskyto presume that the
    cardinality of various entity types and subtypes
    will remain the same.

23
Rule 4 Examples
  • Company car
  • College degree
  • Telephone number
  • Home address
  • Business address
  • Email address
  • How many of these does your entity have? How
    many may they have some day?

24
Rule 4 Restated
  • Grow down, not across
  • That is, when a value can repeat, make it into
    its own entity type, with a separate row for each
    repetition
  • So education becomes its table, one row for
    each value

25
Example of Robertss Rules
  • EMP ( EMPNO, ENAME, DEPTNO, DNAME)
  • DEPT (DEPTNO, DNAME, DLOC)
  • This relation violates the following Robertss
    rules
  • Rule 1. The EMP table describes employee as well
    as department
  • Rule 2. In the EMP table, if we have the same
    DEPTNO in multiple rows, DNAME will be
    represented multiple times.

26
Another Example
  • EMP (ENAME, DEGREE1, DEGREE2, DEGREE3)
  • This schema violates the following Robertss rule
    Rule 4. The design assumes every employee has a
    maximum of 3 degrees. If an employee has 4
    degrees, then the database needs to be
    restructured by adding DEGREE4 in the EMP table.
  • Rule 4 deals with an aspect of data independence.
    It can be stated informally as
  • "Grow down, not across"

27
Normalization
28
Basic Concepts
  • Entity Type a class of an object that we record
    information about. Aka relation, table
  • Attribute a characteristic of an entity. Aka
    column.
  • Entity Instance a single occurrence of an entity
    type. Aka tuple, row

29
Candidate Keys
  •  Candidate key a set of attributes Ai, Aj,Ak
    that is a candidate key has two (time-invariant)
    properties
  • Uniqueness no two tuples have the same value
    for the candidate key
  • 2. Minimality if any Ai is discarded from the
    candidate key, then the uniqueness property is
    lost. It is the smallest set of attributes that
    identifies a row.
  • How many candidate keys can a table have?

30
Primary Key
  • One of the candidate keys is selected to be the
    primary identifier of rows. It is called the
    primary key.
  • The selection is usually made based on the
    usefulness of the attribute that is the primary
    key.

31
Functional Dependence
  • R.X?R.Y    or    R.X   FD    R.Y
  • Given a relation R, attribute Y of R is
    functionally dependent on attribute X of R iff
    each X-value in R has associated with it
    precisely one Y-value in R (at any one time)
  • In other words, for each value of X in table R,
    there is one and only one value of Y. A given X
    value must always occur with the same Y value.

32
Functional Dependence Examples
Does X?Y? Does Y?X?
33
Anomalies
  • Update anomalies If one copy of repeated data
    is updated, inconsistency is created unless all
    copies are similarly updated.
  • Insert anomalies It may not be possible to
    store some information unless some other
    information is stored as well.
  • Delete anomalies It may not be possible to
    delete some information without losing some other
    information as well.

34
Full Functional Dependence
  • Y is fully functionally dependent on X iff X?Y
    and no subset of X determines Y.
  • That is, X is the smallest collection of columns
    that determines Y.

35
Normalization
36
First Normal Form
37
First Normal Form
  • A relation is said to be in first normal form iff
    every attribute of every tuple is atomic.

38
1NF Example
Question Is this relation in 1NF?
Question Does this relation show any anomalies?
39
Second Normal Form
40
Second Normal Form
  • A relation is said to be in second normal form
    iff it is in first normal form and every
    attribute is fully functionally dependent on the
    primary key.

41
2NF Example
Does this relation follow Robertss Rules?
Do you see any anomalies?
SID
City
Status
SNAME
42
2NF and RR
  • What is the relationship between 2NF and
    Robertss Rules?
  • If Rule 1 is met, is the relation in 2NF?
  • What about Rule 2?

43
Third Normal Form
44
Third Normal Form
  • A relation is said to be in third normal form iff
    it is in second normal form and there are no
    transitive dependencies.

45
How Do We Convert To 3NF?
SID
City
SNAME
Status
City
46
3NF and RR
  • If a relation is in 3NF, what about rules 1 and 2?

47
Fourth Normal Form
48
Multi-Valued Dependency
  • R.X is said to multi-value determine R.Y if there
    is a set of values for Y that must appear in any
    relation where R.X appears.
  • For example, if a course has two textbooks, then
    there will be an MVD between the course number
    and the names of the books.

49
Fourth Normal Form
  • A relation is said to be in fourth normal form
    iff it is in third normal form and it does not
    have more than one multi-valued dependency.

50
Example of 4NF
  • Is this relation in 4NF?

SPORT-INSTRUMENT
Instrument
MVD
SID
MVD
Sport
51
Converting to 4NF
SPORT
INSTRUMENT
Instrument
MVD
SID
SID
Sport
MVD
52
Boyce-Codd Normal Form
53
Boyce-Codd Normal Form
  • A relation is said to be in Boyce-Codd normal
    form iff every determinant is a key.
  • BCNF deals with problems that can be caused by
    overlapping candidate keys.

54
Example of BCNF
How does this relation comply with Rule 1 and
Rule 2?
Are there any anomalies?
S
SNAME
QTY
P
55
Converting to BCNF
S
SNAME
S
QTY
P
56
Fifth Normal Form
57
Join Dependency
  • A relation R satisfies join dependency (R1, R2,
    ..., Rn) if and only if R is equal to the join of
    R1, R2, ..., Rn where Ri are subsets of the set
    of attributes of R.

58
5NF
  • A relation R is in 5NF (or project-join normal
    form, PJNF) if for all join dependencies at least
    one of the following holds
  • (a) (R1, R2, ..., Rn) is a trivial
    join-dependency (that is, one of Ri is R)
  • (b) Every Ri is a candidate key for R.

59
Example of 5NF
  • Dept Subject Student
  • Comp. Sc. CP1000 John Smith
  • Comp. Sc. CP2000 John Smith
  • Comp. Sc. CP3000 Arun Kumar
  • Mathematics MA1000 Reena Rani
  • Chemistry CH2000 Raymond Chew
  • Physics PH1000 Albert Garcia
  • This relation states that Comp. Sc. offers
    subjects CP1000, CP2000 and CP3000 which are
    taken by a variety of students. No student takes
    all the subjects and no subject has all students
    enrolled in it. Therefore, all three attributes
    are needed to represent the information.

60
Is The Relation in 5NF?
  • The relation cannot be decomposed into two
    relations
  • (dept, subject), and (dept, student)
  • without losing some important information.
  • However, the relation can be decomposed into the
    following three relations
  • (dept, subject), (dept, student) and (subject,
    student)
  • and it can be shown that this decomposition is
    lossless. So the relation is not in 5NF.

61
Data Modeling
  • When we design a relational database, we search
    for a set of entity types that will model the
    problem of interest
  • If we choose a robust data model, it will last a
    long time without major changes, even though the
    programs that use it may change
  • Now that we have some idea what a good data model
    is, we will talk about how to design one.

62
Natural vs. Generated Keys
63
Definitions
  • Natural keya key that occurs in the data, that
    serves as a unique identifier for rows
  • Generated keya key that is generated in order to
    make rows unique

64
Generated Keys
  • In some organizations, programmers have the idea
    that generated keys should be included in every
    relationthis is a misunderstanding of the
    relational approach.
  • Tuples are naturally unique because each one
    corresponds to a distinct instance of an entity
    type in the real world. You dont have to add
    anything to make them unique.
  • If your database design is correct, you have
    unique rows even if you havent added a key to
    each row.

65
When to Use Generated Keys
  • Sometimes you have a compound key that is very
    long
  • Lots of queries will be complicated and tables
    with foreign keys will copy most of another table
  • In this case, a generated key can be used
  • As a guideline, the only time to use a generated
    key is when the natural key has at least three
    columns

66
Class Project
67
Potential Class Project
  • A social network for arts and cultural
    organizations
  • It will operate across the Atlanticand be called
    Across Atlantic
  • The site will use cultural exchangesarts and
    musicto break down barriers between cultures
  • Cultural organizations can register, and so can
    their members
  • Language translation will be provided

68
The Project
  • The project has some interesting database design
    problems
  • We can allow people to enroll to a limited degree
    without review
  • Manual review will be required for full
    membership
  • We can use translucent database techniques to
    protect private information
  • We can have blogs translated into several
    languages automatically

69
Thank you!
Write a Comment
User Comments (0)
About PowerShow.com