Teknik Informatika Fakultas Teknik

About This Presentation

Title:

Teknik Informatika Fakultas Teknik

Description:

Relational Database Schema resulting from an ER-Relational Mapping ... Project Y2K for 23 hours. WORKS-ON [SSN, Proj, Hours] Informatika Unitomo, 31 Agust '03 ... – PowerPoint PPT presentation

Number of Views:1118

Avg rating:3.0/5.0

Slides: 88

Provided by: rahm4

Category:

more less

Transcript and Presenter's Notes

Title: Teknik Informatika Fakultas Teknik

1
Module 5FunctionalDependencies
andNormalization

Teknik Informatika Fakultas Teknik
Universitas Dr. Soetomo Surabaya

2
Module 5 - Motivation

Relational Database Schema resulting from an
ER-Relational Mapping exhibits good design
characteristics
How can we measure the goodness or quality of
the design ?
Use formal theories to distinguish good relation
schemas from bad
Functional dependency information
Normalization techniques

3
Module 5 - Contents

Functional Dependencies and Normalization
Informal Design Guidelines
Functional Dependency
Normalization
Properties of Decomposition

4
Informal Design Guidelines

Informal measures of relational database
schema quality and design guidelines
Semantics of the attributes
Reducing the redundant values in tuples
Reducing the null values in tuples
Disallowing spurious tuples
Apply to schemas of base relations
(files that will be physically stored)

5
Semantics of the attributes

Grouping attributes to construct a relation
implies that a particular meaning or semantics
is associated with the attributes
Employee Smith, with unique id (SSN) 123, who
lives
at 23 Queens Street, earns salary 34K and
works in
Department Payroll
EMPLOYEE SSN, Name, Address, Salary, Dept
The employee with unique id (SSN) 123 Works-on
Project Y2K for 23 hours
WORKS-ON SSN, Proj, Hours

6
Guideline 1

Design each relation so that it is easy to
explain its meaning
Do not combine attributes from multiple entity
types and relationship types into a single
relation
Employee Smith, with unique id (SSN) 123,
who lives at 23 Queens Street, earns salary
34K and
works in Department Payroll that is managed
by
Wyeth and is located in Perth
EMPLOYEE SSN, Name, Address, Salary, Dept,
Manager, Location

7
Redundant values in tuples

One design goal is to minimize the storage space
that base relations occupy
The way the grouping of attributes into relation
schemas is done, has a significant effect on
storage space
In addition, an incorrect grouping may cause
update anomalies which may result in inconsistent
data or even loss of data

8
Example with redundant values

Consider relation BigEmp,
which groups tuples from
two distinct entity types

9
Update Anomalies

Insertion Anomalies
Problems with inserting new tuples
Deletion Anomalies
Problems with deletion of tuples - may result
in loss of information
Modification Anomalies
Problems with modifying values of attributes
may result in inconsistent data

10
Insertion Anomaly 1

Insertion of new employee requires correct
department name and location details or null
values if the employee not assigned to any
department yet

11
Insertion Anomaly 2

Insertion of new department that has no employees
yet is not possible since EMP (primary key of
BigEmp) can not be null

12
Deletion Anomalies

Deletion of all employees in Department D4
results in loss of departments name and location

13
Modification Anomalies

If we change the location of the Shipping
department, many tuples must be modified to keep
the database consistent

14
Guideline 2

Design the base relation schema so that no
insertion, deletion, or modification anomalies
occur in the relations
If any do occur, ensure that all applications
that access the database update the relations in
such a way as to not compromise the integrity of
the database

15
Null Values in Tuples

Tuples without data for certain attributes are
said to contain null values
Example NULL value for dept in EMPLOYEE
relation (123,Jones, 12 Queen St, NULL, 34K)
When occurring in large numbers, null values
represent a waste of physical disk space
More importantly, null values can introduce
problems with the semantics of relations, and in
specification of various operations such as join,
aggregation

16
Semantics of Null Values

Nulls can have multiple interpretations
Attribute does not apply to this relation
Attribute value for this tuple is unknown
The value is known but not yet recorded yet
Having the same representation (NULL) for all
possibilities above compromises the different
meanings they may have

17
Guideline 3

As far as possible, avoid placing attributes in a
base relation whose values may be null
If nulls are unavoidable, make sure that they
apply in exceptional cases only and that they do
not apply to a majority of tuples in the relation

18
Disallowing spurious tuples

The bifurcation, or general decomposition, of a
large relation into a set of smaller ones can
result in numerous problems
One problem involves the situation where
information in the original relation is lost due
to the decomposition
Ironically, the problem becomes apparent when
the
smaller relations are joined back together
and the
result is larger than the original relation,
i.e. contains
spurious tuples

19
Example Decomposition
20
Join with Spurious Tuples
21
Guideline 4

Design the relation schemas so that they can be
(relationally) joined with equality conditions on
attributes that are either primary keys or
foreign keys in a way that guarantees that no
spurious tuples are generated
Do not have relations that contain matching
attributes other than foreign key - primary
key
combinations
If such relations are unavoidable, do not join
them on such attributes, because the join may
contain spurious tuples

22
Module 5 - Contents

Informal Design Guidelines
This section presented informal
guidelines for good relational design
The following sections present formal
concepts and applied theory
Functional Dependency
Normalization
Properties of Decomposition

23
Functional Dependency

Single most important concept in relational
schema design
To understand the concept, we will ...
introduce, motivate and formally define the
term functional dependency
describe basic inference rules for functional
dependencies
use these inference rules to derive others and
to compute closures

24
What is Functional Dependency

A function dependency (FD) is a constraint
denoted X ? Y between two sets of attributes X
and Y from a relational schema R.
The FD specifies a restriction on the possible
tuples that can form a relation instance r of R

25
The FD Constraint - Informally

Example A relation
DEPARTMENT
(DNO, DNAME, DLOC)
can have following FDs
FD1 DNO -gt DLOC
FD2 DNO -gt DNAME

Functional dependency X -gt Y holds if and only if
whenever two tuples agree on their X-value, they
must necessarily agree on their Y value

26
The FD Constraint - Formally

The FD constraint is that for any two tuples t1
and t2 in the relation instance r that have
t1X t2X
we must also have
t1Y t2Y

This means that the values
of the Y component of a tuple depend on, or are
determined by, the X component
The values of the X component of a tuple uniquely
(or functionally) determine the values of the Y
component

If X -gt Y holds, then Y is functionally dependent
on X X is termed the left-hand-side (LHS)
of the FD Y is termed the right-hand-side
(RHS) of the FD
27
What determines FDs

FD is a property of the meaning or semantics of
the attributes
Humans use their understanding of these semantics
to create and specify each FD, they cannot be
accurately obtained from viewing a particular
relation instance
However, if a FD does not hold in even one
instance then the FD cannot hold on the schema

28
Relation at Time T1

Can you determine if the FD STATE -gt CODE holds
true in the database from viewing the instance
above?

29
Relation at Time T2

If the existence of the Smith tuple is legal,
then it implies that the proposed FD
STATE -gt CODE is invalid !

30
Application of FDs

The main use of FDs is to further describe (if
the semantics of the application warrant) a
relation schema R by specifying constraints on
its attributes that must hold at all times
Relation instances r(R), that satisfy the FD
constraints are called legal extensions, legal
instances or legal relation states

31
Inferred FDs

Designer specifies obvious FDs, however numerous
other FDs can be inferred
Let F be a set of FDs on R
F Pnum -gt Pname, Plocation, Dnum,
Dnum -gt Dname, MgrSSN, MgrStartDate
FDs that can be inferred are
Pnum -gt MgrStartDate
Dnum -gt MgrSSN
Pnum -gt Pnum

32
Notation for Inferred FDs

The notation F X -gt Y denotes that the
FD X -gt Y is inferred from the set of
functional dependencies F
Example
F Pnum -gt MgrStartDate,
Dnum -gt MgrSSN,
Pnum -gt Pnum
where F Pnum -gt Pname, Plocation, Dnum,
Dnum -gt Dname, MgrSSN, MgrStartDate

33
Compact Notation for FDs

Pnum -gt PName, PLocation, Dnum
Dnum -gt DnaMe, MGrSSN, MgrStartDate
ESSN, DependenT_Name -gt BDate
Compact Notation
P -gt NLD
D -gt MGS
ET -gt B

34
Inference Rules

It is practically impossible to specify all
possible FDs that may hold
To infer additional FDs from a set of valid FDs
we need a system of inference rules
There are 6 inference rules IR1 - IR6
IR1-IR3 are referred to as Armstrongs
Inference Rules
The set of all dependencies that can be
inferred from a given F is called the
closure of F, denoted by F

35
IR1 Reflexive Rule

If X ? Y then X -gt Y
A set of attributes always determines itself or
any of its subsets
If ESSN, Dependent_Name ? ESSN then
ESSN, Dependent_Name -gt ESSN holds

36
IR2 Augmentation Rule

X -gt Y XZ ? YZ
Adding the same set of attributes to both the
LHS RHS of a FD results in another
valid FD
If SSN -gt Ename then
SSN, Address -gt Ename, Address
holds

37
IR3 Transitive Rule

X -gt Y, Y -gt Z X -gt Z
FDs are transitive
If SSN -gt Dno
and Dno -gt Dlocation
then SSN -gt Dlocation holds

38
Armstongs Inference Rules

The rules IR1-IR3 are sound and complete.
Sound Given a set of FDs F holding on a relation
schema R, any FD that we can infer from F by
using IR1-IR3 holds in every legal relation
instance
Complete Repeatedly applying IR1-IR3 generates
all possible FDs that can be inferred from F

39
IR4 Decomposition Rule

X -gt YZ X -gt Y
We can remove attributes from the RHS of a
dependency, and decompose the FD
If SSN -gt Ename, Dno
then SSN -gt Ename and SSN -gt Dno
both hold

40
Proof of Decomposition Rule

We can use IR1-IR3 to prove IR4
1 X -gt YZ Given
2 YZ -gt Y Using IR1 plus YZ ? Y
3 X -gt Y Using IR3 on results of 1 2

41
IR5 Union Rule

X -gt Y, X -gt Z X -gt YZ
We can union attributes from the RHS of a
dependency, and combine a set of FDs
into a single FD (reverse of IR4)
If SSN -gt Ename and SSN -gt Dno
then SSN -gt Ename, Dno
holds

42
Proof of Union Rule

We can use IR1-IR3 to prove IR5
1 X -gt Y Given
2 X -gt Z Given
3 XX -gt XY Using IR2 on 1 (adding X)
4 X -gt XY Since XXX (not using IRI-R3)
5 XY -gt YZ Using IR2 on 2 (adding Y)
6 X -gt YZ Using IR3 on 4 5

43
IR6 Pseudotransitive Rule

X -gt Y, WY -gt Z WX -gt Z
Represents a variant of IR3
If SSN -gt MgrSSN and
MgrSSN, Dependent_Name -gt
Relationship
then SSN, Dependent_Name -gt Relationship

44
Proof of Pseudotransitive Rule

We can use IR1-IR3 to prove IR6
1 X -gt Y Given
2 WY -gt Z Given
3 WX -gt WYUsing IR2 on 1 (adding W)
4 WX -gt Z Using IR3 on 3 2

45
Use of Inference Rules

Step 1 Database designers first specify the set
of FDs F
that can easily be determined from
semantics
Step 2 Armstrongs rules are used to infer
additional FDs
that also hold
Determine set of attributes X that appear on a
FDs LHS
Use Armstrongs rules to determine attributes
determined by X (called X)
X is the closure of X under the set F

46
Computing the Closure (X)

Algorithm to compute the closure of X under F
X X
repeat
old X X
for each FD Y -gt Z in F do
if Y ? X then X X? Z
until (old X X )

47
Example Closure Computation

X Pnumber
F Pnumber -gt Pname, Plocation, Dnum
Dnum -gt Dname, MgrSSN, MgrStartDate
Step 1 X Pnumber
Step 2 X Pnumber, Pname, Plocation, Dnum
Step 3 X Pnumber, Pname, Plocation, Dnum,
Dname, MgrSSN,
MgrStartDate

48
Module 5 - Contents

Informal Design Guidelines
Functional Dependency
Functional dependencies are constraints forced
on all
database instances
The designer specifies some intuitive FDs but
others
can be inferred using rules
The closure of a set of FDs F is the set of
all FDs F that
can be inferred from F, where as the
closure of a attribute
X under F is the set of attributes X
that are functionally
determined by X
Normalization
Properties of Decomposition

49
Normalization

Normalization is a process that aims at achieving
better designed relational database schemas
through the use of semantic information given by
Functional Dependencies
Primary Keys

50
The Normalization Process

The normalization process takes a relational
schema through a series of tests to certify
whether it satisfies certain conditions
The schemas that satisfy certain condition are
said to be in a given Normal Form.
Unsatisfactory schemas are decomposed by
breaking up their attributes into smaller
relations that possess desirable properties
(e.g., no anomalies)

51
Review of Key Concepts

Superkey - A set of attributes such that no two
tuples have the same values for these attributes
Key - A Minimal Superkey, called a Candidate
Key if more than one
Primary key - A selected candidate key
Secondary key - Remaining candidate keys
Prime Attribute - An attribute that is a member
of
any candidate key
Non-prime attribute An attribute that is not a
member of any candidate key

52
First Normal Form (1NF)

A relation schema is in 1NF if domains of
attributes
include only atomic (simple, indivisible) values
and the value of an attribute is a single value
from the domain of that attribute
1NF disallows
having a set of values, a tuple of values, or
a
combination of both as an attribute value for
a
single tuple
relations within relations and relations as
attributes of tuples
Considered part of the formal definition of
relation

53
Non-1NF Relation
Non atomic values
54
Relations in 1NF
Problem with this design Redundancy (Jones, 123)
55
Relations in 1NF

Problem with this design
Too many Null values
What if an order has gtmax (4) items ?

56
Second Normal Form (2NF)
A relation schema is in 2NF if it is in 1NF, and
every non-prime attribute is fully functionally
dependent on the primary key

A FD X -gt Y is termed partial
if some attribute can be
removed from X and the
dependency still holds
Formally, for some attribute
A? X, (X-A) -gt Y is true

A FD X -gt Y is termed full if
removal of any attribute
from X means that the FD
no longer holds
Formally, for any attribute
A ? X, (X-A) -gt Y is false

57
Non 2NF Relation

Relation DEP
DEP SSN, Dep_Name, Dep_Bdate,
Emp_Bdate
Functional Dependencies in DEP
FD1 SSN, Dep_Name -gt Dep_Bdate
FD2 SSN -gt Emp_Bdate

58
Relation in 2NF

2NF relations can be produced by creating a
decomposition where non-prime attributes are
associated only with the part of the primary key
on which they are fully dependent
Relations after decomposition of DEP
EMP SSN, Emp_Bdate
NEW-DEP SSN,Dep_Name, Dep_Bdate

59
Third Normal Form (3NF)

A relation schema is in 3NF if it is in 2NF,
and no non-prime attribute is transitively
dependent on the primary key
A FD X -gt Y is termed transitive if there is a
set of attributes Z that is not a subset of any
key and both X -gt Z and Z -gt Y hold

60
Non 3NF Relation

Relation EMP
EMPName, State, Code

Functional Dependencies in PERSON FD1 NAME -gt
STATE, CODE FD2 CODE -gt STATE Transitive
Dependency !
61
Relation in 3NF

3NF relations can be produced from 2NF relations
by creating a decomposition where no transitive
dependencies exist

Relations after decomposition of EMP EMP-CODE
Name, Code STATE-CODE State, Code
62
Generalization of 2NF and 3NF

2NF and 3NF respectively require that a schema
should have neither partial nor transitive
dependencies on the primary key to avoid update
anomalies
However the dependencies could exist for
other candidate keys !
General definitions for 2NF and 3NF ...

63
General Definition of 2NF

A relation schema R is in 2NF if it is in 1NF,
and every non-prime attribute is fully
functionally dependent on every key of R
Property PID, Shire, Lot, Size, Tax_Rate,
Price
FD1 PID -gt Shire, Lot, Size, Tax_Rate, Price
FD2 Shire, Lot -gt PID, Size, Tax_Rate, Price
FD3 Shire -gt Tax_Rate
FD4 Size -gt Price
What are the candidate keys of Property ?

64
Generalized 2NF Example

Property PID, Shire, Lot, Size, Tax_Rate,
Price
Candidate keys are PID Shire, Lot
Is there a partial dependency on any key ?
FD1 PID -gt Shire, Lot, Size, Tax_Rate, Price
FD2 Shire, Lot -gt PID, Size, Tax_Rate , Price
FD3 Shire -gt Tax_Rate
If so, the relation is not in 2NF
2NF Relations after decomposition
Property PID, Shire, Lot, Size, Price
ShireRates Shire, Tax_Rate

65
General Definition of 3NF

A relation schema R is in3NF if for every FD
X-gtA that holds on R, either
a) X is a superkey of R
OR
b) A is a prime attribute of R
This definition can be applied directly to test
for 3NF without having to check for 2NF first

66
Generalised 3NF Example

Property PID, Shire, Lot, Size, Price
Primary key PID, Candidate key Shire, Lot
FD1 PID -gt Shire, Lot, Size, Price
FD2 Shire, Lot -gt PID, Size, Price
FD4 Size -gt Price
Does any FD fail the 3NF test?
If so, the relation is not in 3NF
3NF Relations after decomposition
Property PID, Shire, Lot, Size
Cost Size, Price

67
Alternate Definition of 3NF

A relation schema R is in 3NF if every nonprime
attribute of R is
a) Fully functionally dependent on every key of
R
AND
b) Non-transitively dependent on every key of R

68
Boyce-Codd NF (BCNF)

A relation schema R is in BCNF if for every
FD X-gtA that holds on R, X is a superkey of R
A stricter normal form than 3NF
In practice, most 3NF relational schemas are
also in BCNF
BCNF is better than 3NF unless other
properties (to be discussed) are lost

69
BCNF Example

Property PID, Shire, Lot, Size
FD1 PID -gt Shire, Lot, Size
FD2 Shire, Lot -gt PID, Size
FD5 Size -gt Shire
Candidate keys are PID Shire, Lot
Size is not a super key
But, Shire is a prime attribute
Property is in 3NF but not in BCNF (due to FD5)

70
BCNF Example (cont)

BCNF Relations after decomposition
PropertyPID, Size, Lot
ShireLotSize Size, Shire
where
FD1 PID -gt Shire, Lot, Size
FD2 Shire, Lot -gt PID, Size
FD5 Size -gt Shire
Property and ShireLotSize are in BCNF

71
Limitations of Normalization

Isolation
Normal forms, when considered in isolation for
individual relations, do not guarantee a good
design
Normalization must also consider additional
properties
that the relational schemas, taken
together, must
possess (loss less join and dependency
preservation)
Performance
Ideally, relation schemas should be at least
in 3NF.
However, for performance reasons, some
designers
leave relations in lower forms

72
Module 5 - Contents

Informal Design Guidelines
Functional Dependency
Normalization
Properties of Decomposition
Dependency Preservation Property
Loss less (non additive) Join Property

73
Properties of Decompositions

Normalization must also consider
additional properties that the relational
schemas, taken together, must possess
Dependency Preservation Property
Loss less (non additive) Join Property

74
Dependency Preservation

Informally, each FD in F must either appear
directly on one of the relation schema in the
decomposition or can be inferred from the
dependencies that appear in some
relation schema
FDs are user-specified constraints that
capture
data semantics. The DBMS should enforce
them
efficiently
The relational database schema must exhibit
the dependency preservation property
for
this to occur

75
Formal Definition

A decomposition D R1,R2,,Rm of R is
dependency preserving with respect to a set of
functional dependencies F on R if the union of
projections of F on each Ri in D is equivalent to
F, that is
(p R1(F) ? p R2(F) ?p Rm(F) ) F
where the projection of F on Ri, denoted by
pRi(F) is the set of dependencies X -gt Y in F,
such that the attributes X ? Y are contained in
Ri.

76
Decomposition with Lost FD

Property PID, Shire, Lot, Size decomposed to
Property PID, Size, Lot
ShireLotSize Size, Shire
FD1 PID -gt Shire, Lot, Size
FD2 Shire, Lot -gt PID, Size
FD3 Size -gt Shire
Property FP PID -gt Lot, Size
ShireLotSize FS Size -gt Shire
(FP ? FS) FD1,FD3
What about FD2?

77
Dep. Preserving Decomposition

EMP EMP, STATE, CODE decomposed to
Emp-Code(EMP,CODE)
State-Code(STATE,CODE)
FD1 EMP -gt STATE
FD2 EMP -gt CODE
FD3 CODE -gt STATE
Emp-Code FE EMP -gt CODE
State-Code FS CODE -gt STATE
(FE?FS)FD2,FD3, (FE?FS)FD1,FD2,FD3

78
Decomposition with Lost FD

EMP EMP, STATE, CODE decomposed to
Emp-Code(EMP,STATE)
State-Code(STATE,CODE)
FD1 EMP -gt STATE
FD2 EMP -gt CODE
FD3 CODE -gt STATE
Emp-Code FE EMP -gt STATE
State-Code FS CODE -gt STATE
(FE?FS)FD2,FD3, (FE?FS)FD1,FD3 F2?

79
Loss less (non-additive)Join

Loss less refers to the loss of information,
not to loss of tuples
A decomposition which is not loss less,
may result in spurious tuples after a join
is applied to the decomposed relations

80
Formal Definition

A decomposition D R1,R2,,Rm of R is
lossless with respect to a set of functional
dependencies F on R if for every state r of
R that satisfies F, the following holds
p R1(r) p R2(r) p Rm(r) r
where p R1(r) is the projection of r on Ri
and is a join

81
Decomposition with Loss

EMP EMP, STATE, CODE decomposed to
Emp-Code(EMP,STATE)
State-Code(STATE,CODE)
When joined together, it will result in spurious
tuples
Information regarding the CODE for the EMP is lost

82
Join with Spurious Tuples
83
Relational Database Design

Design by Analysis
(Top Down Strategy)
1. Start from attribute
groupings as obtained
from conceptual modeling
and mapping activities
2. Analyze and decompose
until desirable properties
are met
(Covered in Modules 4-5)
Practical and widely used
approach

Design by Synthesis
(Bottom Up Strategy)
1. Start from individual attributes and FDs
2. Synthesize into a 3NF (or higher) relations
(Not Covered)
All possible FDs must be identified in advance.
Even if a few FDs are missing, the schema can be
suboptimal
Results are dependent on the order in which the
synthesis algorithm receives the
FDs(nondeterministic)

84
From UoD to Relational Schema
85
Module 5 - Review
Functional Dependency concepts and Normalization
techniques are formal theories to measure
thegoodness or quality of a relational schema
design resulting from ER-Relational mapping

Functional Dependencies
represent data semantics.
Some FDs can be specified by
database designers, others
can be inferred. These FDs
can be used to enforce
constraints on data that should
hold on all instances of a given
schema

Normalization techniques use FD and primary key
information to restructure (decompose) relation
schemas, such that the resulting relations
possess desirable properties and are said to be
in a given normal form. A higher NF is generally
better, however other factors such as performance
and decomposition need to be considered

86
Recommended Readings