Functional Dependency and 2NF

About This Presentation

Title:

Functional Dependency and 2NF

Description:

Primarily a tool to validate and improve a logical design so that it ... Price: degraded query, display, reporting. 11/27/09. 3. Functional Dependency and Keys ... – PowerPoint PPT presentation

Number of Views:83

Avg rating:3.0/5.0

Slides: 68

Provided by: kevi59

Category:

more less

Transcript and Presenter's Notes

Title: Functional Dependency and 2NF

1
Functional Dependency and 2NF
CS157B
Lecture 8

Prof. Sin-Min Lee
Department of Computer Science
San Jose State University

2
Data Normalization

Primarily a tool to validate and improve a
logical design so that it satisfies certain
constraints that avoid unnecessary duplication of
data.
The process of decomposing relations with
anomalies to produce smaller, well-structured
relations.
Primary Objective Reduce Redundancy,Reduce
nulls,
Improve modify activities
insert,
update,
delete,
but not read
Price degraded query, display, reporting

3
Functional Dependency and Keys

Functional Dependency The value of one attribute
(the determinant) determines the value of another
attribute.
Candidate Key Each non-key field is functionally
dependent on every candidate key.

4
Functional dependency

a constraint between two attributes (columns) or
two sets of columns
A ? B if for every valid instance of A, that
value of A uniquely determines the value of B
or A ?B if there exists at most one value of B
for every value of A

5
(No Transcript)
6
Functional Dependency Graph
R
X Y Z

FDs defined over two sets of attributes X, Y
Ì R
Notation X à Y reads as X determines Y
If X à Y, then all tuples that agree on X must
also agree on Y

1 2 3 2 4 5 1 2 4 1 2 7 2 4 8 3 7 9
7
Functional Dependencies (example)
X Y Z
X Y Z
1 2 3 2 4 5 1 2 4 1 2 7 2 4 8 3 7 9
8
candidate key

a candidate key must satisfy
unique identification.
implies that each nonkey attribute is
functionally dependent on the key (for not(A ? B)
to be true, A must occur more than once (with a
different B), or A must map to more than one B in
a given row)
nonredundancy
no attribute in the key can be deleted and still
be unique
minimal set of columns (Simsion)

9
keys and dependencies
EMPLOYEE1 (Emp_ID, Name, Dept_Name, Salary)
determinant
functional dependency
10
EMPLOYEE2 (Emp_ID, Course_Title, Name,
Dept_Name, Salary, Date_Completed)
not fully functionally dependant on the primary
key
11
(No Transcript)
12
(No Transcript)
13
Normal Forms provide database designers with

A formal framework for analyzing relation schemas
based on their keys and on the functional
dependencies among their attributes.
A series of tests that can be carried out on
individual relation schemas so that the
relational database can be normalized to any
degree.

14
Keys

The difference between a key and a superkey is
that a key has to be minimal.
Example
SSN is a key for EMPLOYEE, whereas SSN,
SSN,ENAME, SSN, ENAME, BDATE are all
superkeys.
superkeya superkey is a set of attributes S ?
RA1,A2,.An with the property that no two
tuples t1 and t2 in any relation state r of R
will have t1S t2S.
A key K is a superkey with the additional
property that removal of any attribute from K
will cause K not to be a superkey anymore.

15
Armstrongs Axioms

For computing the set of FDs that follow a given
FD, the
following rules called Armstrongs axioms are
useful
Reflexivity If B ? A, then A ? B
Augmentation If A ? B, then A ? C ? B ? C
Note also that if A ? B, then A ? C ? B for any
set of attributes C.
Transitivity If A ? B and B ? C then A ? C

16
Inference Rules for FDs
A1, A2, , An ? B1, B2, , Bm
Splitting rule and Combining rule
Is equivalent to
A1, A2, , An ? B1 A1, A2, , An ? B2 . . . .
. A1, A2, , An ? Bm
17
Inference Rules for FDs(continued)
Trivial Rule
A1, A2, , An ? Ai
where i 1, 2, ..., n
Why ?
18
Inference Rules for FDs(continued)
Transitive Closure Rule
A1, A2, , An ? B1, B2, , Bm
If
and
B1, B2, , Bm ? C1, C2, , Cp
A1, A2, , An ? C1, C2, , Cp
then
Why ?
19
(No Transcript)
20
Example (continued)
1. name ? color 2. category ? department 3.
color, category ? price
Start from the following FDs
Infer the following FDs
21
Another Rule
Augmentation
A1, A2, , An ? B
If
then
A1, A2, , An , C1, C2, , Cp ? B
Augmentation follows from trivial rules and
transitivityHow ?
22
Problem infer ALL FDs

Given a set of FDs, infer all possible FDs
How to proceed ?
Try all possible FDs, apply all 3 rules
E.g. R(A, B, C, D) how many FDs are possible ?
Drop trivial FDs, drop augmented FDs
Still way too many
Better use the Closure Algorithm (next)

23
Closure of a set of Attributes
Given a set of attributes A1, , An The
closure, A1, , An , is the set of attributes
Bs.t. A1, , An ? B
name ? color category ? department color,
category ? price
Example
Closures name name, color
name, category name, category, color,
department, price color color
24
Closure Algorithm
Start with XA1, , An. Repeat until X doesnt
change do if B1, , Bn ? C is a FD
and B1, , Bn are all in X then
add C to X.
Example
name ? color category ? department color,
category ? price
name, category name, category,
color, department, price
25
Example
A, B ? C A, D ? E B ? D A, F ? B
R(A,B,C,D,E,F)
Compute A,B X A, B,
Compute A, F X A, F,

26
Using Closure to Infer ALL FDs
Example
A, B ? CA, D ? B B ? D
Step 1 Compute X, for every X
A A, B BD, C C, D D AB ABCD,
AC AC, AD ABCD ABC ABD ACD ABCD
(no need to compute why ?) BCD BCD, ABCD
ABCD
Step 2 Enumerate all FDs X ? Y, s.t. Y ? X and
X?Y ?
AB ? CD, AD?BC, ABC ? D, ABD ? C, ACD ? B
27
Keys

If a relation schema has more than one minimal
key, each is called a candidate key.

28
Keys

one of the candidate keys is designated to be the
primary key.
Each relation schema must have a primary key.
For example, SSN is the only candidate key for
EMPLOYEE, so it is also the primary key.

29
R(A B C D E)

FD1. A -gt C
FD2. BC -gtD
FD3. E -gtAB
result A
By FD1. A -gt C A? result
result A, C
By FD2. BC -gt D BC? result
?result A, C
By FD3. E -gtAB E? result
?result A, C
? A A, C

30
What is the closure of E ?FD1. A -gt CFD2. BC
-gtDFD3. E -gtAB

result E
By FD1. A -gt C A ? result
result E
By FD2. BC -gt D BC? result
?result E
By FD3. E -gtAB E ? result
?result E,A, B
By FD1. A -gt C A ? result
? result E,A, B,C
By FD2. BC -gt D BC ? result
result E,A, B,C,D
Thus E A,B,C,D,E
E

Similarly B B
C C
D D
E E,A,B,C,D
E is a candidate key
Now, we see
AB ABCD AC AC AD
ACD
BC BCD BD BD CD
CD
ABC ABCD ABD ABCD BCD
BCD
ACD ACD

32
What is the largest normal form of this table?
R(A B C D E)
FD1. A -gtC FD2. BC -gtD FD3. E -gtAB Answer E
is the only candidate key of R The non-prime
attributes are A, B, C, D As FD!. A-gtC, we have
transitive dependency. Thus R(ABCD) is 2NF but
not 3NF
33
What is Normalization?

The purpose of normalization is to produce a
stable set of relations that is a faithful model
of the operations of the enterprise. By following
the principles of normalization, we can achieve a
design that is highly flexible, allowing the
model to be extended when needed to account for
new attributes, entity sets, and relationships.

34
Normal Forms

A relation is in specific normal form if it
satisfies the set of requirements or constraints
for that form. All of the normal forms are nested
in that each satisfies the constraints of the
previous one but is a "better" form because each
eliminates flaws found in the previous

35
(No Transcript)
36
Steps in Normalization
37
1NF

relation is in first normal form if it contains
no multivalued attributes
remove repeating groups to a new table as already
demonstrated, carrying the PK as a FK

38
First Normal Form ( 1NF )

the domains of attributes must include only
atomic(simple, indivisible) values and the value
of any attribute in a tuple must be a single
value from the domain of the attribute.

39
First Normal Form ( 1NF )

example
Department
DNAME DNUMBER DMGRSSN DLOCATIONS
research 5 333445555
Bellaire ,
Sugarland Houston
Administration 4 987654321
Stafford
Headquarters 1 888665555
Houston
the domain of DLOCATIONS contains atomic values,
but some tuples can have a set of these values.
In this case,
DNUMBER x-gtDLOCATIONS.
The domain of DLOCATIONS contains sets of values
and hence in non-atomic.

40
Our Example in 1NF
PROJ_NUM
PROJ_NAME
EMP_NUM
EMP_NAME
JOB_CLASS
CHG_HOUR
HOURS

Key (PROJ_NUM, EMP_NUM)
Given PROJ_NUM
PROJ_NAME is determined
Given EMP_NUM
EMP_NAME, JOB_CLASS, and CHG_HOUR are determined

41
2NF

a relation is in second normal form if it is in
first normal form AND every nonkey attribute is
fully functionally dependant on the primary key
i.e. remove partial functional dependencies, so
no nonkey attribute depends on just part of the
key

42
EMPLOYEE2 (Emp_ID, Course_Title, Name,
Dept_Name, Salary, Date_Completed)
not fully functionally dependant on the primary
key
43
Second Normal Form ( 2NF )

it is based on the concept of full functional
dependency.
A functional dependency X?Y is a full functional
dependency , for any attribute A ? X, X - A
? Y.

44
Second Normal Form

A relation is in second normal form (2NF) if and
only if it is in first normal form and all the
nonkey attributes are fully functionally
dependent on the key.

45
Second Normal Form

A table is in second normal form (2NF) if
It is in 1NF
It includes no partial dependencies. No
attribute is dependent on only a portion of the
primary key.

46
2NF

a relation is in 2NF if it is in 1NF and any one
of these is true
the PK consists of only 1 attribute
all attributes are part of the PK (no nonkey
attributes)
every nonkey attribute is functionally dependant
on the whole PK

47
2NF (Example)
A B C D
2 Candidate Keys
R with keyAB is NOT 2NF
R with keyAC is NOT 2NF
48
Second Normal Form ( 2NF )
fd1
fd2
fd3

SSN, PNUMBER?HOURS is a fully dependency
(neither SSN?HOURS nor PNUMBER?HOURS holds).

49
Second Normal Form ( 2NF )
EMP_PROJ
fd1
fd2
fd3
2NF NORMALIZATION
EP2
EP3
EP1
fd2
fd1
fd3

The functional dependencies fd1,fd2,fd3 lead to
the decomposition of EMP_PROJ into the three
relation schemas EP1,EP2,EP3, each of which is in
2NF.

50
(No Transcript)
51
1NF ?2NF

EMPLOYEE2 (Emp_ID, Course_Title, Name,
Dept_Name, Salary, Date_Completed)
?
EMPLOYEE1 (Emp_ID, Name, Dept_Name, Salary)
and
EMP_COURSE (Emp_ID, Course_Title, Date_Completed)
EMPLOYEE1 satisfies condition1
EMP_COURSE satisfies condition3

52
Converting to 2NF

To convert from 1NF to 2NF, list each key
component and then the key itself.
Each component will become the key in a new table.

53
Our Example in 2NF
Table Name PROJECT
PROJ_NUM
PROJ_NAME
Table Name EMPLOYEE
CHG_HOUR
EMP_NUM
EMP_NAME
JOB_CLASS
Table Name ASSIGN
HOURS
PROJ_NUM
EMP_NUM
54
Problems with 2NF

Transitive Dependency
An attribute that is dependent on a non-prime
attribute exhibits transitive dependency.
Still leads to data anomalies.

CHG_HOUR
EMP_NUM
EMP_NAME
JOB_CLASS
Our example contains the transitive
dependency JOB_CLASS -----gt CHG_HOUR
55
Second Normal Form

Second normal form
Let R be a relation, and let F be the set of
governing FDs. An attribute belongs to R is
prime if a key of R contains A. In other words,
A is prime in R if there exists KltR such that
(1) K-gtR,
(2) for all B belongs to K, (K-B)-gtR not
belongs to F, and
(3) A belongs to K

56
(No Transcript)
57
General Definitions of Second Normal Form

A relation schema R is in second normal form
(2NF) if every nonprime attribute A in R is fully
functionally dependent on every key of R.

What is a Decomposition?
Let R be a relation schema. A set of relation
schemas R1, R2,, Rn is a decomposition of R if
R R1 U R2 UU Rn
That is, R1, R2,, Rn is a decomposition of R
for I1,2,,n, each Ri is a subset of R, and
every attribute in R appears in at least one Ri.

Normalization Using Functional Dependencies
Desirable properties of Decomposition
1. Lossless-Join Decomposition
Let R be a relation schema, and let F be a set of
functional dependencies on R. Let R1 and R2 form
a decomposition of R. This decomposition is a
lossless-join decomposition of R if at least one
of the following functional dependencies are in
F
R1 ? R2 ? R1
R1 ? R2 ? R2

2. Dependency Preservation
When an update is made to the database, the
system should be able to check if it satisfies
all the given functional dependencies. If we want
to check updates efficiently, we should design
relational-database schemas that allow update
validation without the computation of joins.
To decide whether joins must be computed we need
to determine what functional dependencies may be
tested by checking each relation individually.

Cont.
Let F be a set of functional dependencies on a
schema R, and Let R1, R2,, Rn be a decomposition
of R. The restriction of F to Ri is the set Fi of
all functional dependencies in F that include
only attributes of Ri.
Let F F1 U F2 U U Fn. F is a set of
functional dependencies on schema R, in general,
F ? F. However, it may be F F. If the
latter is true, then every dependency in F is
logically implied by F, and if we verify that F
is satisfied, we have verified that F is
satisfied. We say that a decomposition having the
property F F is a dependency preserving
decomposition.

Algorithm to test dependency preservation
compute F
for each schema Ri in D do
begin
Fi the restriction of F to Ri
end
F0
for each restriction Fi do
begin
FF U Fi
end
compute F
if(F F) then return (true)
else return (false)
Note since the first step, computation of F
takes exponential time, it is often easier not to
apply the algorithm.

63
Boyce-Codd Normal Form

A relation schema R is in BCNF with respect to a
set F of functional dependencies
if for all functional dependencies in F of the
form a b, where a Í R and b Í R, at least one
of the following holds.
a Í R is a trivial functional dependency ( b Í a
)
a is a superkey for schema R.

64
Cont.

A database design is in BCNF if each member of
the set of relation schemas that constitutes the
design is in BCNF.
To determine whether these schemas are in BCNF,
we need to determine what functional dependencies
apply to them.
Note examples are available in text P225-226

65
BCNF Decomposition Algorithm
Result R done false compute F while(
not done ) do if( there is a schema Ri in result
that is not in BCNF ) then begin let a b
be a nontrivial functional dependency that holds
on Ri such that a Ri is not in F, and a Ç
b 0 result ( result -Ri ) È ( Ri - B ) È
( a , b ) end else done true
66
Cont.

Not every BCNF decomposition is
dependency preserving
We can not always satisfy all three design
goals
1. BCNF
2. Lossless join
3. Dependency preservation

67
Cont.
Example Banker-schema ( branch-name,
customer-name, banker-name ) This banker-schema
indicates that a customer has "personal banker"
in a particular branch. The set F of functional
dependencies that we require to hold on the
banker-schema is banker-name
branch-name branch-name customer-name banker
name Banker-schema is not in BCNF because
banker-name is not a superkey

Write a Comment

User Comments (0)