Midterm 3 Revision 1 - PowerPoint PPT Presentation

1 / 61

About This Presentation

Title:

Midterm 3 Revision 1

Description:

Note the very different meanings of the attribute salary' Note ... for all rows in S which have the same symbols in the columns corresponding to attributes in X ... – PowerPoint PPT presentation

Number of Views:18

Avg rating:3.0/5.0

Slides: 62

Provided by: Lee144

Learn more at: http://www.cs.sjsu.edu

Category:

more less

Transcript and Presenter's Notes

Title: Midterm 3 Revision 1

1
Midterm 3 Revision 1
CS157A Lecture 19

Prof. Sin-Min Lee
Department of Computer Science

2
Revision

For each of the E-R problems, develop a
relational schema
Identify the primary keys, foreign keys and
attribute domains
Comment on the data quality of each of the
relational schema that you have developed

3
Normalization

A process to design good relations
Not a requirement of the relational model or any
other model
Attempts to reflect the semantics of the data.
This meaning is determined by the organization.
What does salary mean? Profits? Stock Price?
Relations can be in 1 NF, 2 NF, 3 NF, Boyce-Codd
NF, 4 NF, and 5 NF
In this revision, we will concern ourselves with
1 NF, 2 NF, 3 NF and 4NF.

4
First Normal Form

All attributes must be single-valued
Each row can assume only one value for a given
attribute
Consider employees working for a given department
at a given salary level
EMPLOYEE(EMPNO, department, salary)
Now consider employees who works for a given
department and receive a raise every year. We
are interested in maintaining data about each
employees yearly salary.
EMPLOYEE(EMPNO, YEAR, department, salary)
Note the very different meanings of the attribute
salary
Note the change in primary key - why?

5
Second Normal Form
6
(No Transcript)
7
Third Normal Form

Consider the following A large urban university
offers several types of scholarships. Lets
assume that there are four types and we have a
predefined code for each. The type of
scholarship that a student is eligible for is
based on their major. An analyst sets up the
following database
(SSNO, name, major, GPA, scholarship type)
What problems are we likely to encounter? Why?
Note that scholarship type is not dependent on
(determined by) SSNO. Rather, it is determined
by major, which is determined by SSNO.
This represents a transitive dependency.

8
Third Normal Form

Decompose the relation into two relations
Student(SSNO, name, major, GPA)
Scholarship_Eligibility(Major, Scholarship_Type)
The 3 NF rule
Once you are in 2 NF, examine if there are any
transitive dependencies
Engage in lossless decomposition so as to remove
these transitive dependencies
You are now in 3 NF!

9
Review

Given R(A, B, C) and FA-gtB
Is R in 2NF?
Is R in 3NF?
Is R in BCNF

10
Review

Given R(A, B, C) and FA-gtB, B-gtC
Is R in 2NF?
Is R in 3NF?
Is R in BCNF

11
Review

Given R(A, B, C) and FA-gtB, B-gtC
Is R in 2NF?
Is R in 3NF?
Is R in BCNF

12
Lossless Join Property

A decomposition D R1, R2, ..., Rm of R has
the lossless join property with respect to the
set of dependencies F on R if for every relation
instance r of R that satisfies F, the following
holds
(?ltR1gt(r), ... , ?ltRmgt(r)) r

13
(No Transcript)
14
Testing for the lossless join property

Consider R(A, B, C, D, E) and
FAB ? C, B ? D
DR1(A, B, C), R2(B,D)
Is D a lossless decomposition?

15
Lossless Join Property

Consider R(A, B, C, D) and F AB ?C
Let D1R1(A, B, C), R2(C, D)
Let r be one of the legal instance
A B C D
----------------------------
a1 b1 c1 d1
a2 b1 c1 d2
a2 b2 c2 d2

16
Lossless Join Property

?ltR1gt ?ltR2gt
A B C C D
------------------- -----------
a1 b1 c1 c1 d1
a2 b1 c1 c1 d2
a2 b2 c2 c2 d2

17
Lossless Join Property

?ltR1gt ?ltR2gt
A B C D
--------------------------------
a1 b1 c1 d1
a1 b1 c1 d2 lt------- Spurious tuple
a2 b1 c1 d1 lt-------- Spurious tuple
a2 b1 c1 d2
a2 b2 c2 d2
R1 and R2 are not a lossless decomposition.

18
Lossless Join Property

Informally, a decomposition is lossless if no
spurious tuples appear when the relations in the
decomposition are JOINed.
Thus, decomposition D1 is not lossless

19
Testing for the lossless join property

Step1 create a matrix S with one row i for each
relation Ri in the decomposition D, and one
column j for each attribute Aj in R
Step 2 set S(i,j)bij for all matrix entries

20
Testing for the lossless join property

Step 3.
for each row i representing relation schema Ri
for each column j representing attribute Aj
if Ri includes attribute Aj
then set S(i,j)aj

21
Testing for the lossless join property

Step 4.
repeat the following until a loop execution
results
in no changes to S
for each functional dependency X ? Y in F
for all rows in S which have the same
symbols in the columns corresponding to
attributes in X

22
Testing for the lossless join property

Step 4 continued .
make the symbols in each column that correspond
to an attribute in Y be the same in all these
rows as follows
if any of the rows has an "a" symbol for the
column, set the other rows to the same "a" symbol
in the column
if no "a" symbol exists for the attribute in any
of the rows choose one of the "b" symbols that
appear in one of the rows for the attribute and
set the other rows to the "b" symbols in the
column

23
Testing for the lossless join property

Step 5
If a row is made up entirely of "a" symbols,
then the decomposition has the lossless join
property-otherwise, it does not

24
Testing for the lossless join property

Consider R(A,B,C) and FA?B, B ? C
D R1(A,B), R2(B,C)
A B C A B C
---------------------- ----------------------
R1 a1 a2 b13 gt a1 a2 a3
R2 b21 a2 a3 b21 a2 a3
Thus, D is a lossless decomposition.

25
(No Transcript)
26
Review

Given R(A, B, C) and
FFD1,FD2,FD3,,FDk
Is R in 2NF?
Is R in 3NF?
Is R in BCNF
which is in 3NF but not in BCNF.

27
(No Transcript)
28
(No Transcript)
29
(No Transcript)
30
(No Transcript)
31
Overview

It is possible to decompose any relational schema
into a set of relational schemas with the
following properties
1) Attribute Preserving
2) FDs preserving
3) Lossless Join

32
(No Transcript)
33
(No Transcript)
34
BCNF normalisation
35
(No Transcript)
36
The Decomposition Algorithm

Input A relational schema R and a set of FDs F.
Output A set of relational schemas R1, R2, ...,
Rm

37
The Decomposition Algorithm

Step 1. Find a minimal cover G for F
Step 2. For each left-hand side X that appears in
G, create a relation schema with attributes
X ? A1 ? A2, ... , ? Am
where X ? A1, X ? A2, ... , X ? Am are all
dependencies in G with X as left-hand side.
Step 3. If none of the relation schemas contains
a key of R, create one more relation schema that
contains attributes that form a key for R.

38
Example

Consider R(A, B, C, D, E) and
FAB ? C, A ? BE, C ?E
Step 1. minimal cover
FminA?C, A?B, C?E
Step 2. R1(A,B,C), R2(C,E)
Step 3. Key A,D
we have R3(A,D)
Final Result
R1(A, B,C), R2(C,E), and R3(A,D)

39
BCNF Decomposition

Property LJ1
A decomposition DR1, R2 of R has the lossless
join property with respect to a set of functional
dependencies F on R if and only if either
the FD ((R1 ? R2) ? (R1 - R2)) is in F, or
the FD ((R1 ? R2) ? (R2 - R1)) is in F

40
BCNF Decomposition

Property LJ2
If a decomposition DR1, R2, ..., Rm of R has
the lossless join property with respect to a set
of functional dependencies F on R, and if a
decomposition D1Q1, Q2, ... ,Qk of Ri has the
lossless join property with respect to the
projection of F on Ri, then the decomposition
D2R1, R2, ... Ri-1, Q1, Q2, ..., Qk, Ri1,
..., Rm of R has the lossless join property with
respect to F

41
BCNF Decomposition - Algorithm

1. Set D ? R
2. While there is a relation schema Q in D that
is not in BCNF do
begin
choose a relation schema Q in D that is not in
BCNF
find a functional dependency X ? Y in Q that
violates BCNF
replace Q in D by two schemas
(Q-Y) and (X ? Y)
end

42
BCNF Decomposition - Example

Consider R(A,B,C,D) and
FA ? B, B ? C, D ? B
Decompose R into BCNF relations.
Step 1.DR(A,B,C,D)
Step 2. Loop 1.R is not in BCNF because A ? B and
A is not a superkey
decompose R into R1(A, C, D), and R2(A, B)
Loop 2. R1 is not in BCNF because A ? C and A
is not a superkey
decompose R1 into R11(A, D) and R12(A, C)
ResultDR11(A,D), R12(A,C), R2(A,B)

43
(No Transcript)
44
Questions

Is the decomposition always unique?
Is a decomposition still useful if it does not
preserve the lossless property?
Is a decomposition still useful if it does not
preserve the dependency preservation condition?
Is a decomposition still useful if it does not
preserve the attribute preservation condition?

45
Overview

Given a relation schema R(A1, A2, ... , An).
If R is not in the third normal form (3NF), we
wan to decompose it into a set of relation schema
D R1, R2, ... ,Rm , where each Ri is in 3NF,
such that the following conditions are held
Attribute preservation condition.
Dependency preservation condition
Lossless join condition

46
(No Transcript)
47
Attribute Preservation Condition

Attribute preservation condition states that the
union of attributes of Ri equal to the set of
attributes of R.
For example Given R(A, B, C, D) and the
decomposition
D1 R1(A,B), R2(B,C) and R3(A,C,D). D1
satisfies the attribute preservation condition.

48
Attribute Preservation Condition

Given R(A, B, C, D) and the decomposition
D2R1(A, B), R2(B,C), R3(A, C),
The attribute preservation condition is violated
because D is missing (not preserved in the
decomposition).

49
Dependency Preservation Condition

We say that a decomposition DR1, R2, ... , Rm
of R is dependency preserving with respect to F
if the union of the projections of F on each Ri
in D is equivalent to F. That is
((?F(R1) ? ... ? ?F(Rm)) F
Given a set of dependencies F on R, the
projection of F on Ri, denoted by ?F(Ri) where Ri
is a subset of R, is the set of dependencies X ?
Y in F such that the attributes in X ? Y are all
contained in Ri.

50
Dependency Preservation Condition

Given R(A, B, C, D) and F A ? B, B ? C, C ?
D
Let D1R1(A,B), R2(B,C), R3(C,D)
?F(R1)A ? B
?F(R2)B ? C
?F(R3)C ? D
FDs are preserved.

51
Dependency Preservation Condition

Given R(A, B, C, D) and F A ? B, B ? C, C ?
D
Let D2R1(A,B, R2(B,C), R3(A, D), then FDs
are not preserved.

52
Dependency Preservation Condition

Given R(A, B, C, D) and F A ? B, B ? C, C ?
D
Let D2R1(A,B, R2(B,C), R3(A, D), then FDs
are not preserved.

53
Dependency Preservation Condition

We want to preserve the dependencies because each
dependency in F represents a constraint on a
database.

54
Dependency Preservation Condition

If one of the dependencies is not represented by
the dependencies on some individual relation Ri
of the decomposition, we will not be able to
enforce this constraint by looking only at an
individual relation, instead, to enforce the
constraint, we will have to join two or more of
the relations in the decomposition and then check
that functional dependency hold in the result of
the join operation. This is very inefficient.

55
(No Transcript)
56
Picture of MVD X -gt-gtY
X Y others equal exchange
57
MVD Rules

Every FD is an MVD.
If X -gtY, then swapping Y s between two tuples
that agree on X doesnt change the tuples.
Therefore, the new tuples are surely in the
relation, and we know X -gt-gtY.
Complementation If X -gt-gtY, and Z is all the
other attributes, then X -gt-gtZ.

58
Fourth Normal Form

The redundancy that comes from MVDs is not
removable by putting the database schema in BCNF.
There is a stronger normal form, called 4NF, that
(intuitively) treats MVDs as FDs when it comes
to decomposition, but not when determining keys
of the relation.

59
4NF Definition

A relation R is in 4NF if whenever X
-gt-gtY is a nontrivial MVD, then X is a
superkey.
Nontrivial means that
Y is not a subset of X, and
X and Y are not, together, all the attributes.
Note that the definition of superkey still
depends on FDs only.

60
BCNF Versus 4NF