Title: Midterm 3 Revision 1
1Midterm 3 Revision 1
CS157A Lecture 19
- Prof. Sin-Min Lee
- Department of Computer Science
2Revision
- For each of the E-R problems, develop a
relational schema - Identify the primary keys, foreign keys and
attribute domains - Comment on the data quality of each of the
relational schema that you have developed
3Normalization
- A process to design good relations
- Not a requirement of the relational model or any
other model - Attempts to reflect the semantics of the data.
This meaning is determined by the organization. - What does salary mean? Profits? Stock Price?
- Relations can be in 1 NF, 2 NF, 3 NF, Boyce-Codd
NF, 4 NF, and 5 NF - In this revision, we will concern ourselves with
1 NF, 2 NF, 3 NF and 4NF.
4First Normal Form
- All attributes must be single-valued
- Each row can assume only one value for a given
attribute - Consider employees working for a given department
at a given salary level - EMPLOYEE(EMPNO, department, salary)
- Now consider employees who works for a given
department and receive a raise every year. We
are interested in maintaining data about each
employees yearly salary. - EMPLOYEE(EMPNO, YEAR, department, salary)
- Note the very different meanings of the attribute
salary - Note the change in primary key - why?
5Second Normal Form
6(No Transcript)
7Third Normal Form
- Consider the following A large urban university
offers several types of scholarships. Lets
assume that there are four types and we have a
predefined code for each. The type of
scholarship that a student is eligible for is
based on their major. An analyst sets up the
following database - (SSNO, name, major, GPA, scholarship type)
- What problems are we likely to encounter? Why?
- Note that scholarship type is not dependent on
(determined by) SSNO. Rather, it is determined
by major, which is determined by SSNO. - This represents a transitive dependency.
8Third Normal Form
- Decompose the relation into two relations
- Student(SSNO, name, major, GPA)
- Scholarship_Eligibility(Major, Scholarship_Type)
- The 3 NF rule
- Once you are in 2 NF, examine if there are any
transitive dependencies - Engage in lossless decomposition so as to remove
these transitive dependencies - You are now in 3 NF!
9Review
- Given R(A, B, C) and FA-gtB
- Is R in 2NF?
- Is R in 3NF?
- Is R in BCNF
10Review
- Given R(A, B, C) and FA-gtB, B-gtC
- Is R in 2NF?
- Is R in 3NF?
- Is R in BCNF
11Review
- Given R(A, B, C) and FA-gtB, B-gtC
- Is R in 2NF?
- Is R in 3NF?
- Is R in BCNF
12Lossless Join Property
- A decomposition D R1, R2, ..., Rm of R has
the lossless join property with respect to the
set of dependencies F on R if for every relation
instance r of R that satisfies F, the following
holds - (?ltR1gt(r), ... , ?ltRmgt(r)) r
13(No Transcript)
14 Testing for the lossless join property
- Consider R(A, B, C, D, E) and
- FAB ? C, B ? D
- DR1(A, B, C), R2(B,D)
- Is D a lossless decomposition?
15Lossless Join Property
- Consider R(A, B, C, D) and F AB ?C
- Let D1R1(A, B, C), R2(C, D)
- Let r be one of the legal instance
- A B C D
- ----------------------------
- a1 b1 c1 d1
- a2 b1 c1 d2
- a2 b2 c2 d2
16Lossless Join Property
- ?ltR1gt ?ltR2gt
- A B C C D
- ------------------- -----------
- a1 b1 c1 c1 d1
- a2 b1 c1 c1 d2
- a2 b2 c2 c2 d2
17Lossless Join Property
- ?ltR1gt ?ltR2gt
- A B C D
- --------------------------------
- a1 b1 c1 d1
- a1 b1 c1 d2 lt------- Spurious tuple
- a2 b1 c1 d1 lt-------- Spurious tuple
- a2 b1 c1 d2
- a2 b2 c2 d2
- R1 and R2 are not a lossless decomposition.
18Lossless Join Property
- Informally, a decomposition is lossless if no
spurious tuples appear when the relations in the
decomposition are JOINed. - Thus, decomposition D1 is not lossless
19 Testing for the lossless join property
- Step1 create a matrix S with one row i for each
relation Ri in the decomposition D, and one
column j for each attribute Aj in R - Step 2 set S(i,j)bij for all matrix entries
20 Testing for the lossless join property
- Step 3.
- for each row i representing relation schema Ri
- for each column j representing attribute Aj
- if Ri includes attribute Aj
- then set S(i,j)aj
21 Testing for the lossless join property
- Step 4.
- repeat the following until a loop execution
results - in no changes to S
- for each functional dependency X ? Y in F
- for all rows in S which have the same
symbols in the columns corresponding to
attributes in X
22 Testing for the lossless join property
- Step 4 continued .
- make the symbols in each column that correspond
- to an attribute in Y be the same in all these
rows as follows - if any of the rows has an "a" symbol for the
column, set the other rows to the same "a" symbol
in the column - if no "a" symbol exists for the attribute in any
of the rows choose one of the "b" symbols that
appear in one of the rows for the attribute and
set the other rows to the "b" symbols in the
column
23 Testing for the lossless join property
- Step 5
- If a row is made up entirely of "a" symbols,
then the decomposition has the lossless join
property-otherwise, it does not -
24 Testing for the lossless join property
- Consider R(A,B,C) and FA?B, B ? C
- D R1(A,B), R2(B,C)
- A B C A B C
- ---------------------- ----------------------
- R1 a1 a2 b13 gt a1 a2 a3
- R2 b21 a2 a3 b21 a2 a3
-
- Thus, D is a lossless decomposition.
25(No Transcript)
26Review
- Given R(A, B, C) and
- FFD1,FD2,FD3,,FDk
- Is R in 2NF?
- Is R in 3NF?
- Is R in BCNF
- which is in 3NF but not in BCNF.
27(No Transcript)
28(No Transcript)
29(No Transcript)
30(No Transcript)
31Overview
- It is possible to decompose any relational schema
into a set of relational schemas with the
following properties - 1) Attribute Preserving
- 2) FDs preserving
- 3) Lossless Join
32(No Transcript)
33(No Transcript)
34 BCNF normalisation
35(No Transcript)
36The Decomposition Algorithm
- Input A relational schema R and a set of FDs F.
- Output A set of relational schemas R1, R2, ...,
Rm
37The Decomposition Algorithm
- Step 1. Find a minimal cover G for F
- Step 2. For each left-hand side X that appears in
G, create a relation schema with attributes - X ? A1 ? A2, ... , ? Am
- where X ? A1, X ? A2, ... , X ? Am are all
dependencies in G with X as left-hand side. - Step 3. If none of the relation schemas contains
a key of R, create one more relation schema that
contains attributes that form a key for R. -
38Example
- Consider R(A, B, C, D, E) and
- FAB ? C, A ? BE, C ?E
- Step 1. minimal cover
- FminA?C, A?B, C?E
- Step 2. R1(A,B,C), R2(C,E)
- Step 3. Key A,D
- we have R3(A,D)
- Final Result
- R1(A, B,C), R2(C,E), and R3(A,D)
39BCNF Decomposition
- Property LJ1
- A decomposition DR1, R2 of R has the lossless
join property with respect to a set of functional
dependencies F on R if and only if either - the FD ((R1 ? R2) ? (R1 - R2)) is in F, or
- the FD ((R1 ? R2) ? (R2 - R1)) is in F
40BCNF Decomposition
- Property LJ2
- If a decomposition DR1, R2, ..., Rm of R has
the lossless join property with respect to a set
of functional dependencies F on R, and if a
decomposition D1Q1, Q2, ... ,Qk of Ri has the
lossless join property with respect to the
projection of F on Ri, then the decomposition - D2R1, R2, ... Ri-1, Q1, Q2, ..., Qk, Ri1,
..., Rm of R has the lossless join property with
respect to F
41BCNF Decomposition - Algorithm
- 1. Set D ? R
- 2. While there is a relation schema Q in D that
is not in BCNF do - begin
- choose a relation schema Q in D that is not in
BCNF - find a functional dependency X ? Y in Q that
violates BCNF - replace Q in D by two schemas
- (Q-Y) and (X ? Y)
- end
-
42BCNF Decomposition - Example
- Consider R(A,B,C,D) and
- FA ? B, B ? C, D ? B
- Decompose R into BCNF relations.
- Step 1.DR(A,B,C,D)
- Step 2. Loop 1.R is not in BCNF because A ? B and
A is not a superkey - decompose R into R1(A, C, D), and R2(A, B)
- Loop 2. R1 is not in BCNF because A ? C and A
is not a superkey - decompose R1 into R11(A, D) and R12(A, C)
- ResultDR11(A,D), R12(A,C), R2(A,B)
-
43(No Transcript)
44Questions
- Is the decomposition always unique?
- Is a decomposition still useful if it does not
preserve the lossless property? - Is a decomposition still useful if it does not
preserve the dependency preservation condition? - Is a decomposition still useful if it does not
preserve the attribute preservation condition?
45Overview
- Given a relation schema R(A1, A2, ... , An).
- If R is not in the third normal form (3NF), we
wan to decompose it into a set of relation schema
D R1, R2, ... ,Rm , where each Ri is in 3NF,
such that the following conditions are held - Attribute preservation condition.
- Dependency preservation condition
- Lossless join condition
46(No Transcript)
47Attribute Preservation Condition
- Attribute preservation condition states that the
union of attributes of Ri equal to the set of
attributes of R. - For example Given R(A, B, C, D) and the
decomposition - D1 R1(A,B), R2(B,C) and R3(A,C,D). D1
satisfies the attribute preservation condition.
48Attribute Preservation Condition
- Given R(A, B, C, D) and the decomposition
- D2R1(A, B), R2(B,C), R3(A, C),
- The attribute preservation condition is violated
because D is missing (not preserved in the
decomposition).
49Dependency Preservation Condition
- We say that a decomposition DR1, R2, ... , Rm
of R is dependency preserving with respect to F
if the union of the projections of F on each Ri
in D is equivalent to F. That is - ((?F(R1) ? ... ? ?F(Rm)) F
- Given a set of dependencies F on R, the
projection of F on Ri, denoted by ?F(Ri) where Ri
is a subset of R, is the set of dependencies X ?
Y in F such that the attributes in X ? Y are all
contained in Ri.
50Dependency Preservation Condition
- Given R(A, B, C, D) and F A ? B, B ? C, C ?
D - Let D1R1(A,B), R2(B,C), R3(C,D)
- ?F(R1)A ? B
- ?F(R2)B ? C
- ?F(R3)C ? D
- FDs are preserved.
51Dependency Preservation Condition
- Given R(A, B, C, D) and F A ? B, B ? C, C ?
D - Let D2R1(A,B, R2(B,C), R3(A, D), then FDs
are not preserved. -
52Dependency Preservation Condition
- Given R(A, B, C, D) and F A ? B, B ? C, C ?
D - Let D2R1(A,B, R2(B,C), R3(A, D), then FDs
are not preserved. -
53Dependency Preservation Condition
- We want to preserve the dependencies because each
dependency in F represents a constraint on a
database.
54Dependency Preservation Condition
- If one of the dependencies is not represented by
the dependencies on some individual relation Ri
of the decomposition, we will not be able to
enforce this constraint by looking only at an
individual relation, instead, to enforce the
constraint, we will have to join two or more of
the relations in the decomposition and then check
that functional dependency hold in the result of
the join operation. This is very inefficient.
55(No Transcript)
56Picture of MVD X -gt-gtY
X Y others equal exchange
57MVD Rules
- Every FD is an MVD.
- If X -gtY, then swapping Y s between two tuples
that agree on X doesnt change the tuples. - Therefore, the new tuples are surely in the
relation, and we know X -gt-gtY. - Complementation If X -gt-gtY, and Z is all the
other attributes, then X -gt-gtZ.
58Fourth Normal Form
- The redundancy that comes from MVDs is not
removable by putting the database schema in BCNF. - There is a stronger normal form, called 4NF, that
(intuitively) treats MVDs as FDs when it comes
to decomposition, but not when determining keys
of the relation.
594NF Definition
- A relation R is in 4NF if whenever X
-gt-gtY is a nontrivial MVD, then X is a
superkey. - Nontrivial means that
- Y is not a subset of X, and
- X and Y are not, together, all the attributes.
- Note that the definition of superkey still
depends on FDs only.
60BCNF Versus 4NF
- Remember that every FD X -gtY is also an MVD, X
-gt-gtY. - Thus, if R is in 4NF, it is certainly in BCNF.
- Because any BCNF violation is a 4NF violation.
- But R could be in BCNF and not 4NF, because
MVDs are invisible to BCNF.
61(No Transcript)