Title: Functional Dependencies FDs and
1Lecture 23
- Functional Dependencies (FDs) and
- Normalization
2Holiday season offers
- Exam 2 final seem repetitive?
- Exam 1 13, Exam 213, Final15
- Option 1 No Final
- Exam 2 will include SQL NFs QP/QO will be on
last day 12/13 (projects will be demoed on 12/11
and due on 12/14 by 1159 PM) - 20.5 each (if you improve 18 exam1 and 23
exam2)
3Holiday season offers
- Option 2 Final optional
- Final will include ERD/EERD mapping SQL NFs
QP/QO - Option 3 No Exam 2
- Final will include ERD/EERD mapping SQL NFs
QP/QO - 15 exam 1 and 26 final exam
4Review
- What is an FD?
- Inference rules?
- Why helpful?
- What are they?
- Prove augmentation X ? Y ? XZ ? YZ
5Closure of a Set of FDs
- The closure of a set F of FDs is F which is the
set of all FDs that can be inferred from F - F SSN ? ENAME, BDATE, ADDRESS DNUMBER,
DNUMBER ? DNAME, DMGSSN - SSN ? DNAME, DMGSSN
- SSN ? SSN
- DNUMBER ? DNAME
- Most of the times it is IMPOSSIBLE to list all
FDs for a given situation - F is impossible to find sometimes
6Closure of a Set of Attributes
- Test if a certain FD is in F
- Entailed by F
- Closure of a set of attributes X with respect to
F is the set X of all attributes that are
functionally determined by X - I.e. Y such that X ? Y is true in F
- X can be calculated by repeatedly applying the
inferences using the FDs in F - READ Algorithm 10.1 p.353 in book
7Closure of a Set of Attributes
- Algorithm to compute X
- XX
- Repeat
- oldXX
- For each FD Y? Z in F do
- If X ? Y then XX U Z
- Until (X oldX) //i.e. until no change
occurs - Using the attribute closure we can determine if
any FD is in F (i.e. entailed by F)
8Example - Computing Attribute Closure
X XF A A, D, E B
B D D, E AB A,
B, C, D, E
F AB ? C A ? D D ? E AC
? B
Is AB ? E entailed by F? Is D? C entailed by
F? What is a possible key for R?
9In-class Exercise
- F
- SSN ? EName
- PNumber ? PName, PLocation
- SSN, PNumber?Hours
-
- Find SSN, PNumber, SSN, PNumber
- Key?
10Equivalence of Sets of FDs
- For two sets of FDs F and G, F is said to cover G
if every FD in G can be inferred from F - i.e., if G ? F
- Two sets of FDs F and G are equivalent if
- every FD in F can be inferred from G, and
- G covers F
- every FD in G can be inferred from F
- F covers G
- F and G are equivalent if F G
- There is an algorithm for checking equivalence of
sets of FDs
11Equivalence of Sets of FDs
12Minimal Sets of FDs
- A set of FDs F is minimal if it satisfies the
following - (1) Every dependency in F has a single attribute
for its RHS - Minimal RHS
- (2) We cannot replace any dependency X ? A in F
with a dependency Y ? A, where Y is a proper
subset of X ( Y ? X) and still have a set of
dependencies that is equivalent to F - Minimal LHS
- (3) We cannot remove any dependency from F and
have a set of dependencies that is equivalent to
F - All FDs are necessary
- A minimal cover of a set of FDs E is a minimal
set of FDs that equivalent to E - Algorithm 10.2 p.354 in book
13Normalization of Relations
- Normalization The process of decomposing
unsatisfactory "bad" relations into smaller
relations - By breaking up their attributes
- Main purpose to achieve some desirable properties
- Minimizing redundancy or update anomalies
- A normal form (NF) is a condition that indicates
the degree to which a relation has been
normalized - The normal form of a relation refers to the
highest normal form that a relation meets
14Normalization of Relations
- 1NF, 2NF, 3NF, BCNF based on keys and FDs of a
relation schema - 4NF based on keys, multi-valued dependencies
MVDs5NF based on keys, join dependencies JDs
(Chapter 11) - On their own, NFs DO NOT guarantee good design
(in theoretical terms)! - Additional properties may be needed
- Lossless join (a must)
- No spurious tuple generation problem
- Dependency preservation (an advantage)
- After decomposition, each FD is represented some
relation
15Practical Use of Normal Forms
- Even though higher (NFs means better
- Database designers need not normalize to the
highest possible normal form - Usually up to 3NF, BCNF or 4NF
- De-normalization the process of storing the join
of higher normal form relations as a base
relationwhich is in a lower normal form - For performance reasons
16Review
- A superkey of a relation schema R A1, A2,
...., An is a set of attributes S ? R with the
property that no two tuples t1 and t2 in any
legal relation state r of R will have t1S
t2S - A key/candidate key K is a superkey with the
additional property that removal of any attribute
from K will cause K not to be a superkey any more
(i.e. minimal) - If a relation schema has more than one key
- One of the candidate keys is arbitrarily
designated to be the primary key, and the others
are called secondary keys - A prime attribute must be a member of some
candidate key - A non-prime attribute is not a prime
attributethat is, it is not a member of any
candidate key
171NF First Normal Form
- Disallows
- Multi-valued attributes
- E.g. DEPARTMENT (DNumber, .,DLOCATIONS)
- Composite attributes
- E.g. ADDRESS (Street, City, State, Zip)
- Nested relations/compound attributes
combinations of the above - An attribute PROJECT for every EMPLOYEE listing
the project number and name - EMPLOYEE (SSN, ..,PROJECT (PNUMBER, PNAME))
- (Defn.) Every attribute must be dependant on
every key - The only attributes allowed in 1NF are single
atomic (or indivisible) values - Otherwise, attributes cant depend on or cant be
determined by the key - Considered to be part of the definition of
relation
18L.H.S of FD
R.H.S of FD
Better design?
19Solutions
- DLOCATIONS multi-valued attribute not
dependent on DNUMBER (in b) - 3 Solutions
- (1) Previous slide
- Suffers from redundancy
- (2) If maximum number is known (e.g. 3) then
replace DLOCATIONS by 3 attributes Loc 1, Loc2,
Loc3 - Lot of nulls
- Difficulty in querying Find all departments in
St Cloud - (3) Remove DLOCATIONS from this relations and
create a new relation LOCATION (DNUMBER,
DLOCATION) - Best theoretical solution
- Project off bad attributes and store in a new
relation with the key
20(No Transcript)
212NF Second Normal Form
- Full functional dependency - a Full FD or FFD Y ?
Z is an FD where removal of any attribute from Y
means the FD does not hold any more - SSN, PNUMBER ? HOURS is a full FD since neither
SSN ? HOURS nor PNUMBER ? HOURS hold - SSN, PNUMBER ? ENAME is not a full FD (it is
called a partial dependency) since SSN ? ENAME
also holds (an FFD) - (Defn.) A relation schema R is in second normal
form (2NF) - If it in 1NF and
- If every non-prime attribute A in R is fully
functionally dependent on every key of R (i.e. No
partial dependencies allowed for nonprime
attributes) - R can be decomposed into 2NF relations via the
process of 2NF normalization
22Non-prime attributes are only associated with the
part of the key on which they are fully
functional dependent
23Assume COUNTRY_NAME, LOT is another candidate
key