Title: Functional Dependencies FDs and
1Lecture 24 (11/10/05)
- Functional Dependencies (FDs) and
- Normalization
2Assignment
- Will post up today after 600 PM
3Inference Rules for FDs
- Given a set of FDs F, we can infer additional FDs
that hold whenever the FDs in F hold - Armstrong's inference rules
- IR1. (Reflexive) If Y ? X, then X ? Y
- (Generates trivial FDs)
- E.g. SSN, ENAME ? ENAME
- IR2. (Augmentation) If X ? Y, then XZ ? YZ (Note
that XZ stands for X U Z) - E.g. SSN ? ENAME, then SSN, PNUMBER ? ENAME,
PNUMBER - IR3. (Transitive) If X ? Y and Y ? Z, then X ? Z
- E.g. SSN ? DNO and DNO ? DNAME then SSN ? DNAME
4Inference Rules for FDs
- IR1, IR2, IR3 form a sound and complete set of
inference rules - Sound ? Any rule inferred using IR1, IR2 or IR3 a
valid FD - Complete ? All possible FDs can be generated
using them - Some additional inference rules that are useful
- IR4. (Decomposition) If X ? YZ, then X ? Y and X
? Z - IR5. (Union) If X ? Y and X ? Z, then X ? YZ
- IR6. (Pseudo-transitivity) If X ? Y and WY ? Z,
then WX ? Z - Can be deduced from IR1, IR2, and IR3
(completeness property)
5Proofs - SKIP
- IR1. (Reflexive) If Y ? X, then X ? Y
- For any two tuples t1 and t2 with t1X t2X
then t1Y t2Y because Y ? X - IR2. (Augmentation) If X ? Y, then XZ ? YZ
- Proof by contradiction
- If for two tuples t1 and t2 we have
- (1) t1X t2X
- (2) t1Y t2Y
- (3) t1XZ t2XZ
- (4) t1YZ ? t2YZ
- Cant be true since from (1) and (3) we have (5)
t1Z t2Z and from (2) and (5) we have t1YZ
t2YZ which contradicts (4)
6Proofs - SKIP
- IR3. (Transitive) If X ? Y and Y ? Z, then X ? Z
- For any two tuples t1 and t2 with t1X t2X
then t1Y t2Y which implies that t1Z
t2Z hence X ? Z holds - IR4. (Decomposition) If X ? YZ, then X ? Y and X
? Z - X ? YZ
- YZ ? Y (Using IR1)
- X ? Y (Using IR3)
- Similary for X ? Z
7Proofs - SKIP
- IR5. (Union) If X ? Y and X ? Z, then X ? YZ
- X ? Y
- X ? Z
- X ? XY (Using IR1)
- XY ? YZ (Using IR2)
- X ? YZ (Using IR3)
- IR6. (Psuedotransitivity) If X ? Y and WY ? Z,
then WX ? Z - X ? Y
- WY ? Z
- WX ? WY (Using IR2)
- WX ? Z (Using IR3)
8Closure of a Set of FDs
- The closure of a set F of FDs is F which is the
set of all FDs that can be inferred from F - F SSN ? ENAME, BDATE, ADDRESS DNUMBER,
DNUMBER ? DNAME, DMGSSN - SSN ? DNAME, DMGSSN
- SSN ? SSN
- DNUMBER ? DNAME
- Most of the times it is IMPOSSIBLE to list all
FDs for a given situation - F is impossible to find sometimes
9Closure of a Set of Attributes
- Test if a certain FD is in F
- Entailed by F
- Closure of a set of attributes X with respect to
F is the set X of all attributes that are
functionally determined by X - I.e. Y such that X ? Y is true in F
- X can be calculated by repeatedly applying the
inferences using the FDs in F - READ Algorithm 10.1 p.310 in book
10Closure of a Set of Attributes
- Algorithm to compute X
- XX
- Repeat
- oldXX
- For each FD Y? Z in F do
- If X ? Y then XX U Z
- Until (X oldX) //i.e. until no change
occurs - Using the attribute closure we can determine if
any FD is in F (i.e. entailed by F)
11Example - Computing Attribute Closure
X XF A A, D, E B
B D D, E AB A,
B, C, D, E
F AB ? C A ? D D ? E AC
? B
Is AB ? E entailed by F? Is D? C entailed by
F? What is a possible key for R?
12In-class Exercise
- F
- SSN ? EName
- PNumber ? PName, PLocation
- SSN, PNumber?Hours
-
- Find SSN, PNumber, SSN, PNumber
- Key?
13Equivalence of Sets of FDs
- For two sets of FDs F and G, F is said to cover G
if every FD in G can be inferred from F - i.e., if G ? F
- Two sets of FDs F and G are equivalent if
- every FD in F can be inferred from G, and
- G covers F
- every FD in G can be inferred from F
- F covers G
- F and G are equivalent if F G
- There is an algorithm for checking equivalence of
sets of FDs
14Equivalence of Sets of FDs - SKIP
15Minimal Sets of FDs
- A set of FDs F is minimal if it satisfies the
following - (1) Every dependency in F has a single attribute
for its RHS - Minimal RHS
- (2) We cannot replace any dependency X ? A in F
with a dependency Y ? A, where Y proper subset of
X ( Y ? X) and still have a set of dependencies
that is equivalent to F - Minimal LHS
- (3) We cannot remove any dependency from F and
have a set of dependencies that is equivalent to
F - All FDs are necessary
- A minimal cover of a set of FDs E is a minimal
set of FDs that equivalent to E - Algorithm 10.2 p.311 in book (SKIP)
16Normalization of Relations
- Normalization The process of decomposing
unsatisfactory "bad" relations into smaller
relations - By breaking up their attributes
- Main purpose to achieve some desirable properties
- Minimizing redundancy or update anomalies
- A normal form (NF) is a condition that indicates
the degree to which a relation has been
normalized - The normal form of a relation refers to the
highest normal form that a relation meets
17Normalization of Relations
- 1NF, 2NF, 3NF, BCNF based on keys and FDs of a
relation schema - 4NF based on keys, multi-valued dependencies
MVDs5NF based on keys, join dependencies JDs
(Chapter 11) - On their own, NFs DO NOT guarantee good design
(in theoretical terms)! - Additional properties may be needed
- Lossless join (a must)
- No spurious tuple generation problem
- Dependency preservation (an advantage)
- After decomposition, each FD is represented some
relation
18Practical Use of Normal Forms
- Even though higher NFs means better
- Database designers need not normalize to the
highest possible normal form - Usually up to 3NF, BCNF or 4NF
- De-normalization the process of storing the join
of higher normal form relations as a base
relationwhich is in a lower normal form - For performance reasons
19Revision
- A superkey of a relation schema R A1, A2,
...., An is a set of attributes S ? R with the
property that no two tuples t1 and t2 in any
legal relation state r of R will have t1S
t2S - A key/candidate key K is a superkey with the
additional property that removal of any attribute
from K will cause K not to be a superkey any more
(i.e. minimal) - If a relation schema has more than one key
- One of the candidate keys is arbitrarily
designated to be the primary key, and the others
are called secondary keys - A prime attribute must be a member of some
candidate key - A non-prime attribute is not a prime
attributethat is, it is not a member of any
candidate key
201NF First Normal Form
- Disallows
- Multi-valued attributes
- E.g. DEPARTMENT (DNumber, .,DLOCATIONS)
- Composite attributes
- E.g. ADDRESS (Street, City, State, Zip)
- Nested relations/compound attributes
combinations of the above - An attribute PROJECT for every EMPLOYEE listing
the project number and name - EMPLOYEE (SSN, ..,PROJECT (PNUMBER, PNAME))
- (Defn.) Every attribute must be dependant on
every key - The only attributes allowed in 1NF are single
atomic (or indivisible) values - Otherwise, attributes cant depend on or cant be
determined by the key - Considered to be part of the definition of
relation
21L.H.S of FD
R.H.S of FD
Better design?
22Solutions
- DLOCATIONS multi-valued attribute not
dependent on DNUMBER (in b) - 3 Solutions
- (1) Previous slide
- Suffers from redundancy
- (2) If maximum number is known (e.g. 3) then
replace DLOCATIONS by 3 attributes Loc 1, Loc2,
Loc3 - Lot of nulls
- Difficulty in querying Find all departments in
St Cloud - (3) Remove DLOCATIONS from this relations and
create a new relation LOCATION (DNUMBER,
DLOCATION) - Best theoretical solution
- Project off bad attributes and store in a new
relation with the key
23(No Transcript)
242NF Second Normal Form
- Full functional dependency - a Full FD or FFD Y ?
Z is an FD where removal of any attribute from Y
means the FD does not hold any more - SSN, PNUMBER ? HOURS is a full FD since neither
SSN ? HOURS nor PNUMBER ? HOURS hold - SSN, PNUMBER ? ENAME is not a full FD (it is
called a partial dependency) since SSN ? ENAME
also holds (an FFD) - (Defn.) A relation schema R is in second normal
form (2NF) - If it in 1NF and
- If every non-prime attribute A in R is fully
functionally dependent on every key of R (i.e. No
partial dependencies allowed for nonprime
attributes) - R can be decomposed into 2NF relations via the
process of 2NF normalization
25Non-prime attributes are only associated with the
part of the key on which they are fully
functional dependent
26Assume COUNTRY_NAME, LOT is another candidate
key
273NF Third Normal Form
- Transitive functional dependency - a FD X?Z that
can be derived from two FDs X?Y and Y?Z - SSN ? DMGRSSN is a transitive FD since
- SSN ? DNUMBER
- DNUMBER ? DMGRSSN
- SSN ? ENAME is non-transitive since there is no
set of attributes X where SSN?X and X?ENAME