Title: C20'0046: Database Management Systems Lecture
1C20.0046 Database Management SystemsLecture 7
- Matthew P. Johnson
- Stern School of Business, NYU
- Spring, 2005
2Agenda
- Last time FDs
- This time
- Combining FDs
- Anomalies
- Normalization
- Next time Relational Algebra
- Now review
3Closure Algorithm
Start with XA1, , An. Repeat if
B1, , Bn ? C is a FD and B1, ,
Bn are all in X then add C to X. until X
didnt change
Example
name ? color category ? department color,
category ? price
name, category name, category,
color, department, price
4Example
In class
A, B ? C A, D ? E B ? D A, F ? B
R(A,B,C,D,E,F)
Compute A,B X A, B,
Compute A, F X A, F,
5More definitions
- Given FDs versus Derived FDs
- Given FDs stated initially for the relation
- Derived FDs those that are inferred using the
rules or through closure - Basis A set of FDs from which we can infer all
other FDs for the relation - Minimal basis a minimal set of FDs (Cant
discard any of them) - No proper subset of the minimal basis can derive
the complete set of FDs
6Problem find all FDs
- Given a relation instance and set of given FDs
- Find all FDs satisfied by that instance
- Useful if we dont get enough information from
our users need to reverse engineer a data
instance - Q How long does this take?
- A Some time for each subset of atts
- Q How many subsets?
- powerset
- ? exponential time in worst-case
- But can often be smarter
7Using Closure to Infer ALL FDs
A, B ? CA, D ? B B ? D
Example
Compute X, for every set X (AB is shorthand for
A,B)
A A, B BD, C C, D D AB ABCD,
AC AC, AD ABCD, BC BC, BD BD, CD
CD ABC ABD ACD ABCD (no need to
computewhy?) BCD BCD, ABCD ABCD
8Example
- R(A,B,C)
- Given FDs AB?C, BC?A
- Q What are the FDs?
- Closure of singleton sets
- Closure of doubletons
- Q What are the keys?
- Q What are the minimal FD bases?
9Computing Keys
- Rephrasing of definition of key
- X is a key if
- X all attributes
- X is minimal
- Note there can be many minimal keys !
- Example R(A,B,C), AB?C, BC?AMinimal keys AB
and BC
10Examples of Keys
- Product(name, price, category, color)
- name, category ? price
- category ? color
- FDs are
- Keys are name, category
- Enrollment(student, address, course, room, time)
- student ? address
- room, time ? course
- student, course ? room, time
- FDs are
- Keys are
11Next topic Anomalies
- Identify anomalies in existing schema
- How to decompose a relation
- Boyce-Codd Normal Form (BCNF)
- Recovering information from a decomposition
- Third Normal Form
12Types of anomalies
- Redundancy
- Repeat info unnecessarily in several tuples
- Update anomalies
- Change info in one tuple but not in another
- Deletion anomalies
- Delete some values lose other values too
- Insert anomalies
- Inserting row means having to insert other,
separate info / null-ing it out
13Example of anomalies
SSN ? Name, Mailing-address
SSN ? Phone
- Redundancy name, maddress
- Update anomaly Bill moves
- Delete anom. Bill doesnt pay bills, lose phones
? lose Bill! - Insert anom cant insert someone without a
(non-null) phone - Underlying cause SSN-phone is many-many
- Effect partial dependency ssn ? name, maddress,
- Whereas key ssn,phone
14Decomposition by projection
- Soln replace anomalous R with projections of R
onto two subsets of attributes - Projection an operation in Relational Algebra
- Corresponds to SELECT command in SQL
- Projecting R onto attributes (A1,,An) means
removing all other attributes - Result of projection is another relation
- Yields tuples whose fields are A1,,An
- Resulting duplicates ignored
15Projection for decomposition
R(A1, ..., An, B1, ..., Bm, C1, ..., Cp)
R1 projection of R on A1, ..., An, B1, ..., Bm
R2 projection of R on A1, ..., An, C1, ..., Cp
A1, ..., An ? B1, ..., Bm ? C1, ..., Cp all
attributes, usually disjoint sets R1 and R2 may
(/not) be reassembled to produce original R
16Decomposition example
Break the relation into two
- The anomalies are gone
- No more redundant data
- Easy to for Bill to move
- Okay for Bill to lose all phones
17Thus high-level strategy
Conceptual Model
18Using FDs to produce good schemas
- Start with set of relations
- Define FDs (and keys) for them based on real
world - Transform your relations to normal form
(normalize them) - Do this using decomposition
- Intuitively, good design means
- No anomalies
- Can reconstruct all (and only the) original
information
19Decomposition terminology
- Projection eliminating certain attributes from
relation - Decomposition separating a relation into two by
projection - Join (re)assembling two relations
- Whenever a row from R1 and a row from R2 have the
same value for some atts A, join together to form
a row of R3 - If exactly the original rows are reproduced by
joining the relations, then the decomposition was
lossless - We join on the attributes R1 and R2 have in
common (As) - If it cant, the decomposition was lossy
20Lossless Decompositions
Lossless Decompositions
- A decomposition is lossless if we can recover
- R(A,B,C)
- R1(A,B) R2(A,C)
- R(A,B,C) should be the same
as R(A,B,C)
Decompose
Recover
R is in general larger than R. Must ensure R
R
21Lossless decomposition
- Sometimes the same set of data is reproduced
- (Word, 100) (Word, WP) ? (Word, 100, WP)
- (Oracle, 1000) (Oracle, DB) ? (Oracle, 1000,
DB) - (Access, 100) (Access, DB) ? (Access, 100, DB)
22Lossy decomposition
- Sometimes its not
- (Word, WP) (100, WP) ? (Word, 100, WP)
- (Oracle, DB) (1000, DB) ? (Oracle, 1000, DB)
- (Oracle, DB) (100, DB) ? (Oracle, 100, DB)
- (Access, DB) (1000, DB) ? (Access, 1000, DB)
- (Access, DB) (100, DB) ? (Access, 100, DB)
Whatswrong?
23Ensuring lossless decomposition
R(A1, ..., An, B1, ..., Bm, C1, ..., Cp)
R1(A1, ..., An, B1, ..., Bm)
R2(A1, ..., An, C1, ..., Cp)
If A1, ..., An ? B1, ..., Bm or A1, ..., An ?
C1, ..., Cp Then the decomposition is lossless
Note dont need both
- Examples
- name ? price, so first decomposition was lossless
- category ? name and category ? price, and so
second decomposition was lossy
24Quick lossless/lossy example
- At a glance can we decompose into R1(Y,X),
R2(Y,Z)? - At a glance can we decompose into R1(X,Y),
R2(X,Z)?
25Next topic Normal Forms
- First Normal Form all attributes are atomic
- As opposed to set-valued
- Assumed all along
- Second Normal Form (2NF)
- Third Normal Form (3NF)
- Boyce Codd Normal Form (BCNF)
- Fourth Normal Form (4NF)
- Fifth Normal Form (5NF)
26Most important BCNF
A simple condition for removing anomalies from
relations
A relation R is in BCNF if If As ? Bs is a
non-trivial dependency in R , then As is a
superkey for R
I.e. The left side must always contain a
key I.e If a set of attributes determines other
attributes, it must determine all the attributes
- C Ted Codd, IBM researcher, inventor of
relational model, 1970 - B Ray Boyce, IBM researcher, helped develop SQL
in the 1970s
27BCNF decomposition algorithm
Repeat choose A1, , Am ? B1, , Bn that
violates the BNCF condition split R into
R1(A1, , Am, B1, , Bn) and R2(A1, , Am,
others) continue with both R1 and R2Until
no more violations
//Heuristic choose Bs as large as possible
28Boyce-Codd Normal Form
- Name/phone example is not BCNF
- ssn,phone is key
- FD ssn ? name,mailing-address holds
- Violates BCNF ssn is not a superkey
- Its decomposition is BCNF
- Only superkeys ? anything else