C20'0046: Database Management Systems Lecture - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

C20'0046: Database Management Systems Lecture

Description:

AB = ABCD, AC = AC, AD = ABCD, BC = BC, BD = BD, CD = CD ... Q: What are the keys? Q: What are the ... Define FDs (and keys) for them based on real world ... – PowerPoint PPT presentation

Number of Views:63
Avg rating:3.0/5.0
Slides: 29
Provided by: pagesSt
Category:

less

Transcript and Presenter's Notes

Title: C20'0046: Database Management Systems Lecture


1
C20.0046 Database Management SystemsLecture 7
  • Matthew P. Johnson
  • Stern School of Business, NYU
  • Spring, 2005

2
Agenda
  • Last time FDs
  • This time
  • Combining FDs
  • Anomalies
  • Normalization
  • Next time Relational Algebra
  • Now review

3
Closure Algorithm
Start with XA1, , An. Repeat if
B1, , Bn ? C is a FD and B1, ,
Bn are all in X then add C to X. until X
didnt change
Example
name ? color category ? department color,
category ? price
name, category name, category,
color, department, price
4
Example
In class
A, B ? C A, D ? E B ? D A, F ? B
R(A,B,C,D,E,F)
Compute A,B X A, B,
Compute A, F X A, F,

5
More definitions
  • Given FDs versus Derived FDs
  • Given FDs stated initially for the relation
  • Derived FDs those that are inferred using the
    rules or through closure
  • Basis A set of FDs from which we can infer all
    other FDs for the relation
  • Minimal basis a minimal set of FDs (Cant
    discard any of them)
  • No proper subset of the minimal basis can derive
    the complete set of FDs

6
Problem find all FDs
  • Given a relation instance and set of given FDs
  • Find all FDs satisfied by that instance
  • Useful if we dont get enough information from
    our users need to reverse engineer a data
    instance
  • Q How long does this take?
  • A Some time for each subset of atts
  • Q How many subsets?
  • powerset
  • ? exponential time in worst-case
  • But can often be smarter

7
Using Closure to Infer ALL FDs
A, B ? CA, D ? B B ? D
Example
Compute X, for every set X (AB is shorthand for
A,B)
A A, B BD, C C, D D AB ABCD,
AC AC, AD ABCD, BC BC, BD BD, CD
CD ABC ABD ACD ABCD (no need to
computewhy?) BCD BCD, ABCD ABCD
8
Example
  • R(A,B,C)
  • Given FDs AB?C, BC?A
  • Q What are the FDs?
  • Closure of singleton sets
  • Closure of doubletons
  • Q What are the keys?
  • Q What are the minimal FD bases?

9
Computing Keys
  • Rephrasing of definition of key
  • X is a key if
  • X all attributes
  • X is minimal
  • Note there can be many minimal keys !
  • Example R(A,B,C), AB?C, BC?AMinimal keys AB
    and BC

10
Examples of Keys
  • Product(name, price, category, color)
  • name, category ? price
  • category ? color
  • FDs are
  • Keys are name, category
  • Enrollment(student, address, course, room, time)
  • student ? address
  • room, time ? course
  • student, course ? room, time
  • FDs are
  • Keys are

11
Next topic Anomalies
  • Identify anomalies in existing schema
  • How to decompose a relation
  • Boyce-Codd Normal Form (BCNF)
  • Recovering information from a decomposition
  • Third Normal Form

12
Types of anomalies
  • Redundancy
  • Repeat info unnecessarily in several tuples
  • Update anomalies
  • Change info in one tuple but not in another
  • Deletion anomalies
  • Delete some values lose other values too
  • Insert anomalies
  • Inserting row means having to insert other,
    separate info / null-ing it out

13
Example of anomalies
SSN ? Name, Mailing-address
SSN ? Phone
  • Redundancy name, maddress
  • Update anomaly Bill moves
  • Delete anom. Bill doesnt pay bills, lose phones
    ? lose Bill!
  • Insert anom cant insert someone without a
    (non-null) phone
  • Underlying cause SSN-phone is many-many
  • Effect partial dependency ssn ? name, maddress,
  • Whereas key ssn,phone

14
Decomposition by projection
  • Soln replace anomalous R with projections of R
    onto two subsets of attributes
  • Projection an operation in Relational Algebra
  • Corresponds to SELECT command in SQL
  • Projecting R onto attributes (A1,,An) means
    removing all other attributes
  • Result of projection is another relation
  • Yields tuples whose fields are A1,,An
  • Resulting duplicates ignored

15
Projection for decomposition
R(A1, ..., An, B1, ..., Bm, C1, ..., Cp)
R1 projection of R on A1, ..., An, B1, ..., Bm
R2 projection of R on A1, ..., An, C1, ..., Cp
A1, ..., An ? B1, ..., Bm ? C1, ..., Cp all
attributes, usually disjoint sets R1 and R2 may
(/not) be reassembled to produce original R
16
Decomposition example
Break the relation into two
  • The anomalies are gone
  • No more redundant data
  • Easy to for Bill to move
  • Okay for Bill to lose all phones

17
Thus high-level strategy
Conceptual Model
18
Using FDs to produce good schemas
  • Start with set of relations
  • Define FDs (and keys) for them based on real
    world
  • Transform your relations to normal form
    (normalize them)
  • Do this using decomposition
  • Intuitively, good design means
  • No anomalies
  • Can reconstruct all (and only the) original
    information

19
Decomposition terminology
  • Projection eliminating certain attributes from
    relation
  • Decomposition separating a relation into two by
    projection
  • Join (re)assembling two relations
  • Whenever a row from R1 and a row from R2 have the
    same value for some atts A, join together to form
    a row of R3
  • If exactly the original rows are reproduced by
    joining the relations, then the decomposition was
    lossless
  • We join on the attributes R1 and R2 have in
    common (As)
  • If it cant, the decomposition was lossy

20
Lossless Decompositions
Lossless Decompositions
  • A decomposition is lossless if we can recover
  • R(A,B,C)
  • R1(A,B) R2(A,C)
  • R(A,B,C) should be the same
    as R(A,B,C)

Decompose
Recover
R is in general larger than R. Must ensure R
R
21
Lossless decomposition
  • Sometimes the same set of data is reproduced
  • (Word, 100) (Word, WP) ? (Word, 100, WP)
  • (Oracle, 1000) (Oracle, DB) ? (Oracle, 1000,
    DB)
  • (Access, 100) (Access, DB) ? (Access, 100, DB)

22
Lossy decomposition
  • Sometimes its not
  • (Word, WP) (100, WP) ? (Word, 100, WP)
  • (Oracle, DB) (1000, DB) ? (Oracle, 1000, DB)
  • (Oracle, DB) (100, DB) ? (Oracle, 100, DB)
  • (Access, DB) (1000, DB) ? (Access, 1000, DB)
  • (Access, DB) (100, DB) ? (Access, 100, DB)

Whatswrong?
23
Ensuring lossless decomposition
R(A1, ..., An, B1, ..., Bm, C1, ..., Cp)
R1(A1, ..., An, B1, ..., Bm)
R2(A1, ..., An, C1, ..., Cp)
If A1, ..., An ? B1, ..., Bm or A1, ..., An ?
C1, ..., Cp Then the decomposition is lossless
Note dont need both
  • Examples
  • name ? price, so first decomposition was lossless
  • category ? name and category ? price, and so
    second decomposition was lossy

24
Quick lossless/lossy example
  • At a glance can we decompose into R1(Y,X),
    R2(Y,Z)?
  • At a glance can we decompose into R1(X,Y),
    R2(X,Z)?

25
Next topic Normal Forms
  • First Normal Form all attributes are atomic
  • As opposed to set-valued
  • Assumed all along
  • Second Normal Form (2NF)
  • Third Normal Form (3NF)
  • Boyce Codd Normal Form (BCNF)
  • Fourth Normal Form (4NF)
  • Fifth Normal Form (5NF)

26
Most important BCNF
A simple condition for removing anomalies from
relations
A relation R is in BCNF if If As ? Bs is a
non-trivial dependency in R , then As is a
superkey for R
I.e. The left side must always contain a
key I.e If a set of attributes determines other
attributes, it must determine all the attributes
  • C Ted Codd, IBM researcher, inventor of
    relational model, 1970
  • B Ray Boyce, IBM researcher, helped develop SQL
    in the 1970s

27
BCNF decomposition algorithm
Repeat choose A1, , Am ? B1, , Bn that
violates the BNCF condition split R into
R1(A1, , Am, B1, , Bn) and R2(A1, , Am,
others) continue with both R1 and R2Until
no more violations
//Heuristic choose Bs as large as possible
28
Boyce-Codd Normal Form
  • Name/phone example is not BCNF
  • ssn,phone is key
  • FD ssn ? name,mailing-address holds
  • Violates BCNF ssn is not a superkey
  • Its decomposition is BCNF
  • Only superkeys ? anything else
Write a Comment
User Comments (0)
About PowerShow.com