Title: Schema Normalization, Concluded
1Schema Normalization, Concluded
- Zachary G. Ives
- University of Pennsylvania
- CIS 550 Database Information Systems
- October 11, 2005
Some slide content courtesy of Susan Davidson
Raghu Ramakrishnan
2Announcements
- Decide on 3-person project groups by 1 week from
Thursday (10/20) - Homework 2 answers posted on Web
- Homework 3 due Thursday
- No class next Tuesday (Fall Break)
- Midterm Thursday 10/20
3Not All Designs are Equally Good
- Why is this a poor schema design?
- And why is this one better?
Stuff(sid, name, serno, subj, cid, exp-grade)
Student(sid, name) Course(serno,
cid) Subject(cid, subj) Takes(sid, serno,
exp-grade)
4Functional DependenciesDescribe Key-Like
Relationships
- A key is a set of attributes where
- If keys match, then the tuples match
- A functional dependency (FD) is a generalization
- If an attribute set determines another, written X
! Ythen if two tuples agree on attribute set X,
they must agree on Xsid ! name - What other FDs are there in this data?
- FDs are independent of our schema design choice
5Formal Definition of FDs
- Def. Given a relation schema R and subsets X, Y
of R - An instance r of R satisfies FD X ? Y if, for
any two tuples t1, t2 2 r, t1X t2X
implies t1Y t2Y - For an FD to hold for schema R, it must hold for
every possible instance of r - (Can a DBMS verify this? Can we determine this
by looking at an instance?)
6General Thoughts on Good Schemas
- We want all attributes in every tuple to be
determined by the tuples key attributes, i.e.
part of a superkey (for key X ? Y, a superkey is
a non-minimal X) - What does this say about redundancy?
- But
- What about tuples that dont have keys (other
than the entire value)? - What about the fact that every attribute
determines itself?
7Armstrongs Axioms Inferring FDs
- Some FDs exist due to others can compute using
Armstrongs axioms - Reflexivity If Y ? X then X ? Y (trivial
dependencies) - name, sid ? name
- Augmentation If X ? Y then XW ? YW
- serno ? subj so serno, exp-grade ? subj,
exp-grade - Transitivity If X ? Y and Y ? Z then X ? Z
- serno ? cid and cid ? subj
- so serno ? subj
-
8Armstrongs Axioms Lead to
- Union If X ? Y and X ? Z then X ? YZ
- Pseudotransitivity If X ? Y and WY ? Z
then XW ? Z - Decomposition If X ? Y and Z ? Y then
X ? Z - Lets prove a few of these from Armstrongs Axioms
9Closure of a Set of FDs
- Defn. Let F be a set of FDs. Its closure, F,
is the set of all FDs - X ? Y X ? Y is derivable from F by
Armstrongs Axioms - Which of the following are in the closure of our
Student-Course FDs? - name ? name
- cid ? subj
- serno ? subj
- cid, sid ? subj
- cid ? sid
10Attribute Closures Is SomethingDependent on X?
- Defn. The closure of an attribute set X, X, is
- X ? Y X ? Y ? F
- This answers the question is Y determined
(transitively) by X? compute X by - Does sid, serno ? subj, exp-grade?
closure X repeat until no change if there
is an FD U ? V in F such that U is
in closure then add V to closure
11Equivalence of FD sets
- Defn. Two sets of FDs, F and G, are equivalent
if their closures are equivalent, F G - e.g., these two sets are equivalent
- XY ? Z, X ? Y and
- X ? Z, X ? Y
- F contains a huge number of FDs (exponential
in the size of the schema) - Would like to have smallest representative FD
set
12Minimal Cover
we expresseach FD insimplest form
- Defn. A FD set F is minimal if
- 1. Every FD in F is of the form X ? A, where A
is a single attribute - 2. For no X ? A in F is F X ? A equivalent
to F - 3. For no X ? A in F and Z ? X is
- F X ? A ? Z ? A equivalent to F
- Defn. F is a minimum cover for G if F is minimal
and is equivalent to G. - e.g.,
- X ? Z, X ? Y is a minimal cover for
- XY ? Z, X ? Z, X ? Y
in a sense, each FD is essential to the cover
13More on Closures
- If F is a set of FDs and X ? Y ? F then for
some attribute A ? Y, X ? A ? F - Proof by counterexample.
- Assume otherwise and let Y A1,..., An Since
we assume X ? A1, ..., X ? An are in F - then X ? A1 ... An is in F by union rule,
- hence, X ? Y is in F which is a contradiction
14Why Armstrongs Axioms?
- Why are Armstrongs axioms (or an equivalent rule
set) appropriate for FDs? They are - Consistent any relation satisfying FDs in F
will satisfy those in F - Complete if an FD X ? Y cannot be derived by
Armstrongs axioms from F, then there exists some
relational instance satisfying F but not X ? Y - In other words, Armstrongs axioms derive all the
FDs that should hold - What is the goal of using these axioms?
15Decomposition
- Consider our original bad attribute set
- We could decompose it into
- But this decomposition loses information about
the relationship between students and courses.
Why?
Stuff(sid, name, serno, subj, cid, exp-grade)
Student(sid, name) Course(serno,
cid) Subject(cid, subj)
16Lossless Join Decomposition
- R1, Rk is a lossless join decomposition of R
w.r.t. an FD set F if for every instance r of R
that satisfies F, - ÕR1(r) ? ... ? ÕRk(r) r
- Consider
- What if we decompose on (sid, name) and (serno,
subj, cid, exp-grade)?
sid name serno subj cid exp-grade
1 Sam 570103 AI 570 B
23 Nitin 550103 DB 550 A
17Testing for Lossless Join
- R1, R2 is a lossless join decomposition of R with
respect to F iff at least one of the following
dependencies is in F - (R1 ? R2) ? R1 R2
- (R1 ? R2) ? R2 R1
- So for the FD set
- sid ? name
- serno ? cid, exp-grade
- cid ? subj
- Is (sid, name) and (serno, subj, cid, exp-grade)
a lossless decomposition?
18Dependency Preservation
- Ensures we can check whether a FD X ? Y is
violated during DB updates, without using a join - FZ, the projection of FD set F onto attribute set
Z, is - X ? Y X ? Y ? F , X ? Y Í Z
- i.e., it is those FDs only applicable to Zs
attributes - A decomposition R1, , Rk is dependency
preserving if F (FR1 ?...? FRk) (note we
need an extra closure!) - We dont lose the ability to test the cover of
our FDs in a single table, just because we
decompose
19Example 1
- For Schema R(sid, name, serno, cid, subj,
exp-grade) and FD set - sid ? name serno ? cid
- cid ? subj sid, serno ? exp-grade
- Is R1(sid, name) and R2(serno, subj, cid,
exp-grade) - A lossless decomposition?
- Is it dependency-preserving?
- How about R1(sid, name) and R2(sid, serno, subj,
cid, exp-grade)?
20Example 2
- Given schema R(name, street, city, st, zip, item,
price), - FD set name ? street, city street, city ? st
- street, city ? zip name, item ? price
- and decomposition
- R1(name, street, city, st, zip) and R2(name,
item, price) - Is it lossless?
- Is it dependency preserving?
- What if we replaced the first FD with name,
street ? city?
21A More Disturbing Example
- Given schema R(sid, fid, subj)
- and FD set fid ? subj sid, subj ? fid
- Consider the decomposition
- R1(sid, fid) and R2(fid, subj)
- Is it lossless?
- Is it dependency preserving?
- If it isnt, can you think of a decomposition
that is? Can you do this non-redundantly?
22Redundancy vs. FDs
- Ideally, we want a design s.t. for each
nontrivial dependency X ? Y, X is a superkey for
some relation schema in R - We just saw that this isnt always possible in a
non-redundant way - Thus we have two kinds of normal forms,
Boyce-Codd and Third Normal Form
23Two Important Normal Forms
- Boyce-Codd Normal Form (BCNF). For every
relation scheme R and for every X ? A that holds
over R, - either A ? X (it is trivial) ,or
- or X is a superkey for R
- Third Normal Form (3NF). For every relation
scheme R and for every X ? A that holds over R, - either A ? X (it is trivial), or
- X is a superkey for R, or
- A is a member of some key for R
24Normal Forms Compared
- BCNF is preferable, but sometimes in conflict
with the goal of dependency preservation - Its strictly stronger than 3NF
- Lets see algorithms to obtain
- A BCNF lossless join decomposition
(nondeterministic) - A 3NF lossless join, dependency preserving
decomposition
25BCNF Decomposition Algorithm(from Korth et al.
our book gives a recursive version)
result R compute F while there is a
relation schema Ri in result that isnt in
BCNF let A ? B be a nontrivial FD on Ri s.t.
A ? Ri is not in F and A and B are
disjoint result (result Ri) ? (Ri - B),
(A,B)
263NF Decomposition Algorithm
Let F be a minimal cover i0 for each FD A ? B
in F if none of the schemas Rj, 1? j ? i,
contains AB increment i
Ri (A, B) if no schema Rj, 1 ? j ? i
contains a candidate key for R
increment i Ri any candidate key
for R return (R1, , Ri)
Build dep.-preserving decomp.
Ensurelosslessdecomp.
27Summary of Normalization
- We can always decompose into 3NF and get
- Lossless join
- Dependency preservation
- But with BCNF we are only guaranteed lossless
joins - BCNF is stronger than 3NF every BCNF schema is
also in 3NF - The BCNF algorithm is nondeterministic, so there
is not a unique decomposition for a given schema R
28XML A Semi-Structured Data Model
29Why XML?
- XML is the confluence of several factors
- The Web needed a more declarative format for data
- Documents needed a mechanism for extended tags
- Database people needed a more flexible
interchange format - Lingua franca of data
- Its parsable even if we dont know what it
means! - Original expectation
- The whole web would go to XML instead of HTML
- Todays reality
- Not so But XML is used all over under the
covers
30Why DB People Like XML
- Can get data from all sorts of sources
- Allows us to touch data we dont own!
- This was actually a huge change in the DB
community - Interesting relationships with DB techniques
- Useful to do relational-style operations
- Leverages ideas from object-oriented,
semistructured data - Blends schema and data into one format
- Unlike relational model, where we need schema
first - But too little schema can be a drawback, too!
31XML Anatomy
Processing Instr.
- lt?xml version"1.0" encoding"ISO-8859-1" ?gt
- ltdblpgt
- ltmastersthesis mdate"2002-01-03"
key"ms/Brown92"gt - ltauthorgtKurt P. Brownlt/authorgt
- lttitlegtPRPL A Database Workload
Specification Languagelt/titlegt - ltyeargt1992lt/yeargt
- ltschoolgtUniv. of Wisconsin-Madisonlt/schoolgt
- lt/mastersthesisgt
- ltarticle mdate"2002-01-03" key"tr/dec/SRC1997-
018"gt - lteditorgtPaul R. McJoneslt/editorgt
- lttitlegtThe 1995 SQL Reunionlt/titlegt
- ltjournalgtDigital System Research Center
Reportlt/journalgt - ltvolumegtSRC1997-018lt/volumegt
- ltyeargt1997lt/yeargt
- lteegtdb/labs/dec/SRC1997-018.htmllt/eegt
- lteegthttp//www.mcjones.org/System_R/SQL_Reunio
n_95/lt/eegt - lt/articlegt
Open-tag
Element
Attribute
Close-tag
32Well-Formed XML
- A legal XML document fully parsable by an XML
parser - All open-tags have matching close-tags (unlike so
many HTML documents!), or a special - lttag/gt shortcut for empty tags (equivalent to
lttaggtlt/taggt - Attributes (which are unordered, in contrast to
elements) only appear once in an element - Theres a single root element
- XML is case-sensitive