Schema Normalization, Concluded - PowerPoint PPT Presentation

About This Presentation

Title:

Schema Normalization, Concluded

Description:

Homework 2 answers posted on Web. Homework 3 due Thursday. No class next Tuesday (Fall Break) ... This answers the question 'is Y determined (transitively) by X? ... – PowerPoint PPT presentation

Number of Views:32

Avg rating:3.0/5.0

Slides: 33

Provided by: zack4

Category:

more less

Transcript and Presenter's Notes

Title: Schema Normalization, Concluded

1
Schema Normalization, Concluded

Zachary G. Ives
University of Pennsylvania
CIS 550 Database Information Systems
October 11, 2005

Some slide content courtesy of Susan Davidson
Raghu Ramakrishnan
2
Announcements

Decide on 3-person project groups by 1 week from
Thursday (10/20)
Homework 2 answers posted on Web
Homework 3 due Thursday
No class next Tuesday (Fall Break)
Midterm Thursday 10/20

3
Not All Designs are Equally Good

Why is this a poor schema design?
And why is this one better?

Stuff(sid, name, serno, subj, cid, exp-grade)
Student(sid, name) Course(serno,
cid) Subject(cid, subj) Takes(sid, serno,
exp-grade)
4
Functional DependenciesDescribe Key-Like
Relationships

A key is a set of attributes where
If keys match, then the tuples match
A functional dependency (FD) is a generalization
If an attribute set determines another, written X
! Ythen if two tuples agree on attribute set X,
they must agree on Xsid ! name
What other FDs are there in this data?
FDs are independent of our schema design choice

5
Formal Definition of FDs

Def. Given a relation schema R and subsets X, Y
of R
An instance r of R satisfies FD X ? Y if, for
any two tuples t1, t2 2 r, t1X t2X
implies t1Y t2Y
For an FD to hold for schema R, it must hold for
every possible instance of r
(Can a DBMS verify this? Can we determine this
by looking at an instance?)

6
General Thoughts on Good Schemas

We want all attributes in every tuple to be
determined by the tuples key attributes, i.e.
part of a superkey (for key X ? Y, a superkey is
a non-minimal X)
What does this say about redundancy?
But
What about tuples that dont have keys (other
than the entire value)?
What about the fact that every attribute
determines itself?

7
Armstrongs Axioms Inferring FDs

Some FDs exist due to others can compute using
Armstrongs axioms
Reflexivity If Y ? X then X ? Y (trivial
dependencies)
name, sid ? name
Augmentation If X ? Y then XW ? YW
serno ? subj so serno, exp-grade ? subj,
exp-grade
Transitivity If X ? Y and Y ? Z then X ? Z
serno ? cid and cid ? subj
so serno ? subj

8
Armstrongs Axioms Lead to

Union If X ? Y and X ? Z then X ? YZ
Pseudotransitivity If X ? Y and WY ? Z
then XW ? Z
Decomposition If X ? Y and Z ? Y then
X ? Z
Lets prove a few of these from Armstrongs Axioms

9
Closure of a Set of FDs

Defn. Let F be a set of FDs. Its closure, F,
is the set of all FDs
X ? Y X ? Y is derivable from F by
Armstrongs Axioms
Which of the following are in the closure of our
Student-Course FDs?
name ? name
cid ? subj
serno ? subj
cid, sid ? subj
cid ? sid

10
Attribute Closures Is SomethingDependent on X?

Defn. The closure of an attribute set X, X, is
X ? Y X ? Y ? F
This answers the question is Y determined
(transitively) by X? compute X by
Does sid, serno ? subj, exp-grade?

closure X repeat until no change if there
is an FD U ? V in F such that U is
in closure then add V to closure
11
Equivalence of FD sets

Defn. Two sets of FDs, F and G, are equivalent
if their closures are equivalent, F G
e.g., these two sets are equivalent
XY ? Z, X ? Y and
X ? Z, X ? Y
F contains a huge number of FDs (exponential
in the size of the schema)
Would like to have smallest representative FD
set

12
Minimal Cover
we expresseach FD insimplest form

Defn. A FD set F is minimal if
1. Every FD in F is of the form X ? A, where A
is a single attribute
2. For no X ? A in F is F X ? A equivalent
to F
3. For no X ? A in F and Z ? X is
F X ? A ? Z ? A equivalent to F
Defn. F is a minimum cover for G if F is minimal
and is equivalent to G.
e.g.,
X ? Z, X ? Y is a minimal cover for
XY ? Z, X ? Z, X ? Y

in a sense, each FD is essential to the cover
13
More on Closures

If F is a set of FDs and X ? Y ? F then for
some attribute A ? Y, X ? A ? F
Proof by counterexample.
Assume otherwise and let Y A1,..., An Since
we assume X ? A1, ..., X ? An are in F
then X ? A1 ... An is in F by union rule,
hence, X ? Y is in F which is a contradiction

14
Why Armstrongs Axioms?

Why are Armstrongs axioms (or an equivalent rule
set) appropriate for FDs? They are
Consistent any relation satisfying FDs in F
will satisfy those in F
Complete if an FD X ? Y cannot be derived by
Armstrongs axioms from F, then there exists some
relational instance satisfying F but not X ? Y
In other words, Armstrongs axioms derive all the
FDs that should hold
What is the goal of using these axioms?

15
Decomposition

Consider our original bad attribute set
We could decompose it into
But this decomposition loses information about
the relationship between students and courses.
Why?

Stuff(sid, name, serno, subj, cid, exp-grade)
Student(sid, name) Course(serno,
cid) Subject(cid, subj)
16
Lossless Join Decomposition

R1, Rk is a lossless join decomposition of R
w.r.t. an FD set F if for every instance r of R
that satisfies F,
ÕR1(r) ? ... ? ÕRk(r) r
Consider
What if we decompose on (sid, name) and (serno,
subj, cid, exp-grade)?

sid name serno subj cid exp-grade
1 Sam 570103 AI 570 B
23 Nitin 550103 DB 550 A
17
Testing for Lossless Join

R1, R2 is a lossless join decomposition of R with
respect to F iff at least one of the following
dependencies is in F
(R1 ? R2) ? R1 R2
(R1 ? R2) ? R2 R1
So for the FD set
sid ? name
serno ? cid, exp-grade
cid ? subj
Is (sid, name) and (serno, subj, cid, exp-grade)
a lossless decomposition?

18
Dependency Preservation

Ensures we can check whether a FD X ? Y is
violated during DB updates, without using a join
FZ, the projection of FD set F onto attribute set
Z, is
X ? Y X ? Y ? F , X ? Y Í Z
i.e., it is those FDs only applicable to Zs
attributes
A decomposition R1, , Rk is dependency
preserving if F (FR1 ?...? FRk) (note we
need an extra closure!)
We dont lose the ability to test the cover of
our FDs in a single table, just because we
decompose

19
Example 1

For Schema R(sid, name, serno, cid, subj,
exp-grade) and FD set
sid ? name serno ? cid
cid ? subj sid, serno ? exp-grade
Is R1(sid, name) and R2(serno, subj, cid,
exp-grade)
A lossless decomposition?
Is it dependency-preserving?
How about R1(sid, name) and R2(sid, serno, subj,
cid, exp-grade)?

20
Example 2

Given schema R(name, street, city, st, zip, item,
price),
FD set name ? street, city street, city ? st
street, city ? zip name, item ? price
and decomposition
R1(name, street, city, st, zip) and R2(name,
item, price)
Is it lossless?
Is it dependency preserving?
What if we replaced the first FD with name,
street ? city?

21
A More Disturbing Example

Given schema R(sid, fid, subj)
and FD set fid ? subj sid, subj ? fid
Consider the decomposition
R1(sid, fid) and R2(fid, subj)
Is it lossless?
Is it dependency preserving?
If it isnt, can you think of a decomposition
that is? Can you do this non-redundantly?

22
Redundancy vs. FDs

Ideally, we want a design s.t. for each
nontrivial dependency X ? Y, X is a superkey for
some relation schema in R
We just saw that this isnt always possible in a
non-redundant way
Thus we have two kinds of normal forms,
Boyce-Codd and Third Normal Form

23
Two Important Normal Forms

Boyce-Codd Normal Form (BCNF). For every
relation scheme R and for every X ? A that holds
over R,
either A ? X (it is trivial) ,or
or X is a superkey for R
Third Normal Form (3NF). For every relation
scheme R and for every X ? A that holds over R,
either A ? X (it is trivial), or
X is a superkey for R, or
A is a member of some key for R

24
Normal Forms Compared

BCNF is preferable, but sometimes in conflict
with the goal of dependency preservation
Its strictly stronger than 3NF
Lets see algorithms to obtain
A BCNF lossless join decomposition
(nondeterministic)
A 3NF lossless join, dependency preserving
decomposition

25
BCNF Decomposition Algorithm(from Korth et al.
our book gives a recursive version)
result R compute F while there is a
relation schema Ri in result that isnt in
BCNF let A ? B be a nontrivial FD on Ri s.t.
A ? Ri is not in F and A and B are
disjoint result (result Ri) ? (Ri - B),
(A,B)
26
3NF Decomposition Algorithm
Let F be a minimal cover i0 for each FD A ? B
in F if none of the schemas Rj, 1? j ? i,
contains AB increment i
Ri (A, B) if no schema Rj, 1 ? j ? i
contains a candidate key for R
increment i Ri any candidate key
for R return (R1, , Ri)
Build dep.-preserving decomp.
Ensurelosslessdecomp.
27
Summary of Normalization

We can always decompose into 3NF and get
Lossless join
Dependency preservation
But with BCNF we are only guaranteed lossless
joins
BCNF is stronger than 3NF every BCNF schema is
also in 3NF
The BCNF algorithm is nondeterministic, so there
is not a unique decomposition for a given schema R

28
XML A Semi-Structured Data Model
29
Why XML?

XML is the confluence of several factors
The Web needed a more declarative format for data
Documents needed a mechanism for extended tags
Database people needed a more flexible
interchange format
Lingua franca of data
Its parsable even if we dont know what it
means!
Original expectation
The whole web would go to XML instead of HTML
Todays reality
Not so But XML is used all over under the
covers

30
Why DB People Like XML

Can get data from all sorts of sources
Allows us to touch data we dont own!
This was actually a huge change in the DB
community
Interesting relationships with DB techniques
Useful to do relational-style operations
Leverages ideas from object-oriented,
semistructured data
Blends schema and data into one format
Unlike relational model, where we need schema
first
But too little schema can be a drawback, too!

31
XML Anatomy
Processing Instr.

lt?xml version"1.0" encoding"ISO-8859-1" ?gt
ltdblpgt
ltmastersthesis mdate"2002-01-03"
key"ms/Brown92"gt
ltauthorgtKurt P. Brownlt/authorgt
lttitlegtPRPL A Database Workload
Specification Languagelt/titlegt
ltyeargt1992lt/yeargt
ltschoolgtUniv. of Wisconsin-Madisonlt/schoolgt
lt/mastersthesisgt
ltarticle mdate"2002-01-03" key"tr/dec/SRC1997-
018"gt
lteditorgtPaul R. McJoneslt/editorgt
lttitlegtThe 1995 SQL Reunionlt/titlegt
ltjournalgtDigital System Research Center
Reportlt/journalgt
ltvolumegtSRC1997-018lt/volumegt
ltyeargt1997lt/yeargt
lteegtdb/labs/dec/SRC1997-018.htmllt/eegt
lteegthttp//www.mcjones.org/System_R/SQL_Reunio
n_95/lt/eegt
lt/articlegt

Open-tag
Element
Attribute
Close-tag
32
Well-Formed XML

A legal XML document fully parsable by an XML
parser
All open-tags have matching close-tags (unlike so
many HTML documents!), or a special
lttag/gt shortcut for empty tags (equivalent to
lttaggtlt/taggt
Attributes (which are unordered, in contrast to
elements) only appear once in an element
Theres a single root element
XML is case-sensitive