Schema Normalization, Concluded - PowerPoint PPT Presentation

About This Presentation
Title:

Schema Normalization, Concluded

Description:

Homework 2 answers posted on Web. Homework 3 due Thursday. No class next Tuesday (Fall Break) ... This answers the question 'is Y determined (transitively) by X? ... – PowerPoint PPT presentation

Number of Views:32
Avg rating:3.0/5.0
Slides: 33
Provided by: zack4
Category:

less

Transcript and Presenter's Notes

Title: Schema Normalization, Concluded


1
Schema Normalization, Concluded
  • Zachary G. Ives
  • University of Pennsylvania
  • CIS 550 Database Information Systems
  • October 11, 2005

Some slide content courtesy of Susan Davidson
Raghu Ramakrishnan
2
Announcements
  • Decide on 3-person project groups by 1 week from
    Thursday (10/20)
  • Homework 2 answers posted on Web
  • Homework 3 due Thursday
  • No class next Tuesday (Fall Break)
  • Midterm Thursday 10/20

3
Not All Designs are Equally Good
  • Why is this a poor schema design?
  • And why is this one better?

Stuff(sid, name, serno, subj, cid, exp-grade)
Student(sid, name) Course(serno,
cid) Subject(cid, subj) Takes(sid, serno,
exp-grade)
4
Functional DependenciesDescribe Key-Like
Relationships
  • A key is a set of attributes where
  • If keys match, then the tuples match
  • A functional dependency (FD) is a generalization
  • If an attribute set determines another, written X
    ! Ythen if two tuples agree on attribute set X,
    they must agree on Xsid ! name
  • What other FDs are there in this data?
  • FDs are independent of our schema design choice

5
Formal Definition of FDs
  • Def. Given a relation schema R and subsets X, Y
    of R
  • An instance r of R satisfies FD X ? Y if, for
    any two tuples t1, t2 2 r, t1X t2X
    implies t1Y t2Y
  • For an FD to hold for schema R, it must hold for
    every possible instance of r
  • (Can a DBMS verify this? Can we determine this
    by looking at an instance?)

6
General Thoughts on Good Schemas
  • We want all attributes in every tuple to be
    determined by the tuples key attributes, i.e.
    part of a superkey (for key X ? Y, a superkey is
    a non-minimal X)
  • What does this say about redundancy?
  • But
  • What about tuples that dont have keys (other
    than the entire value)?
  • What about the fact that every attribute
    determines itself?

7
Armstrongs Axioms Inferring FDs
  • Some FDs exist due to others can compute using
    Armstrongs axioms
  • Reflexivity If Y ? X then X ? Y (trivial
    dependencies)
  • name, sid ? name
  • Augmentation If X ? Y then XW ? YW
  • serno ? subj so serno, exp-grade ? subj,
    exp-grade
  • Transitivity If X ? Y and Y ? Z then X ? Z
  • serno ? cid and cid ? subj
  • so serno ? subj

8
Armstrongs Axioms Lead to
  • Union If X ? Y and X ? Z then X ? YZ
  • Pseudotransitivity If X ? Y and WY ? Z
    then XW ? Z
  • Decomposition If X ? Y and Z ? Y then
    X ? Z
  • Lets prove a few of these from Armstrongs Axioms

9
Closure of a Set of FDs
  • Defn. Let F be a set of FDs. Its closure, F,
    is the set of all FDs
  • X ? Y X ? Y is derivable from F by
    Armstrongs Axioms
  • Which of the following are in the closure of our
    Student-Course FDs?
  • name ? name
  • cid ? subj
  • serno ? subj
  • cid, sid ? subj
  • cid ? sid

10
Attribute Closures Is SomethingDependent on X?
  • Defn. The closure of an attribute set X, X, is
  • X ? Y X ? Y ? F
  • This answers the question is Y determined
    (transitively) by X? compute X by
  • Does sid, serno ? subj, exp-grade?

closure X repeat until no change if there
is an FD U ? V in F such that U is
in closure then add V to closure
11
Equivalence of FD sets
  • Defn. Two sets of FDs, F and G, are equivalent
    if their closures are equivalent, F G
  • e.g., these two sets are equivalent
  • XY ? Z, X ? Y and
  • X ? Z, X ? Y
  • F contains a huge number of FDs (exponential
    in the size of the schema)
  • Would like to have smallest representative FD
    set

12
Minimal Cover
we expresseach FD insimplest form
  • Defn. A FD set F is minimal if
  • 1. Every FD in F is of the form X ? A, where A
    is a single attribute
  • 2. For no X ? A in F is F X ? A equivalent
    to F
  • 3. For no X ? A in F and Z ? X is
  • F X ? A ? Z ? A equivalent to F
  • Defn. F is a minimum cover for G if F is minimal
    and is equivalent to G.
  • e.g.,
  • X ? Z, X ? Y is a minimal cover for
  • XY ? Z, X ? Z, X ? Y

in a sense, each FD is essential to the cover
13
More on Closures
  • If F is a set of FDs and X ? Y ? F then for
    some attribute A ? Y, X ? A ? F
  • Proof by counterexample.
  • Assume otherwise and let Y A1,..., An Since
    we assume X ? A1, ..., X ? An are in F
  • then X ? A1 ... An is in F by union rule,
  • hence, X ? Y is in F which is a contradiction

14
Why Armstrongs Axioms?
  • Why are Armstrongs axioms (or an equivalent rule
    set) appropriate for FDs? They are
  • Consistent any relation satisfying FDs in F
    will satisfy those in F
  • Complete if an FD X ? Y cannot be derived by
    Armstrongs axioms from F, then there exists some
    relational instance satisfying F but not X ? Y
  • In other words, Armstrongs axioms derive all the
    FDs that should hold
  • What is the goal of using these axioms?

15
Decomposition
  • Consider our original bad attribute set
  • We could decompose it into
  • But this decomposition loses information about
    the relationship between students and courses.
    Why?

Stuff(sid, name, serno, subj, cid, exp-grade)
Student(sid, name) Course(serno,
cid) Subject(cid, subj)
16
Lossless Join Decomposition
  • R1, Rk is a lossless join decomposition of R
    w.r.t. an FD set F if for every instance r of R
    that satisfies F,
  • ÕR1(r) ? ... ? ÕRk(r) r
  • Consider
  • What if we decompose on (sid, name) and (serno,
    subj, cid, exp-grade)?

sid name serno subj cid exp-grade
1 Sam 570103 AI 570 B
23 Nitin 550103 DB 550 A
17
Testing for Lossless Join
  • R1, R2 is a lossless join decomposition of R with
    respect to F iff at least one of the following
    dependencies is in F
  • (R1 ? R2) ? R1 R2
  • (R1 ? R2) ? R2 R1
  • So for the FD set
  • sid ? name
  • serno ? cid, exp-grade
  • cid ? subj
  • Is (sid, name) and (serno, subj, cid, exp-grade)
    a lossless decomposition?

18
Dependency Preservation
  • Ensures we can check whether a FD X ? Y is
    violated during DB updates, without using a join
  • FZ, the projection of FD set F onto attribute set
    Z, is
  • X ? Y X ? Y ? F , X ? Y Í Z
  • i.e., it is those FDs only applicable to Zs
    attributes
  • A decomposition R1, , Rk is dependency
    preserving if F (FR1 ?...? FRk) (note we
    need an extra closure!)
  • We dont lose the ability to test the cover of
    our FDs in a single table, just because we
    decompose

19
Example 1
  • For Schema R(sid, name, serno, cid, subj,
    exp-grade) and FD set
  • sid ? name serno ? cid
  • cid ? subj sid, serno ? exp-grade
  • Is R1(sid, name) and R2(serno, subj, cid,
    exp-grade)
  • A lossless decomposition?
  • Is it dependency-preserving?
  • How about R1(sid, name) and R2(sid, serno, subj,
    cid, exp-grade)?

20
Example 2
  • Given schema R(name, street, city, st, zip, item,
    price),
  • FD set name ? street, city street, city ? st
  • street, city ? zip name, item ? price
  • and decomposition
  • R1(name, street, city, st, zip) and R2(name,
    item, price)
  • Is it lossless?
  • Is it dependency preserving?
  • What if we replaced the first FD with name,
    street ? city?

21
A More Disturbing Example
  • Given schema R(sid, fid, subj)
  • and FD set fid ? subj sid, subj ? fid
  • Consider the decomposition
  • R1(sid, fid) and R2(fid, subj)
  • Is it lossless?
  • Is it dependency preserving?
  • If it isnt, can you think of a decomposition
    that is? Can you do this non-redundantly?

22
Redundancy vs. FDs
  • Ideally, we want a design s.t. for each
    nontrivial dependency X ? Y, X is a superkey for
    some relation schema in R
  • We just saw that this isnt always possible in a
    non-redundant way
  • Thus we have two kinds of normal forms,
    Boyce-Codd and Third Normal Form

23
Two Important Normal Forms
  • Boyce-Codd Normal Form (BCNF). For every
    relation scheme R and for every X ? A that holds
    over R,
  • either A ? X (it is trivial) ,or
  • or X is a superkey for R
  • Third Normal Form (3NF). For every relation
    scheme R and for every X ? A that holds over R,
  • either A ? X (it is trivial), or
  • X is a superkey for R, or
  • A is a member of some key for R

24
Normal Forms Compared
  • BCNF is preferable, but sometimes in conflict
    with the goal of dependency preservation
  • Its strictly stronger than 3NF
  • Lets see algorithms to obtain
  • A BCNF lossless join decomposition
    (nondeterministic)
  • A 3NF lossless join, dependency preserving
    decomposition

25
BCNF Decomposition Algorithm(from Korth et al.
our book gives a recursive version)
result R compute F while there is a
relation schema Ri in result that isnt in
BCNF let A ? B be a nontrivial FD on Ri s.t.
A ? Ri is not in F and A and B are
disjoint result (result Ri) ? (Ri - B),
(A,B)
26
3NF Decomposition Algorithm
Let F be a minimal cover i0 for each FD A ? B
in F if none of the schemas Rj, 1? j ? i,
contains AB increment i
Ri (A, B) if no schema Rj, 1 ? j ? i
contains a candidate key for R
increment i Ri any candidate key
for R return (R1, , Ri)
Build dep.-preserving decomp.
Ensurelosslessdecomp.
27
Summary of Normalization
  • We can always decompose into 3NF and get
  • Lossless join
  • Dependency preservation
  • But with BCNF we are only guaranteed lossless
    joins
  • BCNF is stronger than 3NF every BCNF schema is
    also in 3NF
  • The BCNF algorithm is nondeterministic, so there
    is not a unique decomposition for a given schema R

28
XML A Semi-Structured Data Model
29
Why XML?
  • XML is the confluence of several factors
  • The Web needed a more declarative format for data
  • Documents needed a mechanism for extended tags
  • Database people needed a more flexible
    interchange format
  • Lingua franca of data
  • Its parsable even if we dont know what it
    means!
  • Original expectation
  • The whole web would go to XML instead of HTML
  • Todays reality
  • Not so But XML is used all over under the
    covers

30
Why DB People Like XML
  • Can get data from all sorts of sources
  • Allows us to touch data we dont own!
  • This was actually a huge change in the DB
    community
  • Interesting relationships with DB techniques
  • Useful to do relational-style operations
  • Leverages ideas from object-oriented,
    semistructured data
  • Blends schema and data into one format
  • Unlike relational model, where we need schema
    first
  • But too little schema can be a drawback, too!

31
XML Anatomy
Processing Instr.
  • lt?xml version"1.0" encoding"ISO-8859-1" ?gt
  • ltdblpgt
  • ltmastersthesis mdate"2002-01-03"
    key"ms/Brown92"gt
  •   ltauthorgtKurt P. Brownlt/authorgt
  •   lttitlegtPRPL A Database Workload
    Specification Languagelt/titlegt
  •   ltyeargt1992lt/yeargt
  •   ltschoolgtUniv. of Wisconsin-Madisonlt/schoolgt
  •   lt/mastersthesisgt
  • ltarticle mdate"2002-01-03" key"tr/dec/SRC1997-
    018"gt
  •   lteditorgtPaul R. McJoneslt/editorgt
  •   lttitlegtThe 1995 SQL Reunionlt/titlegt
  •   ltjournalgtDigital System Research Center
    Reportlt/journalgt
  •   ltvolumegtSRC1997-018lt/volumegt
  •   ltyeargt1997lt/yeargt
  •   lteegtdb/labs/dec/SRC1997-018.htmllt/eegt
  •   lteegthttp//www.mcjones.org/System_R/SQL_Reunio
    n_95/lt/eegt
  •   lt/articlegt

Open-tag
Element
Attribute
Close-tag
32
Well-Formed XML
  • A legal XML document fully parsable by an XML
    parser
  • All open-tags have matching close-tags (unlike so
    many HTML documents!), or a special
  • lttag/gt shortcut for empty tags (equivalent to
    lttaggtlt/taggt
  • Attributes (which are unordered, in contrast to
    elements) only appear once in an element
  • Theres a single root element
  • XML is case-sensitive
Write a Comment
User Comments (0)
About PowerShow.com