Schema Refinement and Normal Forms - PowerPoint PPT Presentation

1 / 37
About This Presentation
Title:

Schema Refinement and Normal Forms

Description:

Is the following decomposition lossless or lossy? ... Lossless-join, dependency-preserving decomposition of R into a collection of 3NF ... – PowerPoint PPT presentation

Number of Views:186
Avg rating:3.0/5.0
Slides: 38
Provided by: RaghuRamak204
Category:

less

Transcript and Presenter's Notes

Title: Schema Refinement and Normal Forms


1
Schema Refinement and Normal Forms
  • Chapter 19
  • Raghu Ramakrishnan and J. Gehrke
  • (second text book)
  • In Course Pick-up box tomorrow

2
Review
Avoiding the expense of global integrity
constraints e.g. lno ? bname preserved by
CREATE ASSERTION lno-bname CHECK ( NOT
EXIST (SELECT
FROM loan-info
l1, loan-info l2
WHERE l1.lno l2.lno AND
l1.bname ltgt l2.bname))
Expensive, requires a join for every insertion
Reducing the expense 1. Determine FD set for
loan-info, F 2. Find minimal set, G,
s.t. F G
3
Functional Dependencies (FDs)
  • A functional dependency X Y holds over
    relation R if, for every allowable instance r of
    R
  • t1 r, t2 r, (t1) (t2)
    implies (t1) (t2)
  • i.e., given two tuples in r, if the X values
    agree, then the Y values must also agree. (X and
    Y are sets of attributes.)
  • An FD is a statement about all allowable
    relations.
  • Must be identified based on semantics of
    application.
  • Given some allowable instance r1 of R, we can
    check if it violates some FD f, but we cannot
    tell if f holds over R!
  • K is a candidate key for R means that K R
  • However, K R does not require K to be
    minimal!

4
Reasoning About FDs
  • Given some FDs, we can usually infer additional
    FDs
  • ssn did, did lot implies ssn
    lot
  • An FD f is implied by a set of FDs F if f holds
    whenever all FDs in F hold.
  • closure of F is the set of all FDs that
    are implied by F.
  • Armstrongs Axioms (X, Y, Z are sets of
    attributes)
  • Reflexivity If X Y, then Y X
  • Augmentation If X Y, then XZ
    YZ for any Z
  • Transitivity If X Y and Y Z,
    then X Z
  • These are sound and complete inference rules for
    FDs!

5
Armstrongs Axioms
  • A. Fundamental Rules (W, X, Y, Z sets of
    attributes)
  • 1. Reflexivity
  • If Y X then X ? Y
  • 2. Augmentation
  • If X ? Y then WX ? WY
  • 3. Transitivity
  • If X? Y and Y ? Z then X?Z
  • B. Additional rules (can be proved from A)
  • 4. UNION If X ? Y and X ? Z then X
    ? YZ
  • 5. Decomposition If X ? YZ then X ? Y,
    X ?Z
  • 6. Pseudotransitivity If X ? Y and WY ?
    Z then WX ?Z

6
Functional Dependencies
A B ? C AB determines C two tuples
with the same values for A and B
will also have
the same value for C
7
Functional Dependencies (Cont.)
  • K is a superkey for relation schema R if and only
    if K ? R
  • K is a candidate key for R if and only if
  • K ? R, and
  • for no ? ? K, ? ? R
  • Functional dependencies allow us to express
    constraints that cannot be expressed using
    superkeys. Consider the schema
  • bor_loan (customer_id, loan_number, amount ).
  • We expect this functional dependency to hold
  • loan_number ? amount
  • but would not expect the following to hold
  • amount ? customer_name

8
Use of Functional Dependencies
  • We use functional dependencies to
  • test relations to see if they are legal under a
    given set of functional dependencies.
  • If a relation r is legal under a set F of
    functional dependencies, we say that r satisfies
    F.
  • specify constraints on the set of legal relations
  • We say that F holds on R if all legal relations
    on R satisfy the set of functional dependencies
    F.
  • Note A specific instance of a relation schema
    may satisfy a functional dependency even if the
    functional dependency does not hold on all legal
    instances.
  • For example, a specific instance of loan may, by
    chance, satisfy amount ?
    customer_name.

9
Functional Dependencies (Cont.)
  • A functional dependency is trivial if it is
    satisfied by all instances of a relation
  • Example
  • customer_name, loan_number ? customer_name
  • customer_name ? customer_name
  • In general, ? ? ? is trivial if ? ? ?

10
Another use of FDs Schema Design
Example
R
R Universal relation tuple meaning
Jones has a loan (L-17) for 1000 taken out at
the Downtown branch in Bkln which has
assets of 9M
Design fast queries (no need for
joins!) - redudancy
update anomalies examples?
deletion anomalies
11
The Evils of Redundancy
  • A first-cut design
  • Universal relation (or few relations)
  • Redundantly store some columns in multiple tables
  • Redundancy is at the root of several problems
    associated with relational schemas
  • redundant storage, performance (of updates,)
    suffers
  • insert/delete/update anomalies

12
Example Constraints on Entity Set
  • Consider relation obtained from Hourly_Emps
  • Hourly_Emps (ssn, name, lot, rating, hrly_wages,
    hrs_worked)
  • Notation We will denote this relation schema by
    listing the attributes SNLRWH
  • This is really the set of attributes
    S,N,L,R,W,H.
  • Sometimes, we will refer to all attributes of a
    relation by using the relation name. (e.g.,
    Hourly_Emps for SNLRWH)

13
Example (Contd.)
Hourly_Emps relation
  • No Fuctional Dependencies
  • Any instance is legal here (no constraints)
  • No redundany

14
Example Constraints on Entity Set
Hourly_Emps relation
  • Add Some FDs on Hourly_Emps
  • ssn is the key S ? SNLRWH
  • Values of SSN have to be unique in the relation
  • rating determines hrly_wages R ? W
  • For two rows that have same value of R, the rows
    also have same value for W
  • Above relation stores R and W values redundantly

15
Example (Contd.)
Hourly_Emps
  • Problems due to R W
  • Update anomaly Can we change W in
    just the 1st tuple of SNLRWH?
  • Insertion anomaly What if we want to insert an
    employee and dont know the hourly wage for his
    rating?
  • Deletion anomaly If we delete all employees with
    rating 5, we lose the information about the wage
    for rating 5!

Will 2 smaller tables be better?
16
Example (Contd.)
Will 2 smaller tables be better? Yes
Hourly_Emps2
Wages
Solution to avoid redundancy Decomposition to
smaller tables
But what criteria should the new smaller
tables satisfy so that you can stop
decomposition? What is a good design? ----
Normal Forms
17
Goals of Decomposition
1. Lossless Joins Want to be able to
reconstruct big (e.g. universal) relation by
joining smaller ones (using natural joins)
(i.e. R1 R2 R) 2. Dependency
preservation Want to minimize the cost of
global integrity constraints based on FDs
( i.e. avoid big joins in assertions) 3.
Redundancy Avoidance/Minimization
Avoid/minimize unnecessary data duplication (the
motivation for decomposition)
? Normal Forms
Why important? LJ information loss
DP efficiency (time) RA efficiency
(space), update anomalies
18
Decomposition
1. Decomposing the schema R (
bname, bcity, assets, cname, lno, amt)
R R1 U R2
R1 (cname, lno, amt)
R1 (bname, bcity, assets, cname)
2. Decomposing the instance
19
Dependency Goal 1 lossless joins
A bad decomposition

Problem join adds meaningless tuples
lossy join by adding noise, have lost
meaningful information
20
Dependency Goal 1 lossless joins
Is the following decomposition lossless or lossy?
Ans Lossless R R1 R2, it has
same 4 tuples as original R1 and R2 share the
lno which is the key to R2.
21
Ensuring Lossless Joins
  • A decomposition of R R R1 union R2
  • Is lossless iff
  • R1? R2 ? R1, or
  • R1 ? R2 ? R2
  • (i.e., intersecting attributes must for a
    superkey for one of the resulting smaller
    relations)
  • In the previous example, lno is the common
    attribute and lno is the key to second relation
    R2

22
More on Lossless Join
  • The decomposition of R into X and Y is
    lossless-join wrt F if and only if the closure
    of F contains
  • X Y X, or
  • X Y Y
  • In particular, the decomposition of R into R - V
    and UV is lossless-join if U V holds
    over R (i.e., U is key of second relation)

23
Decomposition Goal 2 Dependency preservation
Goal efficient integrity checks of FDs An
example w/ no DP R ( bname, bcity, assets,
cname, lno, amt) bname ? bcity
assets lno ? amt bname
Decomposition R R1 U R2 R1 (bname,
assets, cname, lno) R2 (lno, bcity,
amt) Lossless but not DP. Why?
Ans bname ?bcity assets crosses 2 tables
Goal 2 Attributes in a dependency should be in
a single relation
24
Decomposition Goal 3 Redundancy
Avoidance/Minimization
Redundancy for Bx , y and z
Example
(1) An FD that exists in the above relation is
B ? C (2) A superkey in the above relation is
A, (or any set containing A)
When do you have redundancy? Ans when
there is some FD, X?Y covered by a relation
and X is not a superkey
Criteria to determine if a design
avoids/minimizes redundancy ? Normal Forms
25
Normal Forms
  • Criteria to decide whether or not a design is
    good
  • Minimize/avoid the problems due to redundancy
  • Examples
  • First Normal Form
  • Second Normal Form
  • Third Normal Form
  • Boyce-Codd Normal Form
  • Most important
  • BCNF and 3NF

More restrictive
26
Boyce-Codd Normal Form (BCNF)
  • Reln R with FDs F is in BCNF if, for all X A
    in
  • A X (called a trivial FD), or
  • X is a super key for R (i.e., X contains a
    candidate key for R)
  • In other words, R is in BCNF if the only
    non-trivial FDs that hold over R are key
    constraints.
  • No dependency in R that can be predicted using
    FDs alone.
  • If we are shown two tuples that agree upon
    the X value, we cannot infer
    the A value in
    one tuple from the A value in the other.
  • If example relation is in BCNF, the 2 tuples
    must be identical
    (since X is a key).

27
Boyce-Codd Normal Form
A relation schema R is in BCNF with respect to a
set F of functional dependencies if for all
functional dependencies in F of the form
??? ? where ? ? R and ? ? R, at least
one of the following holds
  • ?? ? ? is trivial (i.e., ? ? ?)
  • ? is a superkey for R

Example schema not in BCNF bor_loan (
customer_id, loan_number, amount ) because
loan_number ? amount holds on bor_loan but
loan_number is not a superkey
28
Decomposing a Schema into BCNF
  • Suppose we have a schema R and a non-trivial
    dependency ???? causes a violation of BCNF.
  • We decompose R into
  • (??U ? )
  • ( R - ( ? - ? ) )
  • In our example that is not in BCNF,
  • bor_loan ( customer_id, loan_number,
    amount )
  • with FD loan_number ? amount
  • ? loan_number
  • ? amount
  • and bor_loan is replaced by a BCNF
  • (??U ? ) ( loan_number, amount )
  • ( R - ( ? - ? ) ) ( customer_id, loan_number )

29
BCNF and Dependency Preservation
  • Constraints, including functional dependencies,
    are costly to check in practice unless they
    pertain to only one relation
  • If it is sufficient to test only those
    dependencies on each individual relation of a
    decomposition in order to ensure that all
    functional dependencies hold, then that
    decomposition is dependency preserving.
  • Because it is not always possible to achieve both
    BCNF and dependency preservation, we consider a
    weaker normal form, known as third normal form.

30
Third Normal Form (3NF)
  • Reln R with FDs F is in 3NF if, for all X ? A in
  • A X (called a trivial FD), or
  • X contains a key for R, or
  • A is part of some key(i.e., candidate key) for R.
  • Minimality of a key is crucial in third condition
    above!
  • i.e., Can only have candidate keys
  • If R is in BCNF, obviously in 3NF.
  • If R is in 3NF, some redundancy is possible.
  • (why? Because (3) allows non-key based X-gtA
    dependencies)
  • It is a compromise, used when BCNF not
    achievable (e.g., no good decomp, or
    performance considerations).
  • Lossless-join, dependency-preserving
    decomposition of R into a collection of 3NF
    relations always possible.

31
3NF example
  • R (custid, empid, bname, type)
  • F empid ? bname
  • Other candidate keys custid, bname
  • Not in BCNF because empid is not a superkey
  • In 3NF because bname is in a candidate key

32
3NF Decomposition Algorithm
  • Let Fc be a canonical cover for Fi 0for
    each functional dependency ? ? ? in Fc do if
    none of the schemas Rj, 1 ? j ? i contains ? ?
    then begin i i 1 Ri ? ?
    endif none of the schemas Rj, 1 ? j ? i
    contains a candidate key for R then begin i
    i 1 Ri any candidate key for
    R end return (R1, R2, ..., Ri)

33
3NF Decomposition Algorithm (Cont.)
  • Above algorithm ensures
  • each relation schema Ri is in 3NF
  • decomposition is dependency preserving and
    lossless-join

34
3NF Decomposition An Example
  • Relation schema
  • cust_banker_branch (customer_id, employee_id,
    branch_name, type )
  • The functional dependencies for this relation
    schema are
  • customer_id, employee_id ? branch_name, type
  • employee_id ? branch_name
  • customer_id, branch_name ? employee_id
  • We first compute a canonical cover (or minimal
    cover) FC
  • branch_name is extraneous in the r.h.s. of the
    1st dependency
  • No other attribute is extraneous, so we get FC
  • customer_id, employee_id ? type
    employee_id ? branch_name customer_id,
    branch_name ? employee_id

35
3NF Decompsition Example (Cont.)
  • The for loop generates following 3NF schema
  • (customer_id, employee_id, type )
  • (employee_id, branch_name)
  • (customer_id, branch_name,
    employee_id)
  • Observe that (customer_id, employee_id, type )
    contains a candidate key of the original schema,
    so no further relation schema needs be added

36
Summary of Schema Refinement
  • If a relation is in BCNF, it is free of
    redundancies that can be detected using FDs.
    Thus, trying to ensure that all relations are in
    BCNF is a good heuristic.
  • If a relation is not in BCNF, we can try to
    decompose it into a collection of BCNF relations.
  • Must consider whether all FDs are preserved. If
    a lossless-join, dependency preserving
    decomposition into BCNF is not possible (or
    unsuitable, given typical queries), should
    consider decomposition into 3NF.
  • Decompositions should be carried out and/or
    re-examined while keeping performance
    requirements in mind.

37
Theory and practice
Performance tuning
  • Redundancy not the sole guide to decomposition
  • Workload matters too!!
  • nature of queries run
  • mix of updates, queries
  • .....

Workload (mix of queries, updates,) can
influence BCNF vs 3NF may further
decompose a BCNF into (4NF) may denormalize
(i.e., undo a decomposition or add new columns
? For optimizing query workloads
Materialized views,
Write a Comment
User Comments (0)
About PowerShow.com