Best Database DesignAn Information theoretic approach - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

Best Database DesignAn Information theoretic approach

Description:

Best Database Design-An Information theoretic approach. Caroline ... If a schema(s) holds set of FD's and MVD's (S), it is well-designed if (s, S) is in 4NF. ... – PowerPoint PPT presentation

Number of Views:140
Avg rating:3.0/5.0
Slides: 34
Provided by: Sax8
Category:

less

Transcript and Presenter's Notes

Title: Best Database DesignAn Information theoretic approach


1
Best Database Design-An Information theoretic
approach
  • Caroline John Peter

2
Outline
  • Introduction
  • Information Theory
  • Entropy
  • Computation of Entropy
  • DefWell Designed Database
  • Relational Database
  • XML

3
Introduction
  • What is a good database design?
  • Examples3NF, BCNF

4
Information Theory
  • Need an approach based on intrinsic
    characteristics
  • Applicable to all data models
  • Independent of query/update issue

5
Entropy
  • Entropy measures the amount of information
    provided by certain event
  • Information measure I(P) have several properties
  • 1)
  • 2)If an event has probability 1,entropy
  • is 0.
  • 3)

6
Computation of Entropy
  • It is a simple measure
  • Database Schema R(A,B,C),A? B
  • Instance I
  • Domain a1..4
  • Probability distribution P(2)1,
  • P(a)0
    otherwise
  • Entropy log 10

7
EXAMPLE
  • Entropy is higher for instances which do not
    satisfy dependencies and it is lower for
    instances which hold dependencies.

8
Case a No dependencies
  • Consider attribute A1 can take domain value
  • 1,i ,A2 can take 1,j and A3 can take1,k.

9
  • P can take any value from domain 1,i, V1
  • Can take any value from domain 1,j.If PX1,
  • V1 cant be Y3 to avoid duplication.
  • Number of possibilities(ij)-1

10
Case bholds dependencies
  • Here MVD A1? ? A2/A3 holds

11
  • Since MVD holds, P must be X6 and V1
  • must be Z4.
  • Number of Possibilities111
  • Entropy0

12
General Measure
  • Average amount of information gained (entropy)

13
  • Vector a V1,X2,X2,X1,X1,X1,X1
  • The set of substitutions for a with range
  • 1,i for every a ?1,j.

14
  • P(a/a)number of substitutions toV1when Pa
  • Total number of substitutions toV1for all
    a
  • P(Y3/a)7/49
  • P(Y1/a)7/49
  • If PY3 , V1 can take 7 values(2714)
  • If PY1,V1 can take 7 values (3721)
  • If PY2,V1 can take 7 values(2714)
  • Total number of substitution14121449

15
Well Designed Database
  • Def A database specification (s,S) is well
  • designed if for every I ? inst(s,S) and every
  • p ? Pos(I), Infi(P/ S) 1

16
Relational Database
  • If a schema(s) holds set of functional
    dependencies(S), it is well-designed if (s, S) is
    in BCNF.
  • If a schema(s) holds set of FDs and MVDs (S),
    it is well-designed if (s, S) is in 4NF.

17
XML Databases
  • XML specification (D, S) ,where D is a
  • document type definition.The set of position
  • in an XML tree is defined by TPos(T)

18
Positions in a XML tree
19
XNF XML Normal Form
  •  
  • An XML specification (D, ?) is well
  • designed if for every T ?inst(D, ?) and every
  • p ? Pos(T), InfT(P/ ?) 1
  • If an XML specification (D, ?) holds set of
    functional dependencies(?), it is well-designed
    if (D, ?) is in XNF.

20
XML normal form-XNF
  • Creating new element type
  • Moving attributes

21
Creating new element type
22
(No Transcript)
23
Moving Attributes
24
(No Transcript)
25
XML document with TMVD
26
(No Transcript)
27
Price of dependency preservation
  • Price(NF) is the minimum amount of information
    content that NF loses to guarantee dependency
    preservation.
  • Known Results
  • Price(3NF)1/2
  • If NF is a dependency-preserving normal form,
    then price(NF)?1/2.
  • 3NF achieves the smallest price one needs to pay
    to ensure dependency preservation.

28
Losing FDs in XNF decomposition

29
XML hold the FDs
  • Company.branch.clients.client.post?company.branch.
    clients.client.city
  • Company.branch.clients.client.city,company.branch.
    type ?company.branch.

30
A redundancy-free XML document
31
X3NF
  • (D,?) is in XML third normal form(X3NF) iff
    for every nontrivial FD S ?p._at_l?(D, ?), the FD S
    ?p is also in (D, ?), or p._at_l is prime.

32
Remarks
  • Dependency preserving normal form can not
    guarantee a higher information content.
  • Normal forms, which eliminate all redundancies
    may lose dependencies.
  • XML takes advantage of good properties of BCNF
    3NF.

33
References
  • A normal form for XML documents Marcelo
    Arenas,Leonid Libkin
  • Dependency-Preserving Normalization of Relational
    and XML Data Solmaz Kolahi
  • An information-Theoretic Approach to normal forms
    for Relational and XML Data Marcelo
    Arenas,Leonid Libkin
Write a Comment
User Comments (0)
About PowerShow.com