A Brief History of DB Theory - PowerPoint PPT Presentation

About This Presentation
Title:

A Brief History of DB Theory

Description:

A statement about which instances (finite sets of tuples) are 'legal' for ... clause rules w/o function symbols have decidable containment (Chandra and Merlin) ... – PowerPoint PPT presentation

Number of Views:26
Avg rating:3.0/5.0
Slides: 48
Provided by: Jeff382
Category:
Tags: brief | history | merlin | theory

less

Transcript and Presenter's Notes

Title: A Brief History of DB Theory


1
A Brief History of DB Theory
  • Functional dependencies --- Normalization
  • More dependencies multivalued, general
  • Universal relations
  • Acyclic hypergraphs
  • Logical query languages Datalog

2
Functional Dependencies
  • Setting A relation table with column headers
    (schema ) and a finite number of rows (tuples ).
  • A B C
  • 0 1 2
  • 0 3 4

3
FD X -gtY
  • A statement about which instances (finite sets
    of tuples) are legal for a given relation.
  • If two tuples agree in all the attributes of set
    X , then then must also agree in all attributes
    of set Y.
  • Common case X is the key of the relation, Y is
    the other attributes.

4
Issue 1 Inference
  • Armstrongs Axioms
  • X -gtY if Y is a subset of X
  • X -gtY implies XZ -gtYZ (XZ , etc. union)
  • X -gtY and YZ -gtW implies XZ -gtW
  • Closure test (Bernstein) given FDs and a
    possible consequent X -gtY , use the FDs to see
    what X determines, and then see if Y is
    contained therein.

5
Issue 2 Redundancy
  • Certain combinations of FDs lead to redundancy
    and other anomalies.
  • Example B -gtC is the only FD
  • A B C
  • 0 1 2
  • 3 1 ?
  • The FD lets us predict the value of ?.

6
Issue 3 Normalization
  • Eliminate redundancy by splitting schemas.
  • A FD X -gtY allows us to split schema XYZ into
    XY and XZ .
  • Without the FD, the decomposition would lack a
    lossless join the ability to reconstruct the
    original XYZ relation from the decomposed
    relations.

7
How I Met Moshe Learned Database Theory
  • Tsichritzis
  • Bernstein Beeri
  • Ullman Vardi

8
Why I Like Working With Vardi
  • Taste, selection of issues.
  • Best inventor of constructions.
  • Name comes after mine in alphabet.

9
Lossless Joins
  • When relations are decomposed, do the pieces
    allow reconstruction of the original?
  • Only way join the projected relations.
  • You always get back what you started with.
  • Bad case is when you get more.

10
Example
  • A B C
  • 0 1 2
  • 3 1 4
  • A B A B C B C
  • 0 1 0 1 4 1 2
  • 3 1 3 1 2 1 4
  • 0 1 2
  • 3 1 4

11
More Kinds of Dependencies
  • In essence, a dependency is any predicate that
    tells whether a given set of tuples is OK for a
    given schema.
  • The language chosen determines what constraints
    can be expressed and what we can decide about
    relations.
  • FDs are just one, simple example

12
Multivalued Dependencies
  • These occur when a relation tries to connect one
    class of objects to independent sets from two
    other classes..
  • Notation X -gt-gtY means
  • If two tuples agree on X , then we may swap the Y
    components and get two tuples that are also in
    the relation.

13
Example
  • EmpID -gt-gt Phone
  • EmpID Addr Phone Project
  • If 123 a1 p1 j1
  • 123 a2 p2 j2
  • Then 123 a1 p2 j1
  • 123 a2 p1 j2

14
MVD Origins
  • Independent work of Delobel, Fagin, Zaniolo
  • Inference of MVDs FDs Beeri, Fagin, and
    Howard.

15
Generalized Dependencies
  • Equality-generating dependencies
  • If tuples with this pattern of equal symbols
    appear in a relation instance, then certain
    symbols must also be equal.
  • Generalizes FDs.
  • Tuple-generating dependencies
  • If tuples with this pattern of equal symbols
    appear in a relation instance, then a certain
    tuple must also appear.
  • Generalizes MVDs.

16
Example EGD
  • FD A -gtB in schema ABC
  • A B C
  • a b1 c1 (Hypotheses)
  • a b2 c2
  • b1 b2 (Conclusion)

17
Example TGD
  • MVD A -gt-gtB in schema ABC
  • A B C
  • a b1 c1 (Hypotheses)
  • a b2 c2
  • a b1 c2 (Conclusion)

18
Full Versus Embedded GDs
  • Full GDs have a conclusion with only symbols
    that appear in the hypotheses.
  • Embedded GDs may have new symbols in the
    conclusion.
  • Existentially quantified.
  • Embedded EGD makes no sense, but embedded TGDs
    are quite interesting.

19
The Chase for Inferring GDs
  • Test whether a GD G follows from given GDs.
  • Start with R the hypotheses of G .
  • Apply GD H by mapping the hypotheses of H to
    some rows of R .
  • If so, infer the (mapped) conclusion of H ---
    equate two symbols of R or add a tuple to R .
  • If you eventually infer the conclusion of G ,
    then G follows, else it does not.

20
Example If A -gtB Then A -gt-gtB
  • R has tuples (a,b1,c1) and (a,b2,c2). Does it
    have (a,b1,c2)?
  • Apply A -gtB .
  • Its hypotheses are the same as the two tuples of
    R , so surely they map.
  • Lets us conclude b1b2.
  • Thus, since R has (a,b2,c2), It surely has
    (a,b1,c2).

21
Decision Properties of GDs
  • If all dependencies are full, no new symbols ever
    appear, so the chase must terminate.
  • Either we prove the desired conclusion, or R
    becomes a relation that provides a counterexample
    --- it satisfies all the given GDs, but not the
    one we were trying to infer.

22
Undecidability of Embedded GDs
  • First undecidability results involved the
    inference of untyped GDs (symbols may appear in
    several columns).
  • Beeri and Vardi
  • Chandra, Lewis, and Makoswky
  • Tough result is undecidability even when GDs are
    typed (one column per symbol).
  • Vardi
  • Gurevich and Lewis

23
History of GDs
  • Similar ideas were developed independently by
    several people
  • Yannakakis and Papadimitriou
  • Beeri and Vardi
  • Fagin
  • Sadri and Ullman

24
The Universal Relation
  • Idea attributes carry information regardless of
    how they are placed in relation schemas.
  • One of the cool things about Moshe Vardi is the
    collection he keeps of re-inventions of this
    universal relation concept.
  • Typically touted by some HCI type as a wonderful
    new interface to databases.

25
UR Query Systems
  • Query
  • UR
  • Stored
  • Relations

26
Example
  • Stored relations ES(Emp,Sal), ED(Emp,Dept),
    DM(Dept,Mgr)
  • Universal relation U(Emp,Sal, Dept,Mgr)
  • UR query
  • SELECT Mgr WHERE Empa
  • Translated to
  • SELECT Mgr FROM ED, DM WHERE Empa AND
    ED.DeptDM.Dept
  • U ! join(ES, ED, DM) empty ES proves it.

27
Theory of Universal Relations
  • Some were quite upset by the UR idea.
  • Objection the informal or ad-hoc way queries
    were translated from UR to stored relations.
  • Codd Neither the collection of all base
    relations nor the collection of all views should
    be cast by the DBMS in the form of a universal
    relation (in the Stanford University sense)
    Vardi, 1988.

28
Some Early Approaches to the UR
  • Selected joins of stored relations.
  • Representative instance pad stored relations
    with nulls in missing attributes then chase with
    given dependencies.
  • Contributions by Vassiliou, Honeyman, Mendelzon,
    Sagiv, Yannakakis.
  • Maximal objects union of joins within acyclic
    hypergraphs.
  • Maier, Ullman

29
Window Functions
  • Two stage process
  • 1. If query involves set of attributes X ,
    compute the window function X some relation
    with schema X derived from the UR.
  • 2. Apply the query to X .
  • Different UR definitions give different window
    functions, e.g., join-based, rep.-instance,
    maximal-object.
  • From Maier, Ullman, Vardi 1984.

30
Acyclic Hypergraphs
  • Several different applications led to the
    identification of a class of database schemas
    (collection of relation schemas) with useful
    properties.
  • Sensible connections among stored relations in
    maximal-object theory of UR.
  • Joins of many relations in time polynomial in the
    size of the input and output (Yannakakis).
  • Etc., etc.

31
Hypergraphs
  • Nodes (typically attributes)
  • (Hyper)edges sets of any number of nodes.
  • A B C D
  • E F

32
GYO Reduction
  • Graham, Yu-Ozsuyoglu, independently defined
    acyclic hypergraphs thusly
  • Two transformations
  • 1. Delete a node that is in only one hyperedge.
  • 2. Delete a hyperedge contained in another.
  • Hypergraph is acyclic iff the result is a single,
    empty edge.
  • Limit is unique for any hypergraph.

33
Example
  • GYO steps delete C delete D delete B delete
    F delete B,E delete A delete B delete E
  • A B C D
  • E F

34
History
  • Acyclic hypergraph idea from Bernstein and
    Goodman.
  • But the definition is rather different.
  • Hypergraph version from Fagin, Mendelzon, Beeri,
    Maier, others.
  • And dont forget the GYO folks.
  • Fagin defined several related-but-distinct
    concepts of acyclic.
  • This one is alpha-acyclic.

35
Logical Query Languages
  • Collections of Horn-clause rules without function
    symbols behave almost like SQL, but can do
    recursion.
  • Conjunctive queries single Horn-clause rules
    w/o function symbols have decidable containment
    (Chandra and Merlin).

36
Conjunctive Queries
  • Example path(X,Y) - edge(X,Z) edge(Z,Y)
  • Atoms in the body (hypothesis) refer to stored
    relations (EDB or Extensional Database).
  • Atom in head (conclusion) is the result.
  • Q1 is contained in Q2 if for every database D,
    Q1(D ) is a subset of Q2(D ).

37
Containment Mappings
  • Mapping from the variables of Q2 to the variables
    of Q1 that
  • 1. Turns the head of Q2 into the head of Q1.
  • 2. Turns every atom in the body of Q2 into some
    atom in the body of Q1.
  • Containment mapping exists iff Q1 is contained in
    Q2.

38
Example
  • Q1 answer(X,Y) - e(X,Y) e(Y,X)
  • Q2 answer(X,Y) - e(X,W) e(W,Z) e(Z,Y)

39
Datalog Programs
  • Collection of conjunctive queries.
  • Some predicates are EDB (stored).
  • Other predicates are IDB (intensional database
    defined by the CQs only).
  • One IDB predicate is the answer least
    fixedpoint of the CQs.

40
Example
  • path(X,Y) - edge(X,Y)
  • path(X,Y) - path(X,Z) edge(Z,Y)
  • Value of path is all pairs of nodes such that
    there is a path from the first to the second
    according to EDB predicate edge .

41
Optimization of Datalog Programs
  • Problem often the query asks for only a fraction
    of the answer.
  • e.g., find path(0,Y) for fixed node 0.
  • Bottom-up methods (seminaive) evaluate the
    whole least-fixedpoint, throw most away.
  • Top-down (goal seeking) can get stuck in
    left-recursion.

42
Linear-Recursion Methods
  • Recursion is linear if at most one IDB atom in
    the body.
  • Henschen-Naqvi.
  • Magic-sets for linear (Bancilhon, Maier, Sagiv,
    Ullman).
  • Left- and right-linear special cases (Naughton).
  • Conversion of nonlinear to linear (Ramakrishnan,
    Sagiv, Ullman, Vardi).

43
Magic Sets
  • Several similar techniques for rewriting or
    executing Datalog programs in a way that avoids
    generation of useless facts.
  • Rohmer, Lescoeur, and Kerisit (earliest
    exposition).
  • Beeri and Ramakrishnan (reordering of atoms).
  • Sacca and Zaniolo (simplification of rules).
  • Dietrich and Warren (tabulation).
  • Vielle (query-subquery).

44
Bounded Recursion
  • Sometimes a recursive Datalog program is
    equivalent to a nonrecursive program.
  • Sufficient conditions (Naughton, Ioannidis,
    Naughton and Sagiv).
  • Undecidability (Gaifman, Mairson, Sagiv, and
    Vardi).
  • Decidable cases (Vardi, Cosmodakis, Gaifman,
    Kanellakis, and Vardi)

45
Containment for Datalog
  • Undecidable if one Datalog program is contained
    in another (Shmueli).
  • NP-complete whether a CQ is contained in a
    Datalog program.
  • Triply exponential whether a Datalog program is
    contained in a CQ (Chaudhuri and Vardi).

46
What Is It All Good For?
  • Dependencies and normalization are now familiar
    to most CS graduates.
  • Difference between BCNF and 3NF.
  • Tradeoffs lossless joins versus maintaining
    dependencies.
  • Magic-sets, recursion used in IBMs DB/2.
  • Universal-relation interfaces despite Codd.

47
More What Is It All Good For?
  • Information integration . While logical query
    languages have not caught on, logic has been used
    to specify how legacy databases are combined into
    a uniform whole.
  • Tsimmis (Papakonstantinou, Vassalos).
  • Information Manifold (Levy).
  • Infomaster (Duschka, Genesereth).
Write a Comment
User Comments (0)
About PowerShow.com