Title: Extending and Inferring Functional Dependencies in Schema Transformation
1Extending and Inferring Functional Dependencies
in Schema Transformation
- Qi He Tok Wang Ling
- Dept. of Computer Science
- School of Computing
- National Univ. of Singapore
2Agenda
- Background
- Schematic discrepancy and schema transformation
- Qualified FD
- Inference rules of qualified FD
- Propagation rules of qualified FD during schema
transformation - Related work
- Conclusion
3Motivation
- Functional dependencies (FDs) are useful
- Ensure the integrity of data in insertion,
deletion and update - Check lossless join decomposition
- Normalize relations
- Optimize queries
4Issues on FDs in Individual DB
- Inference rules of FDs in fixed relations
- Derivation of dependencies on views from those on
original relations - How about in multidatabase systems?
5FDs in Multidatabase
- Multidatabase is distributed (i.e., data may be
divided and stored in several databases) and
heterogeneous (i.e., similar data may be
represented in quite different forms in component
databases) - The distribution and heterogeneity of
multidatabase require - Complex schema transformations rather than the
relational algebra to integrate database schemas - A flexible expression of FDs to represent
dependencies in component databases - Consequently, the expression and derivation of
FDs in multidatabase are more difficult than in
individual databases
6Agenda
- Background
- Schematic discrepancy and schema transformation
- Qualified FD
- Inference rules of qualified FD
- Propagation rules of qualified FD during schema
transformation - Related work
- Conclusion
7Schematic discrepancy
- A semantic heterogeneity among component schemas
- Schematic discrepancy occurs when one DBs
attribute values correspond to schema labels
(attribute names or relation names) in others - See next page for an example
8DB1 Supply DB1 Supply DB1 Supply DB1 Supply
product supplier month price
p1 s1 Jan 100
p1 s1 Feb 105
DB2 Supply DB2 Supply DB2 Supply DB2 Supply DB2 Supply DB2 Supply
product supplier Jan Feb Dec
p1 s1 100 105 97
p1 s2 99 107 110
fold(Supply,month,price)
unfold(Supply,month,price)
split(Supply,supplier)
unite(s1,,sn,supplier)
split(Supply,supplier)
unite(s1,,sn,supplier)
DB3 s1 DB3 s1 DB3 s1
product month price
p1 Jan 100
p1 Feb 105
s2
product month price
p1 Jan 99
p1 Feb 107
DB4 s1 DB4 s1 DB4 s1 DB4 s1 DB4 s1
product Jan Feb Dec
p1 100 105 97
s2
product Jan Feb Dec
p1 99 107 110
unfold(si,month,price)
fold(si,month,price) (i 1,2, )
- Supplier names modeled as attribute values in DB1
and DB2, but relation names in DB3 and DB4 - Months modeled as attribute values in DB1 and
DB3, but attribute names in DB2 and DB4
9Schematic Discrepant Transformation
- We call a transformation between schematic
discrepant schemas schematic discrepant
transformation - A schematic discrepant transformation can be
implemented by a sequence of restructuring
operators (unfold, fold, split and unite 5). - Unfold transforms attribute values to attribute
names, and fold does the converse. - Split transforms attribute values to relation
names, and unite does the converse.
5 L. V. S. Lakshmanan, F. Sadri, and S. N.
Subramanian. On efficiently implementing
schemaSQL on SQL database system. VLDB, 1999,
471-482.
10Agenda
- Background
- Schematic discrepancy and schema transformation
- Qualified FD
- Inference rules of qualified FD
- Propagation rules of qualified FD during schema
transformation - Related work
- Conclusion
11Qualified FD (E.g.)
Jan
product supplier price
Feb
product supplier price
- Constraint in the 1st quarter, the products
prices supplied by supplier s1 are uniquely
determined by the products. - That is, in the union R ?Mi?Jan, Feb,
Mar(?supplier s1Mi), the FD product?price
holds. - We represent this constraint as a qualified FD as
follows
the dependency holds in the union of the
relations Jan, Feb and Mar
the dependency holds when supplier is s1
Jan, Feb, Mar(product, supplierss1 ? price)
called qualification attribute
12Qualified FD
- FDs holding over a set of relations or a set of
horizontal partitions of relations - If a qualified FD only contains regular
attributes and holds in one relation, then it is
just a conventional FD. - Flexible to represent dependencies in distributed
and heterogeneous schemas
13Inference rules of qualified FDs
- Let F be a set of qualified FDs and f be a
qualified FD for the schemas of a set of
relations R . We say F implies f, if every
instance of R that satisfies F also satisfies f. - We define F to be set of qualified FDs implied
by F. - Inference rules will allow us to deduce all the
true qualified FDs for R , i.e., those in F
14Inference Rules
- Given X a mixed set of regular and qualification
attributes, Y and Z sets of regular attributes, A
an attribute, R a set of relations, R1 ? R, D1?
D? dom(A), we have - (A1) Partition on relation set. If R (X?Y) holds,
then R1 (X?Y) holds. - (A2) Composition on relation set. If Ri,
Rj(X?Y) holds for any Ri, Rj ? R, then R (X?Y)
holds. - (A3) Partition on qualification. If R (AsD, X?Y)
holds, then R (AsD1, X?Y) holds. - (A4) Composition on qualification. If R
(A?ai,aj, X?Y) holds for any ai, aj?D, then R
(A?D, X?Y) holds. - (A5) Single-valued qualification. If a ? dom(A),
then R (Asa?A) holds. - (A6) Assembly. If R (Asa, X?Y) holds for each
a ? D, then R (AsD, A, X?Y) holds.
15Inference Rules (cont.)
- (A7) Reflexivity. If Y ? X, then R (X?Y) holds.
- (A8) Augmentation. If R (X?Y) holds, then R (X,
Z?Y, Z). - (A9) Transitivity. If R (X?Y) and R (X1, Y?Z)
hold, where X1 is a set (possibly an empty set)
of some qualification attributes in X, then R
(X?Z) holds. - (A10) Dummy qualification. R (X?Y) iff R
(Asdom(A), X?Y) holds. - A useful rule derived from the above inference
rules - (A11) Disassembly. R (A, X?Y) holds iff R
(Asa, X?Y) holds for each a ? dom(A)
16Inference Rules
- Rule A5 and A7 give trivial dependencies
- Rule A7, A8 and A9 extend Armstrong's Axiom, the
inference rules for FDs - The rules are sound and complete to infer
qualified FDs in fixed relations
17Inference Rule of Disassembly (E.g.)
Supply Supply
product supplier month price price
Original FD product, supplier,
month?price equivalent to product,
supplierssi, month?price for each
si?dom(supplier) equivalent to product,
supplierssi, monthsmj?price for each
si?dom(supplier) and mj ?dom(month)
18Agenda
- Background
- Schematic discrepancy and schema transformation
- Qualified FD
- Inference rules of qualified FD
- Propagation rules of qualified FD during schema
transformation - Related work
- Conclusion
19Propagation Rules
- Let R be a set of original relations, and S be
a set of relations transformed from R by
applying some restructuring operator (i.e.,
unfold, fold, split or unite). - From a given set of qualified FDs F for R ,
propagation rules allow us to deduce the
qualified FDs for S. - We give the rules for split/unite and unfold/fold
in a pairwise way.
20Propagation Rules for split/unite
b1(A1, , An)
split(R, B)
where dom(B) b1, , bm
R(A1, , An, B)
unite(b1,, bm, B).
bm(A1, , An)
Let X be a mixed set of regular and qualification
attributes from A1, , An, Y ? A1, , An be a
set of regular attributes, and R ? b1, , bm be
a set of relation names. We have the following
rule (P1) R(BsR , X?Y) holds iff R (X?Y)
holds.
21Propagation Rules for split/unite
- The qualification on the values of attribute B in
a qualified FD becomes the qualification on the
relation set over which the inferred qualified FD
holds, as B values become relation names in the
transformed schemas. - unite is a qualified FD preserving
transformation, but split is not - The FD R(X ? B) will not be transformed to any
dependency in application of split
22Propagation Rules for unfold/fold
unfold(Ri, B, C) for each i 1, , k
R1(A1, , An, B, C)
S1(A1, , An, b1, , bm)
fold(Si, B, C) for each i 1, , k
Sk(A1, , An, b1, , bm)
Rk(A1, , An, B, C)
where dom(B) b1, , bm, and the values of
attributes bi are from dom(C)
Let X be a mixed set of regular and qualification
attributes from A1, , An, Y ? A1, , An be a
set of regular attributes. Let R ?R1, , Rk and
S ? S1, , Sk, such that the relations of S
are transformed from R. We have the following
rules (P2) R (Bsbi, X?C) holds iff S(X?bi)
holds. (P3) R (Bsbi, X, C?Y) holds iff S(X,
bi?Y) holds. (P4) R (X?Y) holds iff S(X?Y)
holds.
23Propagation Rules for unfold/fold
- The rules are based on a set of unfold/fold
operators because some qualified FDs may hold
over a set of relations which are transformed
together by unfold/fold operations. - Rule P2 and P3 indicate that the qualification on
the value of attribute B in a qualified FD
becomes the qualification on the attribute name
in the inferred qualified FD. - Both fold and unfold are not qualified FD
preserving transformations.
24Derive Qualified FDs in Schema Transformation
- Let R be a set of original relations, and S be
a set of relations transformed from R by
applying a sequence of restructuring operators - From a given set of qualified FDs F for R , we
deduce the qualified FDs for S, using both
inference rules and propagation rules
25Derive Qualified FDs in Schema Transformation
(E.g.)
fold(si,month,price) (i1,2,)
unite(s1,,sn,supplier)
DB3 s1 DB3 s1 DB3 s1
product month price
s2
product month price
DB4 s1 DB4 s1 DB4 s1 DB4 s1 DB4 s1
product Jan Feb Dec
s2
product Jan Feb Dec
DB1 Supply DB1 Supply DB1 Supply DB1 Supply
product supplier month price
Given qualified FDs in DB4 si(product ?
Jan,,Dec) for each si s1, s2,,sn After fold,
in DB3, we get (by propagation rule
P2) si(product, monthsmj ? price) for each si
s1, s2,,sn and mj Jan, , Dec then (by
disassembly rule A11) si(product, month ?
price) After unite, in DB1, we get (by
propagation rule P1) Supply(product, month,
supplierssi ? price) for each si s1,
s2,,sn then (by disassembly rule A11)
Supply(product, month, supplier ? price)
26Derive Qualified FDs in Schema Transformation
(E.g.)
- In the above example, the qualified FD on the
original relations are transformed to an
equivalent qualified FD on the transformed
relation. - Directly applying the inference rules and
propagation rules need the computation of
qualified FD closure, which take much time. - Instead, in the paper, we gave some derived rules
to propagate qualified FDs in schema
transformation.
27Related work (1)
- An extension to FDs in the database design world
are FDs partially holding in a relation, in the
sense that only some tuples, called exceptions,
break the dependencies. These dependencies
include weak FDs 7, afunctional dependencies
3 and partial FDs 2.
2 F. Berzal, J. C. Cubero, F. Cuenca, J. M.
Medina. Relational decomposition through partial
functional dependencies. Data Knowledge
Engineering 43(2), 2002, 207-234. 3 P. De Bra
and J. Paredaens. Conditional dependencies for
horizontal decompositions. ICALP, 1983. 7 Tok
Wang Ling. Extending classical functional
dependencies for physical database design
(lecture notes), 2001. http//www.comp.nus.edu.sg/
lingtw/cs4221/extended.fds.pdf
28Related work (1)
- Differences between weak FD (or some other
similar dependencies) and qualified FD - A weak FD predicates that some tuples (but dont
know which tuples) in a relation would violate
the dependency, while a qualified FD indicates
exactly what kind of tuples satisfy the
dependency. - Those dependencies are specified on individual
relations, while qualified FDs may hold over a
set of relations. - We are not aware of any axiomatization for weak
FD, afunctional dependency or partial FD.
29Related work (2)
- In individual databases, the issue of inferring
FDs on views from FDs on original relations has
been introduced in 1, 4. - They did not consider schematic discrepancy in
schema transformation - Our work complements existing work based on the
relational algebra
1 S. Abiteboul, R. Hull, and V. Vianu.
Foundations of Databases. Addison-Wesley, 1995,
173-187, 216-235. 4 G. Gottlob. Computing
covers for embedded functional dependencies.
SIGMOD, 1987.
30Related work (3)
- Some work 6, 8, 9 has been done on the
derivation of constraints in schema integration
based on ER or OO model. - They did not consider schematic discrepancy in
schema integration - They did not prove the completeness of their
methods, while we did.
6 Mong Li Lee and Tok Wang Ling. Resolving
constraint conflicts in the integration of ER
schemas. ER, 1997, 394-407. 8 M. P. Reddy, B.
E. Prasad, and A. Gupta. Formulating global
integrity constraints during derivation of global
schema. Data Knowledge Engineering, 1995,
241-268 9 M. W. W. Vermeer and P. M. G. Apers.
The role of integrity constraints in database
interoperation. VLDB, 1996, 425-435
31Conclusion
- Proposed qualified FD to represent dependencies
for distributed and heterogeneous schemas in a
multidatabase system - Found a set of inference rules and propagation
rules to derive qualified FDs in schematic
discrepant transformation - The rules are sound and complete to derive some
classes (not all) of qualified FDs in schema
transformation
32Reference
- 1 S. Abiteboul, R. Hull, and V. Vianu.
Foundations of Databases. Addison-Wesley, 1995,
173-187, 216-235. - 2 F. Berzal, J. C. Cubero, F. Cuenca, J. M.
Medina. Relational decomposition through partial
functional dependencies. Data Knowledge
Engineering 43(2), 2002, 207-234. - 3 P. De Bra and J. Paredaens. Conditional
dependencies for horizontal decompositions.
ICALP, 1983. - 4 G. Gottlob. Computing covers for embedded
functional dependencies. SIGMOD, 1987. - 5 L. V. S. Lakshmanan, F. Sadri, and S. N.
Subramanian. On efficiently implementing
schemaSQL on SQL database system. VLDB, 1999,
471-482. - 6 Mong Li Lee and Tok Wang Ling. Resolving
constraint conflicts in the integration of ER
schemas. ER, 1997, 394-407. - 7 Tok Wang Ling. Extending classical functional
dependencies for physical database design
(lecture notes), 2001. http//www.comp.nus.edu.sg/
lingtw/cs4221/extended.fds.pdf - 8 M. P. Reddy, B. E. Prasad, and A. Gupta.
Formulating global integrity constraints during
derivation of global schema. Data Knowledge
Engineering, 1995, 241-268 - 9 M. W. W. Vermeer and P. M. G. Apers. The role
of integrity constraints in database
interoperation. VLDB, 1996, 425-435