Henning Christiansen - PowerPoint PPT Presentation

1 / 15
About This Presentation
Title:

Henning Christiansen

Description:

F: DB facts : Trusted constraints F |= ; (IC1,...,ICn, cross s.c. ... Exercise 1: Simplification for efficient IC enforcement in mono DB ... – PowerPoint PPT presentation

Number of Views:22
Avg rating:3.0/5.0
Slides: 16
Provided by: henningchr
Category:

less

Transcript and Presenter's Notes

Title: Henning Christiansen


1
Optimizing integrity checking indata integration
systems withsimplification techniques
2
Simplification
  • A technique for improving efficiency of integrity
    checking in traditional databases
  • Specializing ICs for specific update, using
    assumption that DB consistent before update
  • Suggested by J.-M. Nicolas, 1982
  • Never become part of standard (R)DBMS
  • Elaborated recently by present authors
  • Uncovered theoretical limitations of simp.
  • General and powerful methods developed
  • Typically an order of magnitude gained (or more)

3
This talk
  • Considering how simplification can be used for
    integrity checking maintenance in data
    integration systems/integrated databases

4
Picture of a DI system with ICs
  • IC1 ICn ---
    autonomous sources
  • . . .
    cross-source constraints
  • ICglobal

S1
Sn
msg. about updates performed
Virtual global DB
Trusted IC1,...,ICn, cross-s. c. Desired
ICglobal
5
General definitions
  • Database D ltF,?,?gt
  • F DB facts
  • ? Trusted constraints F ? (IC1,...,ICn,
    cross s.c.)
  • ? constraints to be checked (unfolded version of
    global ICs)
  • Update A set of literals (add, delete)
  • A ? U gt A not ? F A ? U gt A ? F
  • Composition, negation and application of updates
  • U o V, U, F o U, DU ltFoU,?,?gt
  • Props F o (U o V) (F o U) o V, F o U o U F

6
Simplification framework
Works for denial constraints, e.g. ? p(x) ?
??y q(x,y) Parameterized update patterns, i.e.
do simp. at design time Cases of
recursion Constraints over aggregate values SQL
type updates Produces convincing
results Implemented, available on the web
  • AfterU(IC) constraints which in any DB D
    evaluates to same truth value as IC in DU
  • Example Afterp(a)(?p(x)?q(x)) equiv
  • ? (p(x) ? xa) ? q(x)
  • Optimize?(?) a best ? so that
  • For any D with D ?, D ? iff D ?
  • See D.Martinenghi's PhD thesis for
  • analysis of what "best" means
  • a capable implementation of Optimize

7
Exercise 1 Simplification for efficient IC
enforcement in mono DB
  • Example
  • ? p(x)?q(x)
  • U p(a)
  • Simplified check ? q(a)
  • Given DB DltF,?gt and update U
  • if D Optimize?(AfterU(?)) then
  • perform U
  • else
  • reject U

I.e. the ICs are optimized by assumption that
current DB is consistent test before update, and
no "bad" update executed! Tested on wide range
of examples it works.
8
Exercise 2 Optimal IC check of DI system at
"integration time"
  • Given DI database D ltF,?,?gt
  • F DB facts
  • ? Trusted constraints F ? (IC1,...,ICn,
    cross s.c.)
  • ? constraints to be checked (unfolded version of
    global ICs)
  • D consistent iff F Optimize?(?)

Example ICi ?pi(x)?qi(x),
i1,2,global Global p,qunion of local
ones With ?IC1 U IC2, simp. check is
?p1(x)?q2(x), ?p2(x)?q1(x) With ?IC1 U IC2 U
"sources disjoint", simp. check is true, i.e.,
integration can't go wrong
9
Maintain consistent view of DI system using
correction table - (preliminary work no
practical experience)
  • IC1 ICn ---
    autonomous sources
  • . . .
    cross-source constraints
  • ICglobal

S1
Sn
Virtual corrections to sources
Task Maintain correction table so that ICglobal
holds gt provide consistent global view
Virtual global DB
NB Embury al, 2001, has made extensive study
of CTs but without simplification
10
Correction table, CT
  • ?p(x)?q(x), ?r(x)??q(x)
  • F p(a),q(a),r(a)
  • Repairing instance ?p(a)?q(a) by
  • CT ... ?q(a) ...
  • creates another failing instance ?r(a)??q(a)

Known result easy to find examples
  • Def A CT for a database D ltF,?,?gt is an update
    R such that DR ? U ?.
  • R is minimal if no subset of R is a CT.
  • Informally A CT is a virtual update which, if
    executed would restore consistency
  • Problem statement How to produce a CT and how to
    maintain it incrementally when updates are
    reported from the sources

Intuitively Simp. removes all traces of the
update, so we need as well consider CTs that
undoes part of update (no time for example more
later)
  • Problems
  • Exponentially many (minimal) CTs
  • Correcting one problem may cause another
  • Generating CTs from simplified checks only does
  • not give us all relevant CTs

11
Relating CTs to simplification
  • Assume updated state DU with R' being a CT for D.
  • Let ? be a set of constraints with DU ?.
  • Then R is CT for DU iff
  • DU Optimize?(AfterR(? U ?))
  • where ? are all constraints we know holds in DU
  • After?UoR'(?) ? U ? ? ? After?U (?)
  • ? After?UoR' (?)

12
Special case consistently signed ICs
  • Def. consist. signed ?...p(...)... ,
    ?...?p(...)...
  • Let ? be as in previous slide
  • ? Optimize?(?) .... (depends on old R' and
    U)
  • S Collect one literal from each instance
    ? ? ?
  • with DU/ ?
  • Expected property
  • Any minimal CT is a subset of ?U ? ?S

  • Example --gt

13
Example
Notice ? evaluated, not ?
  • ? ? p(x)?q(x)
  • U p(a),p(b)
  • ? q(a), ? q(b)
  • S q(a),q(b)
  • ?U ? ?S
  • p(a),p(b),q(a),q(b)
  • Practical version dialogue with data
    verification agent, e.g., human expert, "voting",
    rules-of-thump (e.g., AGM postulates)

14
Maintenance of CTs, general case
  • I.e., dropping consistently signed requirement
  • We can suggest similar algorithm which requires
  • repeated integrity check
  • repeated runtime application of simp- procedure
  • For practical purposes An engineering job ahead
  • keep track of signs and trace changes
  • partial evaluation, etc.
  • to generate a sort of decision tree with
    preproduced simp. checks

15
Conclusion
  • Simplification is a technique that cuts down
    orders of magnitude for integrity checking
  • We have demonstrated
  • effective and general simp. methods are possible
  • simplification relevant for DI systems
  • Future work
  • practical, large scale implemenations, both mono
    DI (??)
  • allow value modifications in CT (à la J.Wijsen)
  • Further reading
  • Simplification, theory and methods DM's PhD
    thesis 2005 HCDM, Funda.Inf. 2006
  • Simp DI HCDM, FoIKS'04, LAAIC'06
Write a Comment
User Comments (0)
About PowerShow.com