Title: Efficient Query Rewriting for Inconsistent Databases
1Efficient Query Rewriting for Inconsistent
Databases
- Ariel D. Fuxman, Renée J. Miller
- Dept. of Computer Science
- University of Toronto
2Motivation
and well after it is created, we realize that
some constraint is missing
- Suppose we create a database
Institute ! City
3Semantics
- How do we query an inconsistent database?
- Repairs of the database ABC99.
- Certain answers, where the possible worlds are
the repairs of the database.
4Repairs
Repair 1
Inconsistent database
Repair 2
Institute ! City
5Consistent Query Answering
- qGet cities where there is some institute
Repair 1
q
Repair 2
q
consistentS(q,I)(San Jose)
q(I)(Markham), (Toronto), (San Jose)
6Computation
- Goal Efficient computation of consistent query
answers. - avoid the explicit construction of the repairs.
- query the inconsistent database directly.
- reuse existing database technology.
7Query Rewriting
q,?
First-Order
Rewriting Algorithm
Q
RDBMS
- t 2 Q(I) iff t 2 consistent?(q,I), for every
instance I
8Problems
- Retrieving consistent answers is a hard problem
- coNP-hard in general for conjunctive queries and
one f.d. per relation. CM02,CLR03 - First-order query rewriting algorithms have been
given for quantifier-free queries ABC99 - Quantifier-free queries do not allow projections.
9Our Goals
- Develop a first-order query rewriting algorithm.
- Characterize a class of queries for which the
algorithm is correct. - Main requirement the class must capture most
queries that arise in practice.
10Setting
- The database is inconsistent wrt at most one key
dependency (primary key) per relation of the
schema. - Convention the key attributes of a relation are
underlined in the query. - q9 in,ci. R1(in,ci) Æ R2(ci,canada)
11Query Rewriting Example 1
- q(in) R1(in,toronto)
- Get institutes in Toronto
- Q(in) R1(in,toronto) Æ
- (8 ci.R1(in,ci) ! citoronto)
Institute ! City
12Query Rewriting Example 2
- q(in)9 ci. R1(in,ci) Æ R2(ci,canada)
- Get institutes in Canada
R2
R1
Institute ! City
City ! Country
13Query Rewriting Example 2
- q(in)9 ci. R1(in,ci) Æ R2(ci,canada)
R2
R1
- Q(in)9 ci.R1(in,ci) Æ R2(ci,canada) Æ
- (8 ci.R1(in,ci) !(9 co.R2(ci,co) Æ
- (8 co.R2(ci,co) ! cocanada)))
14Query Rewriting Example 2
- q(in)9 ci. R1(in,ci) Æ R2(ci,canada)
R2
R1
- Q(in)9 ci.R1(in,ci) Æ R2(ci,canada) Æ
- (8 ci.R1(in,ci) !(9 co.R2(ci,co) Æ
- (8 co.R2(ci,co) ! cocanada)))
15SQL Rewriting
- Schemas R1(IN,CI) R2(CI,CO)
- SELECT r1.IN
- FROM R1 r1, R2 r2
- WHERE r1.CIr2.CI AND r2.COcanada
- AND NOT EXISTS
- (SELECT FROM R1 r1 WHERE r1.INr1.IN AND
- (NOT EXISTS (SELECT FROM R2 r2
- WHERE r1.CIr2.CI)
- OR (EXISTS (SELECT FROM R2 r2
- WHERE r1.CI r2.CI
- AND r2.CO ltgtcanada))))
16Class of queries
- For what class of conjunctive queries does the
algorithm work?
17Join graph
- q9 x,y,z,w R1(x,y) Æ R2(y,z) Æ R3(z,w) Æ R4(y,a)
R1
R2
R3
R4
Condition The join graph of the query is a tree.
18A practical class
- Condition join graph is a tree
- Allows key-to-key and non-key to key joins
- The query must be acyclic
- We capture many queries that arise in practice.
- 20 out of 22 queries in the TPC-H standard are
within the class.
19Future Work
- Theory
- Incompleteness inconsistency
- Find a dichotomy for the class of conjunctive
queries. - Find a rewriting for queries in more expressive
query languages. - Inclusion dependencies
- Practice
- Implement the rewriting for SQL.
- Experiment with the performance of the
translation - Provide optimization techniques
- Integrate the techniques into a data exchange
environment
20Bibliography
J. Chomicki and J. Marcinkowski. On the
Computational Complexity of Consistent Query
Answers. coRR cs.DB/0204010, 2002. M. Arenas, L.
Bertossi, and J. Chomicki. Consistent Query
Answers in Inconsistent Databases, Proc. ACM
PODS, 1999. Andrea Calì, Domenico Lembo,
Riccardo Rosati. On the decidability and
complexity of query answering over inconsistent
and incomplete databases, Proc. ACM PODS,
2003. Oliver Duschka, Michael Genesereth. Query
Planning with Disjunctive Sources. AAAI Workshop
on AI and Information Integration.
21Rewritable queries
- The problem of computing consistentS(q,I) is
not tractable for the following queries. - q9 x,z,y R(x,y,a) Æ R(z,y,b)
- q9 x,y,z R(x,z) Æ R(x,y) Æ R(z,y)
- Notice that both queries have joins between
non-key attributes.
22Rewritable queries
- The query we used in the example only has joins
between a non-key and a key attribute - q(in)9 ci R1(in,ci) Æ R2(ci,US)
- This is the most common situation in practice.
23A class of rewritable queries
- Theorem Let q be a conjunctive query containing
only joins between non-key and key attributes.
Then, RewriteConjunctive produces a first-order
rewriting Q of q. Furthermore,
RewriteConjunctive runs in polynomial time in the
size of q.
24Applications
- Data Integration
- Data Exchange (Clio,)
- Data Cleaning (dynamic data cleaning)
- Peer-to-peer data exchange
25Example A tractable query
Are there two employees with the same salary?
Inconsistent instance
Graph of the inconsistent instance
1000
John
Mary
2000
Anna
3000
Employee ! Salary
26Example A tractable query
)
1000
John
Mary
2000
Anna
3000
27Example A tractable query
)
1000
John
Mary
2000
Anna
3000
28Example A tractable query
(
1000
John
Mary
2000
Anna
3000
29Example A tractable query
(
1000
John
Mary
2000
Anna
3000
30Generalization
- Arbitrary conjunctive queries with inequalities
on one binary relation.
- Reduction to degree-constrained subgraph problem,
which can be solved by network-flow techniques.
The running time of the algorithm is O(nk2).
31Practical Considerations (I)
Conflicts are usually confined to a small portion
of the database
Robert
4000
Fred
5000
Paul
6000
7000
Peter
1000
John
2000
Mary
Anna
3000
32Practical Considerations (I)
Conflicts are usually confined to a small portion
of the database
1000
John
2000
Mary
Anna
3000
33Practical Considerations (II)
Reasonable assumption constant number of
conflicts per key.
Financial
Global Database
Employee ! Salary
Human Resources
Employee ! Salary
Employee ! Salary
34Practical Considerations (II)
- Assuming bounded number of conflicts per key,
- queries with some free variables are easier than
boolean queries! - Are there three employees with the same salary?
- Get all employees who have the same salary as two
other employees - Q can be solved in O(n2), as opposed to O(n3)
for Q
35Motivation (Data Exchange)
Source 1 Institute City
Target Institute City
Institute ! City
Institute ! City
Source 2 Institute City
Institute ! City
36Motivation (Data Exchange)
Source1
Target
Institute ! City
Source2
Institute ! City
Mapping Source1(in,ci) µ Target(in,ci) Source2(in,
ci) µ Target(in,ci)
Institute ! City
37Motivation (Data Exchange)
Source1
Target
Institute ! City
Source2
Institute ! City
Institute ! City
38Motivation (Data Exchange)
- Approaches in data exchange
- Say that there is no target database for the
given sources. - Construct a target database, disregarding the
target constraints. The database may be
inconsistent wrt the constraints.
Institute ! City
39Inconsistent Databases
- Both examples have in common that we end up with
a database that is inconsistent wrt a set of
constraints.
40Repairs
- Definition Let S be set of integrity
constraints, and I be database instance. We say
that a database instance I is a repair of I wrt
S iff the following conditions hold - I ² S, and
- D(I,I) is minimal under set inclusion in the
class of instances that satisfy S, that is, there
is no instance I' such that I' ² S and
D(I,I) ½ D(I,I)
41Consistent Query Answers
- Definition Let I be a database instance,
possibly not satisfying a set S of integrity
constraints. Let q be a query. We say that a
tuple t is a consistent answer wrt S, denoted t 2
consistentS(q,I), if for every repair I of I wrt
S, t 2 q(I).
42Query Rewriting An Example
- q(in)9 ci R1(in,ci) Æ R2(ci,US)
- Q(in)9 ci R1(in,ci)
R2
R1
in
43Query Rewriting An Example
- q(in)9 ci R1(in,ci) Æ R2(ci,US)
- Q(in)9 ci R1(in,ci) Æ
- (8 ciR1(in,ci) !
R2
R1
ci
in
44Query Rewriting An Example
- q(in)9 ci R1(in,ci) Æ R2(ci,US)
- Q(in)9 ci R1(in,ci) Æ
- (8 ciR1(in,ci) !(9 coR2(ci,co)
R2
R1
ci
in
45Query Rewriting An Example
R2
R1
ci
co
in
46Motivation (Data Cleaning)
- We could clean the database
but we might miss some answers. Is TO Lab an
IBM institute?
YES
NO
Institute ! City
47Motivation (Data Cleaning)
- Or we may not have enough information to do the
cleaning
Institute ! City
48Repairs
Repairs must differ minimally from the
inconsistent database
The following is not a repair
Inconsistent database
Institute ! City
49Consistent Query Answers
Repair 1
q
Repair 2
q
- consistentS(q,I)(TO Lab),(Almaden)
50Consistent Query Answering
- Based on the notion of certain answers.
- But the possible worlds are the repairs.
- Intuition
- Input Query q.
- Apply q to every repair, and obtain a query
result from each one. - We say that a tuple t is in the consistent answer
if it appears in every query answer.
51SQL Rewriting
- Schemas R1(IN,CI) R2(CI,CO)
- SELECT r1.IN
- FROM R1 r1 JOIN R2 r2 ON r1.CIr2.CI
- WHERE r2.CO canada AND
- NOT EXISTS
- (SELECT
- FROM R1 r1 JOIN R2 r2
- ON r1.CI r2.CI
- WHERE r1.INr1.IN AND r2.CO ltgtcanada)