Title: Schema mappings are logical assertions that
1SPIDER a Schema Mapping Debugger
Bogdan Alexe Laura Chiticariu Wang-Chiew
Tan University of California, Santa Cruz
Debugging a Data Exchange
Debugging Schema Mappings
Schema Mappings and Data Exchange
- Schema mappings are logical assertions that
- describe the correspondence between two
schemas - Higher-level, declarative programming constructs
- Hide implementation details, allow for
optimizations - Key elements in data exchange and data
integration systems - Data Exchange FKMP03
- Translate data conforming to a source schema S
into data conforming to a target schema T so that
the schema mapping M is satisfied
- The process of exploring, understanding and
refining a schema mapping through the use of
(test) data at the level of schema mappings
- Approach 1 At the level of the implementation
(unsatisfactory) - Specific to the exchange engine
- Specific to implementation language. E.g.,
XQuery, XSLT, etc - Commercial tools available Altova MapForce,
Stylus Studio, etc - Approach 2 At the level of the schema mapping
(desirable) - Currently, NO SUPPORT!!!
- Motivation for debugging at the level of the
schema mappings Uniformity in specifying and
debugging - Reduce programming effort by allowing a user to
specify and debug at the level of schema mappings
M
Source schema S
Target Schema T
SPIDER
Source instance I
Target instance J
M
routes
Source schema S
Target Schema T
tuple selection
tuple selection
Source instance I
Target instance J
- SPIDER is the first prototype
- debugger for schema mappings
XQuery/SQL/Java
Example
Main Idea Debugging Schema Mappings with Routes
Features
- Routes illustrate the relationship between source
and target data with the schema mapping - Declarative semantics, based on the logical
satisfaction of the dependencies - Independent of any implementation of the schema
mapping - Concept applies to any mapping-based data
exchange or data integration system
- Computing routes for selected source or target
data - Compute all routes
- Compute one route
- Compute alternative routes on demand
- Guided exploration of all routes
- Standard debugging features
- Breakpoints on dependencies
- Watch windows zoom into details about each step
in the routes - Schema-level exploration of routes
- Facilitates the understanding of schema mappings
directly at the level of source and target
schemas - Implementation details
- On top of the Clio data exchange system
- Supports relational and XML schema mappings
Source schema
Target schema
MANHATTAN CREDIT CardHolders cardNo
² limit ² ssn
² name ²
Dependents accNo ² ssn
² name ²
FARGO FINANCE Accounts ² accNo ² creditLine ²
accHolder Clients ² ssn ² name
D1
fk1
C1
D2
Debugging scenario 1
Debugging scenario 2
Source-to-target dependencies
Accounts
Accounts
D1 foreach s0 in MANHATTAN-CREDIT.CardHolders
exists t0 in FARGO-FINANCE.Accounts, t1 in
FARGO-FINANCE.Clients, where
t0.accHoldert1.ssn with s0.cardNo
t0.accNo and s0.ssn t0.accHolder and s0.name
t1.name
Unknown credit limit?
Unknown account number?
ID2
L2
A2
D2 foreach s0 in MANHATTAN-CREDIT.Dependents
exists t0 in FARGO-FINANCE.Clients with
s0.ssn t0.ssn and s0.name t0.name
A route for the Accounts tuple
Accounts
CardHolders
ID1
L1
123
D1
Target dependencies
Alice
ID1
15K
123
Clients
C1 foreach s0 in FARGO-FINANCE.Clients
exists t0 in FARGO-FINANCE.Accounts with s0.ssn
t0.accHolder
Alice
ID1
123 is not copied to the target as Bobs account
number
15K is not copied over to the target
Source instance I
Target instance J
D1 foreach s0 in MANHATTAN-CREDIT.CardHolders
exists t0 in FARGO-FINANCE.Accounts, t1 in
FARGO-FINANCE.Clients, where
t0.accHoldert1.ssn with s0.cardNo
t0.accNo and s0.ssn t0.accHolder and s0.name
t1.name and s0.limit t0.creditLine
CardHolders
Accounts
D2 foreach s0 in MANHATTAN-CREDIT.Dependents,
s1 in MANHATTAN-CREDIT.CardHold
ers where s0.accNos1.cardNo
exists t0 in FARGO-FINANCE.Clients, t1 in
FARGO-FINANCE.Accounts where t1.accHoldert0.ssn
with s0.ssn t0.ssn and s0.name t0.name
and s1.cardNo t1.accNo and
s1.limit t1.creditLine
Clients
Dependents
Solution for I under the schema mapping
Schema-level exploration of routes
Compute All Routes and Compute One Route
- Illustrate the schema mapping at the level of the
source and target schemas
Forest of routes for the Account tuple
Routes obtained from the forest
Accounts
MANHATTAN CREDIT CardHolders cardNo
² limit ² ssn
² name ²
Dependents accNo ² ssn
² name ²
FARGO FINANCE Accounts ² accNo ² creditLine ²
accHolder Clients ² ssn ² name
ID2
L2
A2
Clients
selected schema element
Bob
ID2
CardHolders
Accounts
D1
C1
Bob
ID2
7K
456
ID2
L2
A2
C1
D1
Accounts
fk1
C1
ID2
L3
456
D2
Clients
Bob
ID2
Dependents
Accounts
Clients
D1
D2
D2
C1
Bob
ID2
123
ID2
L2
A2
Bob
ID2
CardHolders
Dependents
Bob
ID2
7K
456
Bob
ID2
123
Towards a full-fledged debugger
- Compute all routes
- For each selected target tuple ts, consider every
possibility for witnessing t. Do not consider the
same tuple twice. - Complete, polynomial time algorithm
- The route forest is a polynomial representation
of all routes (possibly exponentially many) for
the selected tuples - Computation can be user-guided, or stopped with
breakpoints on dependencies
- Compute one route
- Non-exhaustive adapted compute all routes to
stop when one witness is found - Inference procedure to deduce all consequences
of a proven tuple and avoid recomputation of
branches - Complete, polynomial time algorithm
- Future work
- Extension to handle nested schema mappings
- Adapt the target instance with changes in the
schema mapping - Acknowledgements
- Daniel Pepper, UC Santa Cruz
- The Clio team in IBM Almaden Research Center