Debugging Schema Mappings with Routes - PowerPoint PPT Presentation

1 / 38
About This Presentation
Title:

Debugging Schema Mappings with Routes

Description:

(joint work with Wang-Chiew Tan) 2. SPIDER: A Schema Mapping Debugger. Today 14:00-15:30 ... A schema mapping is a logical assertion that describes the ... – PowerPoint PPT presentation

Number of Views:45
Avg rating:3.0/5.0
Slides: 39
Provided by: Lau9270
Category:

less

Transcript and Presenter's Notes

Title: Debugging Schema Mappings with Routes


1
Debugging Schema Mappings with Routes
  • Laura Chiticariu
  • UC Santa Cruz
  • (joint work with Wang-Chiew Tan)

2
SPIDER A Schema Mapping Debugger
Demo group B
Today 1400-1530 Thursday 1100-1230
3
Schema Mappings
  • A schema mapping is a logical assertion that
    describes the correspondence between two schemas
  • Key element in data exchange and data integration
    systems
  • Data Exchange FKMP05
  • Translate data conforming to a source schema S
    into data conforming to a target schema T so that
    the schema mapping M is satisfied

M
Schema S
Schema T
J
Target instance
4
Debugging a Data Exchange Today
M
Schema S
Schema T
XQuery/XSLT/Java
J
Target instance
  • Debugging at the (low) level of the
    implementation
  • Specific to the data exchange engine
  • Specific to the implementation language XQuery,
    SQL, etc
  • Debugging at the level of schema mappings
  • NO SUPPORT!!!

5
Debugging Schema Mappings
M
Schema S
Schema T
J
Target instance
  • Debugging schema mappings the process of
    exploring, understanding and refining a schema
    mapping through the use of (test) data at the
    level of schema mappings

6
Outline
  • Overview
  • Motivation
  • Debugging schema mappings with routes
  • Motivating example
  • What are routes?
  • Computing routes
  • Related work
  • Performance evaluation
  • Conclusions

7
Motivation
  • Schema mappings are good
  • Higher-level, declarative programming constructs
  • Hide implementation details, allow for
    optimization
  • Typically easier to understand vs.
    SQL/XSLT/XQuery/Java
  • Serve a similar goal as model management
    Bernstein03, MBHR05
  • Uniformity in specifying and debugging
  • Reduce programming effort by allowing a user to
    specify and debug at the level of schema mappings
  • Schema mappings are often generated by schema
    matching tools
  • Close to users intention, but may need further
    refinements
  • Hard to understand without the help of tools

8
Language for Schema Mappings
  • Tuple generating dependencies (tgds)
  • 8 x (?(x) ! 9 y ?(x,y))
  • Equality generating dependencies (egds)
  • 8 x (?(x) ! x1 x2)
  • Remarks
  • Widely used for relational schema mappings in
    data exchange and data integration
    Kolaitis05,Lenzerini02
  • TGDs generalize LAV, GAV and are equivalent to
    GLAV assertions in the terminology of data
    integration
  • Extended to handle XML data exchange PVMHF02

9
Relational Schema Mappings FKMP03
  • Schema mapping M (S, T, ?st?t)
  • S, T relational schemas with no relation symbols
    in common
  • Source-to-target dependencies ?st
  • Source-to-target tgds (s-t tgds) ?S(x) ! 9y
    ?T(x,y)
  • Target dependencies ?t
  • Target tgds ?T(x) ! 9y ?T(x,y)
  • Target egds ? T(x) ! x1 x2

?st
?t
Schema S
Schema T
J
Target instance
10
Example Schema Mapping
S
T
MANHATTAN CREDIT CardHolders cardNo
² limit ² ssn
² name ²
Dependents accNo ² ssn
² name ²
Source-to-target dependencies, ?st m1
CardHolders(cn,l,s,n) ! 9L
(Accounts(cn,L,s) Æ Clients(s,n)) m2
Dependents(an,s,n) ! Clients(s,n) Target
dependencies, ?t m3 Clients(s,n) ! 9A 9L
(Accounts(A,L,s))
FARGO FINANCE Accounts ² accNo ² creditLine ²
accHolder Clients ² ssn ² name
m1
fk1
m3
m2
Solution for I under the schema mapping
Target instance J
Source instance I
CardHolders
Accounts
Clients
Dependents
11
Example Debugging Scenario 1
Source instance I
Target instance J
CardHolders
Accounts
Clients
Dependents
Unknown credit limit?
A route for the Accounts tuple
Accounts
CardHolders
ID1
L1
123
m1
Alice
ID1
15K
123
Clients
Alice
ID1
15K is not copied over to the target
m1 CardHolders(cn,l,s,n) ! 9L (Accounts(cn,L,s)
Clients(s,n))
12
Example Debugging Scenario 1
Source instance I
Target instance J
CardHolders
Accounts
Clients
Dependents
Unknown credit limit?
A route for the Accounts tuple
Accounts
CardHolders
ID1
L1
123
m1
Alice
ID1
15K
123
Clients
Alice
ID1
15K is not copied over to the target
m1 CardHolders(cn,l,s,n) !
(Accounts(cn,l,s) Clients(s,n))
13
Example Debugging Scenario 2
Source instance I
Target instance J
CardHolders
Accounts
Clients
Dependents
Unknown account number?
123 is not copied over to the target as Bobs
account number
m2 Dependents(an,s,n) ! Clients(s,n)
14
Example Debugging Scenario 2
Source instance I
Target instance J
CardHolders
Accounts
Clients
Dependents
Unknown account number?
123 is not copied over to the target as Bobs
account number
m2 CardHolders(an,l,s,n) Dependents(an,s,n)
! Accounts(an,l,s)
Clients(s,n)
15
Debugging Schema Mappings with Routes
  • Main intuition routes describe the relationships
    between source and target data with the schema
    mapping
  • Definition Let
  • M be a schema mapping
  • I be a source instance
  • J be a solution for I under M and Js µ J
  • A route for Js with M and (I,J) is a finite
    non-empty sequence of satisfaction steps
  • (I,) ! (I,J1) ! !
    (I,Jn)
  • such that
  • Ji µ J, mi 2 ?st ?t, where 1 i n
  • Js µ Jn

mn, hn
m1, h1
m2, h2
16
Example of Satisfaction Step
Source instance I
Target instance J
CardHolders
Accounts
Clients
Dependents
Unknown credit limit?
Accounts
CardHolders
m1, h1
Clients
m1 CardHolders(cn, l, s, n) ! 9L (Accounts(cn,
L, s ) Clients(s, n ))
h1cn ! 123, l ! 15K, s ! ID1, n ! Alice, L !
L1
17
Compute all routes
  • The schema mapping M is fixed
  • Input source instance I, a solution J for I
    under M, a set of target tuples Js µ J
  • Output a forest representing all routes for Js
  • Algorithm idea
  • For each tuple t in Js, consider every possible ?
    2 ?st ?t and h for witnessing t
  • Do the same for all target tuples encountered
    during the process until tuples from the source
    instance are obtained

18
Compute all routes A simple example
T7(a)
  • ?st
  • ?1 S1(x) ! T1(x)
  • ?2 S2(x) ! T2(x) Æ T6(x)
  • ?t
  • ?3 T2(x) ! T3(x)
  • ?4 T3(x) ! T4(x)
  • ?5 T4(x) Æ T1(x) ! T5(x)
  • ?6 T4(x) Æ T6(x) ! T7(x)
  • ?7 T5(x) ! T3(x)
  • Source instance, I
  • S1(a), S2(a)
  • A solution, J
  • T1(a), , T7(a)

19
Compute all routes A simple example
T7(a)
  • ?st
  • ?1 S1(x) ! T1(x)
  • ?2 S2(x) ! T2(x) Æ T6(x)
  • ?t
  • ?3 T2(x) ! T3(x)
  • ?4 T3(x) ! T4(x)
  • ?5 T4(x) Æ T1(x) ! T5(x)
  • ?6 T4(x) Æ T6(x) ! T7(x)
  • ?7 T5(x) ! T3(x)
  • Source instance, I
  • S1(a), S2(a)
  • A solution, J
  • T1(a), , T7(a)

?6
T4(a) T6(a)
20
Compute all routes A simple example
T7(a)
  • ?st
  • ?1 S1(x) ! T1(x)
  • ?2 S2(x) ! T2(x) Æ T6(x)
  • ?t
  • ?3 T2(x) ! T3(x)
  • ?4 T3(x) ! T4(x)
  • ?5 T4(x) Æ T1(x) ! T5(x)
  • ?6 T4(x) Æ T6(x) ! T7(x)
  • ?7 T5(x) ! T3(x)
  • Source instance, I
  • S1(a), S2(a)
  • A solution, J
  • T1(a), , T7(a)

?6
T4(a) T6(a)
?4
T3(a)
21
Compute all routes A simple example
T7(a)
  • ?st
  • ?1 S1(x) ! T1(x)
  • ?2 S2(x) ! T2(x) Æ T6(x)
  • ?t
  • ?3 T2(x) ! T3(x)
  • ?4 T3(x) ! T4(x)
  • ?5 T4(x) Æ T1(x) ! T5(x)
  • ?6 T4(x) Æ T6(x) ! T7(x)
  • ?7 T5(x) ! T3(x)
  • Source instance, I
  • S1(a), S2(a)
  • A solution, J
  • T1(a), , T7(a)

?6
T4(a) T6(a)
?4
T3(a)
?7
T5(a)
22
Compute all routes A simple example
T7(a)
  • ?st
  • ?1 S1(x) ! T1(x)
  • ?2 S2(x) ! T2(x) Æ T6(x)
  • ?t
  • ?3 T2(x) ! T3(x)
  • ?4 T3(x) ! T4(x)
  • ?5 T4(x) Æ T1(x) ! T5(x)
  • ?6 T4(x) Æ T6(x) ! T7(x)
  • ?7 T5(x) ! T3(x)
  • Source instance, I
  • S1(a), S2(a)
  • A solution, J
  • T1(a), , T7(a)

?6
T4(a) T6(a)
?4
T3(a)
?7
T5(a)
23
Compute all routes A simple example
T7(a)
  • ?st
  • ?1 S1(x) ! T1(x)
  • ?2 S2(x) ! T2(x) Æ T6(x)
  • ?t
  • ?3 T2(x) ! T3(x)
  • ?4 T3(x) ! T4(x)
  • ?5 T4(x) Æ T1(x) ! T5(x)
  • ?6 T4(x) Æ T6(x) ! T7(x)
  • ?7 T5(x) ! T3(x)
  • Source instance, I
  • S1(a), S2(a)
  • A solution, J
  • T1(a), , T7(a)

?6
T4(a) T6(a)
?4
?8
T3(a)
S2(a)
?7
?3
T5(a)
T2(a)
?2
S2(a)
Route for T7(a) ?2, ?3, ?4, ?8, ?6
24
Properties of compute all routes
  • Completeness
  • Let F denote the route forest by our algorithm
    returned on Js. If R is a minimal route for Js,
    then it is represented in F.
  • Running time polynomial in the sizes of I, J and
    Js
  • Every branch of a tuple once explored, is never
    explored again
  • Polynomial number of branches for each tuple
    since M is fixed
  • Challenge
  • Exponentially many routes, but polynomial-size
    representation constructed in polynomial time

25
Compute one route
  • Our experimental results indicate that compute
    all routes can be expensive
  • Generate one route fast and alternative routes as
    needed?
  • Our solution adapt compute all routes to compute
    only one route
  • Non-exhaustive Stops when one witness is found.
    A witness that uses source tuples is preferred
  • Inference procedure to deduce all consequences
    of a proven tuple and avoid recomputation of
    branches
  • Key step for polynomial time analysis
  • Completeness If there is a route for Js, then
    our algorithm will produce a route for Js

26
Related work
  • Commercial data exchange systems
  • e.g., Altova MapForce, Stylus Studio
  • Use lower-level languages (e.g., XSLT, XQuery)
    to specify the exchange
  • Debugging is done at this low level
  • Source tuple centric
  • Data viewer YMHF01
  • Constructs an example source instance
    illustrative for the behavior of the schema
    mapping
  • Complementary to our approach
  • Works only for relational schema mappings

27
Related work
  • Computing routes for target data is related to
    computing provenance (aka lineage) of data

28
Empirical Evaluation
  • Implementation on top of the Clio data exchange
    system from IBM Almaden Research Center
  • Scalable push computation to the database
  • Handles relational and XML schema mappings
    PVMHF02
  • Testbed
  • Created relational and XML schema mappings based
    on the TPCH schema
  • Created schema mappings based on Mondial, DBLP
    and Amalgam schemas
  • Methodology - measured the influence of
  • The sizes of I, J and Js
  • The complexity of ?st ?t
  • i.e., the number of tgds and the number of atoms
    in each tgd
  • Setup P4 2.8GHz, 2Gb RAM, 256MB DB2 buffer pool
  • Our regret No benchmark to base our comparisons

29
ComputeOneRoute with Rel. schema
mappingInfluence of the Sizes of I and J
30
ComputeOneRoute with Rel. schema
mappingInfluence of the Complexity of ?st ?t
31
ComputeOneRoute vs. ComputeAllRoutes
32
Experimental results with Mondial, DBLP and
Amalgam
33
Experimental results with Mondial, DBLP and
Amalgam
  • Two DBLP schemas and datasets, both XML
  • DBLP1, DBLP2
  • First relational schema from Amalgam test suite

34
Experimental results with Mondial, DBLP and
Amalgam
  • Two DBLP schemas and datasets, both XML
  • DBLP1, DBLP2
  • First relational schema from Amalgam test suite
  • Two Mondial schemas and datasets
  • one relational (Mondial1), the other XML
    (Mondial2)
  • Designed ?st and used the foreign key constraints
    as ?t

35
Experimental results with Mondial, DBLP and
Amalgam
  • Compute one route under 3 seconds for 1-10
    randomly selected tuples
  • Compute all routes can take much longer
  • 18 seconds to construct the route forest for 10
    selected tuples in the target instance of Mondial
  • Compute one route took under 1 second

36
Conclusions
  • Debugging schema mappings with routes
  • Complete, polynomial time algorithms for
    computing routes
  • Extension for routes for selected source data
  • Routes have declarative semantics, based on the
    logical satisfaction of tgds
  • What we dont do illustrate data merging
  • Future work
  • Illustrate grouping semantics for nested schema
    mappings
  • Adapt target instance to changes in the schema
    mapping and data sources

37
SPIDER A Schema Mappings Debugger
Demo group B
Today 1400-1530 Thursday 1100-1230
  • Compute one/all routes
  • Alternative routes
  • Guided computation of routes
  • Standard debugging features
  • Breakpoints
  • Watch windows
  • Schema-level routes

38
Thank you!
Write a Comment
User Comments (0)
About PowerShow.com