WP 8: Assessment and Dissemination - PowerPoint PPT Presentation

1 / 48
About This Presentation
Title:

WP 8: Assessment and Dissemination

Description:

Comprehensive, formal design methodologies and coherent tools ... Visual Wrappers (LiXto, RODAN Data Extractor) Logical source format: fragment of XML Schema ... – PowerPoint PPT presentation

Number of Views:22
Avg rating:3.0/5.0
Slides: 49
Provided by: eit4
Category:

less

Transcript and Presenter's Notes

Title: WP 8: Assessment and Dissemination


1
INFOMIX Data Integration meets Nonmonotonic
Deductive DatabasesThomas Eiter Institute
of Information Systems Vienna University of
Technology
2
Overview
  • Motivation
  • Information Integration Framework
  • Nonmonotonic Logic Programs
  • INFOMIX Architecture
  • Repair Programs
  • Focussing Techniques
  • Conclusion

3
Motivation
  • Data integration Increasing demand
  • byproduct of expansion of internet and WWW
  • Highly complex problem
  • Current solutions in practice pragmatic
  • Comprehensive, formal design methodologies and
    coherent tools for designs are missing
  • Towards information integration at human-level
    competence, utilizing reasoning capabilities

4
INFOMIX Objectives
  • Powerful information integration
  • Comprehensive information model
  • deal with incomplete and/or inconsistent
    information
  • Information integration algorithms
  • Usage of Computational Logic
  • Integration of results on data acquisition
    transformation
  • Prototype system

5
Project Partners
  • University of Calabria Nonmonotonic LP,
    Deductive DB (Leone, Greco, Ianni, )
  • University of Rome La Sapienza Data
    Integration (Lenzerini, Cali, Lembo, Rosati,
    ... )
  • TU Wien Nonmonotonic LP, data acquisition
    extraction (Eiter, Faber, Gottlob, Fink )
  • RODAN Systems Database system implementation
    (Staniszkis, Nowicki, Kalka)

6
Data Integration System Basic View
User Query
Result
Global Schema
Mapping
Source 1
Source 2
7
Formal Information Integration Framework
  • Data Integration System
  • I ltG, M, Sgt
  • G Global Schema
  • M Mapping Assertions
  • S Source Schema

8
Global Schema
  • G ltR, S gt
  • R relational schema (set of relations)
  • S set of constraints
  • Key constraints
  • Inclusion dependencies
  • Exclusion dependencies
  • .

9
Source Schema
  • S ltRS,gt
  • RS relational schema (set of relations)
  • No integrity constraints on sources
  • Different interpretations of data retrieved
    fromsources wrt data satisfying the global
    schema (later)

10
Mapping Assertions
  • Link sources and global relations
  • ltqS, qGgt
  • qS query over the sources RS
  • qG query over the global relations R
  • Informally lhs corresponds to rhs
  • Covers
  • GAV (qG relation r in R)
  • LAV (qS relation s in S)

11
Semantics
  • Given Instance D of the source schema S
  • Issue Instance DG of the global schema G
    ltR,Sgt
  • DG must satisfy the constraints S
  • DG must comply with a mapping assumption (MA)
  • sem(I,D) DG DG complies with MA
  • Most important soundness, exactness

12
Example Sound Semantics
KDs player1, team1, coach1 IDs team3 ?
player1 EDs player1 ? coach1
?
team
player
coach
player(X,Y,Z)- s1(X,Y,Z,W)
team(X,Y,Z)- s2(X,Y,Z)team(X,Y,Z)- s3(X,Y,Z)
coach(X,Y,Z)- s4(X,Y,Z)
s1
s4
s2
s3
13
User Queries
  • Important Core Conjunctive Queries (CQs)
  • q(x) - r1(x1),r2(x2),,rk(xk)
  • Extensions UOCs, datalog w/o negation,
    recursion, built-ins
  • Semantics Certain answers
  • anssem(q,I,D) c c in qDG , for each DG in
    sem(I,D)

14
Example
  • Query
  • q(X) - player(X,Y,Z).
  • Result
  • anssound(q,I,D) 10, 9, 8 .

player
15
Semantics for Inconsistency
  • Problem sem(I,D) possible
  • Relax choice of DG (Loose semantics vs
    strict)
  • DG must satisfy global constraints
  • DG should comply as close as possible with
    mapping assumption
  • Select best (minimal) DG under ordering DG ?D DG
  • GAVsound get more of qS(D)
  • r(DG) ? qS(D) ? r(DG) ? qS(D)
  • GAVcomplete miss qS(D) less
  • r(DG) ? qS(D) ? r(DG) ? qS(D)
  • GAVexact get more of qS(D) and miss it less

16
Example Sound Semantics
Additional tuple in s3 Inconsistency (KD
team1 violated)
team
player
coach
player(X,Y,Z)- s1(X,Y,Z,W)
team(X,Y,Z)- s2(X,Y,Z)team(X,Y,Z)- s3(X,Y,Z)
coach(X,Y,Z)- s4(X,Y,Z)
s1
s4
s2
s3
17
Example / 2
Two possibilities for ?D-minimal DG (no extra
tuples)
player
coach
1)
team
team
player
coach
2)
Query answer ansloosely-sound(q,I,D)
10, 9, 8
18
Complexity of Query Answering
  • Queries UOCs
  • GAV non-recursive Datalog, LAVCQs
  • Constraints and mapping assumptions interact
  • Data/combined complexity (lower bounds if
    decidable above PTIME)
  • NKC non-key conflicting IDs rA ? sB ?s
    has key K ? K?B.
  • 1KC 1-key conflicting IDs rA ? sB ?
    s has key K ? K?B ? BK1.

19
How to Evaluate Queries / Semantics?
  • Approach Use computational logic
  • Advantages
  • executable specification of semantics
  • obtain computational power needed
  • Desiderata
  • close to database processing
  • non-determinism (for global view)
  • efficiency

20
Basic Approach
  • Query Rewriting
  • Query q(x) on global schema G ? query q(x) on
    source schema S.
  • (perfect rewriting, data independent)
  • Feasibility depends on
  • mapping type (GAV/LAV) and language
  • semantics
  • type of constraints
  • input / output query language

21
Some Results
  • For UOCs
  • GAV non-recursive Datalog(neg) mapping
  • perfect rewriting under
  • strictly- / loosely-sound semantics with KDs,
    IDs, EDs
  • output language general Datalog(neg)
  • LAV CQs mappings
  • compilation of LAV into GAV, for strictly-sound
    semantics with IDs and EDs

22
Nonmonotonic Logic Programs
  • Disjunctive Datalog(neg) Rules
  • h1(x1) v v hl(xk) - b1(y1),,bm(ym), not
    c1(z1),,not cn(zn)
  • function-free atoms (constants allowed)
  • non-monotonic negation (not)
  • Semantics
  • minimal model semantics (not-free programs)
  • stratified semantics (layered negation)
  • stable model semantics (Gelfond Lifschitz, all
    programs)
  • Complexity Expressiveness
  • captures co-NPNP queries
  • co-NPNP / co-NEXPNP data / combined complexity

23
Nonmonotonic Logic Programs / 2
  • Non-determinism
  • Example Select one element from a set s.
  • in(X) - s(X), not out(X).
  • out(X) v out(Y) - s(X), s(Y), X
    ltgt Y.
  • Extensions
  • strong (classical) negation
  • weight constraints
  • aggregates, .
  • Efficient implementations
  • DLV (TU Vienna, U Calabria), Smodels (TU
    Helsinki),
  • Important KR tools (e.g. for Answer Set
    Programming)

24
Challenges for Nonmonotonic LP
  • Interfacing standard relational DB
  • Scalability
  • Facilities for query answering(DLV / Smodels
    were more conceived as model generators)
  • Remark Historically, DLV set out as a deductive
    DB engine DLV uses a lot of DDB
    technology

25
INFOMIX High Level Architecture
26
Information Service Layer
  • Define the data integration system I (G,M,S)
  • Store descriptions in Metadata Repository
  • Accept user queries
  • Visualize query results
  • INFOMIX Query Language (IQL)
  • subset of stratified Datalog(neg), depending on
    decidability

/- NKC / General
27
Data Acquisition Transformation (DAT) Layer
  • Access raw data in different formats
    (relational, HTML, XML, OO)
  • Support data extraction from web pages (LiXto
    technology)
  • Wrappers
  • Code Wrappers (API)
  • Query Wrappers (e.g., SQL/ODBC)
  • Visual Wrappers (LiXto, RODAN Data Extractor)
  • Logical source format fragment of XML Schema
    (akin to complex values)

28
DAT Relevant Data Formats in INFOMIX
ISDF XML fragment
29
Wrapper Design
Goal Relieve Designer from technical details
30
Integration Layer
  • Perform the data integration
  • Receive requests from Information Service Layer
  • Compute query rewritings
  • Evaluate query rewritings, interacting with DAT
    Layer
  • Approach
  • combine / couple DLV with standard relational
    engines
  • narrow use of DLV to where it is needed(push
    work to relational engines as much as possible)

31
Repair Programs
  • GAV setting, loosely semantics (typically, exact
    sem.)
  • Repair semantics (Bertossi et al., Chomicki
    and Marcinkowski, Greco et al., )
  • each ?G-minimal DG w.r.t. the retrieved database
    ret(I,D) ( materialized mappings) is a repair
  • repI(D) is the set of all repairs
  • Query answering
  • ans(q,I,D) c q(c) holds w.r.t each R in
    repI(D)

32
LP Specification for querying I(G,M,S)
  • Disjunctive Datalog(neg) program
  • PI(q) PM PS Pq
  • where
  • PM is (stratified) a Datalog(neg) program, for
    retrieving the data from the sources
  • PS is a disjunctive Datalog(neg) program,
    computing repI(D) in its stable models
  • Pq is a nonrecursive Datalog(neg) program
    encoding the query q(x) on top

33
  • Hierarchical structure
  • PM gt PS gt Pq
  • ret(I,D) ? SM(PM D)
  • repI(D) ? SM(PS ret(I,D))
  • ans(q,I,D) c q(c) in M for each M
    in SM(Pq M), DG?repI(D) c q(c)
    in M for each M in SM(Pq PS ret(I,D))
    c q(c) in M for each M in SM(PI(q)
    D).
  • Compile-in non-key conflicting IDs

34
Example
  • Pq q(X) - player(X,Y,Z). q(X) -
    team(V,W,X). from ID team3
    ? player1
  • PM player_D(X,Y,Z) - s1(X,Y,W,Z).
    team_D(X,Y,Z) - s2(X,Y,Z).
    team_D(X,Y,Z) - s3(X,Y,Z).
    coach_D(X,Y,Z) - s4(X,Y,Z).
  • PS player(X,Y,Z) - player_D(X,Y,Z), not
    player(X,Y,Z). key player1.
    player(X,Y,Z) v player(X,V,W)-
    player_D(X,Y,Z), player_D(X,V,W), Y ltgt V.
    player(X,Y,Z) v player(X,V,W)- player_D(X,Y,Z),
    player_D(X,V,W), Z ltgt W.

35
Example / 2
  • team(X,Y,Z) - team_D(X,Y,Z), not
    team(X,Y,Z). key
    team1. team(X,Y,Z) v team(X,V,W) -
    team_D(X,Y,Z), team_D(X,V,W), Y ltgt V.
    team(X,Y,Z) v team(X,V,W) - team_D(X,Y,Z),
    team_D(X,V,W), Z ltgt W.
  • coach(X,Y,Z) - coach_D(X,Y,Z), not
    coach(X,Y,Z). key coach1.
    coach(X,Y,Z) v coach(X,V,W) - coach_D(X,Y,Z),
    coach_D(X,V,W),Y ltgt V. coach(X,Y,Z) v
    coach(X,V,W) - coach_D(X,Y,Z), coach_D(X,V,W),
    Z ltgt W.


  • ED player1,coach1.player(X,Y,Z) v
    coach(X,V,W)- player_D(X,Y,Z), coach_D(X,V,W).

36
Query Optimization
  • Different repair encodings
  • E.g., use of unstratified negation instead of
    disjunction
  • ? Equivalence of logic program encodings
  • Focussing techniques
  • Relevance prune useless rules in PI(q) .
  • Decomposition localize inconsistency in
    ret(I,D).
  • Recombination combine localized repairs to
    answer q.

37
Decomposition
  • Conflict set Cret(I,D) (via ?), syntactic
    conflict closure Cret(I,D)
  • affected part (Aret(I,D) ) and safe part
    (Sret(I,D) ) of ret(I,D)
  • works for
  • universal constraints
  • ?x A1? ? An ? B1 ? ? Bm ? f1 ? ? fk ,
    nmgt0,

  • fi built-in literals (, ltgt, etc)
  • similarity-compliant ?D ( R?ret(I,D) ?
    R?ret(I,D) ? R ltD R)

38
Main Results
  • For each R in repI(D) there is some R in
    rep(Aret(I,D)) such that
  • R (R ? Cret(I,D)) ? Sret(I,D).
  • For each R in rep(Aret(I,D) ) there is such an R
    in repI(D).
  • Computing Aret(I,D) , Cret(I,D) is expensive,
    while computing Cret(I,D) is efficient (use DB
    engines)
  • If ngt0, each R in rep(Aret(I,D) ) is included
    in Cret(I,D)
  • If ngt0 and m0 (e.g., FDs, KDs, EDs), each R in
    rep(Aret(I,D)) is included in Cret(I,D)

39
DLV Developments
  • Coupling with DBMS
  • ODBC interface
  • Relevance for query answering
  • Magic set techniques
  • Non-ground query answering
  • internal marking of relations

40
Recombination
  • Answer query q(x) from localized repairs
  • ans(q,I,D) c q(c) in each M in
    SM(Pq (R cap Cret(I,D))
    Sret(I,D)), R in
    rep(Aret(I,D))
  • Simplifies for special constraints (ngt0 ngt0,m0)
  • Practical method Repair Compilation
  • store all repairs R in rep(Aret(I,D)) in a
    relational DB
  • mark tuples of Aret(I,D) with bitstring
    (membership in R)
  • rewrite q(x) to an SQL query on marked ret(I,D)

41
Experiments
  • Experiments on synthetic data sets (football
    teams, graph 3-coloring) showed positive effects
  • Drastic improvement over naïve DLV evaluation
  • Still, marking effort increases quickly with
    number of conflicts (viable for few conflicts)
  • But Inspiration to internal marking of
    relations for non-ground query answering in DLV

42
INFOMIX Demo Scenario
  • University of Rome La Sapienza
  • information about students, courses, professors,
    exams ...
  • 3 legacy databases (MySQL), lots of web pages
  • Global schema 15 relations,
    30 constraints (KDs,IDs,EDs)
  • 40 Wrappers (query wrappers, visual wrappers)
  • 10 user queries
  • Experiments Talk by Gianluigi Greco

43
Conclusion
  • INFOMIX Powerful information integration
  • Dealing with inconsistent and incomplete sources
  • Hard problems to solve
  • Fruitful use of computational logic
  • Rich Data Acquisition and Transformation Layer
  • Prototype (under implementation, available soon)

44
Further Issues
  • Data Cleaning Important aspect
  • Improve wrapper retrieval
  • Methodology for Design / Usage
  • Compile (parts of) logic program to DBMS DLVDB
  • Challenge Information integration for
    semi-structured data

45
Publications
  • INFOMIX homepage
  • http//sv.mat.unical.it/infomix/
  • INFOMIX Reports
  • Papers in conferences and journals (PODS,
    ICDT, IJCAI, KR, ICLP, LPNMR, JELIA, )

46
Data Integration System
  • Provides a global, unified view of a set of
    heterogeneous, autonomous sources
  • A mapping specifies the relationship between the
    global view and the sources
  • Users pose queries to the global view of the
    data
  • The system computes the answers to the query by
    suitably accessing the sources

47
Example (non-disjunctive)
  • Pq q(X) - player(X,Y,Z). q(X) -
    team(V,W,X). from ID team3
    subset player1
  • PM player_D(X,Y,Z) - s1(X,Y,W,Z).
    team_D(X,Y,Z) - s2(X,Y,Z).
    team_D(X,Y,Z) - s3(X,Y,Z).
    coach_D(X,Y,Z) - s4(X,Y,Z).
  • PS player(X,Y,Z) - player_D(X,Y,Z), not
    player(X,Y,Z). key player1.
    player(X,Y,Z) - player(X,V,W), player_D(X,Y,Z),
    Y ltgt V. player(X,Y,Z) -
    player(X,V,W), player_D(X,Y,Z), Z ltgt W.

48
Example / 2
  • team(X,Y,Z) - team_D(X,Y,Z), not
    team(X,Y,Z). key team1.
    team(X,Y,Z) - team(X,V,W), team_D(X,Y,Z), Y ltgt
    V. team(X,Y,Z) - team(X,V,W),
    team_D(X,Y,Z), Z ltgt W.
  • coach(X,Y,Z) - coach_D(X,Y,Z), not
    coach(X,Y,Z). key coach1.
    coach(X,Y,Z) - coach(X,V,W), coach_D(X,Y,Z), Y
    ltgt V. coach(X,Y,Z) - coach(X,V,W),
    coach_D(X,Y,Z), Z ltgt W.
  • player(X,Y,Z) - player_D(X,Y,Z),
    coach(X,V,W). ED team3,coach1.
    coach(X,Y,Z) - coach_D(X,Y,Z), team(V,W,X).
    coach(X,Y,Z) - coach_D(X,Y,Z), player(X,V,W).
    team(X,Y,Z) - team_D(X,Y,Z), coach(Z,V,W).
Write a Comment
User Comments (0)
About PowerShow.com