Answer Set Programming and Data Integration Systems

1 / 54
About This Presentation
Title:

Answer Set Programming and Data Integration Systems

Description:

... does it relate to host rock structures? Information. Integration. Geologic Map ... What is the cerebellar distribution of rat proteins with more than 70 ... – PowerPoint PPT presentation

Number of Views:53
Avg rating:3.0/5.0
Slides: 55
Provided by: Ste60

less

Transcript and Presenter's Notes

Title: Answer Set Programming and Data Integration Systems


1
Answer Set Programming and Data Integration
Systems
  • S. Costantini
  • Università degli Studi di LAquila
  • stefcost_at_di.univaq.it

2
A Traditional Database Architecture
Query (SQL)
Database Manager (DBMS) -Storage mgmt
-Query processing -View management
-(Transaction processing)
Database (relational)
Answer (relation)
3
An Online Shoppers Information Integration
Problem
El Cheapo Where can I get the cheapest copy
(including shipping cost) of Wittgensteins
Tractatus Logicus-Philosophicus within a week?
One-World Mediation
4
A Home Buyers Information Integration Problem
What houses for sale under 500k have at least 2
bathrooms, 2 bedrooms, a nearby school ranking
in the upper third, in a neighborhood with
below-average crime rate and diverse population?
Multiple-Worlds Mediation
5
A Geoscientists Information Integration Problem
What is the distribution and U/ Pb zircon ages of
A-type plutons in VA? How about their 3-D
geometry ? How does it relate to host rock
structures?
Complex Multiple-Worlds Mediation
6
A Neuroscientists Information Integration Problem
What is the cerebellar distribution of rat
proteins with more than 70 homology with human
NCS-1? Any structure specificity? How about other
rodents?
Complex Multiple-Worlds Mediation
7
What is a Data Integration System?
A system providing
  • Uniform (same query interface to all sources)
  • Access to (queries eventually updates too)
  • Multiple (we want many, but 2 is hard too)
  • Autonomous (DBA doesnt report to you)
  • Heterogeneous (data models are different)
  • Structured (or at least semi-structured)
  • Data Sources (not only databases).

8
Past
  • User must know which sites have relevant info
  • User must go to each one in turn
  • Slow Sequential access takes time
  • Confusing Each site has a different interface
  • User must manually integrate information

9
Perspective
  • Intelligent agents such that
  • User says what she wants
  • Agent decides how when to achieve it

10
References
  • Data Integration System Group at the Dipartimento
    di Informatica e Sistemistica, Univ. Of Roma La
    Sapienza (Prof. Maurizio Lenzerini)
  • INFOMIX information integration, database
    theory, computational logic and deductive
    databases, database implementation and
    optimization, and logic-based software agent
    technology.
  • University of Calabria (Italy), Vienna
    University of Technology (Austria), University of
    Rome La Sapienza (Italy), Rodan Systems
    (Poland).
  • Università degli Studi di Roma "Tor Vergata"
    Dipartimento di informatica, sistemi e
    produzione (Prof. Maria Teresa Pazienza)
  • Database Group of Microsoft Research (Phil
    Bernstein)
  • Answer Set Programming in Data Integration
    Systems, Leopoldo Bertossi, Loreto Bravo
    (Carleton University, Ottawa, Canada) Jan
    Chomichi (University of Buffalo, USA)

11
Integration vs. Distributed DBMS
  • No common schema
  • Sources with heterogeneous schemas
  • Semi-structured sources
  • Legacy Sources
  • Not necessarily relational
  • Access/process limitations
  • Autonomous sources
  • Uncontrolled source content overlap
  • Unpredictable run-time behavior
  • Makes query execution hard

12
Building a Data Integration System
  • Create a middleware mediator or data
    integration system over the sources
  • Can be virtual or a data wharehouse
  • Presents a uniform query interface and schema
  • Abstracts away multitude of sources consults
    them for relevant data

13
Data Warehouse Architecture
OLAP / Decision support/ Data mining
User queries
(Relational?) database (warehouse)
Data extraction, cleaning/ scrubbing
Data extraction programs
Data source
Data source
Data source
14
OLTP vs. OLAP
  • OLTP On-Line Transaction Processing
  • Many short transactions (queries updates)
  • Examples
  • Update account balance
  • Enroll in course
  • Add book to shopping cart
  • Queries touch small amounts of data (one record
    or a few records)
  • Updates are frequent
  • Concurrency is biggest performance concern
  • OLAP On-Line Analytical Processing
  • Long transactions, complex queries
  • Examples
  • Report total sales for each department in each
    month
  • Identify top-selling books
  • Count classes with fewer than 10 students
  • Queries touch large amounts of data
  • Updates are infrequent
  • Individual queries can require lots of resources

15
Data Warehouse
Enterprise Database
Customers
Orders
Transactions
Vendors
Etc
Etc
  • Data Miners
  • Farmers they know
  • Explorers - unpredictable

Copied, organized summarized
Data Warehouse
Data Mining
16
Architecture for Virtual Integration
  • Leave the data in the sources.
  • When a query comes in
  • Determine which sources are relevant to query.
  • Break query into sub-queries for each source.
  • Get answers from sources
  • Combine.
  • Data is fresh.
  • Challenge performance.

17
Virtual Integration Architecture
User queries
Mediated schema
Reformulator
Mediator
Optimizer
Data source catalog
Execution engine
wrapper
wrapper
wrapper
Data source
Data source
Data source
Sources can be relational, hierarchical
(IMS), structured files, web sites.
18
Mediated Integration
  • User poses queries in terms of global schema
  • Relationship between global schema and local,
    source schemas specied in the mediator, as source
    descriptions
  • Mediator is responsible of solving problems of
    data
  • redundancy
  • complementarity
  • inconsistency sources, independently, may be
    consistent, but together, possibly not. E.g.,
    Same ID card number may be assigned to different
    people in different sources

19
XML-Based Mediator Architecture
USER/Client
Query Q ( G (S1,..., Sk) )
Integrated Global (XML) View G
Integrated View Definition G(..)? S1(..)Sk(..)

MEDIATOR
(XML) Queries Results
(XML) View
(XML) View
(XML) View
wrappers implemented as web services
Wrapper
Wrapper
Wrapper
S1
S2
Sk
20
Wrapper Programs
  • Task
  • to communicate with the data sources and do
    format translations.
  • Built w.r.t. a specific source.
  • Can sit either at the source or mediator.
  • Often hard to build
  • (very little science).
  • Can be intelligent
  • perform source-specific optimizations.
  • Exploit ontologies to relate analogous domains

21
Example
ltbgt Introduction to DB lt/bgt ltigt Phil Bernstein
lt/igt ltigt Eric Newcomer lt/igt Addison Wesley,
1999 ltbookgt lttitlegt Introduction to DB
lt/titlegt ltauthorgt Phil Bernstein
lt/authorgt ltauthorgt Eric Newcomer
lt/authorgt ltpublishergt Addison Wesley
lt/publishergt ltyeargt 1999 lt/yeargt lt/bookgt
Transform
into
22
Semantic Mappings between Schemas
  • Source schemas XML DTDs

house
address
num-baths
contact-info
agent-name agent-phone
1-1 mapping
non 1-1 mapping
house
location contact
full-baths
half-baths
name phone
23
Accessing Sources via Wrappers
SELECT address, tel FROM Restaurant WHERE cuisine
chinese
Chinois, 2720 Main St, 310-777-9876 Peking Star,
1 Broad St, 213-999-7676 .....
24
Wrapper Generation
  • Approaches to automating the process
  • Exploit format information (structure, HTML etc.
    )
  • Template based approaches
  • Machine learning techniques
  • (e.g., shallow NLP where NLP Natural
    Language Processing)
  • LIXTO semi-automated wrapper generation
  • Elog prolog patterns
  • HTML ? operator ?Elog ? XML

25
Semantic Web
  • The semantic-web initiative attempts to
    automate schema mapping
  • Idea Allow pages to write logical axioms
    relating their vocabulary (tags) to other
    external tags
  • Support automatic inference of relations between
    source and mediator schema using these rules

26
Query Model in Virtual Integration
  • User formulates query in terms of his/her
    ontology on the mediated (or global) schema
  • System reformulates queries in terms of
    sub-queries for each source (local schema)
  • Structure of the query model should be more
    intuitive for the user

27
Reformulation Problem
  • Given
  • A query Q posed over the mediated schema
  • Descriptions of the data sources
  • Find
  • A query Q over the data source relations, such
    that
  • Q provides only correct answers to Q, and
  • Q provides all possible answers from to Q given
    the sources.

28
Deductive Databases
  • Tables viewed as predicates.
  • Ops on tables expressed as datalog rules
  • (Horn clauses, without function symbols)
  • Enames(Name) - Employe(Name, SSN) Projection
  • Wealthy-Employee(Name)-
  • Employee(Name,SSN), Salary(SSN,Money),Moneygt
    10

  • Selection
  • Ed(Name, Dname)-
  • Employee(Name, SSN), E_Dependents(SSN,
    Dname)

  • Join
  • ERelated(Name,Dname) - Ed(Name,Dname)
  • ERelated(Name,Dname) - Ed(Name,D1),
    ERelated(D1,D2)

  • Recursion

29
More datalog terminology
  • A datalog program is a set of datalog rules.
  • A program with one rule is a conjunctive query.
  • We distinguish EDB predicates and IDB predicates
  • EDBs are stored in the database,
  • appear only in rule bodies
  • IDBs are intensionally defined,
  • appear in both bodies and heads.

30
Approaches to Specifying Schema Descriptions
  • Global-as-View express the mediated schema
    relations as a set of views over the data source
    relations
  • Local-as-View express the source relations as
    views over the mediated schema.

31
GaV an example
32
Approaches for Mapping
33
Global-as-view vs. Local-as-view
  • Global-as-view approach
  • Each item in Global schema/ontology as a view
    (query) over source schemas/ontologies
  • query(G) query(f(S1, S2, , Sn))
  • Local-as-view approach
  • Each source as a view/query over global
    schema/ontology
  • query(G) query(f1-1 (S1), f2-1(S2), , fn-1
    (Sn))

34
GAV vs. LAV
  • Not modular
  • Addition of new sources changes the schema
    mapping
  • Can be awkward to write mediated schema without
    loss of information
  • Query reformulation easy
  • Often reduces to view unfolding (polynomial)
  • Can build hierarchies of mediated schemas
  • Best when
  • Few, stable, data sources
  • well-known to the mediator (e.g. corporate
    integration)
  • Modular--adding new sources is easy
  • Very flexible--power of the entire query language
    available to describe sources
  • Reformulation is hard
  • Involves answering queries only using views (can
    be intractable)
  • Best when
  • Many, relatively unknown data sources
  • possibility of addition/deletion of sources

35
Data Integration Global-as-View in Answer Set
Programming
  • S. Costantini
  • Università degli Studi di LAquila
  • stefcost_at_di.univaq.it

36
Computing Answers in Data Integration Systems
  • Answer set programming (ASP) gives a semantics to
    datalog (logic) programs with negation (and
    possibly disjunction in the heads)
  • ASP-based specification of a data integration
    system
  • ASP-based specification of extensions (plausible
    answers, repairs)

37
Computing Answers in Data Integration Systems
  • Methodology works for first-order queries (and
    Datalog extensions), and many ICs
  • LAV Bravo and Bertossi, IJCAI03
  • GAV Costantini and Formisano, ASP03
  • (experiments and a working program)

38
GaV in ASP
  • Instead of unfolding w.r.t. each query, helper
    model so as to get a general inference engine
  • Easily express integrity constraints
  • Cope with incomplete information (also resulting
    form adding new sources)

39
GaV an example
40
Front-end Global Schema through a Helper Model
  • person(X)- entity(1,X). meta-concepts naming
  • organization(X)- entity(2,X).
  • member(X,Y)- relat(1,X,Y).
  • student(X)- entity(3,X).
  • university(X)- entity(4,X).
  • enrolled(X,Y)- relat(2,X,Y).
  • age(X,Y)- attr(1,X,Y).
  • Constraints on student and enrolled
  • enrolledE(X)- student(X),university(Y),enrolled(X
    ,Y).
  • - not student(X),enrolled(X,Y).
  • - student(X),not enrolledE(X).

41
Back-end Mapping to the Source Schemas
  • entity(1,X)- s(1,X).
  • entity(2,X)- s(2,X).
  • entity(4,X)- s(5,X).
  • student is described implicitly as the domain
    of an unknown relationship
  • entity(3,X)- s(3,X,Y).
  • student is described implicitly as the domain
    of one of the sources of enrolled
  • entity(3,X)- s(4,X,Z).

42
Back-end Mapping to the Source Schemas
  • Relationships
  • relat(1,X,Y)- s(7,X,Z),s(8,Z,Y).
  • relat(2,X,Y)- s(4,X,Y).
  • relat(2,X,Y)-s(9,X,Y).
  • attr(1,X,Y)- s(3,X,Y).
  • attr(1,X,Y)- s(6,X,Y,Z).

43
Helper Model Relational Inference Engine
  • Meta-rules for is_a chaining
  • relat(M,X,Y)- schema(N,E1,E2),
    entity(E1,X), entity(E2,Y),
    is_a_r(N,M), relat(N,X,Y).
  • entity(M,X)- is_a_e(N,M), entity(N,X).

44
GaV an example
45
Dealing with Incomplete Information
  • Assume to add a new source for Enrolled, that in
    the back-end becomes, e.g.,
  • relat(2,X,Y)- s(4,X,Y).
  • relat(2,X,Y)- s(9,X,Y).
  • No connection to student but, one who is
    enrolled to a University should be a student

46
Dealing with Incomplete Information (cont. 1)
ASP constraints
  • Constraints on student and enrolled
  • enrolledE(X)- university(Y),enrolled(X,Y).
  • Student mandatory in Enrolled
  • - student(X),not enrolledE(X).
  • One if enrolled is a student
  • - not student(X),enrolled(X,Y).

47
Dealing with Incomplete Information (cont. 2)
Generating a Search Space
  • specifies possibly incomplete entities
  • ... student(X)- entity(3,X).
  • incomplete(3). Student
  • Any individual might occur in
  • an incomplete entity
  • entity(N,X)- incomplete(N), not noentity(N,X).
  • noentity(N,X)- incomplete(N), not entity(N,X).

48
Dealing with Incomplete Information (cont. 2)
Pruned Search space
  • One who is enrolled is a student
  • Instance of a well-known problem in GaV
  • Not solvable with simple unfolding
  • Simply (though not efficiently) solved in ASP

49
Dealing with Incomplete Information
50
Dealing with Incomplete Information (cont. 1)
51
Dealing with Incomplete Information (cont. 2)
52
Dealing with Incomplete Information (cont. 3)
53
Dealing with Incomplete Information (cont. 4)
54
Observations and Issues
55
The End, thank you for your attention!
Write a Comment
User Comments (0)