Title: Data Integration under the Schema Tuple Query Assumption
1Data Integration under the Schema Tuple Query
Assumption
- Michael Minock
- The University of Umeå, Sweden
2Introduction
- Problem
- Queries may be over information that is not (yet)
covered by the data integration system - List museums in Vienna or Bratislava holding
paintings by Klimt or Picasso. - A purely extensional response misleads
- Solution
- Give available extension, but contextualize with
intensional descriptions of coverage - Certain The following are all the museums in
Vienna that hold paintings of Picasso - Possible The following museums in Vienna do not
provide inventory records, so they may have
paintings by Klimt - Incomplete There is no information for museums
in Bratislava.
3Approach
- LAV (Local as View) architecture
- user queries and data source descriptions
restricted to schema tuple queries in L (or Q) - currently sources must contain complete and
correct views - broker mediates user query over sources and
supplies a mixed extensional/intensional response - Use algebraic properties of L (or Q) to derive
- query plan (using cache)
- logical descriptions of certain, uncertain and
incomplete sets - Exploit subsumption properties for
- query simplification
- natural language generation
4The Schema Tuple Query Languages L (and Q)
- Assumptions
- L Tuple relational queries of the form
- Q
- Properties
- L and Q decidable for satisfiability
- Unlike , Q closed over negation
- May calculate difference and intersection and
decide containment, equivalence and disjointness
for queries built using L and Q
5Example Art museum domain
QUERY List museums in Vienna or Bratislava
holding paintings by Klimt or Picasso.
Artist(id, name, country, DOB,DOD)
Museum (id, name, address, city, country)
Painting (id, title,year, artistId)
HasPainting (museumId, paintingId)
Central European Museums
MAK Inventory
Picasso Locator
Albertina Inventory
6Example Input Expressions
(m Museum (IN m city ("Vienna" "Bratislava"))
( (y1 y2 y3) (HasPainting y1) (Painting y2)
(Artist y3) ( m id y1 museumId) ( y1
paintingId y2 id) ( y2 artistId y3 id) (IN
y3 name ("Klimt" "Picasso"))))
(h HasPainting ( (y1 y2) (Painting y1)
(Artist y2) ( h paintingId y1 id) (
y1 artistId y2 id) ( y2 name "Picasso"))))
(m Museum (IN m city ("Vienna" "Prague
"Berlin ))))
(h HasPainting ( (y1) (Museum y1) (
h museumId y1 id) ( y1 name "MAK") ( y1
city "Vienna"))))
(h HasPainting ( (y1) (Museum y1) (
h museumId y1 id) ( y1 name Albertina")
( y1 city "Vienna"))))
7Example Output Expressions
(m Museum ( m city Vienna") ( (y1 y2 y3)
(HasPainting y1) (Painting y2) (Artist y3)
( m id y1 museumId) ( y1 paintingId y2 id)
( y2 artistId y3 id) ( y3 name "Picasso")))
(m Museum ( m city Vienna") (IN m name
(Albertina MAK)) ( (y1 y2 y3)
(HasPainting y1) (Painting y2) (Artist y3) (
m id y1 museumId) ( y1 paintingId y2 id) (
y2 artistId y3 id) ( y3 name "Klimt")))
Certain
(m Museum ( m city Vienna") (NOT_IN m name
(Albertina MAK)) ( (y1 y2 y3)
(HasPainting y1) (Painting y2) (Artist y3) (
m id y1 museumId) ( y1 paintingId y2 id) ( y2
artistId y3 id) ( y3 name "Klimt")))
Uncertain
(m Museum ( m city "Bratislava") ( (y1 y2
y3) (HasPainting y1) (Painting y2) (Artist
y3) ( m id y1 museumId) ( y1 paintingId y2
id) ( y2 artistId y3 id) (IN y3 name
("Klimt" "Picasso"))))
Incomplete
8Example To Natural Language
QUERY List museums in Vienna or Bratislava
holding paintings by Klimt or Picasso.
Museums in Vienna named Albertina or MAK
that have paintings by Klimt.
Certain
Museums in Vienna that have paintings by Picasso
Museums in Vienna not named Albertina or MAK
that have paintings by Klimt.
Uncertain
Incomplete
Museums in Bratislava that have paintings by
Picasso or Klimt.
9Pros and cons of L and Q
- Pros
- May represent n-ary relations
- Direct translation to SQL!
- Some negation
- General cyclic queries
- The artists without paintings in a museum in
the country of their origin. - Cons
- No projection!
- Certain quantifier prefixes prohibited
- The artists with paintings in all of the museums
in the country of their origin
10Next STEP
- STEP 1.0 (Schema Tuple Expression Processor)
- Incomplete and/or incorrect source views
- Real applications
Datasource Descriptions
Phrasal Lexicon
Cache DB
Broker
NLG
Differencing Engine/Simplifier
L2DomainCalculus
SPASS theorem prover