Title: Global-as-View and Local-as-View for Information Integration
1Global-as-View and Local-as-Viewfor Information
Integration
- CS652 Spring 2004
- Presenter Yihong Ding
2Common Integration Architecture
- Information Integration Systems
- Global-as-view (Gav.) vs. Local-as-view (Lav.)
- Query Reformulation
- Specification of Source Description
- Adding new sources
3Query Reformulation
- Problem rewrite a user query expressed in the
mediated schema into a query expressed in the
source schema - Given a query Q in terms of the mediator schema
relations, and descriptions of information
sources - Find a query Q that uses only the source
relations, such that - Q ? Q, and
- Q provides all possible answers to Q given the
sources
4Solving Queries by Views
Mediator Relations
Source Relations
5Query Rewriting Using Views
- Query Containment q ?q ???D q(D) ?q(D)
- Query Equivalence qq?? q ?q q ?q
- Given query q and view definitions Vv1, , vn
- q is an Equivalent Rewriting of q using V if
- q refers only to views in V, and
- q q
- q is an Maximally-Contained Rewriting of q using
V if - q refers only to views in V and
- q ? q, and
- There is no rewriting q1, such that q ?q1 and
q1?q
6ComputationComplexity
7Complexity of Query Containment
- Conjunctive Queries (CQ) (NP-Complete)
- Q1 p(X,Z) - a(X,Y) a(Y,Z)
- Q2 p(X,Z) - a(X,Y) a(V,Z)
- CQs With Negation ( -Complete)
- Q1 p(X,Z) - a(X,Y) a(Y,Z) NOT a(X,Z)
- CQs With Arithmetic Comparison ( -Complete)
- Q1 p(X,Z) - a(X,Y) a(Y,Z) XltY
- Datalog Programs
- p(A,C) - a(A,B) b(B,C)
8Specification of Source Description
- Views resources that used by integrator to help
to answer queries - Gav. Mediator relation defined as view over
source relations - Lav. Source relation defined as view over
mediator relations
9Information Integration Systems
- Tsimmis
- Stanford and IBM
- Global-as-View (Gav)
- Mediator relations defined as views of source
relations - Information Manifold (IM)
- ATT
- Local-as-View (Lav)
- Description logic
- Source relations defined as views of mediator
relations ( a collection of global predictions)
10TSIMMIS Gav Solution
- The Stanford-IBM Manager of Multiple Information
Sources (TSIMMIS) - Offers
- A flexible data model
- A common query language
- Other supporting tools
11TSIMMIS Components
- OEM (Object-Exchange Model)
- LOREL (Lightweight Object REpository Language)
- MSL (Mediator Specification Language)
- Wrappers
12TSIMMIS OEM
- Object Exchange Model
- The data model for TSIMMIS
- self-describing (labels carry all of the
information that there is about an object) - Flexible
- First order logic
13TSIMMIS OEM
set or string
Object Identifier
type
value
OID
label
Human Understandable
A set or a string
14TSIMMIS OEM
library
set
book
set
author
string
Aho
title
string
Compilers
15TSIMMIS OEM
First order predicate logic
author
string
Aho
123
author( T, Aho )
This would return the object IDs of all objects
with a label author and value Aho.
16TSIMMIS LOREL
- Lightweight Object REpository Language
- An OQL for OEM
- The end-user language for TSIMMIS
17TSIMMIS LOREL
select library.book.title from library where
library.book.author Aho
18TSIMMIS LOREL
select R.A from R, S, T where R.A S.A or R.A
T.A
- This would fail to return anything in SQL if
either S or T were empty. - Because of partial match semantics this does not
fail in LOREL
19TSIMMIS MSL
- Mediator Specification Language
- Allows declarative specification of mediators
- Object oriented, logical query language
- Targeted to OEM
20TSIMMIS MSL
Query
library
set
Mediator
Mediator
book
set
author
string
Aho
Wrapper
Wrapper
Source
Source
ltbooktitle Xgt - ltlibrary ltbook lttitle Xgt
ltauthor Ahogt gt gt _at_s1
21TSIMMIS Wrappers
Query
- Wrappers are similar to database drivers
- Wrappers are written with MSL
Mediator
Mediator
Wrapper
Wrapper
Source
Source
22TSIMMIS Wrappers
MSL template // action //
ltbooks Xgt - ltlibrary Xltbook lttitle Xgt
ltauthor AUgtgt gt_at_s1 // sprintf(lookup-query,
find author s, AU) //
23TSIMMIS Summary
- End users need to specify their sources w.r.t. a
mediator model OEM in TSIMMIS - Query specification is standard LOREL
- Query rewriting is straightforward MSL and
wrappers - To add a new source is not easy need to specify
it in the mediator model
24Information Manifold
- Challenges for Information Integration
- Interrelated data over multiple information
sources - Large number of the sources
- Limited size of data in many of the sources
- Greatly variant details of interacting with each
source
25IM Architecture
26World View
Classes
Product
NewCar
Automobile
Car
Automobile
Car
Motorcycle
UsedCar
CarForSale
Virtual Relations
Product(Model) Automobile(Model, Year,
Category) Motorcycle(Model, Year) Car(Model,
Year, Category) NewCar(Model, Year,
Category) UsedCar(Model, Year, Category)
CarForSale(Model, Year, Category, Price,
SellerContact)
27Source Descriptions
- For each source
- Content Record
- Capability Record
Web Sources for Automobile Application
28Content Records of Auto Sources
29Capability Records of Auto Sources
30Query Reformulation
- Containing instead of equivalent
- Incomplete source
- Useful subset
- Utilizes Plan Generator to
- Prune irrelevant sources
- Split query into subgoals
- Generate conjunctive query plans
- Find executable ordering of subgoals
31The Bucket Algorithm
- Given user query q, source descriptions Vi
- Find relevant source (fill buckets)
- For each relation g in query q
- Find Vj that contains relation g
- Check that constraints in Vj are compatible with
q - Combine source relations Vj from each bucket
into a conjunctive query q and check for
containment (q ? q)
32The Bucket Algorithm Example
q(m,p,r) ? CarForSale(c), Category(c,sportscar),
Year(c,y), y?1992, Model(c,m), Price(c,p),
ProductReview(m,y,r)
331. Filling the Buckets
q(m,p,r) ? CarForSale(c),
Category(c,sportscar),
Year(c,y), y?1992, Model(c,m),
Price(c,p),
ProductReview(m,y,r)
342. Checking Containment
User Query q(m,p,r) ? CarForSale(c),
Category(c,sportscar),
Year(c,y), y?1992, Model(c,m),
Price(c,p),
ProductReview(m,y,r)
Result Query q(m,p,r) ? V1(c)(Category(c)sports
car, Price(c),
Model (c), Year(c),
Year(c)?1992,
Category(c)sportscar),
V5(m,y,r)(mModel(c), yYear(c),
r, ).
?
35Finding an Executable Ordering
?
V1(c)
V1(c,t)
V1(c,y)
V1(c,m)
V1(c,p)
V5(m,y,r)
36Advantages and Disadvantages
- Gav Tsimmis
- Advantage
- Query reformulation rule unfolding
- Disadvantage
- Mediation description
- Adding, removing, and modifying source
description - Better for static, centralized systems
- Lav Information Maniford
- Advantage adding new sources
- Mediator (global predicates, source descriptions)
- Query processing
- Disadvantages
- query reformulation (Bucket algorithm)
- Better for dynamic, distributed systems