Global-as-View and Local-as-View for Information Integration - PowerPoint PPT Presentation

1 / 36
About This Presentation
Title:

Global-as-View and Local-as-View for Information Integration

Description:

Global-as-View and Local-as-View for Information Integration CS652 Spring 2004 Presenter: Yihong Ding Common Integration Architecture Information Integration Systems ... – PowerPoint PPT presentation

Number of Views:80
Avg rating:3.0/5.0
Slides: 37
Provided by: osmCsByu1
Category:

less

Transcript and Presenter's Notes

Title: Global-as-View and Local-as-View for Information Integration


1
Global-as-View and Local-as-Viewfor Information
Integration
  • CS652 Spring 2004
  • Presenter Yihong Ding

2
Common Integration Architecture
  • Information Integration Systems
  • Global-as-view (Gav.) vs. Local-as-view (Lav.)
  • Query Reformulation
  • Specification of Source Description
  • Adding new sources

3
Query Reformulation
  • Problem rewrite a user query expressed in the
    mediated schema into a query expressed in the
    source schema
  • Given a query Q in terms of the mediator schema
    relations, and descriptions of information
    sources
  • Find a query Q that uses only the source
    relations, such that
  • Q ? Q, and
  • Q provides all possible answers to Q given the
    sources

4
Solving Queries by Views
Mediator Relations
Source Relations
5
Query Rewriting Using Views
  • Query Containment q ?q ???D q(D) ?q(D)
  • Query Equivalence qq?? q ?q q ?q
  • Given query q and view definitions Vv1, , vn
  • q is an Equivalent Rewriting of q using V if
  • q refers only to views in V, and
  • q q
  • q is an Maximally-Contained Rewriting of q using
    V if
  • q refers only to views in V and
  • q ? q, and
  • There is no rewriting q1, such that q ?q1 and
    q1?q

6
ComputationComplexity
7
Complexity of Query Containment
  • Conjunctive Queries (CQ) (NP-Complete)
  • Q1 p(X,Z) - a(X,Y) a(Y,Z)
  • Q2 p(X,Z) - a(X,Y) a(V,Z)
  • CQs With Negation ( -Complete)
  • Q1 p(X,Z) - a(X,Y) a(Y,Z) NOT a(X,Z)
  • CQs With Arithmetic Comparison ( -Complete)
  • Q1 p(X,Z) - a(X,Y) a(Y,Z) XltY
  • Datalog Programs
  • p(A,C) - a(A,B) b(B,C)

8
Specification of Source Description
  • Views resources that used by integrator to help
    to answer queries
  • Gav. Mediator relation defined as view over
    source relations
  • Lav. Source relation defined as view over
    mediator relations

9
Information Integration Systems
  • Tsimmis
  • Stanford and IBM
  • Global-as-View (Gav)
  • Mediator relations defined as views of source
    relations
  • Information Manifold (IM)
  • ATT
  • Local-as-View (Lav)
  • Description logic
  • Source relations defined as views of mediator
    relations ( a collection of global predictions)

10
TSIMMIS Gav Solution
  • The Stanford-IBM Manager of Multiple Information
    Sources (TSIMMIS)
  • Offers
  • A flexible data model
  • A common query language
  • Other supporting tools

11
TSIMMIS Components
  • OEM (Object-Exchange Model)
  • LOREL (Lightweight Object REpository Language)
  • MSL (Mediator Specification Language)
  • Wrappers

12
TSIMMIS OEM
  • Object Exchange Model
  • The data model for TSIMMIS
  • self-describing (labels carry all of the
    information that there is about an object)
  • Flexible
  • First order logic

13
TSIMMIS OEM
set or string
Object Identifier
type
value
OID
label
Human Understandable
A set or a string
14
TSIMMIS OEM
library
set
book
set
author
string
Aho
title
string
Compilers
15
TSIMMIS OEM
First order predicate logic
author
string
Aho
123
author( T, Aho )
This would return the object IDs of all objects
with a label author and value Aho.
16
TSIMMIS LOREL
  • Lightweight Object REpository Language
  • An OQL for OEM
  • The end-user language for TSIMMIS

17
TSIMMIS LOREL
  • Example

select library.book.title from library where
library.book.author Aho
18
TSIMMIS LOREL
  • Partial Match Semantics

select R.A from R, S, T where R.A S.A or R.A
T.A
  • This would fail to return anything in SQL if
    either S or T were empty.
  • Because of partial match semantics this does not
    fail in LOREL

19
TSIMMIS MSL
  • Mediator Specification Language
  • Allows declarative specification of mediators
  • Object oriented, logical query language
  • Targeted to OEM

20
TSIMMIS MSL
Query
library
set
Mediator
Mediator
book
set
author
string
Aho
Wrapper
Wrapper
Source
Source
ltbooktitle Xgt - ltlibrary ltbook lttitle Xgt
ltauthor Ahogt gt gt _at_s1
21
TSIMMIS Wrappers
Query
  • Wrappers are similar to database drivers
  • Wrappers are written with MSL

Mediator
Mediator
Wrapper
Wrapper
Source
Source
22
TSIMMIS Wrappers
  • Wrappers have the form

MSL template // action //
  • Example

ltbooks Xgt - ltlibrary Xltbook lttitle Xgt
ltauthor AUgtgt gt_at_s1 // sprintf(lookup-query,
find author s, AU) //
23
TSIMMIS Summary
  • End users need to specify their sources w.r.t. a
    mediator model OEM in TSIMMIS
  • Query specification is standard LOREL
  • Query rewriting is straightforward MSL and
    wrappers
  • To add a new source is not easy need to specify
    it in the mediator model

24
Information Manifold
  • Challenges for Information Integration
  • Interrelated data over multiple information
    sources
  • Large number of the sources
  • Limited size of data in many of the sources
  • Greatly variant details of interacting with each
    source

25
IM Architecture
26
World View
Classes
Product
NewCar
Automobile
Car
Automobile
Car
Motorcycle
UsedCar
CarForSale
Virtual Relations
Product(Model) Automobile(Model, Year,
Category) Motorcycle(Model, Year) Car(Model,
Year, Category) NewCar(Model, Year,
Category) UsedCar(Model, Year, Category)
CarForSale(Model, Year, Category, Price,
SellerContact)
27
Source Descriptions
  • For each source
  • Content Record
  • Capability Record

Web Sources for Automobile Application
28
Content Records of Auto Sources
29
Capability Records of Auto Sources
30
Query Reformulation
  • Containing instead of equivalent
  • Incomplete source
  • Useful subset
  • Utilizes Plan Generator to
  • Prune irrelevant sources
  • Split query into subgoals
  • Generate conjunctive query plans
  • Find executable ordering of subgoals

31
The Bucket Algorithm
  • Given user query q, source descriptions Vi
  • Find relevant source (fill buckets)
  • For each relation g in query q
  • Find Vj that contains relation g
  • Check that constraints in Vj are compatible with
    q
  • Combine source relations Vj from each bucket
    into a conjunctive query q and check for
    containment (q ? q)

32
The Bucket Algorithm Example
q(m,p,r) ? CarForSale(c), Category(c,sportscar),
Year(c,y), y?1992, Model(c,m), Price(c,p),
ProductReview(m,y,r)
33
1. Filling the Buckets
q(m,p,r) ? CarForSale(c),
Category(c,sportscar),
Year(c,y), y?1992, Model(c,m),
Price(c,p),
ProductReview(m,y,r)
34
2. Checking Containment
User Query q(m,p,r) ? CarForSale(c),
Category(c,sportscar),
Year(c,y), y?1992, Model(c,m),
Price(c,p),
ProductReview(m,y,r)
Result Query q(m,p,r) ? V1(c)(Category(c)sports
car, Price(c),
Model (c), Year(c),
Year(c)?1992,
Category(c)sportscar),
V5(m,y,r)(mModel(c), yYear(c),
r, ).
?
35
Finding an Executable Ordering
?
V1(c)
V1(c,t)
V1(c,y)
V1(c,m)
V1(c,p)
V5(m,y,r)
36
Advantages and Disadvantages
  • Gav Tsimmis
  • Advantage
  • Query reformulation rule unfolding
  • Disadvantage
  • Mediation description
  • Adding, removing, and modifying source
    description
  • Better for static, centralized systems
  • Lav Information Maniford
  • Advantage adding new sources
  • Mediator (global predicates, source descriptions)
  • Query processing
  • Disadvantages
  • query reformulation (Bucket algorithm)
  • Better for dynamic, distributed systems
Write a Comment
User Comments (0)
About PowerShow.com