Title: Querying Heterogeneous Information Sources Using Source Descriptions
1Querying Heterogeneous Information Sources Using
Source Descriptions
- Authors Alon Y. Levy
- Anand Rajaraman
- Joann J. Ordille
- Presenter Yihong Ding
2Challenges for Information Integration
- Interrelated data over multiple information
sources - Large number of the sources
- Limited size of data in many of the sources
- Greatly variant details of interacting with each
source
3IM Architecture
4IM World View
Classes
Product
NewCar
Automobile
Car
Automobile
Car
Motorcycle
UsedCar
CarForSale
Virtual Relations
Product(Model) Automobile(Model, Year,
Category) Motorcycle(Model, Year) Car(Model,
Year, Category) NewCar(Model, Year,
Category) UsedCar(Model, Year, Category)
CarForSale(Model, Year, Category, Price,
SellerContact)
5 Source Descriptions
- For each source
- Content Record
- Capability Record
Web Sources for Automobile Application
6 Content Records of Auto Sources
7Capability Recordsof Auto Sources
8Query Reformulation
- Containing instead of equivalent
- Incomplete source
- Useful subset
- Utilizes Plan Generator to
- Prune irrelevant sources
- Split query into subgoals
- Generate conjunctive query plans
- Find executable ordering of subgoals
9The Bucket Algorithm
- Given user query q, source descriptions Vi
- Find relevant source (fill buckets)
- For each relation g in query q
- Find Vj that contains relation g
- Check that constraints in Vj are compatible with
q - Combine source relations Vj from each bucket
into a conjunctive query q and check for
containment (q ? q)
10The Bucket Algorithm Example
q(m,p,r) ? CarForSale(c), Category(c,sportscar),
Year(c,y), y?1992, Model(c,m), Price(c,p),
ProductReview(m,y,r)
111. Filling the Buckets
q(m,p,r) ? CarForSale(c),
Category(c,sportscar),
Year(c,y), y?1992, Model(c,m),
Price(c,p),
ProductReview(m,y,r)
122. Checking Containment
User Query q(m,p,r) ? CarForSale(c),
Category(c,sportscar),
Year(c,y), y?1992, Model(c,m),
Price(c,p),
ProductReview(m,y,r)
Result Query q(m,p,r) ? V1(c)(Category(c)sports
car, Price(c),
Model (c), Year(c),
Year(c)?1992,
Category(c)sportscar),
V5(m,y,r)(mModel(c), yYear(c),
r, ).
?
13Finding an Executable Ordering
?
V1(c)
V1(c,t)
V1(c,y)
V1(c,m)
V1(c,p)
V5(m,y,r)
14Experimental Results
- Query 1 Find titles and years of movies
featuring Tom Hanks - Query 2 Find titles and reviews of movies
featuring Tom Hanks - Query 3 Find telephone number(s) for Alaska
Airlines
15Conclusions
- Source descriptions as content record and
capability record - Bucket algorithm for query reformulation