Topics in distributed databases - PowerPoint PPT Presentation

1 / 14
About This Presentation
Title:

Topics in distributed databases

Description:

We need a way to integrate information from different data sources into a single ... data from various sources (hotel bookings, ticket reservations, car rental etc. ... – PowerPoint PPT presentation

Number of Views:141
Avg rating:3.0/5.0
Slides: 15
Provided by: homepage7
Category:

less

Transcript and Presenter's Notes

Title: Topics in distributed databases


1
Topics in distributed databases
  • Schema-directed XML integration
  • Robert Camilleri
  • Gerasimos Voultepsis

2
Contents
  • Problem statement
  • Project objective
  • Data extraction
  • Overview of AIG
  • AIG Implementation in Java
  • Identify two books from different sources that
    refer to the same book

3
Problem statement
  • Data exists in different formats and schemas
  • We need a way to integrate information from
    different data sources into a single predefined
    schema
  • E.g. Travel agencies need to integrate data from
    various sources (hotel bookings, ticket
    reservations, car rental etc.)

4
Project objective
  • Integrate information about books from two data
    sources and generate an XML view that conforms to
    a predefined recursive DTD

5
Project overview
6
Data Extraction
  • Retrieve data from Amazon via the Amazon
    web-service
  • Store it in a relation schema amazon_book(title,
    author, year, price ISBN)
  • Retrieve data from DBLP via an XML file
  • Store it in a relation schema

7
Overview of AIG
  • AIGs provide a schema driven data integration
    method
  • They extend a DTD with semantic attributes and
    semantic rules

8
Attribute Integration Grammar
  • DTD
  • db -gt book
  • book -gt author, title, year, publisher, price,
    ISBN
  • author-gtname, book
  • Semantic Attributes
  • Inh(db) ()
  • Inh(book) (author, title, year, publisher,
    price, ISBN)
  • Inh(author) ( Inh(book).author, Inh(book).year
    )
  • Inh(title) Inh(year) Inh(publisher)
    Inh(price) Inh(ISBN) Inh(name) (val)

9
Attribute Integration Grammar
  • Semantic rules
  • db-gtbook
  • Inh(book).(author, title, year, publisher, price,
    ISBN) lt- Q1
  • Q1 SELECT author, title, year, publisher, NULL
    as price, NULL as ISBN
  • FROM dblp_book
  • UNION
  • SELECT author, title, year, NULL, price, ISBN
  • FROM amazon_book
  • book -gt author, title, year, publisher, price,
    ISBN
  • Inh(author) ( Inh(book).author, Inh(book).year
    )
  • Inh(title).val Inh(book).title
  • Inh(year).val Inh(book).year
  • Inh(publisher) Inh(book).publisher
  • Inh(price) Inh(book).price

10
Attribute Integration Grammar
  • author-gtname, book
  • Inh(name).val Inh(author).author
  • Inh(book).author Inh(author).author
  • Inh(book).(title, year, publisher, price, ISBN)
    lt- Q2( Inh(author) )
  •  
  • Q2 SELECT title, year, publisher, NULL as price,
    NULL as ISBN
  • FROM dblp_book
  • WHERE author Inh(author).author AND year lt
    Inh(author).year
  • UNION
  • SELECT title, year, NULL, price, ISBN
  • FROM amazon_book
  • WHERE author Inh(author).author AND year lt
    Inh(author).year

11
Implementation of AIG in Java
  • Build XML root element
  • Get book information and store in a book object.
  • For each book build book element.
  • For each author build an author element
  • Get all books written by author with year lt X
  • Build book element
  • Add rest of book information.

12
Identify two books from different sources that
refer to the same book
Ideally, we could identify two tuples referring
to the same book by ISBN
13
Use the notion of data similarity
  • Use a similarity function on all common
    attributes of the different sources, e.g. title,
    author and year.
  • Use domain knowledge year should be equal,
    titles should be similar and contain common
    words, author names should be similar.
  • Score similarity between 0 and 1.
  • Use a threshold value, tuples scoring above a
    certain value are considered identical.

14
Thank you !
Write a Comment
User Comments (0)
About PowerShow.com