Data Integration in Digital Libraries: Approaches and Challenges - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Data Integration in Digital Libraries: Approaches and Challenges

Description:

Dr. Ismail Khalil Ibrahim ismail.khalil-ibrahim_at_scch.at +43 7236 3343 852 www.scch.at Data Integration in Digital Libraries: Approaches and Challenges – PowerPoint PPT presentation

Number of Views:268
Avg rating:3.0/5.0
Slides: 23
Provided by: Schwinge
Category:

less

Transcript and Presenter's Notes

Title: Data Integration in Digital Libraries: Approaches and Challenges


1
Data Integration in Digital Libraries Approaches
and Challenges
  • Dr. Ismail Khalil Ibrahim
  • ismail.khalil-ibrahim_at_scch.at
  • 43 7236 3343 852www.scch.at
  • Bringing Digital Libraries together

2
Biography
Dr. Ismail Khalil Ibrahim is a senior software
develepoer and AgenCom project manager at the
Software Competence Center Hagenberg - Austria.
He worked in the University of Technology -
Baghdad Iraq from 1985-1990 as a lecturer, in
the Human Resources Training and Development
Institute - Iraq from 1990-1996 as the head of
the academic studies department, in Gadjah Mada
University from 1996-2000 as a teaching and
research assistant. His main research interests
lay in the fields of E-commerce I-Commerce,
Database Applications and Techniques for the Web,
Practical Experience and Applications in
Information Integration systems , Logic
Programming for Information Integration , Agents
for Information Retrieval and Knowledge Discovery
, XML and Semistructured Data Management ,
Information Systems Management and Development ,
Information Technology Impact, Economic
Analysis. Ismail is a member of ACM, SIGMOD,
SIGKDD, and SIGecom, general Secretary of the
Indonesian Information Society Initiative (IISI),
member of the Iraqi Engineers Association (IEA),
overseas Collaborator in the E-commerce Lab at
the National University of Singapore, editorial
Board of the Columbian Journal of Computing
Revista Colombiana de Computación, chairman of
the organizing committee of the 1st and 2nd
International Workshop on Information Integration
and Web-based Applications Services (IIWAS'99,
IIWAS'00) , Yogyakarta, Indonesia, chairman of
the organizing committee of the 3rd International
Conference on Information Integration and
Web-based Applications Services (IIWAS'2001),
Linz, Austria. Ismail holds a B.Sc. in
Electrical Engineering, from the University of
Technology, Iraq (1985), M.Sc. and Ph.D., in
Computer Eng. and Information Systems from Gadjah
Mada University (1998, 2001).
3
Outline
  • Data Integration
  • What is it ?
  • What does a data integration system look like ?
  • What are some data integration challenges?

4
What Is Data Integration?
  • Providing
  • uniform sources transparent to user
  • access query, and eventually updates
  • multiple even two is a problem
  • autonomous not effect behavior of sources
  • heterogeneous different data models, schemas
  • unstructured at least semi-structured
  • information sources not only databases

5
Example Scenario
6
Example Scenariocont.
Retrieve the titles and subjects of all the
technical reports written by (Stephane Bressan)
and published by MIT PRESS q1? amazon ?
(Title,Stephane Bressan,subject) q2?
book-a-million ? (ISBN,Title,MIT Press) Join
the results
7
So What is the Problem?
  • Virtual vs. Materialized Architectures
  • Access query or query update?
  • Problem similar to updating through views
  • need distributed transactional services
  • Mediated schema yes or no?
  • without mediated schema we lose advantages
  • mediated schema requires schema integration
  • schema integration need query transformation
  • query transformation need query optimization

8
Additional Dimensions
  • How many sources are we accessing?
  • how autonomous are the sources?
  • how much knowledge do we have about sources?
  • how structured are the data in the sources?
  • Requirements from responses
  • accuracy
  • completeness
  • machine readable vs. human readable
  • handling inconsistencies
  • speed
  • closed World Assumption vs. Open World Assumption

9
Related Technologies / Issues
  • Distributed databases
  • sources are homogeneous
  • data is distributed a priori
  • sources are not autonomous
  • Similarities at the optimization and execution
    level
  • Information retrieval
  • keyword search
  • no semantics
  • Data mining discovering properties and patterns
    in data

10
Current Applications
  • Intranets
  • enterprise data integration
  • web-site construction
  • World Wide Web
  • digital libraries
  • comparison shopping (Netbot, Junglee)
  • portals integration data from multiple resources
  • XML integration
  • Science Culture
  • medical genetics integrating genomic data
  • Astrophysics monitoring events in the sky
  • Environment puget sound regional synthesis model
  • Culture uniform access to all the cultural
    databases

11
Paradigms of Data Integration
Integration
global defined from local
global independent of local
CWA
OWA
global-schema-as-view
global-as-view- of-local
local-as-view- of-global
Database Schema Integration
Data Warehousing
Mediation
12
Paradigms of Data Integration II
  • Data Warehousing (materialization architecture)
  • data of interest is collected in a central place
    and a web site is built on top of it
  • queries are applied to the data warehouse

easy to support queries, transactions
hard to modify, the warehouse is not connected to
the providers of information, ... etc.
13
Data Warehousing Architecture
Application
Data Warehouse
Data Extraction
14
Paradigms of Data Integration III
  • Information Mediation (virtual architecture)
  • data remains in web sources
  • rules that relate external data to internal
    application

data is not replicated, data are guaranteed to be
up-to-date
query optimization and execution is more complex
15
Mediation Architecture
Application
Global Data Model
Local Data Model
16
Running Example
  • World Relations
  • Book(title,year,author,subject)
    BookYear(title,year)
  • BookRev(title,author,review)
  • Source Relations
  • DB1(title,author,year)
  • DB2(title,author,year)
  • DB3(title,review)

17
Global As View (GAV)
  • Define a global schema of objects ande write down
    rules to collect these objects
  • for each relation R in the mediated schema, we
    write a query over the sources' relations
    specifying how to obtain R's tuples from the
    sources (Query unfolding)

traditional query processing applies
requires the right sources to be avaliable and
compliant
18
Local As View (GAV)
  • For every information source (S), we write a
    query over the relations in the mediated schema
    that describes which tuples are found in S (Query
    folding or Answering Queries using Views)

may be able to answer a query based on the
avaliable partial information
generally, may not be able to answer the query
needs non standard query processing techniques
potentially high complexity
19
Challanges
  • Complexity over traditional DBs heterogeneous,
    autonomous, network-bounded surces
  • Query reformulation now understood
  • map queries over mediated schemas to wrapped
    sources (heterogeneity)
  • Issues remain in query processing
  • few statistics (autonomous sources)
  • unanticipated delays and failures
    (network-bounded sources)

20
Conclusions
  • Data integration handles many problems needed for
    embedded systems applications
  • Many data sources
  • Easy addition and deletion of sources
  • Different source capabilities
  • Dealing with network delays
  • Easy for user

21
Publications
  • Semantic Query Transformation for the Integration
    of Autonomous Information Sources (INAP99
    Tokyo)
  • IKA Unity in Heterogenity (IIWAS99
    Yogyakarta)
  • Information Reterival Agents for the Intelligent
    Integration of Information Sources (MulNet 2000 -
    Bandung)
  • A Multilingual Natural Language Interface for
    Mediating E-Commerce Product Catalogs (INAP2000
    Tokyo)
  • Semantic Query Transformation for the Intelligent
    Integration of Information Sources over the Web
    (WIIW2001 Rio de Janeiro)
  • Rewriting Rules for Semantic Query Transformation
    in E-Commerce Applications (DS9 Hong Kong)
  • Data Integration in Digital Libraries Challenges
    and Approaches (IndonesiaDL Bandung)

22
Thank you for your attention!
Write a Comment
User Comments (0)
About PowerShow.com