Information Integration - PowerPoint PPT Presentation

1 / 15
About This Presentation
Title:

Information Integration

Description:

How to capture 'navigational access' based on hyper-linked HTML pages performed ... SQL and an XML-based query over the same data (i.e., XML query against SQL data ... – PowerPoint PPT presentation

Number of Views:20
Avg rating:3.0/5.0
Slides: 16
Provided by: ralfra9
Category:

less

Transcript and Presenter's Notes

Title: Information Integration


1
Information Integration
Dagstuhl Workshop on Information
Integration April/May 2002
  • (My Personal) Workshop Summary

2
Overview
  • Supported by
  • Focus Group Results (esp. Christoph Freytag,
    Rakesh Agrawal, Gerd Stumme)
  • Panels (we all)
  • Starting PointA Look at the Original Seminar
    Proposal
  • The topics raised
  • The solutions given
  • Hot research topics
  • Closing PointThe (forseeable) truth

3
Old Keywords
4
New Keywords
InformationIntegration
ApplicationIntegration
BusinessIntegration
Data Integration
Process Integration
5
Information Integration Topic Description
  • II subsumes all technologies needed to provide
    for manipulation of information scattered over
    many data stores while supporting a single system
    image.
  • The data stores to be integrated are inherently
    heterogeneous in nature, owned by different
    organizations, and distributed over the whole
    world.
  • Data can be structured semi-structured, or
    unstructured.
  • Data access can be based on standardized
    interfaces or via proprietary APIs.
  • II is expected to become a key technology in many
    application areas like
  • product data management
  • business process management
  • enterprise application integration
  • life science
  • entertainment.

6
Discussion Areas
  • How to get Access to the various data stores?
    Different technologies like SQL/MED wrappers,
    J2EE connectors, EAI adapters, and Web Services
    can be used for these purposes.
  • When should either of these technologies be used?
  • Can they be unified?
  • What are possible System Structures?
  • Which role will database systems, application
    server, workflow systems, messaging systems,
    portal servers, etc. play?How do they relate and
    cooperate?
  • Does Web Database Technology suffice?
  • Can XML be used as the language for describing
    the integrated information base? How to capture
    navigational access based on hyper-linked HTML
    pages performed today in many application areas?
  • How to combine search and query functionality?
  • How is XML stored - sliced/diced, as whole
    document as file in file system, as whole
    document but combined with other documents in
    file system? How do you index these effectively?
    How do you combine SQL and an XML-based query
    over the same data (i.e., XML query against SQL
    data and SQL against XML)? Is a pure XML database
    the way to go or will an extended relational
    engine be the right solution?

7
Discussion Areas
  • How is information described?Which information
    qualities are needed? How can qualities be
    compared, assessed, measured,? Which metadata is
    relevant (schema, ontologies,)?
  • Which Federated Database Technologies can be
    used?What is a federated schema if structured
    and unstructured data are brought together?
    Which schema integration techniques, federated
    query and search technologies are applicable?
  • Not Discussed
  • Which Transaction Model is appropriate?Which
    transactional guarantees are needed? Which
    concurrency models, recovery models are
    applicable?
  • Not Discussed

8
Some Results/Agreements ... Definitions
  • Data Integration
  • integration schema an image presenting all
    relevant facts as one data source
  • generic functions for access and change
  • Integration different from cooperation and
    interaction
  • Integration properties are Impacted by
  • Experimental/Exploratory vs. Production
  • Exploratory
  • loosely coupled
  • fluid integration
  • Data centric
  • Production
  • Function integration
  • less flexible and often fixed

9
Some Results/Agreements ... Definitions
  • Structured object
  • ltoid, ltname, valuegtgt
  • Unstructured object
  • ltoid, wordgt
  • ltoid, unknown/complex structuregt
  • Semi-structured object
  • ltoid, ltname, valuegt, wordgt
  • ltname, valuegt pairs may be
  • Given (e.g. author, title, etc.)
  • Extracted (e.g. Date, Zipcode, etc.)
  • Inferred (e.g. Topic)

10
Some Results/Agreements ... Definitions
  • Metadata can be anything between natural language
    text and formalisms with formal semantics (e.g.
    ER models, (first order) logics, description
    logics, ontologies), including intermediate
    degrees of formality (e.g. XML, RDF)
  • For supporting II, we need more formal models
    which allow for machine manipulation
  • Ontologies are
  • data about metadata (schema for metadata)
  • Force people to use them!
  • (at least) 2 Secrets/Rumors
  • Late night tutorial for thursty people
  • (late) night tete-a-tete (maybe separe)
    tutorial

11
Some Results/Agreements ... Web Services
  • Web Services is a new model for using the Web
  • "An interface that describes a collection of
    operations that are network accessible through
    standardized XML messaging.
  • transactions initiated automatically by a
    program, not necessarily using a browser
  • can be described, published, discovered, and
    invoked dynamically in a distributed computing
    environment

12
Some Results/Agreements ...
  • Information Integration the database reaches out
  • Unstructured Data Support
  • OLAP, Mining, rich Search
  • Federated database extensions
  • Metadata management
  • SQL and XML
  • Pure XML
  • ...
  • SQL and XML and NF2-like technology
  • DBMS should reflect some semantics of the
    applications

13
Some Results Confluence of Multiple Disciplines
14
... Other Results
  • More focus on Process Models and Process
    Specifications
  • DIFF operator
  • Formal theory for process models etc.
  • More focus on Semantics
  • Much more focus on Semantics
  • ...
  • Performance
  • Much more Performance
  • ...
  • Information systems should also project into the
    future ...

15
The (forseeable) Truth
  • Heterogeneity is Fact

Integration is Fact
Make Life in Heterogeneity possible ... Easy!
Data Integration per se is not beneficial!
There has to be life beyond angular brackets!
Write a Comment
User Comments (0)
About PowerShow.com