Title: Information Integration
1Information Integration
Dagstuhl Workshop on Information
Integration April/May 2002
- (My Personal) Workshop Summary
2Overview
- Supported by
- Focus Group Results (esp. Christoph Freytag,
Rakesh Agrawal, Gerd Stumme) - Panels (we all)
- Starting PointA Look at the Original Seminar
Proposal - The topics raised
- The solutions given
- Hot research topics
- Closing PointThe (forseeable) truth
3Old Keywords
4New Keywords
InformationIntegration
ApplicationIntegration
BusinessIntegration
Data Integration
Process Integration
5 Information Integration Topic Description
- II subsumes all technologies needed to provide
for manipulation of information scattered over
many data stores while supporting a single system
image. - The data stores to be integrated are inherently
heterogeneous in nature, owned by different
organizations, and distributed over the whole
world. - Data can be structured semi-structured, or
unstructured. - Data access can be based on standardized
interfaces or via proprietary APIs. - II is expected to become a key technology in many
application areas like - product data management
- business process management
- enterprise application integration
- life science
- entertainment.
6Discussion Areas
- How to get Access to the various data stores?
Different technologies like SQL/MED wrappers,
J2EE connectors, EAI adapters, and Web Services
can be used for these purposes. - When should either of these technologies be used?
- Can they be unified?
- What are possible System Structures?
- Which role will database systems, application
server, workflow systems, messaging systems,
portal servers, etc. play?How do they relate and
cooperate? - Does Web Database Technology suffice?
- Can XML be used as the language for describing
the integrated information base? How to capture
navigational access based on hyper-linked HTML
pages performed today in many application areas? - How to combine search and query functionality?
- How is XML stored - sliced/diced, as whole
document as file in file system, as whole
document but combined with other documents in
file system? How do you index these effectively?
How do you combine SQL and an XML-based query
over the same data (i.e., XML query against SQL
data and SQL against XML)? Is a pure XML database
the way to go or will an extended relational
engine be the right solution?
7Discussion Areas
- How is information described?Which information
qualities are needed? How can qualities be
compared, assessed, measured,? Which metadata is
relevant (schema, ontologies,)? - Which Federated Database Technologies can be
used?What is a federated schema if structured
and unstructured data are brought together?
Which schema integration techniques, federated
query and search technologies are applicable? - Not Discussed
- Which Transaction Model is appropriate?Which
transactional guarantees are needed? Which
concurrency models, recovery models are
applicable? - Not Discussed
8Some Results/Agreements ... Definitions
- Data Integration
- integration schema an image presenting all
relevant facts as one data source - generic functions for access and change
- Integration different from cooperation and
interaction - Integration properties are Impacted by
- Experimental/Exploratory vs. Production
- Exploratory
- loosely coupled
- fluid integration
- Data centric
- Production
- Function integration
- less flexible and often fixed
9Some Results/Agreements ... Definitions
- Structured object
- ltoid, ltname, valuegtgt
- Unstructured object
- ltoid, wordgt
- ltoid, unknown/complex structuregt
- Semi-structured object
- ltoid, ltname, valuegt, wordgt
- ltname, valuegt pairs may be
- Given (e.g. author, title, etc.)
- Extracted (e.g. Date, Zipcode, etc.)
- Inferred (e.g. Topic)
10Some Results/Agreements ... Definitions
- Metadata can be anything between natural language
text and formalisms with formal semantics (e.g.
ER models, (first order) logics, description
logics, ontologies), including intermediate
degrees of formality (e.g. XML, RDF) - For supporting II, we need more formal models
which allow for machine manipulation - Ontologies are
- data about metadata (schema for metadata)
- Force people to use them!
- (at least) 2 Secrets/Rumors
- Late night tutorial for thursty people
- (late) night tete-a-tete (maybe separe)
tutorial
11Some Results/Agreements ... Web Services
- Web Services is a new model for using the Web
- "An interface that describes a collection of
operations that are network accessible through
standardized XML messaging. - transactions initiated automatically by a
program, not necessarily using a browser - can be described, published, discovered, and
invoked dynamically in a distributed computing
environment
12Some Results/Agreements ...
- Information Integration the database reaches out
- Unstructured Data Support
- OLAP, Mining, rich Search
- Federated database extensions
- Metadata management
- SQL and XML
- Pure XML
- ...
- SQL and XML and NF2-like technology
- DBMS should reflect some semantics of the
applications
13Some Results Confluence of Multiple Disciplines
14... Other Results
- More focus on Process Models and Process
Specifications - DIFF operator
- Formal theory for process models etc.
- More focus on Semantics
- Much more focus on Semantics
- ...
- Performance
- ...
- Information systems should also project into the
future ...
15The (forseeable) Truth
Integration is Fact
Make Life in Heterogeneity possible ... Easy!
Data Integration per se is not beneficial!
There has to be life beyond angular brackets!