Title: Tools for Integrating Heterogeneous Data Sources from a User Perspective
1Tools for Integrating Heterogeneous Data Sources
from a User Perspective
Jie Bao, Changhui Yan, Doina Caragea, Vasant
Honavar Artificial Intelligence Research
Laboratory Department of Computer Science Iowa
State University, Ames, IA 50011-1040,
USA baojie,yan330,dcaragea,honavar_at_cs.iastate.ed
u
Data Source 1
Data Source 2
- Data Sources could be structurally and
semantically heterogeneous - Structural commitment( schema, how the data is
organized) - Semantics commitment (content ontology, how the
value in the data source will be explained). - Ontology the formal representation of sharable
knowledge about the data source. - Eg Can a Graduate be explained as a Student
? - Eg How Globin and AlphaProtein are
related? - User perspective is the user view of
knowledge and data in the data sources. Mapping
need to be specified between data sources and
user perspective.
Ontology O2
Ontology O1
Why Ontology
Schema 2
Schema 1
User Perspective
User Schema
User Ontology
- Ontology-extended data source is a tuple of
ltD,S,O gt where D is the actual data, S is the
schema, and O is the associated ontology for the
content of D. - Schema E A1t1 Antn, where A1,,An are
attributes of E from (datatype or abstract) type
domain t1 tn - Ontology is the definition of relation of
subsets of ?D and ?I , where - Datatype domain ?D is the set for all datatypes
- Eg. (Integer)D lt ?D , (String)D lt ?D
lt means subset - AVH and DAG Attribute Value Hierarchy and
Directed Acyclic Graph are ordering (S,lt) on a
partially ordered set S. (AVH)D lt ?D , (DAG)D
lt ?D . Real world example GeneOntology (DAG),
SCOP (AVH) - Abstract Domain ?I is the set of all classes,
eg. Woman, Husband, Couple - Interoperation Constraints mapping constraints
among ontologies, eg Equal Into Onto
Compatible Incompatible - Conversion Functions value mapping for data
types, eg C (F-32)5/9 - All of them could be translated into Description
Logic, thus also RDF/OWL
DAG
AVH
How to represent
A
B
We developed the INDUS Ontology-Extended Data
Source Editor . It could define data types (such
as AVH), schema, interoperation constraints and
conversion functions. More details on
http//boole.cs.iastate.edu9090/wikiont/DSEditor.
html
Case Study
Mappings between two ontologies
Weather Data Sources
Protein Structure Family SCOP and CATH
A conversion function
SOFG2004 , Standards and Ontologies For
Functional Genomics 2 , University of
Pennsylvania , Philadelphia, Pennsylvania Oct
23-26, 2004