Title: Developing an Ontology-based Metadata Management System for Heterogeneous Clinical Databases
1Developing an Ontology-based Metadata Management
System for Heterogeneous Clinical Databases
- By Quddus Chong
- Winter 2002
2Outline
- Towards a clinical data warehouse
- Integrating heterogeneous data sources
- Clinical abstractions as Ontologies
- Managing database metadata
- The data mediator approach
- Using Protégé-2000
3Towards a Clinical Data Warehouse
- Clinical Data Warehousing is the application of
Data Warehousing concepts to allow clinical data
about a large patient population to be analyzed
to perform clinical quality management and
medical research. - In a data warehouse environment, data has the
following properties - Data is organized by subject, or domain-level
concepts, rather than by function. - Data from various operational systems is
integrated, by definition or by content. - Data is archived in non-volatile storage to allow
temporal analysis. - Data is recorded with a temporal dimension (e.g.
timestamp) - Data is optimized for decision making (DSS) or
analysis (OLAP).
4Integrating Heterogeneous Data Sources
- The main challenge in integrating data from
heterogeneous sources is in resolving schema and
data conflicts. - Approaches to this problem include using a
federated database architecture, or providing a
multi-database interface. These approaches are
geared more towards providing query access to the
data sources than towards supporting analysis. - Types of data integration
- Physical integration convert records from
heterogeneous data sources into a common format
(e.g. .xml). - Logical integration relate all data to a common
process model (e.g. a medical service like
diagnose patient or analyze outcomes). - Semantic integration allow cross-reference and
possibly inferencing of data with regards to a
common metadata standard or ontology (e.g. HL7
RIM, OILDAML).
5Clinical abstractions as Ontologies
- An ontology is a explicit specification of the
conceptualization of a domain. Information
models (such as the HL7 RIM) and standardized
vocabularies (such as UMLS) can be part of an
ontology. An ontology provides a core component
in a Knowledge-Based System. - In the clinical research field, ontologies have
been used in computerized guideline modeling.
This allows the development of applications to
provide recommendations (e.g. to make indications
for the use of surgical procedures), to identify
deviations in practices, and screening services
(e.g. evaluate patient eligibility). - Benefits of using ontologies include
- Facilitating sharing between systems and reuse of
knowledge - Aiding new knowledge acquisition
- Improving the verification and validation of
knowledge-based systems.
6Managing database metadata
- Metadata is the detailed description of the
instance data the format and characteristics of
the populated instance data instances and values
dependent on the requirements/role of the
metadata recipient. - Metadata is used in locating information,
interpreting information, and integrating/transfor
ming data. - Being able to maintain a well-organized and
up-to-date collection of the organizations
metadata is a great step towards improving
overall data quality and usage. However this
task is complicated by the different quality and
formats of metadata available (or not) from the
heterogeneous data sources, and the consistency
in updating existing metadata. - A common metadata architecture is essential to
keeping data manageable.
7The Data Mediator approach
- In this project, we will attempt to develop an
extensible and adaptable architecture to perform
integration of heterogeneous data sources into a
data warehouse environment using a ontology-based
data mediator approach. - The components of this architecture include
- Knowledge base stores the ontology consists
of - The abstraction model domain-level concepts
- The database description model metadata record
of data sources - The mappings model how data elements relate to
attributes in the abstraction model - The transformations model metadata of available
methods to transform data elements from one data
source to another - Data mediators provides each data source an
interface to the warehouse and resolving data
conflicts between any different representations
necessary classes generated from the ontology. - Data warehouse provides access to integrated
data for analysis and decision-making. -
8Patient model (adapted from SMI Dharma model)
- The patient-data information model defines the
classes and attributes of patient data for an
Electronic Patient Record (EPR). - The patient-data model consists of
- a Patient class whose instances hold demographic
information about specific patients - a Note_Entry class that describes qualitative
observations about patients - a Numeric_Entry class that represent results of
quantitative measurements - an Adverse_Event class that models adverse
reactions to specific substances - a Condition class that represent medical
conditions that persist over time, and two
intervention classes - Medication and Procedure, that model drugs and
other medical procedures that have been
recommended, authorized, or used. - The defining characteristic of entities in the
patient-data model is that they are assertions
about demographic and clinical conditions of
specific patients.
9Database metadata model (adapted from Critchlow
et. al.)
- The metadata model here contains the information
needed for the data integration process. - The database description model contains language
independent class definitions that closely mirror
the physical layout of a source database. In our
prototype model, the database description is
simply a class containing a set of database
entries. A model is provided for two distinct
entry-types field-entries (from flat-file data
sources) and column-entries (from relational data
sources). Entries are essentially instances of
the attribute class. - Modeling the database metadata as an ontology
provides flexibility when trying to describe
heterogeneous data sources. For instance, the
model can be easily extended to describe Native
XML databases. - How the models are used in data integration
- The source database attributes are mapped to the
appropriate abstraction characteristic through
mappings. When an abstraction defines multiple
representations for the same characteristic
attribute, transformation functions are defined
to convert between them.
10A prototype architecture
(Data Warehouse environment, e.g. SQL Server)
ontologies can be created and modified via
Protégé-2000 tool underlying format is RDF
possible use of JDBC metadata to obtain db
descriptions
Ontology Server
Source db 1
Target db
Mediator Interface 1
Abstractions
alternatively, a common metadata exchange
standard such as XMI could be used
(Relational DBMS, e.g. MySQL)
Data Descriptions
abstraction model in the ontology is extensible
to any domain
Data Mappings
Source db 2
Mediator Interface 2
Warehouse Mediator
Transformation Descriptions
(Object-Relational DBMS, e.g. Postgresql)
possible use of XSLT to perform data
transformations
XML data binding could be used to generate APIs
for data validation or transformation
key goal develop the ontology server as a
component, use EJB or .NET
11Using Protégé-2000
- Protégé-2000 is a experimental knowledge-acquisiti
on tool, written in Java, that allows users to
import, export and create their own ontologies. - The tool itself is extensible a programming
developer kit is available for instructions on
creating plug-ins - tabs - user interface between a ontology model
in Protégé and another knowledge-based
application. - slot-widget user interface for viewing and
acquiring slot values for new instances. - backend plug-ins specify the mechanism that
Protégé-2000 will use to store the ontology.
12Screenshot Creating the classes and slots of an
ontology
13Screenshot Viewing the newly created ontology
model
14References
- Pedersen T. B., Jensen C. S., Research Issues in
Clinical Data Warehousing In Proceedings of the
10th International Conference on Scientific and
Statistical Database Management, pg. 43-52, July
1998 (available online http//citeseer.nj.nec.com
/pedersen98research.html) - Critchlow T., Ganesh M., Musick R., Meta-Data
Based Mediator Generation In Proceedings of the
3rd IFCIS Conference on Cooperative Information
Systems, August 1998 (available online
http//citeseer.nj.nec.com/critchlow98metadata.htm
l) - Tu S. et. al. A Flexible Approach to Guideline
Modeling AMIA Annual Symposium, 1999 (available
online http//smi-web.stanford.edu/pubs/SMI_Abstr
acts/SMI-1999-0793.html)