Title: COI: Brainstorming Document
1COI Brainstorming Document
2Feedback from
- Jyotishman Pathak
- Dan Russler
- Matt Moores
- Alan Ruttenberg
- Parsa Mirhaji
- Lee Feigenbaum
- Ronan Fox
- Rachel Richesson
- Susie Stephens
- Eric Prudhommeaux
3Goal
- To demonstrate the value of semantic web (SW)
specifications in bridging the divide between
clinical practice and clinical research - Collaborative development of a proof of concept
(POC) that demonstrates key value propositions of
using SW specifications - To get buy-in from a wide variety of stakeholders
as a prelude to acceptance and adoption of
semantic web specifications - Get buy-in for the use case
- Get wide participation and involvement of the key
stakeholders in various stages of analysis,
design and development of the POC. - Re-use existing standards, terminologies, data
and information models of existing communities to
increase the probability of adoption.
4Methodology
- The group has been functioning in a consensus
driven manner where opinions are sought at each
step from all the stakeholders and a decision is
taken based on the consensus so created. - It was realized that a critical success factor
was to incorporate the views of various
communities at each step.
5Decision 1 Use Case
- Use Case Development was lead by Rachel Richesson
- Wide variety of use cases were investigated and
discussed - Patient Recruitment
- Adverse Event Detection
- Tracking Patient Through a Clinical Trial
- Decision Focus on Patient Recruitment
- Data was assumed to be in an EMR.
6Re-use of existing Information Models
- How can we re-use EMR data for Clinical Research?
- HL7/RIM/DCM descriptions may be viewed as a
format for Clinical Research Data. - Typically clinical data in healthcare delivery
systems and applications is represented or
transformed into this format - CDISC/SDTM description may be viewed as a
format for Clinical Research questions. - Typically clinical data in clinical trials
systems and application is represented or
transformed into this format? - Can we ask questions in one format when the
data represented in another format? - How can we implement functionality to map across
these formats? - The mapping module should be flexible to
incorporate extensions in the formats - The mapping module should be flexible to plug
and play with multiple formats
7Decision 2 Information Models
- Wide variety of Information Models were
considered - HL7/RIM
- CDISC/SDTM
- Detailed Clinical Models from Intermountain
Healthcare - Galen
- POMR Ontology (Chimezie)
- Eligibility Criteria Ontology (Helen)
- Healthcare Delivery Encounter-based Meta Model
(Parsa Mirhaji) - Conclusions
- No one ontology/information model is likely to
fit the bill - Align as closely as possible to existing
information model and terminology standards as
possible - Identify gaps and inadequacies in addressing the
use case at hand. - Provide feedback to standards groups CDISC,
HL7/RIM, BRIDG - Decision Use CDISC/SDTM, Detailed Clinical
Models, HL7/RIM as seed ontologies to begin
with - Iteratively refine them as gaps and inadequacies
are discovered
8Demonstrate Re-use
- Re-use of data from the EMR for Clinical Research
- Re-use of existing vocabularies, e.g., NCI
Thesaurus, Snomed, MedDRA - Re-use of pre-existing information models e.g.,
HL7/RIM/DCM, SDTM - Identify and Re-use software components that can
be used to enable a wide range of use cases - Patient Recruitment
- Adverse Drug Event Detection
- Tracking a Patient through a Clinical Trial
- Develop the POC based on an implementation of
these re-usable components - Components that implement mapping
- Components that implement data retrieval
- Components that implement wrappers/trasnformations
- Components that implement checking for elgibility
criteria, adverse events and other clinical
events of significance.
9Decision 3
- Decided to implement POC on a real world data set
as opposed to a synthetically created data set.
This raised the following issues - What would be an appropriate seed Information
Model/Ontology to describe healthcare data based
on current state. - What are appropriate terminologies (e.g., Snomed,
LOINC, RxNorm) that need to be considered to
capture coded information in healthcare data
based on current state - Parsa Mirhaji provided the data and his feedback
was crucial in identifying the appropriate seed
model/terminology
10Decision 4
- Based on discussions with W3C folks such as Ralph
Swick, Karen Myers, Steve Bratt, Eric P. - W3C is interested in working with external
standards bodies such as HL7 and CDISC and
express their content using Semantic Web
specification such as RDF and OWL Steve Bratt
at the Bio IT World Luncheon - Implication about W3C being a content neutral
- Bron Kisler emphasized that since W3C is
providing only the languages, a collaboration
would be synergistic and would make sense - Is it possible to develop a collaborative
interest group with involvement of HL7, CDISC and
others conversation with Ralph Swick
11One proposed Solution Architecture
Protocol Specification Interface
Mapping Module
RDF Transformation Engine
Eligibility Checking Module
CTMS
EMR System
12Decision 5
- Current State Assumptions
- Information Models and Vocabularies used in
Clinical Trials Context are different from those
used in the Healthcare Delivery context - Emphasis on the mapping aspect
- Support Plug and Play of different Information
Models and Vocabularies - Technology Choices
- SPARQL
- N3 rules
13Mapping Module
- Critical component of the key goal of this
effort. - i.e., To gain acceptance from a wide variety of
stakeholders in the healthcare and clinical
trials space. - HL7/RIM/DCM seek alignment with healthcare
standards - CDISC/SDTM seek alignment with clinical trials
standards - Develop Mappings across these two models
- Identify limitations and gaps across these models
- Scope
- Focus only on those data items that are required
for patient recruitment - Focus only on those data items that are related
to diabetes and hypertension - To be driven in some part by mock diabetes and
hypertension records
14Use Case Step Through
- Clinical Trial Administrator uses the Protocol
Specification Interface to specify the
eligibility criteria. The data items are
specified using elements from the SDTM model. - The mapping module translates the data items to
the appropriate HL7/RIM/DCM representation. - Appropriate queries are made to the
Mediator/Gateway module. - The Mediator/Gateway module translates the query
into the underlying database query language. The
query is executed at the database and sent to the
mapping module. - The mapping module retranslates the data into
terms from the SDTM model. - The Eligibility Checking Module checks which
patients satisfy the eligibility criteria. - The selected patients are returned to the
Clinical Trial Administrator - Note Some eligibility criteria may not be
expressible using SPARQL queries and may required
rules, etc.
15Next Steps Narrow Scope for Implementation
- Choose a protocol for implementation 8 (second
one) - Limit Scope to Medications, Lab Tests and Vital
Signs - Develop Clinical Trials Ontology and Clinical
Practice Ontology - Iterative development
- Alignment with standards as closely as possible
- Implement RDF data store based on data
requirements and mock patients - Implement Mapping module using N3 rules
- Implement Eligibility checking module using
SPARQL - Try to demonstrate another use case for Adverse
Drug Event Detection.
16Specification of Eligibility Criteria
- Assume we will use an ontology or rule-based tool
to specify eligibility criteria - Open to NLP/Ontology-based approaches that
translate free text clinical protocol
specifications that transform these into a
structured form - Examples
- Type 1 diabetes and/or history of ketoacidosis
- History of long-term therapy with insulin (gt30
days) within the last year
17Eligibility Criteria Specification
- The functional requirements for this need to be
identified and speced out. For e.g., - Temporal Constraints
- Trends on clinical data and values
-
- Out of Scope for POC.
- May want to see if the CT or HC communities have
done some work on standards for specifying
eligibility criteria. - Out of Scope for POC
18Design Choice Eligibility Criteria as a layer
around Data Items
- Data Items
- Problem Type 2 Diabetes
- History of Problem Ketoacidosis
- History of Therapy
- Name Insulin
- Length X days
- Time Period Date1, Date2
- Eligibility Criteria
- Rule conditions
- Patient has Type 2 Diabetes
- Patient has History of Ketoacidosis
- Patient has History of Therapy
- Name Insulin
- X gt 30 days
- Time Period lt 1 year
19Mappings Goal/Methodology
- Characterize the various data items required for
patient recruitment (modulo scope) ? List of
requirements on the data content Tab - http//spreadsheets.google.com/ccc?keypINNryLt_vy
DiPyHj11WiDghlen_USpli1 - For each data item do the following
- Identify the RIM/DCM construct(s) that models
that data item ? DCM column under Models - Identify the SDTM construct(s) that models that
data item ? SDTM column under Models - Identify the terminologies that model some of the
values required ? Terminology Columns including
Snomed, MedDRA and NCI Thesaurus - Identify the data types and values that
characterize the values of some of the data items
? Data Types and Units columns including those
for RIM and SDTM - We will be considering various constructs of
HL7/RIM, Detailed Clinical Models and other
models in conjunction
20Consider a Data Item Example
- History of Therapy
- Name Insulin
- Length of Therapy 100 days
- StartDate Date
- EndDate Date
21Mapping Methodology
- Identify Information Model Elements
- Therapy gt
- SubstanceAdministration (HL7/RIM)
- effectiveTime
- statusCode
- Medication (subClass of ManufacturedMaterial,
HL7/RIM) - Specific type of Participation called Consumable
(HL7/RIM) - Insulin gt
- Medication.Name (HL7/RIM)
- Identify Controlled Vocabularies
- Medication.Name gt Controlled Vocabulary RxNorm
(also known as Terminology Binding) - Identify Data Types
- Dates and Times gt TS data type in HL7
- Identify Units
- Included in the definition of data types taken
from the UCUM standard
22Mapping Methodology (Continued)
- Mappings between Information Model elements
- SystolicBP ?VSTEST, VSTESTCD SYSBP
- Mappings between controlled vocabularies
- SystolicBP ? Some Snomed Concept
- SYSBP ? Some NCI Thesaurus Concept
- Some Snomed Concept ? Some NCI Thesaurus
Concept - Between Data Types and Units
- HL7PQ ? VSRESU
23Design Choice Leverage Existing Implementations
and Systems
- SHER System
- Re-use the Reasoner to compute eligibility
criteria - Semantic DB System
- Re-use the NLP parser (if available) to parse the
textual representation of the clinical trials
criteria into structured queries, rules, whatever
24Technical Eligibility Criteria in OWL
- Patientthat (hasProblem some DiabetesType 1
or hasHistory some Ketacidosis)and hasTherapy
(some Therapy that hasLength
all intgt30 and hasTimePeriod all
intlt 365) - Just for illustration purposes need thorough
and detailed analysis to get it right.
25Technical Design Eligibility Criteria using Rules
- IF (the_patient.hasProblem DiabetesType 1
- OR the_patient.hasHistory Ketacidosis)
- AND the_patient.hasHistory.name Insulin
- AND the_patient.hasHistory.length gt 30
- AND the_patient.hasHistory.timePeriod lt 365
- THEN
- the_patient is eligible for the clinical trial
26Mapping Design Issues
- Are mappings always 1-1?
- Is it always possible to get synonym mappings?
- What happens to these mappings when there are
changes in the information models? - Are these mappings enough to enable a
bi-directional flow through between EMR and
Clinical trials data?