Title: CTMS Metadata Project Challenges and Issues
1CTMS Metadata Project Challenges and Issues
- Hemant Shah M.D.
- City of Hope National Medical Center
- hshah_at_coh.org
2Original Metadata Project
- The Scope
- Create data elements for the CTMS workspace
- Approach
- Use Information Models available if none
available create one - Manual extraction of data elements
- Manual mapping of data element components to
vocabulary
3Need for Metadata Project Revision
- Models started evolving rapidly
- Tools were developed
- Processes were defined
4Revised Metadata Project Principles
- Create CDEs for focused areas
- Utilize the existing models - Work within the
ambit of caAERS-BRIDG harmonization effort - Enhance the areas in the models if required
- Follow the caCORE SDK process
- Review by the caBIG community
5Steps for Revised Metadata Project
- Step 1 Selection and Prioritization of Areas
- Step 2
- Step 2 A Selection of the Base Representation
- Step 2 B Creation of Metadata Appropriate
Representation - Step 3 caCORE SDK Process
- Step 4 VCDE Review Process
6Step 1 Selection and Prioritization of Areas
- Definition of Area
- A class or a set of closely related classes
representing a facet or a feature of the domain - Selection / Prioritization Criteria
- Classes that are or should be represented in both
models (BRIDG and caAERS) - Classes that are likely to be of interest to
other sub-domains - Classes that are more stable in the reference
model
7Initial Selected Areas
- Organization
- Person
- Study Subject
- Product
- Location
- Device
- Material
- Product
- Drug
- Ingredient
- Participation
- Observation
- Clinical Study
- Protocol
- Procedure
- Treatment
8Modeling Effort Step 2 A Selection of the
Base Representation
- The representation that is closest to being a
metadata appropriate representation will be
selected - Additional criteria for Base representation
- Expressivity/Comprehensibility
- Flexibility
- Practicality
- Reusability
9Modeling Effort Step 2 B Creation of Metadata
Appropriate Representation
- Definition of Metadata Appropriate
Representation - A UML static model that
- Meets the caCORE SDK requirements for the
Semantic Connector and UML loader tools - Will generate data elements that are
- Semantically appropriate and connected to
concepts in NCI Thesaurus - Compatible with the existing caBIG Data Standards
already approved by VCDE
10Modeling Effort Step 2 B Additional Goals
- The representation should
- Be semantically and syntactically compatible with
the reference models/larger domain models like
HL7 RIM and BRIDG - Be comprehended by the domain personnel
- Lead to data elements that are easily understood
by the domain personnel - Be suitable for being used/adapted by sub-domains
- Meet the caCORE SDK Code Generator expectations
- Retain the best features from both representations
11Stratagem - Examples
- Intermediate Representation (Buffer Zone)
- Inheritance for Reuse Vs. Composition for Reuse
- Façade design pattern
12Stratagems Intermediate Representation
Reference Model
Sub-Model
13Stratagems Intermediate Representation
Reference Model
Buffer Model
Sub-Model
14Stratagems Intermediate Representation
Reference Model
Buffer Zone
Sub-Model
15Stratagems Intermediate Representation
Reference Model
Buffer Zone
Sub-Model
16Stratagems Intermediate Representation
Advantages
- A representation that is more implementation
oriented but retains semantic links with the
implementation independent model - Delineates clearly the area where we contributed
- Creates a representation that the other
sub-domain models/implementations can utilize if
suitable - The authors of the reference model may consider
the intermediate representation when they revise
the model - If changes occur in either models the linkage can
still be maintained by making changes only in the
intermediate representation
17Stratagems Reuse by Inheritance Vs. Reuse by
Composition
Option 1 Reuse by Composition
- Problem
- Cannot get intuitive data elements like
- Subject First Name
- Subject Last Name
- But Names like
- Person Name First Name
- Person Name Last Name
18Stratagems Reuse by Inheritance Vs. Reuse by
Composition
Option 2 No Reuse
- Problem
- Each class will have to have the same attributes
repeated
19Stratagems Reuse by Inheritance Vs. Reuse by
Composition
Option 3 Reuse by Inheritance
- Each class can have the same set of attributes
without the need to repeat and the data elements
generated will be intuitive e.g., Subject First
Name, Data Manager Last Name etc.
20Stratagems Façade design pattern
- Definition
- A Class or Object provides a single point of
entry for services of a subsystem - Aim is to hide a complex system of objects behind
a single object
21Stratagems Façade design pattern
HL7 RIM Organization Related Classes
22Stratagems Façade design pattern
Transformed View of HL7 RIM Related Classes
23Stratagems Façade design pattern
24Stratagems Façade design pattern
25Step 3 caCORE SDK Process
COH
NCICB
1
Generate XMI
1 Day
2
Generate Semantic Report Using Semantic Connector
3
4
Review Annotate Semantic Report
Review Annotate Semantic Report
1 Week
3 Days
5
Generate Annotated XMI
1 Day
7
6
Generate load CDEs using UML Loader
Create submission Package
1 Day
8
CDE curation in caDSR
1 Week
26Step 4 VCDE Review Process
- The caBIG community and VCDE WS will be invited
to comment at every stage - The data elements will be submitted for the
formal VCDE review process
27(No Transcript)
28Miscellaneous Issues
- Complex datatypes
- HL7 datatypes
- Units of Measure
- Use of LOINC
29Problems with the Standard Data Elements Approach
- Fragility
- Every time the model undergoes a change, the DEs
have to be changed - The applications have to deal with two kinds of
changes the model changes, the DE changes - The domain experts spend their times deciding
validity twice - At the model level
- At the data standard level
- Potential of conflict at the two levels
- DEs are only a tunnel view of a larger picture
30Too Many Dependencies
Models
Application
Vocabulary/Ontology
Data Elements
31Common Model Repository Approach
- Have a caMSR instead of caDSR
- The application developers only identify the
attributes from the model the extent of context
information they want to include - The repository should allow negotiation of models
in a graphical manner - The data elements would be generated on-the-fly
by the client end components, from the models - New Developer tools
- Closely integrated with Enterprise Architect
- With the current functionality of SIW
- With the ability to interact directly with the
caMSR
32Common Model Repository Approach
Application
Vocabulary/Ontology