Title: CDISC on EVS Sept05
1 Overview for NCI Enterprise Vocabulary Services
(EVS) for the caBIG Integrative Cancer Research
Workspace 11/09/05
Sherri de Coronado, MS, MBA Frank Hartel,
PhD Margaret Haber, RN, OCN Enterprise
Vocabulary Services National Cancer Institute
2Outline
- Terminology goals and semantic integration
- NCI Enterprise Vocabulary Services
- NCI Thesaurus (NCIt)
- NCI Metathesaurus
- API Access through caCore
- Thesaurus and Metathesaurus Production Process
and Curation
3Goal Controlled Terminology for Semantic
Interoperability
- Systems cannot exchange or use information if
they use incompatible codes or tokens to signify
meaning - Terminology services provide token and codes
- Proper use of them assures consistent meaning
across the enterprise
4Supporting CORE Infrastructure
- Enterprise Vocabulary Services (EVS)
- Core Semantics for caCORE and many other
applications - Public access browsers
- APIs
- cancer Data Standards Repository (caDSR)
- ISO 11179 compliant metadata repository
- Common Data Elements (CDEs) for multiple
templates, such as Case Report Forms, linked to
EVS terminology - cancer Bioinformatics Infrastructure Objects
(caBIO) - UML Models annotated with EVS concepts/terms,
loadable into caDSR - Public access APIs
5Binding Data, Metadata to Terminology - caCORE SDK
- UML Modeling Tool (provided by user)
- Information model that will define data classes,
attributes and relationships - Semantic Connector
- Annotate UML model with ontology concepts
bridges the world of databases to that of
structured semantics. - UML Loader
- Loads model into the caDSR metadata registry
- Model and associated semantics are available at
runtime - Code Generator
- Model and a code template are inputs into
generator - Creates the caCORE-like n-tier software system
with Java and Web Services APIs
6How EVS used with caDSR
From D.Warzel
7caGrid Data Description Infrastructure Includes
EVS
- Objects defined in UML, converted into ISO/IEC
11179 Administered Components, in turn registered
in the caDSR - Object definitions draw from vocabulary
registered in the EVS, and their relationships
are thus semantically described
8 Enterprise Vocabulary Services
- Services and resources that address NCI's needs
for controlled vocabulary http//www.nci.nih.gov/
EVS - A collaboration between
- NCI Office of Communications
- Physician Data Query (PDQ), Cancer Information
Service, the NCI web portal www.cancer.gov, and
Drug dictionary - NCI Center for Bioinformatics
- Bioinformatics Core Infrastructure (caCORE),
including metadata repository (caDSR) and object
models built using EVS terminology for core
semantics
9EVS Products
- Clinical, translational, and basic research
terminology have overlapping but specialized
needs, therefore EVS assists to - Integrate different conceptual frameworks
- Create terminological and taxonomic conventions
across systems - Vocabulary Products
- NCI Thesaurus an ontology-like DL terminology
- NCI Metathesaurus based on UMLS, maps
vocabularies - External vocabularies maintained and served
MedDRA, HL7, NDF-RT, LOINC, etc.
10In caBIG, NCI EVS Goal
- Support vCDE efforts to deploy clinical,
translational, and basic research terminology - Assure that widely used terminological and
taxonomic conventions are followed - Support Workspace terminology needs
- Help find and use relevant existing terminology
- Incorporate needed terminology in EVS
- Provide liaison between NCI and caBIG Workspaces
11NCI Thesaurus (NCIt)
- Reference Terminology for NCI, Partners
- A Federal Standard Terminology
- Broad coverage of the cancer research and
clinical domain including prevention and
treatment trials - Neoplastic and other Diseases
- Findings and Abnormalities
- Anatomy, Tissues, Subcellular Structures
- Agents, Drugs, Chemicals, Combo Chemo
- Genes, Gene Products, Biological Processes
- Animal Models Mouse, other
- Research techniques and management, apparatus,
clinical and lab, radiology, imaging
12NCI Thesaurus (2)
- Published Monthly
- Public domain, open content license
- Available on-line and by download (OWL, Ontylog
XML, flat files) - 48,000 Concepts hierarchically organized into
trees - Description-logic based
- Roles establish machine readable semantic
relationships between Concepts, ex. - Carcinoma Disease_Associated_with_Disease
Lytic Bone Lesions, - TP53 Gene_associated_with_Disease Breast
Carcinoma
13- NCI Thesaurus is Deployed
- http//nciterms.nci.nih.gov
- http//www.nci.nih.gov/EVS
- (full documentation)
- API caCORE public access
- Fulfills NCI and collaborators needs for
controlled vocabulary - Public domain, open content license
14(No Transcript)
15Disease, Disorder or Finding (multiple
inheritance)
Disease, Disorder (7000 concepts)
Organized by Site or by Type
Finding (Cancer TNM Finding, Morphologic Finding,
Lesion, Clinical Course of Disease, etc.)
16Concept History
Draw Graph
Roles
Termtypes and Source
17Concept History Stage_IV_T-Cell_Non-Hodgkin_s_Lymp
homa (code C8668)
Merge or Split would result in reference code
18Graph Selected Roles, distance n2
19Links to other sources
20(No Transcript)
21Listserv
Download
22NCI Metathesaurus
- Filtered UMLS Metathesaurus extended with
additional required vocabularies - 930,000 concepts, 2,200,000 terms and phrases
with definitions - Mappings among over 50 vocabularies
- Extensive synonymy Over 40,000 terms for
neoplasms mapped to 7,000 concepts - Used as online dictionary and thesaurus, for
mapping and document indexing
23NCI Metathesaurus (2)
- Minor releases monthly, Major releases twice a
year. (Soon to be updated monthly with NCI
Thesaurus) - Provides a mapped overlap and partial
inter-relation of current versions of NCI and
partner required vocabularies, ex. The ICDs,
MedDRA, SNOMED, MeSH, HCPCS (procedures), LOINC
(lab values), drug terminologies (VA NDF-RT, AOD,
RxNORM, Multum, NCI Thesaurus drugs, etc.)
24(No Transcript)
25http//ncimeta.nci.nih.gov
NCI added sources in red
26Advanced Search
Search by CUI or Code
Search by Source
27EVS Products Services Are Open
- NCI Thesaurus is Open Content ftp//ftp1.nci.nih.g
ov/pub/cacore/EVS/ThesaurusTermsofUse.htm - NCI Metathesaurus is Mostly Open Source
- See Each Sources License http//ncimeta.nci.nih.
gov/MetaServlet/GenerateSourcesServlet - NCI EVS Servers Are Freely Accessible
- On the Web
- Via API
- All Software Developed by NCI EVS is Public Open
Source and Free for the Asking
http//nciterms.nci.nih.gov and
http//ncimeta.nci.nih.gov
http//ncicb.nci.nih.gov/core/caBIO
http//ncicb.nci.nih.gov/core
28API Access to EVS
Info on how to get concept information from
either NCIt or NCI Meta avail in caCORE 2.0 User
Guide
29Access to NCI Thesaurus via APIs
30Notional Java Code Assuming a property called
FDA_Table and our wanting to find the table
contents for the Dosage Form table, DRG-00201
- Gilbertos example code would go here
31EVS Curation
- NCI Thesaurus
- Domain editors
- Two internal baselines per month
- One baseline for publication monthly with history
file - Placed on test server
- Pre-Thesaurus as well
- Editing in progress to fix errors, meet end user
needs, support caDSR/ UML model annotation
- NCI Metathesaurus
- Moving to monthly updates with NCI Thesaurus in
MEME - Insertion/ updates of additional sources as
needed, including UMLS and special sources like
MedDRA - Matching and Merging/ QC processes/ production to
Meta test server, promoted to production
32TerminologyDevelopment Guidelines
- Develop a content model for a domain
- Leverage existing sources where appropriate
- (VA NDF-RT, RxNorm, LOINC, etc. )
- Develop unique content where needed
- (Cancer genes and diagnoses, drugs and therapies,
molecular abnormalities, clinical trial standard
terminology etc.) - Link to other information sources and standards
using URLs as possible - (GO, Swissprot, drug formularies, trial
protocols) - Federate, merge or map with other standard
terminology for semantic integration
33EVS Collaborations
- Many Active Collaborations
- Federal FDA, VA, CDC, and Various NIH Institutes
such as NHLBI, NIDCR - Major Standards Organizations HL7, CDISC, W3C,
FHA - Cancer Centers and Cancer Cooperative Groups
(caBIG, caGRID) - Numerous Research collaborators such as the
Microarray Gene Expression Data Society (MGED
Ontology, FuGO)
34ContactsSherri de Coronado, MS, MBANCI
Center for Bioinformaticsdecorons_at_mail.nih.govF
rank Hartel, PhDNCI Center for
Bioinformaticshartel_at_mail.nih.govMargaret
Haber, RN, OCNNCI Office of Communicationsmhaber
_at_mail.nih.gov