Title: Toward a Medical Semantic Web
1Toward a Medical Semantic Web
- Gilberto Fragoso
- Enterprise Vocabulary Services
- NCI Center for Bioinformatics and Information
Technology - The Semantic Web meets the Deep Web (SWDW'08)
- July 23, 2008
2Topics
- Background
- Mission
- NCI EVS main products
- NCIT and BGT production cycle in brief
- Tooling, Support Challenges
- Infrastructure
- Editing Support GUI (NCIEditTab)
- Classification Performance
- Explanation Facility
- Semantic Media Wiki
- BiomedGT
3EVS Mission and Products
- Services and resources that address NCI's needs
for controlled vocabulary. See
http//ncicb.nci.nih.gov/NCICB/infrastructure/caco
re_overview/vocabulary - Clinical, translational, and basic research have
overlapping but specialized terminology needs. - EVS integrates different conceptual frameworks
- Creates terminological and taxonomic conventions
across systems - Provides common terminology for annotation and
coding - Forms semantic component for Cancer
Bioinformatics Grid (caGRID) - Vocabulary Products
- NCI Thesaurus an ontology-like terminology
- NCI Metathesaurus maps vocabularies
- External vocabularies served MedDRA, HL7,
NDF-RT, LOINC, GO, Zebrafish, - BiomedGT (new)
4NCI Thesaurus
- Reference Terminology for NCI, caBIG, Partners
- A Federal Standard Terminology in some areas
- Broad coverage of the cancer research and
clinical domain including prevention and
treatment trials - Neoplastic and other Diseases
- Findings and Abnormalities
- Anatomy, Tissues, Subcellular Structures
- Agents, Drugs, Chemicals
- Genes, Gene Products, Biological Processes
- Animal Models Mouse, other
- Research techniques and management, apparatus,
clinical and lab, radiology, imagery
5NCI Thesaurus
- Published Monthly
- Public domain, open content license
- 70,000 Concepts hierarchically organized into
domains - Description-logic based
- Concept History
- Available on-line and by download (OWL, Ontylog
XML, flat files) - Accessed through caCORE 3.2 in deprecation with
Apelon DTS backend and through caCORE 4.0 and
LexBIG server
http//ncicb.nci.nih.gov/download/evsportal.jsp
6Biomedical Grid Terminology (BiomedGT)- New
- Goals
- Open, publicly accessible collaboratively
developed terminology for translational research - Concept orientation
- DL based, support reasoning by end-users
- Federated sub-ontologies
- Content maintained by experts in the relevant
research communities. - Edited in Protégé, content to be added in
multiple ways, but one way is through a semantic
media wiki
7NCIT Production Environment
Conflict Detection and Resolution
Test Environment
Classification
Release Candidate
- Workflow Manager
- Work Manager Client
- DB Schema Master Baseline
History
Baseline
Terminology Server
Work Lists
Change sets
Publishable History
History Processing and Validation
- Individual Editor
- Workflow Client
- Editing Application
- DB Schema Current NCI Baseline
Editing History
Migration to Production
Individual Baseline
Classification
Classification is performed on the client
8BGT and NCIT in OWL
- Advantages of OWL
- W3C Recommendation for Web
- Non-proprietary, semantics are published
- Disadvantages
- Nascent technology, some features for vocabulary
development not yet there - Tool support for vocabulary development
- Editors
- Classifiers and Classification Services
9Challenges
- Editing environment
- Collaboration with Stanford on Protégé/OWL,
database backend, support for imports, client
server - Dedicated GUI support (NCIEditTab)
- Classification Challenges, Clark Parsia
- Perfomance of existing classifiers, runtime
classification - More expressive DL explanation facility
- Access of one classification run to all editors
in client-server environment
10Current BGT Production Environment(and future
NCIT)
- Workflow Manager
- Prompt and Classification
- done in server
Test Environment
Publish
Release Candidate
History
Terminology Server
Conflict Detection and Resolution, and Classificat
ion
Baseline
Editing History
- Individual Editor
- Editing Application
Migration to Production
Wiki Collaborators (specific to BGT)
Classification is desirable From the client in
client-server mode
11Dedicated GUI for Protege
12(No Transcript)
13Challenges
- Editing environment
- Collaboration with Stanford on Protégé/OWL,
database backend, support for imports, client
server - Dedicated GUI support (NCIEditTab)
- Classification Challenges, Clark Parsia
- Perfomance of existing classifiers, runtime
classification - More expressive DL explanation facility
- Access of one classification run to all editors
in client-server environment
14Reasoner Engineering
- Maturing the Pellet reasoner
- Case Study NCIt Classification Services
- Prior to initial work non-terminating
- Improving resource efficiency 9 hours
- Algorithmic optimizations 5 minutes
- Incremental updates seconds
15Explaining NCIt
- Goal Improve the efficiency of editors by
identifying problems and causes - Solution automatic analysis servicesexplanation,
debugging, repair - Based on mature, formal KR
- Explanations legible to editors, not just
logicians - Increase editor confidence in toolchain
16Explanation
17Genesis of BiomedGT
NCI Thesaurus Evaluation2006-2007
- Goals
- Review and report of OBO criteria and relevant
ISO standards for semantic quality and federation
of terminologies, semantic quality and
consistency. - Review content and structure for compliance
- Document examples of how compliance would be
achieved
18Genesis of BiomedGT
Among the Recommendations
- 1) Unravel the vocabulary
- Partition into
- Words and their definitions (Lexicon /
Dictionary) - Categorization and navigational nodes (Thesauri)
- Ontology
- Identify external resources and
- Include ability to reference general upper level
ontologies - Named relationships with other external resources
- 2) Enable collaboration create environment where
SMEs can collaborate and discuss -
19Unraveling the Vocabulary
Traditional Hierarchical System
(C. Chute, Mayo Clinic)
20Unraveling the Vocabulary
in BGT
owlThing
BFO
BGT Thesaurus Nodes
BGT Word Nodes
BioTop
BGT Ontology Nodes
21Unraveling the Vocabulary
in BGT
owlThing
BFO
BGT Thesaurus Nodes
BGT Word Nodes
BioTop
Sparse Trees, Populated by the DL Classifier
BGT Ontology Nodes
Shallow Trees
22Reuse of Resources
owlThing
BFO
BGT Thesaurus Nodes
- External Namespaces, Reference (GO, ChEBI, JAX)
- External Namespaces, Modeled (CTCAE, NPO?)
BioTop
BGT Ontology Nodes
3) Internal Namespaces, Collaboratively Modeled
(NPO?)
23Enable Collaboration
Semantic MediaWiki
- Mediawiki extension
- Focus is on capturing wiki (particularly
Wikipedia) in a formal, computational fashion - Berlin CategoryCity
- ? Berlin rdftype City
- City CategoryGeographical Feature
- ? City rdfssubClasOf Geographical_Feature
- Berlin hasPopulation80,000,000
- ? ltpropertyHasPopulation rdfdatatypexmlsdoub
le"gt80000000lt/propertyHasPopulationgt
24(No Transcript)
25Category Drill-Down
26Ajax-Based Search
27Node Display
28Workflow Propose Changes
29Workflow Propose Changes
30Workflow Propose Changes
31Workflow Propose Changes
32BiomedGT Collab Cycle
Using Semantic Media Wiki And NCI Protégé
33Acknowledgements
Classification and Explanation Michael Smith,
Michael Grove, Evren Sirin (Clark Parsia,
LLC) Protégé Infrastructure Timothy Redmond,
Tania Tudorache (Stanford Medical
Informatics) Editing Plug-in, Workflow
Plug-in Bob Dionne (Dionne Assoc), David Yee
(Northrop Grumman) Semantic Media Wiki Harold
Solbrig, Russ Hamm (Apelon, Inc) Guoquian Jiang,
Deepak Sharma, Sridar Dwarkanath (Mayo
Clinic) Wilberto Garcia (Northrop Grumman) EVS
Group Sherri de Coronado, Frank Hartel, Larry
Wright, Margaret Haber, Gilberto Fragoso (NCI
CBIIT)