Title: Cornerstone I: Representing Knowledge
 1Cornerstone I Representing Knowledge
- From Data to Knowledge Through Concept-Oriented 
 Terminologies
- James J. Cimino
2The first step on the path to knowledge is 
getting things by their right names.
  3Overview
- What is data to knowledge? 
- Knowledge representation choices 
- Knowledge-based terminology efforts 
- Medical Entities Dictionary 
- Proof of concepts
4What is data to knowledge?
- Start with patient data in the medical record 
- Enhance knowledge by 
- gaining a better understanding of the patient 
- learning relevant knowledge 
- bringing smart systems to bear to apply knowledge 
- discovering new knowledge from health data
5Knowledge Representation
- Terminology for representing symbols 
- Format for arranging the symbols
6Knowledge Representation Choices
  7Guideline Implementation
- Starren and Xie, SCAMC, 1994 
- National Cholesterol Education Panel Guideline
8National Cholesterol Education Panel Guideline
Measure Cholesterol  Assess Risk Factors 
 9Guideline Implementation
- Starren and Xie, SCAMC, 1994 
- National Cholesterol Education Panel Guideline
- Three representations 
- PROLOG (first-order logic)
10NCEP Guideline in PROLOG
- rule_j(PID)- 
-  check_lab(PID,hdl,HDL,_),!, 
-  HDL gt 35, 
-  total_risk(PID,Risk),!, 
-  Risk lt 2, 
-  check_lab(PID,cholesterol), C,_), 
-  C gt 200, 
-  C lt 239, 
-  print_rule_j.
11Guideline Implementation
- Starren and Xie, SCAMC, 1994 
- National Cholesterol Education Panel Guideline 
- Three representations 
- PROLOG (first-order logic)
12NCEP Guideline in CLASSIC
- (CL-DEFINE-CONCEPT C-PATIENT 
-  (AND 
-  (ALL CHOL 
-  (AND INTEGER 
-  (MIN 200) (MAX 239))))) 
- (CL-DEFINE-CONCEPT G-PATIENT 
-  (AND C-PATIENT LOW-RISK-PATIENT 
-  (ALL HDL (AND INTEGER (MIN 35)))))
13Guideline Implementation
- Starren and Xie, SCAMC, 1994 
- National Cholesterol Education Panel Guideline 
- Three representations 
- PROLOG (first-order logic) 
- CLASSIC (frames)
14NCEP Guideline in CLIPS
- (defrule C2G2J Rules to reach box J 
-  ?f1 lt- (calculated-patient (state c) 
-  (done no) (hdl ?hdl) (name ?name) 
-  (test (gt ?hdl 35)) 
-  gt 
-  (printout Patient  ?name needs treatment)
15Guideline Implementation
- Starren and Xie, SCAMC, 1994 
- National Cholesterol Education Panel Guideline 
- Three representations 
- PROLOG (first-order logic) 
- CLASSIC (frames) 
- CLIPS (production rules)
- All three representations proved adequate for 
 encoding the guideline
16Knowledge Representation Choices
  17Terminology Representation Choices
  18Frame-Based Representation
- Serum Glucose Test 
-  is-a Lab Test 
-  Measures Glucose 
-  Specimen Serum 
-  Units mg/dl
19Terminology Representation Choices
Terminology Representation Choices
  20Semantic Network Representation
Serum Glucose Test 
 21Terminology Representation Choices
Terminology Representation Choices
- Frame-based 
- Semantic network
22Conceptual Graph Representation
- Serum Glucose Test - 
-  (is-a) -gt Lab Test 
-  (measures) -gt Glucose 
-  (specimen) -gt Serum
23Terminology Representation Choices
Terminology Representation Choices
- Frame-based 
- Semantic network 
- Conceptual graphs
24Knowledge Representation Choices
- Guideline implementation 
- Terminologic knowledge
25Knowledge Representation
- Terminology for representing symbols 
- Format for arranging the symbols
- Terminology and format for representing 
 terminologic knowledge
26Knowledge-Based Terminology Efforts
- Jochen Bernauer, SCAMC, 1991
27Jochen Bernauer, SCAMC, 1991
- Conceptual graphs to model findings
28Knowledge-Based Terminology Efforts
- Jochen Bernauer, SCAMC, 1991
- Rector, Nolan and Glowinski, SCAMC, 1993
29Rector, Nolan and Glowinski, SCAMC, 1993
- GALEN project 
- conditions grammatically haveLocation bodyparts 
- fractures sensibly haveLocation bones 
- femurs sensiblyAndNecessarily haveDivision neck
30Knowledge-Based Terminology Efforts
- Jochen Bernauer, SCAMC, 1991 
- Rector, Nolan and Glowinski, SCAMC, 1993
- Campbell and Musen, SCAMC, 1993
31Campbell and Musen, SCAMC, 1993
- Conceptual graphs and SNOMED 
- Pain  Chest  Radiation to  Left  Arm
Pain -
 (located in) -gt Chest (radiating to) -gt 
Arm -gt (with laterality) -gt Left 
 32Knowledge-Based Terminology Efforts
- Jochen Bernauer, SCAMC, 1991 
- Rector, Nolan and Glowinski, SCAMC, 1993 
- Campbell and Musen, SCAMC, 1993
- Lindberg, Humphreys, McCray, Methods 1993
33Lindberg, Humphreys, McCray, Methods 1993 
- Unified Medical Language System
Concept
Lexical group 
String
String 
 34Knowledge-Based Terminology Efforts
- Jochen Bernauer, SCAMC, 1991 
- Rector, Nolan and Glowinski, SCAMC, 1993 
- Campbell and Musen, SCAMC, 1993 
- Lindberg, Humphreys, McCray, Methods 1993
- Rocha, Huff, et al., CBM, 1994
35Rocha, Huff, et al., CBM, 1994
- VOSER 
- A server architecture for managing terminologic 
 knowledege
36Knowledge-Based Terminology Efforts
- Jochen Bernauer, SCAMC, 1991 
- Rector, Nolan and Glowinski, SCAMC, 1993 
- Campbell and Musen, SCAMC, 1993 
- Lindberg, Humphreys, McCray, Methods 1993 
- Rocha, Huff, et al., CBM, 1994
- Campbell, Cohn, Chute, et al., SCAMC 1996
37Campbell, Cohn, Chute, et al., SCAMC 1996
- Convergent Medical Terminology 
- SNOMED/Kaiser/Mayo 
- Galapagos
38Knowledge-Based Terminology Efforts
- Jochen Bernauer, SCAMC, 1991 
- Rector, Nolan and Glowinski, SCAMC, 1993 
- Campbell and Musen, SCAMC, 1993 
- Lindberg, Humphreys, McCray, Methods 1993 
- Rocha, Huff, et al., CBM, 1994 
- Campbell, Cohn, Chute, et al., SCAMC 1996
- Brown, ONeil and Price, Methods, 1997
39Brown, ONeil and Price, Methods, 1997
- Read Codes 
- Representation with GALEN model
40Knowledge-Based Terminology Efforts
- Jochen Bernauer, SCAMC, 1991 
- Rector, Nolan and Glowinski, SCAMC, 1993 
- Campbell and Musen, SCAMC, 1993 
- Lindberg, Humphreys, McCray, Methods 1993 
- Rocha, Huff, et al., CBM, 1994 
- Campbell, Cohn, Chute, et al., SCAMC 1996 
- Brown, ONeil and Price, Methods, 1997
- Spackman, Campbell, and Côte, SCAMC 1997
41Spackman, Campbell, and Côte, SCAMC 1997
- SNOMED RT (Reference Terminology) 
- Convergent Medical Terminology 
- Description Logic Format
42Knowledge-Based Terminology Efforts
- Jochen Bernauer, SCAMC, 1991 
- Rector, Nolan and Glowinski, SCAMC, 1993 
- Campbell and Musen, SCAMC, 1993 
- Lindberg, Humphreys, McCray, Methods 1993 
- Rocha, Huff, et al., CBM, 1994 
- Campbell, Cohn, Chute, et al., SCAMC 1996 
- Brown, ONeil and Price, Methods, 1997 
- Spackman, Campbell, and Côte, SCAMC 1997
- Huff, Rocha, McDonald, et al., JAMIA 1998
43Huff, Rocha, McDonald, et al., JAMIA 1998
- Logical Observations, Identfiers, Names and Codes 
 (LOINC)
- 4764-5  GLUCOSE3H POST 100 G GLUCOSE PO  SCNC 
 PT  SER/PLAS  QN
44Knowledge-Based Terminology Efforts
- Jochen Bernauer, SCAMC, 1991 
- Rector, Nolan and Glowinski, SCAMC, 1993 
- Campbell and Musen, SCAMC, 1993 
- Lindberg, Humphreys, McCray, Methods 1993 
- Rocha, Huff, et al., CBM, 1994 
- Campbell, Cohn, Chute, et al., SCAMC 1996 
- Brown, ONeil and Price, Methods, 1997 
- Spackman, Campbell, and Côte, SCAMC 1997 
- Huff, Rocha, McDonald, et al., JAMIA 1998
- Pharmacy system knowledge base vendors
45Pharmacy System Knowledge Base Vendors
Country-Specific Packaged Product
Ingredient
Manufactured Components
Composite Trademark Drug 
 46Knowledge-Based Terminology Efforts
- Jochen Bernauer, SCAMC, 1991 
- Rector, Nolan and Glowinski, SCAMC, 1993 
- Campbell and Musen, SCAMC, 1993 
- Lindberg, Humphreys, McCray, Methods 1993 
- Rocha, Huff, et al., CBM, 1994 
- Campbell, Cohn, Chute, et al., SCAMC 1996 
- Brown, ONeil and Price, Methods, 1997 
- Spackman, Campbell, and Côte, SCAMC 1997 
- Huff, Rocha, McDonald, et al., JAMIA 1998 
- Pharmacy system knowledge base vendors
47Medical Entities Dictionary (MED)
- New York Presbyterian Hospital 
- 60,000 concepts (procs, results, drugs, probs) 
- 208,242 synonyms 
- 84,677 hierarchical links 
- 113,906 semantic links 
- 238,040 other attributes 
- 66,404 translations (ICD9-CM, LOINC, MeSH, UMLS)
48Central Controlled Terminology 
 49MED Data Structures
  50MED Semantic Network
Medical Entity
Plasma Glucose 
 51MED Data Structures
  52MED MUMPS Global
- med(1600) ltSERUM GLUCOSE MEASUREMENTgt 
- med(1600,1) ltC0202041gt 
-  . . ,4) lt32703,50000gt 
-  . . ,5) ltgt 
-  . . ,6) ltSerum Glucose Measurementgt 
-  . . ,7) ltgt 
-  . . ,8) lt1724gt 
-  . . ,12) ltGLUCgt 
-  . . ,14) lt169gt 
-  . . ,16) lt31987gt 
-  . . ,17) ltmg/dlgt 
-  . . ,20) ltC000006gt 
-  . . ,23) lt1178gt 
-  . . ,50) ltSerum Glucosegt 
-  . . ,138) lt40444,40445,40446,59165gt 
-  . . ,156) ltMCNCgt 
-  . . ,161) ltQNgt 
53MED Data Structures
- Semantic network 
- MUMPS global
54MED DB2 Tables 
 55MED Data Structures
- Semantic network 
- MUMPS global 
- DB2
56MED UNIX Data Structure
- 1600SERUM GLUCOSE MEASUREMENT 1C020241432703
 45000012GLUC17mg/dl........
57MED Data Structures
- Semantic network 
- MUMPS global 
- DB2 
- UNIX
58Proof of Concepts
- Merging data and application knowledge
59Merging Data and Application Knowledge
- Class-based, reusable lab summaries
Chem20 Display
Serum Glucose Test
Fingerstick Glucose Test
Plasma Glucose Test 
 60DOP Summary 
 61WebCIS Summary 
 62Merging Data and Application Knowledge
- Class-based, reusable lab summaries
Chem20 Display
Serum Glucose Test
Fingerstick Glucose Test
Plasma Glucose Test
- Expert system for application maintenance
63Proof of Concepts
- Merging data and application knowledge
- Smarter retrievals from the record
64Smarter Retrievals from the Record
- Repository stores events and results 
- Clinical problems at a different level of 
 granularity
- Re-use knowledge to map from problems to clinical 
 data
- Produce problem-specific views of the medical 
 record
65Concept-oriented (Heart)
Radiology 2/28/96 Head CT
Lab 12/28/96 Sickle Cell Test
Admission 3/14/96 Stroke
Lab 1/1/99 Blood Type Test
Radiology 2/1/97 Knee X Ray
Admission 2/14/98 Angina
Discharge 1/15/99 CHF
Radiology 2/23/99 Chest X Ray
Lab 1/1/99 Cardiac Enzyme Test 
 66(No Transcript) 
 67(No Transcript) 
 68(No Transcript) 
 69(No Transcript) 
 70Proof of Concepts
- Merging data and application knowledge 
- Smarter retrievals from the record
71Just-in-time Education
- Medline button 
- Infobuttons
72(No Transcript) 
 73(No Transcript) 
 74(No Transcript) 
 75(No Transcript) 
 76(No Transcript) 
 77(No Transcript) 
 78(No Transcript) 
 79(No Transcript) 
 80(No Transcript) 
 81(No Transcript) 
 82Just-in-time Education
- Medline button 
- Infobuttons
83(No Transcript) 
 84(No Transcript) 
 85(No Transcript) 
 86(No Transcript) 
 87(No Transcript) 
 88(No Transcript) 
 89(No Transcript) 
 90(No Transcript) 
 91(No Transcript) 
 92(No Transcript) 
 93Just-in-time Education
- Medline button 
- Infobuttons 
- Text-to-Web
94Proof of Concepts
- Merging data and application knowledge 
- Smarter retrievals from the record 
- Just-in-Time education
95Expert Systems
- Hripcsak, et al., Ann. Int. Med., 1995
96Hripcsak, et al., Ann. Int. Med., 1995
- Identify chest x-ray reports suspicious for 6 
 clinical conditions to trigger alerts
-  Method Sens Spec 
- Laypersons 22-47 97-99 
- Radiologists 73-98 96-99 
- Internists 68-98 97-99 
- Keyword 51-79 79-92 
- NLP/MED/Rule-based 81 98
97Expert Systems
- Hripcsak, et al., Ann. Int. Med., 1995
- Clinical decision support system
98Clinical Decision Support System
- Data monitor runs rules against incoming reports 
- Tuberculosis cultures come back 4-8 weeks later 
- One day, hundreds of TB alerts came in
99What Happened to the Tuberculosis Alert?
?
Medical Logic Module
No Growth to Date
No Growth 
 100How We Outsmarted the Lab
?
Medical Logic Module
No Growth to Date
No Growth 
 101Expert Systems
- Hripcsak, et al., Ann. Int. Med., 1995 
- Clinical decision support system
102DXplain Button
- Elhanan, et al., SCAMC 1997 
- Convert of test results to clinical findings
Serum Cholesterol Test
  103(No Transcript) 
 104(No Transcript) 
 105(No Transcript) 
 106(No Transcript) 
 107Expert Systems
- Hripcsak, et al., Ann. Int. Med., 1995 
- Clinical decision support system 
- DXplain Button
108Proof of Concepts
- Merging data and application knowledge 
- Smarter retrievals from the record 
- Just-in-Time education 
- Expert systems
109Data Mining
- Wilcox and Hripcsak, SCAMC 1997
110Wilcox and Hripcsak, SCAMC 1997 
 111Data Mining
- Wilcox and Hripcsak, SCAMC 1997
- Wilcox and Hripcsak, SCAMC 1998
112Wilcox and Hripcsak, SCAMC 1998
- Compare traditional coding methods with NLP to 
 identify conditions in a set of patient records
 (x-ray reports)
-  Method Sens Spec 
- Laypersons 36 86 
- Expert-coded cases 27-37 95-98 
- ICD-9-coded cases 12-29 86-90 
- Physicians 85 98 
- NLP/MED/Rule-based 81 98
113Data Mining
- Wilcox and Hripcsak, SCAMC 1997 
- Wilcox and Hripcsak, SCAMC 1998
114Proof of Concepts
- Merging data and application knowledge 
- Smarter retrievals from the record 
- Just-in-Time education 
- Expert systems 
- Data mining
- Database maintenance and use
115Database Maintenance and Use
- Tables, columns, events all modeled in the MED 
- Allows linkage of data model to controlled 
 terminology
- Terminologies can be reused 
- Impact of terminology changes on data model can 
 be tracked
116Proof of Concepts
- Merging data and application knowledge 
- Smarter retrievals from the record 
- Just-in-Time education 
- Expert systems 
- Data mining 
- Database maintenance and use
- Terminology maintenance and use
117Terminology Maintenance and Use
- Integrating terminologies from merging hospitals 
- Automated update of medication terminology 
- Detection of errors and inconsistencies
118Proof of Concepts
- Merging data and application knowledge 
- Smarter retrievals from the record 
- Just-in-Time education 
- Expert systems 
- Data mining 
- Database maintenance and use 
- Terminology maintenance and use
119Is it Worth the Trouble?
- Meed 
- noun 
- 1 archaic  an earned reward or wage 
- 2  a fitting return or recompense 
- Date before 12th century 
- Etymology from Old English 
-  MED
120Summary
- Putting knowledge in your terminology gets you 
- Better ways to get knowledge out of your EMR 
- Better ways to get knowledge out of resources 
- Better ways to use other knowledge bases 
- Bettter ways to use terminology 
- Better ways to manage applications 
- Better ways to manage data and terminology 
- Representation scheme is less important 
- Desiderata for controlled terminology
121Desiderata
- Desirable qualities for terminology
122Desiderata
- Desirable qualities for terminology
Go placidly amid the noise and haste, and 
remember what peace there may be in 
silence. Id rather be sailing