Title: Slide No.: 1
1Introduction to Clinical Terminology and
Classification
- AL Rector OpenGALEN CO-ODEThe Medical
Informatics Group, U of Manchester - www.cs.man.ac.uk/mig/galenwww.opengalen.orgwww.c
o-ode.orgoiled.man.ac.ukrector_at_cs.man.ac.uk
2Where we come from
Best Practice
Best Practice
3OpenGALEN Philosophy
- Terminology is software
- Terminology is the interface between people and
machines - Re-use is the key
- Patient-centred information
- Terminology must have a purpose
- Always ask Whats it for?
- Not art for arts sake
- Terminology supports clinical applications - not
vice versa - Applications for someone to do something for
somebody - Keep the Horse before the Cart
- Always ask How will we know if it works?
How will we know if it fails?
4OpenGALEN Key ideas
- Separation of kinds of knowledge
- Terminology, medical record and information
system schemas - Models of meaning Models of Use
- Concepts, language, Coding, Indexing,
Pragmatics - Machine level, User level
- Knowledge is fractal!
- There will always be more detail to be added
- Therefore terminologies must be extensible
- Formal logical Support
- Too big and complicated to maintain by hand
- Extensibility requires rules
- Software needs logical rigour
5Axes for kinds of Knowledge
- Machine level
- Human Level
- Concepts
- Language
- Coding
- Indexing
- Pragmatics User Interface
- Terminology
- Medical Records/Information systems
- Decision Support rules
69) Interface of EHR, Messaging Decision Support
Significant Research Topic Now
7Uses of Terminology
- Clinical
- Epidemiology and quality assurance
- Reproducibility / Comparability
- Indexing
- Software
- Re-use !
- Integration and Messaging between systems
- Authoring and configuring systems
- Data capture and presentation (user interface)
- Indexing information and knowledge (meta-data,
The Web)
8An Old Problem
- On those remote pages it is written that animals
are divided into - a. those that belong to the Emperor
- b. embalmed ones
- c. those that are trained
- d. suckling pigs
- e. mermaids
- f. fabulous ones
- g. stray dogs
- h. those that are included in this classification
- i. those that tremble as if they were mad
- j. innumerable ones
- k. those drawn with a very fine camel's hair
brush - l. others
- m. those that have just broken a flower vase
- n. those that resemble flies from a distance"
From The Celestial Emporium of Benevolent
Knowledge, Borges
9HistoryOrigins of existing terminologies
- Epidemiology
- ICD - Farr in 1860s to ICD9 in 1979
- International reporting of morbidity/mortality
- ICPC - 1980s
- Clinically validated epidemiology in primary care
- Now expanded for use in Dutch GP software
- Librarianship
- MeSH - NLM from around 1900 - Index Medicus
Medline - EMTree - from Elsevier in 1950s - EMBase
- Remumeration
- ICD9-CM (Clinical Modification) 1980
- 10 x larger than ICD aimed at US insurance
reimbursement
10Traditional Systems
- Built by people for interpretation by people
(Coding clerks) - Most knowledge implicit in rubrics
- Must understand medicine to use intelligently
- Not built for software
- On paper for use on paper
- Enumerated - top down all possibilities listed
- Serial - Single use - Single View
- Hierarchical Thesauri
- Traditional terminological techniques from
librarianship - Broader than / Narrower than (ISO 1087)
- no logical foundation
- Focused on terms
- Language and concepts mixed
- Synonyms, preferred terms, etc caused confusion
11History (2)
- Pathology indexing
- SNOMED 1970s to 1990 (SNOMED International)
- First faceted or combinatorial system
- Topology, morphology, aetiology, function
- Plus diseases cross referenced to ICD9
- Specialty Systems
- Mostly similar hierarchical systems
- ACRNEMA/SDM - Radiology
- NANDA, ICNP - Nursing
12History (3)
- Early computer systems
- Read I (4 digit Read)
- Aimed at saving space on early computers
- 1-5 Mbyte / 10,000 patients
- Hierarchical modelled on ICD9
- Detailed signs and symptoms for primary care
- Purchased by UK government in 1990
- Single use
- Morbidity indexing
- Medical Entities Dictionary (MED)
- Jim Cimino
13History (4)
- Aspirations for electronic patient records (EPRs)
- Weeds Problem Oriented Medical Record
- Direct entry by health care professionals
- Aspirations for decision support
- Ted Shortliffe (MYCIN), Clem McDonald (Computer
based reminders), Perry Miller (Critiquing),.. - Aspirations for re-use
- Patient centred information
- Needed common multi-use multi-purpose terminology
- None worked
14Motivations and Business Models
- Remuneration
- ICD9/10-CM in US for insurance and medicare for
diseases - Clinical Procedures Terminology (CPT) for
surgical procedures - Public Health Reporting
- ICD9/10
- Clinical Recording
- Read 1-3, SNOMED-RT/CT
- ICPC International Classification of Diseases
in Primary Care - Indexing publications
- MeSH Medical Subject Headings - Basis of
indexing MedLine/PubMed - EMTree basis of indexing EMBASE
- Support for applications and decision support
- GALEN
15Summary of Changes at end of 1st Generation
- From terminologies for people to terminologies
for machines - From paper to software
- From single use to multiple re-use for patient
centred systems - From entry by coding clerks to direct entry by
health care professionals - From pre-defined reporting for statistics to
reliable indexing for decision support
16Changes at end of first generation
- From models of USE to models of MEANING
- But tended to lose the model of use
- The goal of useful and usable systems lost
17Problems withFirst GenerationEnumerated
Systems in coping with these changes
18Problems (1)
- Scaling !!!
- More detail and more specialities required
scaling up, but... - The combinatorial explosion
- Example Burns
- 100 sites x 3 depths ? 404 codes
- 5 subsites/site x chemical or thermal ? 7272
- x 3 extents x 3 durations ? 116,352
- The Persian chessboard
- 264 ? 1019
- 1019 grains of rice ? 100 billion tonnes of rice
- 1019 nanoseconds ? 10,000 years
- Read II grew from 20,000 to 250,000 terms in 100
staff-years - still too small to be useful
- but too big to use
19Benefits
- Avoid the Exploding Bicycle From phrase
book to dictionary grammar Tame
combinatorial explosions - 1980 - ICD-9 (E826) 8
- 1990 - READ-2 (T30..) 81
- 1995 - READ-3 87
- 1996 - ICD-10 (V10-19) 587
- V31.22 Occupant of three-wheeled motor vehicle
injured in collision with pedal cycle, person on
outside of vehicle, nontraffic accident, while
working for income - and meanwhile elsewhere in ICD-10
- W65.40 Drowning and submersion while in bath-tub,
street and highway, while engaged in sports
activity - X35.44 Victim of volcanic eruption, street and
highway, while resting, sleeping, eating or
engaging in other vital activities
20Problems (2)
- Information implicit in the rubrics
- Hypertension excluding pregancy
- Computers cant read!
- Invisible to software
- No explicit information except the hierarchy
- Minimal support for software
- No opportunity to use softwre to help
- Language and concepts confused
- Synonyms
- Preferred terms
- Homonyms
- Only simple look up and spelling correction
21Problems (3)
- Mixed Organisation
- Heart diseases in 13 of 19 chapters of ICD
- Tumours, infections, congenital abnormalities,
toxic, - Steroids in five chapters of standard drug
classifications - Anti-inflammatories, anthi-asthmatics,
- Unreliable for indexing or Abstractions
- How to say something about all heart diseases?
- Fixed organisation
- Single hierarchy - Single use
- Where to put gout - arthritis or metabolic
disease? - Back and forth in each edition of ICD
- No re-use
22Problems 3bThesauri rather than Classifications
23Problems (4)
- Semantic identifiers
- Codes really paths - moving a concept meant
changing its code - 3 Cardiovascular disorders
- 3.4 Disorders of Artery...
- ...3.4.2 Disorders of coronary artery...
- 3.4.2.3 Coronary thrombosis
- Easy to process but...
- Reorganisation requires changing codes
- Codes cannot be permanent
-
24Problems (5)
- Maintenance
- 20 Years from ICD9 to ICD10
- 100 person-years from Read 1 to Read 3
- Mega francs/guilders/crowns/marks on European
coding schemes - Thousands of unpaid hours of committee time
- Impossible / meaningless decisions take longest
- You can search forever for something that is not
there - Multiple uses compete -
- Must choose one use
- Most successful were clear about their purpose -
ICD, ICPC, MeSH - Codes change meaning with version changes
- Old data misleading!
25Problems (6)
- Version specific artefacts
- Not otherwise specified (NOS)
- Used to move a general concept down
- Not elsewhere classified (NEC)
- Catch all - Nowhere else in coding system e.g.
Tumour not elsewhere classified - dependent on version,
- Other
- Catch all - Not listed below, e.g. Other
diseases of the cardiovascular system - dependent on version
- Not used consistsently
26Problem (7) Language is slipperyTwo hands or
Four?
27Language/Concepts are slippery
- Human cognition makes it look easy
- Logic fails to capture it
- Classification is easy until you try to do it
- Trying since Aristotle in the West and Ancient
Chinese in the East - Words/Concepts mean what a community decides they
mean - Does a chimpanzee have four hands?
- Is a prion alive?
- Is surgery on the ovary a kind of Endocrine
surgery? - Easier to agree on the concrete than the abstract
- Easy to agree on useful abstractions and
generalisations - Harder to agree on how to name them
28Problems (8)
- There is no re-use - there is no standard
- The grand challenge A common controlled
vocabulary for medicine - But re-use requires multiple different views
- Peoples needs differ / People do and find
different things - By profession
- Doctors and specialties, nurses,
physiotherapiests, dentists - By situation
- Inpatient, outpatient, primary care, community
- By task
- Diagnosis, management, prescribing,
- patient care, public health, quality assurance,
management, planning - By country and community
- US, UK, France, Germany, Japan, Korea, ...
29Summary of Problems1st Generation Enumerated
Systems
- Enumerated Single Hierarchies
- List all possibilities in advance
- Cannot cope with fractal knowledge
- Most knowledge implicit
- Invisible to software
- Cant agree on common concepts and classification
- Unreliable for indexing
- Difficult to use for healthcare professionals
- No support for user interface
- Cant build and maintain big classifications
- Language and concepts dont translate easily to
logic and software
30Ciminos Desiderata (1)
- Concept orientation
- Separate language (terms) and concepts (codes)
- Concept permanence
- Never re-use a code (retire it)
- Nonsemantic concept identifiers
- Separate the code from the path
- Polyhierarchy
- Allow one concept to be classified in multiple
ways - Gout can be both a metabolic disease and an
arthritis
31Ciminos Desiderata (2)
- Formal Definitions
- i.e Be compositional
- Reject Not elsewhere classified
- concept permanence and NEC
- Multiple granularities
- Organ, tissue, cellular, molecular
- Grades, types, classes of diseases
- Special clinical criteria
- Multiple consistsent views
- Allow different organisations
- e.g. functional, anatomical, pathological
32Ciminos Desiderata (3)
- Represent context
- Family history, risk, source of information
- Evolve gracefully
- Allow controlled changes
- Recognise redundancy (equivalence)
- Carcinoma Lung ?? Carcinoma of the lung
- How would we know?
- How could a machine know?
33Solution 0 You are worrying about the wrong
problem
- International Classification of Primary Care
(ICPC) - Focus on repeatability and quality across
languages for a small (lt2000) number of codes
34Solution Generation 1Megaterm Crossmapping
UMLS
Decision support
Clinical Applications
Medical Records
Data entry
35Cross mapped and typed terminologies
vocabularies
36The UMLS Knowledge Sources
- Metathesaurus
- Cross mappings
- Language resources
- NORM stemming and term recognition
- UMLS Semantic Net
- 170 types attached to categorise concepts
- Disease, anatomical part, micro-organism, etc.
37(No Transcript)
38Solution 1 Cross-mapping UMLS
- Unified Medical Language System (UMLS) from US
National Library of Medicine - Defacto common registry for vocabularies
- Concept Unique Identifiers (CUIs) and Lexical
Unique Identifiers (LUIs) are defacto the common
nomenclature - NB must use a CUI LUI to get unique
identification - Licence terms
- Class I free for use
- Clsass III heavily restricted
- (Class II almost nonexistent)
39Solution 1 Cross-mapping UMLS
- An invaluable resource, but...
- No better than the vocabularies which are mapped
- Limited detail for patient care
- Unreliable for indexing or abstraction of
knowledge - Best for relating everything to MeSH for indexing
literature - Still limited by combinatorial explosion
- Still cant cope with fractal knowledge
- Not extensible - no help in building or extending
terminologiese - No help in reorganising existing terminologies to
re-use for new purposes - Top down
- Information still implicit
- Minimal help with software
- No help with data capture, user interfaces
40Solution IIa Build what you need as you need it
- LOINC dominant coding system for laboratory
systems(Logical Observation Identifiers Names
and Codes)http//www.loinc.org/ - Clinical LOINC contains increasing amounts of
clinical references - Fully Class I included in UMLS
- Closely linked to HL7 and HL7 vocabulary
committee
41(No Transcript)
42Build and Control what you need only
- HL7 Messaging standard
- Controls the codes that hold messages together
- Uses codes from elsewhere as payload
- See www.hl7.org
- (Possib ly the worlds worst web site)
- Some material members only
43Solutions Generations 2-3Compositional Systems
- Beat the combinatorial explosion
- Build concepts out of pieces - leggo
- Dictionary and grammar rather than phrasebook
- But hard
44Solution Generation 1.5 Faceted
- Faceted systems SNOMED International
- Inflammation Lung Infection Pneumococcus ?
Pneumoccal pneumonia - Limit combinatorial explosion, but
- Rigid - a limited number of axes / facets /
chapters - Each facet has the problems of a first generation
enumerated system - Much knowledge still implicit
- No way to know how identifiers relate
- No explicit relations, only
- No way to recognise redundancy / equivalence
- No help with data capture or user interface / No
way to recognise nonsense - Carcinoma Hair Donkey Emotional ? ????
- Still cant cope with fractal knowledge
- Limited extensibility limited help with
building, extending or reorganising - Still Top Down
45Generation 2 Enumerated Compositional
- Read III with qualifiers
- Inflammation site lung, cause pneumococcus ?
Pnemococcal Pneumonia - More semantics but
- Limited qualifiers - limited views - limited
re-use - Limited help with data capture - User interface
difficult - Much information still implicit - limited
software support - No way to recognise redundancy / equivalence /
errors - Organisation still mixed - indexing better but
still unreliable - Limited separation of language and concepts
- Still cant cope with fractal knowledge
- Limited extensibility limited help with building
and reorganising terminologies - Top down
46Logic Based Ontologies The basics
Primitive skeleton
Descriptions
Definitions
Reasoning
Validating
Thing
red partOf Heart
red partOf Heart
(feature pathological)
47CT Vocabulary
- Reference Terminology vs Interface
Terminologies - Reference terminology enumerated hierarchy of
formally defined terms - Interface terminology navigation structure for
user interface - Explicitly excluded from SNOMED-RT
- Terming, Coding, and Grouping
- Terming - finding the lexical string
- Coding - finding the correct unique code
(concept) - Grouping - putting codes into groupers for
epidmiological or other purposes
48Generation 2.5 Pre-coordinatedFormal Compositions
- SNOMED-CT
- Formal collaboration between College of American
Pathologists (CAP/SNOMED) and NHS - Formal logical model for classifying a fixed list
of definitions - Simple fixed ontology (7 links)
- Now officially adopted and probably available for
both NHS and related academic uses - GALEN derived terminologies
- UK Drug Ontology
- Procedure classifications
49Generation III
- Fully compositional post coordinated
- Not yet in use or fully available
- GALEN-like
- Will probably arrive with Semantic Web
50Other Key Resources
- Anatomy
- Digital Anatomist Foundational Model of Anatomy
- University of Washington (http//sig.biostr.washi
ngton.edu/projects/da/) - Comprehensive model of STRUCTURAL anatomy
- Transformed into formal representation in
Freiburg - Feasibility rather than production
- Mouse
- The Edinburgh Mouse Atlas Project
(http//genex.hgu.mrc.ac.uk/) - Bioinformatics
- GO - The Gene Ontology
- MGED Mircroarray Gene Expression Data
- OMIM Online Mendelian Inheritance in Man
- Drugs
- Proprietary databases First Databank, Micromed
- UK Drug Dictionary (UKCPRS)
- National Cancer Institute CaCore Ontologies
51Current Status (1)
- UMLS is the central coordinating force
- Any terminology needs links links to CUIs and
LUIs - Many people using CLASS I licensed terms only
- Links to MeSH and PubMed
- ICD9/10-CM used for reporting of diseases for
insurance and Medicare in the US - ICD-10 used for official reporting in UK
- CPT and OPCS used for reporting of procedures in
US and UK respectively - SNOMED-CT purchased by US and mandated in UK
- As yet few convincing
52Current Status (2)
- ICPC widely used in in primary care on continent,
especially in the Netherlands - LOINC used for lab systems HL7 for messaging
- Variants of SNOMED used for pathology many places
- Many specialist systems
- SNOMED-DICOM-Microglossary (SDM) for imaging
- Unrelated to SNOMED
- Several nursing systems
- A variety of open source resources appearing
53Current Status (3)
- Commercial world dominated by proprietary systems
- MedCin
- All based on Model of Use
54The Semantic Web and OWL
- Ontologies fancy word for terminologies
- Means many things to many people
- W3C has produced a standard language for
compositional logic based ontologies, OWL - OIL DAML ? DAMLOIL ? OWL
- See oiled.man.ac.uk
- See www.co-ode.org
- See http//www.w3.org/2001/sw/WebOnt/
- Rapid proliferation of open source tools and
resources - No longer a biomedical problem only
- Serious computer scientists finally involved