Title: Unified Medical Language System UMLS
1Unified Medical Language System (UMLS)
- Yildiray Kabak, METU-SRDC
2Outline
- Introduction
- Knowledge Sources
- Metathesaurus
- Semantic Network
- SPECIALIST Lexicon
- UMLS Knowledge Source Server
- UMLS Applications
- EHR perspective
3Introduction (Purpose)
- lack of a standard language in medicine
- main purpose is to build required vocabulary
- to facilitate the development of computer systems
that behave as if they "understand" the meaning
of the language of biomedicine and health
4Introduction (Purpose)
- for system developers in building or enhancing
- electronic information systems that create,
process, retrieve, integrate, and/or aggregate
biomedical and health data and information
5Introduction
- The UMLS Knowledge Sources are multi-purpose
- not optimized for particular applications
- The associated UMLS software tools assist
developers in customizing or using the UMLS
Knowledge Sources for particular purposes
6Introduction
- consists of three Knowledge Sources
- Metathesaurus
- concepts that include the various names
representing the same meaning from different
source vocabularies - Semantic Network
- 135 semantic types, 54 semantic relations
- SPECIALIST Lexicon
- dictionary of biomedical terms and common words,
lexical tools and records used in natural
language processing
7Outline
- Introduction
- Knowledge Sources
- Metathesaurus
- Introduction
- Concepts, Terms, Strings, Atoms
- Relations
- Metathesaurus Files
- Semantic Network
- SPECIALIST Lexicon
- UMLS Knowledge Source Server
- UMLS Applications
- EHR perspective
8Introduction
- very large concept-oriented database
- holds concepts, their various names and
relationship among them - Links alternative names and views of the same
concept together and identify useful relations
between different concepts - 2004AA 1,020,866 concepts and 2.8 million terms
92002AC
- 870,000 concepts (Eye, Oculus 1)
- 1,756,000 terms (Eye, Eyes, eye 1)
- 2,083,103 strings/concept names (Eye, Eyes,
eye 3) - 11,479,000 relationships between concepts
- 7 million of relationships between concepts and
English words - 113 source vocabularies
- 15 different languages
10Metathesaurus
- It is built from the electronic versions of many
different thesauri, classifications, code sets,
and lists of controlled terms used in - patient care,
- health services billing,
- public health statistics,
- indexing and cataloging biomedical literature,
- and/or basic, clinical, and health services
research.
11Types of Metathesaurus sources
- Thesauri, e.g., MeSH
- Statistical Classifications, e.g., ICD-9, ICD-10,
ICPC - Billing Codes, e.g., CPT, CPT Spanish version,
HCPCS - Clinical Coding Systems, e.g., SNOMED, Read
- Nursing Vocabularies, e.g., NIC, NOC, OMAHA
- Alternative/Complementary Medicine ALTLINK
- Drug Sources Multum, Micromedex, VANDF
- Drug Regulatory, e.g., MedDRA
- Lists of controlled terms, e.g., COSTAR, HL7
values
12Introduction
- All concepts are assigned to at least one
semantic type - consistent categorization of all concepts at the
relatively general level - Metathesaurus must be customized to be used
effectively
13Metathesaurus
- do not provide a structure
- provide unified meaning for a concept from
different vocabularies - provide the data itself
14Concerning message ontology
- Assume that all the concepts are defined in OWL
-
-
-
-
- CUI for ALLERGY
-
-
-
- /
-
-
-
-
15Concerning message ontology
- An instance for example
-
-
-
- Anaphylaxis
- C0002792
- ....
- Metathesaurus source vocabularies include
terminologies designed for use in patient-record
systems
16Metathesaurus structure
- Concepts (CUI)
- Terms (LUI)
- Strings (SUI)
- Atoms (AUI)
- Relations
17Concept
- A concept is meaning
- A meaning can have many different names
- link all the names from all of the source
vocabularies that mean the same thing - each concept (meaning) has a concept unique
identifier (CUI)
18(No Transcript)
19Concept Names and String identifiers
- Each string in the concept names has a unique
identifier (SUI) - Any variation in character set, upper-lower case,
punctuation is a separate string with a separate
SUI - The same string in different languages have
different SUI
20Atoms
- Each and every occurrence of a string in each
source vocabulary is an atom - every atom has an atom identifier (AUI)
- In other words, Atoms are the entries in the
source vocabularies
21Terms
- All the variants of a string is grouped into a
term - a term is the group of all strings that are
lexical variants of each other - Each term has a lexical identifier (LUI)
22Example
23(No Transcript)
24(No Transcript)
25Metathesaurus
CONCEPTs
TERMs
STRING
CUIs
SUIs
LUIs
STRING
STRING
Is organized by concept or meaning its purpose
is to link alternative names and views of the
same concept together and to identify useful
relationships between different concepts.
STRING
STRING
STRING
26Metathesaurus relationships
- Apart from the synonymy
- Intra-source relationships between concepts from
the same vocabulary - Inter-source relationships between concepts in
different vocabularies
27Intra-source
- Hierarchical
- immediate-parent
- immediate-child
- immediate-sibling
- Broader (RB) Has a meaning which includes that
of the concept. - Narrower (RN) Has a meaning which is included in
that of the concept - Statistical
- if two concepts co-occurred as key topics within
the same articles
28Inter-source
- (Note some of the below may be statistical)
- Other related (RO) Has a relationship other than
synonymous, narrower, or broader - Like (RL) The two concepts are similar or
"alike". - RQ related and possibly synonymous
- SY source asserted synonymy
29Relationships
- In 2002AC version
- 5M Inter-source Hierarchical relations
- 6.5M statistical relations
30Metathesaurus Files
- when installed, a set of files is created. That
is Metathesaurus is just a set of files - you are responsible for reading from that files
- you can also customize the files according to
your needs - For example, MRCONSO.RRF
- CUI,LAT,TS,LUI,STT,SUI,ISPREF,AUI,SAUI,SCUI,SDUI,S
AB,TTY,CODE,STR,SRL,SUPPRESS, CVT - There is exactly one row for each atom
31(No Transcript)
32Outline
- Introduction
- Knowledge Sources
- Metathesaurus
- Semantic Network
- SPECIALIST Lexicon
- UMLS Knowledge Source Server
- UMLS Applications
- EHR perspective
33Semantic Network
- provides consistent categorization of all
concepts and the relations between the types - Note that these relations are between types not
concepts - They are different from the relations in the
Metathesaurus
34Semantic Network
- Broad subject categories
- Represent the biomedical domain
- 2 main categories
- Entity
- Event
- Semantic type is assigned to Metathesaurus
concepts at the most specific level
35UMLS Semantic Net
Entity
Event
36Relations
- Primary link isa ? establishes the hierarchy
- Five group other than isa
- physically related to
- spatially related to
- functionally related to
- temporally related to
- conceptually related to
- inheritance supported
37Semantic Net 54 Links
38Example
39In addition.. Semantic Groups
- 15 Semantic Groups
- Smaller set of categories (135 15)
- Broader, coarser groupings
- Partition 99.5 of UMLS Metathesaurus concepts
- Used for
- Word sense disambiguation
- Profiling, analyzing vocabularies
- Display in Semantic Navigator
40Semantic Groups
- Activities and Behavior
- Anatomy
- Chemicals Drugs
- Concepts Ideas
- Devices
- Disorders
- Genes Molecular Sequences
- Geographic Areas
- Living Beings
- Objects
- Occupations
- Organizations
- Phenomena
- Physiology
- Procedures
41(No Transcript)
42Outline
- Introduction
- Knowledge Sources
- Metathesaurus
- Semantic Network
- SPECIALIST Lexicon
- UMLS Knowledge Source Server
- UMLS Applications
- EHR perspective
43SPECIALIST Lexicon and Lexical Tools
- Lexicon a database of syntactic, morphological
and orthographic information for commonly
occurring English language words and biomedical
vocabulary for natural language processing - Tools assist in detecting and abstracting away
from the inflectional, case and word order
variations
44Lexicon
- Many of the words and multi-word terms that
appear in concept names also appear in the
SPECIALIST lexicon - The lexical tools are used to generate the word,
normalized word and normalized string indexes
(connect each word to all related string, term,
concept) to the Metathesaurus
45Lexicon
- English words
- 20,000 (initial) test set from MEDLINE abstracts
- 10,000 American Heritage Dictionary frequency
list - 2,000 Longman's Dictionary of Contemporary
English - verbs and adjectives identified by heuristics
- Biomedical terms in the Metathesaurus
46Example Lexical Variant Generator
- 3 primary programs
- Normalizer(norm)
- Word index generator (wordInd)
- Lexical variant generator (LVG)
47Normalization
- Abstracts away
- Case
- Punctuation
- word order
- possessive forms
- inflectional variation
- Generates strings in Metathesaurus normalized
string index (MRXNS)
48Example
- Hodgkin Disease
- HODGKINS DISEASE
- Hodgkin's Disease
- Disease, Hodgkin's
- HODGKIN'S DISEASE
- Hodgkin's disease
- HodgkinsDisease
- Hodgkin's disease NOS
- Hodgkin's disease, NOS
- Disease, Hodgkins
- Diseases, Hodgkins
- HodgkinsDiseases
- Hodgkinsdisease
- hodgkin'sdisease
- DiseaseHodgkins
- Disease, Hodgkin
Disease hodgkin
49Lexical Tools
- For example
- MMTX designed to map arbitrary terms to concept
names or to discover concepts within free text
50Outline
- Introduction
- Knowledge Sources
- Metathesaurus
- Semantic Network
- SPECIALIST Lexicon
- UMLS Knowledge Source Server
- UMLS Applications
- EHR perspective
51UMLS Knowledge Source Server
- Internet access to the Knowledge Sources
- Browser
- API
52Outline
- Introduction
- Knowledge Sources
- Metathesaurus
- Semantic Network
- SPECIALIST Lexicon
- UMLS Knowledge Source Server
- UMLS Applications
- EHR perspective
53Application Example
They developed their own database from the files
54PubMed
Bronzed disease
UMLS tools
Addisons disease(MesH term)
Search MeSH indexed database
55Outline
- Introduction
- Knowledge Sources
- Metathesaurus
- Semantic Network
- SPECIALIST Lexicon
- UMLS Knowledge Source Server
- UMLS Applications
- EHR perspective
- GEHR
- openEHR
56GEHR
You define the structure on your own
57GEHR
- Wherever PLAIN_TEXT or TERM_TEXT appears in the
GOM, the expansion of a termset code may appear. - GEHR uses the CUI (Concept Unique Identifier) of
the UMLS to specify concept codes for any
attribute in the model
58GEHR
59openEHR
60openEHR (Data Package)
61Future Work
- Investigate the API, for example
- Find the concepts of a semantic type
- Find the concepts of a vocabulary
- Find the concepts of the semantic type, which is
functionally related to a semantic type of a
given concept