Title: Computational Linguistics for Referent Tracking in Electronic Healthcare Records: a research agenda CogSCI Colloquium Oct 19, 2005
1Computational Linguistics for Referent Tracking
in Electronic Healthcare Records a research
agenda CogSCI Colloquium Oct 19, 2005
- Dr. W. Ceusters
- European Centre for Ontological Research
- Saarland University, Saarbrücken - Germany
2Presentation overview
- ECOR and me
- The Electronic Health Record (EHR)
- Problems with terminologies and their use in the
EHR - Realist ontology
- Referent Tracking
- Opportunities for computational linguistics
3European Centre forOntological Research
4ECORs members partners
External members
Local members
Partners
Status Oct 2, 2005
5Goals and objectives
- sustained and coordinated collaboration with
institutions with proven track record of
excellence in ontological research and in the
application of ontology to solve concrete
problems. - interdisciplinary approach based on philosophical
rigour - exchange of research personnel for short research
visits - participation in joint projects,
- joint supervision of doctoral research,
- joint production of software and authorship of
research papers - collaborate in seeking funding at national and
international levels for ontology-related
research and development activities
6Recently also in the US
7Short personal history
8The Electronic Health Record
9Current US GOV eHealth goals strategies
- G1 Inform Clinical Practice
- S1. Provide incentives for EHR adoption.
- S2. Reduce risk of EHR investment.
- S3. Promote EHR diffusion in rural and
underserved areas. - G2 Interconnect Clinicians.
- S1. Regional collaborations.
- S2. Develop a national health information
network. - S3. Coordinate federal health information
systems. - Goal 3 Personalize Care.
- S1. Encourage use of Personal Health Records.
- S2. Enhance informed consumer choice.
- S3. Promote use of telehealth systems.
- Goal 4 Improve Population Health.
- S1. Unify public health surveillance
architectures. - S2. Streamline quality and health status
monitoring. - S3. Accelerate research and dissemination of
evidence.
US Department of Health and Human Services July
21, 2004
10Electronic Health Record
- ISO/TS 183082003
- Electronic Health Record (EHR)
- A repository of information regarding the health
of a subject of care, in computer processable
form. - EHR system
- the set of components that form the mechanism by
which electronic health records are created,
used, stored, and retrieved. It includes people,
data, rules and procedures, processing and
storage devices, and communication and support
facilities. - More common meaning of EHR system
- only the software being executed
11The Medical Informatics dogma
- To structure or NOT to be
- Fact computers can only deal with a structured
representation of reality - structured data
- relational databases, spread sheets
- structured information
- XML simulates context
- structured knowledge
- rule-based knowledge systems
- Conclusion a need for structured data
entry (???)
12Example of data entry form
www.comchart.com
13Structured EHR data entry
- Current technical solutions
- Data entry forms
- provide the structure
- various paradigms
- Rigid, pre-fixed
- Adaptable to user-preferences, but fixed when
used - Dynamically adapting to entered data in context
- Terminologies, coding and classification systems
- provide the language to be used
- Exchange of information preserving meaning
- Statistics and epidemiology
14The International Classification of diseases
(WHO).
- ...
- Chapter II Neoplasms (C00-D48)
- Chapter III Diseases of the Blood and
Blood-forming organs and certain disorders
involving the immune mechanism (D50-D89) - Excludes auto-immune disease (systemic) NOS
(M35.9) - ....
- Nutritional Anemias (D50-D53)
- D50 Iron deficiency anaemia
- Includes ...
- D50.0 Iron deficiency anaemia secondary
to blood loss (chronic) - Excludes ...
- D50.1 ...
- D51 Vit B12 deficiency anaemia
- Haemolytic Anemias (D55-D59)
- ...
- Chapter IV ...
15The alphabetic index of ICD-9-CM
- hydrops 782.3
- abdominis 789.5
- amnii (complicating pregnancy)
- (see also hydramnios) 657
- congenital - see Hydrops, fetalis
- fetal(is) or new-born 778.
- due to iso-immunisation 773.3
- not due to iso-immunisation 778.0
- meningeal NEC 331.4
- pericardium - see Pericarditis
16Snomed International (1995) Number of records
(V3.1)
- T Topography 12,385
- M Morphology 4,991
- F Function 16,352
- L Living Organisms 24,265
- C Drugs Biological Products 14,075
- A Physical Agents, Forces and Activities
1,355 - D Disease/ Diagnosis 28,623
- P Procedures 27,033
- S Social Context 433
- J Occupations 1,886
- G General Modifiers 1,176
- TOTAL RECORDS 132,641
17Snomed International (1995)knowledge in the
codes.
- leaflet posterior
- anatomic
- mitral
- cardiac valve
- cardiovascular
-
18Snomed International multiple ways to express
the same thing
- D5-46210 Acute appendicitis, NOS
- D5-46100 Appendicitis, NOS
- G-A231 Acute
- M-41000 Acute inflammation, NOS
- G-C006 In
- T-59200 Appendix, NOS
- G-A231 Acute
- M-40000 Inflammation, NOS
- G-C006 In
- T-59200 Appendix, NOS
19The search for internal formal consistency
medSORT-II (Evans Hersh, 93)
- no pin-prick sensation in calf gt
- ltneuro-sensation-mxgt
- ltmethodgt ltpin-prock-testgt
pin-prick - ltlocusgt ltbody-regiongt calf
- ltresultgt lteval-attrgt ltattrgt
sensation - ltvaluegt absent
20UMLS Unified Medical Language System (NLM)
- Tool for information retrieval of 4 components
- Metathesaurus contains information about
biomedical concepts and how they are represented
in diverse terminological systems. - Semantic Network contains information about
concept categories and the permissible
relationships among them - Information Sources Map contains both
human-readable and machine-processable
information about all kinds of biomedical
terminological systems - Specialist lexicon english words with POS
21UMLS Semantic Network
22Main problems
- Internal and external consistency of
terminologies. - What do the terms in a terminology stand for ?
23Problems with terminologies (1)
24Problems with terminologies (2)
- ventricle used in 2 different meanings
25Problems with terminologies (3)
- Mixing of differentiae
- Ontological nonsense
26Problems with terminologies (4)
Incomplete classification
27Previous work
- Many of these deficiencies can be identified
corrected or prevented by doing the right sort of
ontology using a proper tool. - SNOMED-CT
- NCIT
- UMLS Semantic Network
- But this is NOT the topic of this presentation
28Whats wrong with currentuse of terminologies
(and)ontologies in the EHR ?
29Current mainstream thinking
30The story of Jane Smithan old case, well known
in the literature ...
31July 4th, 1990 Jane goes shopping
32A visit to the hospital
- City Health Centre Dr. Peters
- (City HC) Dr. Longley
-
33Diagnosis a severe spiral fracture of the femur
34CityHCs representation formalism(for statements
in records)
Categories represent concepts and are analogous
to classes in other formalisms
Individuals concrete instances of categories
which persist in space and time
Occurrences are specific occurrences of
individuals and must be situated in space and
time. The most important group of occurrences are
observations i.e. agents observations of
individuals.
Rector AL, Nowlan WA, Kay S, Goble CA, Howkins
TJ. A framework for modelling the electronic
medical record. Methods Inf Med. 1993
Apr32(2)109-19.
35A look at the database Use of SNOMED codes for
unambiguous understanding
How many numerically different disorders are
listed here ?
How many different types of disorders are listed
here ?
How many disorders have patients 5572, 2309 and
298 each had thus far in their lifetime ?
cause, not disorder
36Would it be easier if youcould see the code
labels ?
5572
04/07/1990
79001
Essential hypertension
0939
24/12/1991
255174002
benign polyp of biliary tract
2309
21/03/1992
26442006
closed fracture of shaft of femur
0939
20/12/1998
255087006
malignant polyp of biliary tract
37A look at the problems ...
38Main problem areasfor CityHCs EHR
- Statements refer only very implicitly to the
concrete entities about which they give
information. - Idiosyncracies of concept-based terminologies
- tell us only that some instance of the class the
codes refer to, is refered to in the statement,
but not what instance precisely. - Are usually confused about classes and
individuals. - Country and Belgium.
- Mixing up the act of observation and the thing
observed. - Mixing up statements and the entities these
statements refer to.
39Consequences
- Very difficult to
- Count the number of (numerically) different
diseases - Bad statistics on incidence, prevalence, ...
- Bad basis for health cost containment
- Relate (numerically same or different) causal
factors to disorders - Dangerous public places (specific work floors,
swimming pools), - dogs with rabies,
- HIV contaminated blood from donors,
- food from unhygienic source, ...
- Hampers prevention
- ...
40Proposed solutionReferent Tracking
- Purpose
- explicit reference to the concrete individual
entities relevant to the accurate description of
each patients condition, therapies, outcomes,
... - Method
- Introduce an Instance Unique Identifier (IUI) for
each relevant individual ( particular,
instance). - Distinguish between
- IUI assignment for instances that do exist
- IUI reservation for entities expected to come
into existence in the future
41Ontology
- Ontology the study of being as a science
- An ontology is a representation of some
pre-existing domain of reality which - (1) reflects the properties of the objects within
its domain in such a way that there obtains
a systematic correlation between reality and the
representation itself, - (2) is intelligible to a domain expert
- (3) is formalized in a way that allows it to
support automatic information processing - ontological (as adjective)
- Within an ontology.
- Derived by applying the methodology of ontology
- ...
42An ontological analysis
continuants
43Ontological recategorisation
Jane Smiths consultation with Dr. Peters
at City HC on 4th July 1990
Dr. Peters assessment of Jane Smiths fracture
of femur at City HC on 4th July 1990
44Essentials of Referent Tracking
- Generation of universally unique identifiers
- deciding what particulars should receive a IUI
- finding out whether or not a particular has
already been assigned a IUI (each particular
should receive maximally one IUI) - using IUIs in the EHR, i.e. issues concerning the
syntax and semantics of statements containing
IUIs - determining the truth values of statements in
which IUIs are used - correcting errors in the assignment of IUIs.
45IUI assignment
- an act carried out by the first cognitive
agent feeling the need to acknowledge the
existence of a particular it has information
about by labelling it with a UUID. - cognitive agent
- A person
- An organisation
- A device or software agent, e.g.
- Bank note printer,
- Image analysis software.
46Criteria for IUI assignment (1)
- The particulars existence must be determined
- Easy for persons in front of you, body parts, ...
- Easy for planned acts they do not exist before
the plan is executed ! - Only the plan exists and possibly the statements
made about the future execution of the plan - More difficult subjective symptoms
- But the statements the patient makes about them
do exist ! - However
- no need to know what the particular exactly is,
i.e. which universal it instantiates - No need to be able to point to it precisely
- One bee out of a particular swarm that stung the
patient, one pain out of a series of pain attacks
that made the patient worried - But this is not a matter of choice, not any
out of ...
47Criteria for IUI assignment (2)
- The particulars existence may not already have
been determined as the existence of something
else - Morning star and evening star
- Himalaya
- Multiple sclerosis
- May not have already been assigned a IUI.
- It must be relevant to do so
- Personal decision, (scientific) community
guideline, ... - Possibilities offered by the EHR system
- If a IUI has been assigned by somebody, everybody
else making statements about the particular
should use it
48Representation in the EHR
- Relevant particulars referred to using IUIs
- Relationships that obtain between particulars at
time t expressed using relations from an ontology
(type OBO) - Statements describing for each particular, at
time t - Of what universal from an ontology it is an
instance of - AND/OR (if one insists)
- By means of what concept from a concept-based
system it can sensibly be described
particulars
49Pragmatics of IUIs in EHRs
- IUI assignment requires an additional effort
- In principle no difference qua (or just a little
bit more) effort compared to using directly codes
from concept-based systems - A search for concept-codes is replaced by a
search for the appropriate IUI using exactly the
same mechanisms - Browsing
- Code-finder software
- Auto-coding software (CLEF NLP software Andrea
Setzer) - With that IUI comes a wealth of already
registered information - If for the same patient different IUIs apply, the
user must make the decision which one is the one
under scrutiny, or whether it is again a new
instance - A transfert or reference mechanism makes the
statements visible through the RTDB
50Advantage betterreality representation
IUI-003
51Other Advantages
- mapping as by-product of tracking
- Descriptions about the same particular using
different ontologies/concept-based systems - Quality control of ontologies and concept-based
systems - Systematic inconsistent descriptions in or
cross terminologies may indicate poor definition
of the respective terms
52How to make this practicalfor the text-based
partsof an EHR ?
- Referent tracking
- in the linguistic sense !
53The problem summarised
- natural language is the only medium that is able
to communicate clinical information about
individual patients without loss of necessary
detail - (virtual) structured data repositories are
required to make subsequent analyses possible - any transformation from free language to coding
and classification systems results in information
loss that is unacceptable for individual patient
care, but at the other hand is a conditio sine
qua non for population based studies - todays graphical user interfaces can deal
reasonably well with picking lists build around
controlled vocabularies that fulfil a bridging
function from free language towards coding and
classification systems but are incompatible with
referent tracking
54The ultimate scenario
Ontology
continuant
disorder
person
CAG repeat
EHR
Juvenile HD
IUI-1 affects IUI-2 IUI-3 affects
IUI-2 IUI-1 causes IUI-3
Referent Tracking Database
55A case study
- Goals
- Demonstrate the application of referent tracking
to a concrete patient story - Make you familiar with the ontological analysis
of what is involved - Understand the actions a NLU algorithm has to
perform when transforming (running) text into a
series of IUI-assertions ( information
extraction) - Create interest of the computational linguists
amongst you to embark on joined projects with us.
56Jim Ciminos Woods Hole case
- Jane Smith is a 30 year old, Native American
female who presents to the emergency room with
the chief complaint of cough and chest pain.The
patient reports that she has had a productive
cough for three days but that chest pain
developed one hour ago. She gives a history of
hypertension. She also reports that she was
treated in the past for tuberculosis while she
was pregnant. The patient reports an allergy to
Bufferin.Physical examination revealed a
well-developed, well-nourished female in moderate
respiratory distress. Vital signs showed a pulse
of 90, a respiratory rate of 22, an oral
temperature of 100.3, and a blood pressure of
150/100. Examination revealed rales and rhonchi
in the left upper chest. Abdominal exam revealed
a tender, palpable liver edge.LabsChem7
(serum) Glucose 100 (70-105) Chem7 (plasma)
Glucose 150 (75-110)CBC Hgb 15 (12.0-15.8),
Hct 45 (42.4-48.0), WBC 11,000 (3,540-9,060),
Platelets 145,000 (165,000-415,000)A fingerstick
blood sugar was 80Urinalysis showed protein of
1 and glucose of 0.A blood culture was positive
for methicillin-resistant Staphylococcusaureus
(MRSA)
57case study continued ...
- ECG - Sinus Rhythm, 74BPM, Axis -30 degrees, ST
segment 2mm elevated andT-waves down in leads I,
L, V5 and V6Chest X-ray Left upper lobe
infiltrate, left ventricular hypertrophyThe
patients nurse reported that the patient seemed
depressed about her condition. On questioning,
the nurse found that the patient was caringfor
her elderly father and was concerned that she
would no longer be able to manage caring for
herself and him. The nurse asked the patients
physician to consider an antidepressant and a
social work consult.A medical student reviewing
the case is concerned about the risk of MRSA in
patients with pneumonia and a recent myocardial
infarction. She decides to do a literature
search.
58Step 1 identify the phrases referring to
particulars
- Jane Smith is a 50 year
old , - Native American female who presents
- to the emergency room
- with the chief complaint
- of cough and chest pain.
59Step 2 indentify to what particulars these
phrases refer
60Compare with simple clinical coding in
juxtaposition
61Compare with the output of the perfect semantic
analyser we all would dream of
Compare with the output of the NAIVE !!! semantic
analyser we all would dream of
CS3-complaining
62What it (more or less) should be
chest-pain
CS3-complaining
Has-Saying
Has-referent
CS3-chest pain
Has-Saying
coughing
Has-referent
CS3-coughing
63Most important difference
Use of generic terms
Use of concrete particulars
64Step 3 are relevant and necessary particulars
missing ?
- Referred to
- Jane Smith
- Jane Smiths age
- Jane Smiths race
- Jane Smiths gender
- Jane Smiths showing up at ...
- The specific emergency room in the health
facility - Jane Smiths primarily complaining ...
- The temporal part ... coughs
- Jane Smiths chest
- Jane Smiths particular pain
- Missing
- The health facility
- The healthcare worker she consulted
- The particular coughs (under the condition she
tells the objective truth) - The underlying disorder (under whatever state of
affairs)
65Step 4 IUI assignment
- Assumptions
- the RTS contains already
- IUI-1 Jane Smith
- Coi ltIUIa, ta, CS3, IUI-1, woman, trgt
- IUI-1.1 Ri ltIUIa, ta, depends-on, BFO,
IUI-1.1, IUI-1, trgt - Coi ltIUIa, ta, CS1, IUI-1.1, age, trgt
- IUI-1.2 Coi ltIUIa, ta, CS1, IUI-1.2,
cherokee, trgt - Ri ltIUIa, ta, depends-on, BFO, IUI-1.2,
IUI-1, trgt - IUI-1.3 Coi ltIUIa, ta, CS3, IUI-1.3, chest
pain, trgt - Ri ltIUIa, ta, is-located-in, BFO, IUI-1.3,
IUI-1, trgt - All dates in the statements are 2 years earlier
than now - What to do with
- Jane Smith
- Jane Smiths race (CS1 native American)
- Jane Smiths gender (CS1 female)
- Jane Smiths chest pain (CS3 chest pain)
- Jane Smiths age (50)
66Conclusion
- Referent tracking can solve a number of problems
in an elegant way. - Existing (or emerging) technologies can be used
for the implementation. - Old technologies (cbs) can play an interesting
role. - Big Brother feeling is to be expected but with
adequate measures easy to fight. - The proof of the pudding is in the eating
- Pilote is going to be set up
- Collaboration sought for dealing with NLU