Title: Mark Weiner, MD
1Using Clinical and Administrative Information
Systems to Support the Research EnterpriseThe
Good, the Bad and the Ugly
Institute for Translational Medicine and
Therapeutics (ITMAT) Center for Clinical
Epidemiology and Biostatistics (CCEB) Clinical
Research Computing Unit (CRCU) Office of Human
Research (OHR) University of Pennsylvania School
of Medicine Philadelphia, PA 19104.6021
2In attempting to arrive at the truth, I have
applied everywhere for information, but in
scarcely an instance have I been able to obtain
hospital records fit for any purpose of
comparison. If they could be obtained, they
would enable us to decide many other questions
besides the one alluded to. They would show the
subscribers how their money was being spent, what
good was really being done with it, or whether
the money was not doing mischief rather than good.
3Framing Questions
- Can data from clinical information systems be
used - To enable comparisons of the relative
effectiveness of competing therapies? - To evaluate risks of therapies after they reach
the market? - To inform drug development towards new
innovations where existing therapies are
ineffective or risky? - To ensure earlier studies of effectiveness are
still relevant given more recent drug
developments? - To find interesting cohorts with characteristics
that are especially relevant for genomic
analysis?
4The Database underlying our Research Enterprise
- Pennsylvania
- Integrated
- Clinical and
- Administrative
- Research
- Database
The PICARD System
5Data Sources - Billing
- IDX Professional charges for ambulatory and
inpatient activity - 200 primary care and subspecialty practice sites
- 1.5 million ambulatory visits/year (primary and
subspecialty) among 449,000 patients - 602K visits/year to primary care practices among
222K patients -
- SMS Facility charges for Admissions and ED
encounters and hospital-based ambulatory
procedures (ancillary tests, labs) - 36K admissions/year, 34K ED visits/year (HUP)
- 13.5K admissions/year, 21.4K ED visits/year
(PMC) - 25.5K admissions/year, 18K ED visits/year (PAH)
6Data Sources - Clinical
- Cerner
- Laboratory and pathology results - both inpatient
and outpatient - 75 common chemistries, hematology and serology
results August 1997 - February 1999 - Since February 1999 -- all labs with numerical
results - Since 2001 Microbiology results
7Data Volume
- 400 GB storage on an Oracle 9i server
- 1,843,922 patients (cumulative since 1997)
- 25,297,970 visits (ambulatory encounters, nursing
home visits, inpatient consultations) - 46,474,033 diagnoses assigned
- 153,097,826 labs
8What is available?
- Ambulatory Data
- Primary and Subspecialty Data - Jan 1997-Present
- Patient information
- Location, Gender, Race, Birthdate, Insurance
carrier - Scheduling Information
- When was the visit scheduled?
- Status of visit (arrived, cancelled, no show)
- Visit information
- Date, Location, Physician, Diagnoses, Procedures
with charges and reimbursements
9What is available?
- Inpatient data
- Patient information
- Admission Detail Detail data since FY2000 for
HUP, Presbyterian and Pennsylvania - Admission, DC dates, LOS, discharge disposition
- DRG, Diagnoses
- Major Procedures
- Charges for minor procedures/room/ancillary
services etc. - Medications
10Data Sources Electronic Health record
- EPIC - Ambulatory Electronic Medical Record
- In use at about 60 ambulatory care sites, 8 of
which are primary care - Patient counts
- 184,000 patients with at least 1 visit (overall)
- gt100,000 patients seen within ambulatory care
practices since 2001 - 72,900 primary care patients since Jan 2001
- 90,000 patients with at least 1 visit to any EPIC
provider in past year - 55,000 patients with at least 1 visit to EPIC
primary care practices in past year
11What is available?
- Electronic Medical Record Data
- Records patient history and physical exam as
unstructured text - Linked to SQL Server database containing discrete
components of EpicCare - Vital Signs
- Medications Ordered
- Social History (smoking, Alcohol use)
- Family History
- Problem lists
12Work in Progress
- Sunrise Clinical Manager Order entry and
nursing documentation from HUP and PMC inpatient
settings - Recent conversion to linkable systems
- Cardiology Data - Cath, EKG, Stress tests, Echo
- Pulmonary Function Tests
- Available, but need to draw discrete content from
semi-structured reports - Pathology Data
- Radiology results
- Endoscopy/bronchoscopy results
13How Accurate is PICARD?
- Accuracy truth?
- If PICARD says a patient has asthma, then the
patient has asthma - Accuracy true representation of the source
data? - If PICARD says a patient has asthma, then the
source data for the patient includes a code for
asthma or other diagnostic testing results
consistent with asthma
14How Accurate is PICARD?
- We have worked to make PICARD a true
representation of the underlying data - Physician patient communication/misunderstandings
- Busy doctors dont code/document everything
- Idiosyncrasies of the clinical setting in which
data is collected - Ambiguity inherent in the practice of medicine
- Code creep-
- Early codes before diagnosis of gallstones is
confirmed may suggest simple abdominal pain - URI/bronchitis/tracheitis/pharyngitis/sinusitis
all have similar symptoms
Adapted and expended from OMalley KJ, Cook KF,
Price MD, 14 KR, Hurdle JF, Ashton CA Measuring
Diagnoses ICD Code Accuracy Health Services
Research 2005. 401620-39.
15Idiosyncrasies in Data
- Research using PICARD must account for all of the
realities inherent in the underlying data - Absence of evidence is not evidence of absence
- Just because you dont see evidence of a disease
doesnt mean the patient doesnt have the disease - To find patients with a certain disease, you need
to consider all the ways the disease may be
represented in diagnosis codes and ancillary test
results - The first instance of a disease in the database
is not necessarily the time the disease first
appeared
16Addressing the Idiosyncrasies
- Corrections for problems may increase the
accuracy of the data, but make the analytical
data set less generalizable - Requiring an echocardiogram to definitively rule
in or rule out a diagnosis of CHF limits your
cohort to people who were sick enough to require
an echo even among patients who turn out NOT to
have CHF by echo - Finding incident cases by limiting a cohort to
people who have existed in the system for a while
without evidence for a disease of interest, and
then suddenly a code for the disease appears.
17The good
- Discrete data enables searching for reasonably
objective clinical information that can refine
coarse billing diagnoses. - Not all patients with Hypertension are equally
hypertensive - Not all patients with Hypertension are treated as
aggressively or require the same aggressive
treatment - More homogeneous cohort specification, or at
least better ability to recognize and adjust for
imbalance of clinical characteristics. - Better assessment of differences in care,
resource utilization and outcomes Clinical
trial simulation - More contextual data leads to more rational
attribution of cause and effect
18The (currently) bad
- Still a great deal of data from which it is
difficult to pull discrete concepts (EKGs,
echos, Path??) - Not all discrete concepts are as accurate as wed
like - Not all discrete concepts mean what we think they
mean! - Even if discrete concepts were extractable, it is
difficult to resolve pervasive conflicts in
concepts within clinical charts for a single
patient across different providers and notes. - How to deal with uncaptured clinical care data
from unaffiliated health care settings? - Corollary How do you know if the first instance
of a condition in the chart is truly an incident
occurrence.
19The (potentially) ugly
- Even if you were to able to derive discrete
concepts from unstructured text, you still need
to account for the idiosyncrasies of clinical
care as opposed to the research setting - Patients often seek medical care when they are
not feeling well, rather than being seen at
regular intervals per protocol - Diagnostic testing and interventions are usual
provided for a specific reason, not in a
randomized fashion - Doctors are busy and dont record every piece of
information every time. - How can we express/account for ambiguity??
- Appropriate use of this data for research
(particularly clinical trials simulation)
requires larger data sets than you think
20Proposed solutions
- More robust data integration with semantic
interoperability enabled by data standards and a
truly Universal Identifier. - This is easier said than done!
21Obstacles for which purely automated solutions
will be challenging
- Integrating more databases offers the promise of
filling in gaps in the continuum of care, - But it also increases the likelihood of finding
clinical conflicts in the data for an individual - Semantic interoperability will enable different
information technology systems to understand the
true meaning of data being sent - But have you ever seen two DOCTORS agree upon the
meaning of what they hear?
22Obstacles for which purely automated solutions
will be challenging
- Standards will enable computer systems to share a
common language to describe clinical concepts - But the precision inherent in these vocabularies
often exceeds the precision of medicine - The Universal Identifier problem for identifying
individuals across systems will be solved with
better algorithms - But then the state of the science will demand
defined links between family members!
23Issues in moving forward
- Is PICARD best characterized as
- A database?
- Does it have its own well defined data model?
- An interface?
- Can users interact with it on their own?
- Is it self populating from other information
resources? - A service?
- Do we provide facilitated, as opposed to direct
access? - How do we link our clinical practice database to
tissue and genomic databases that are supposedly
anonymous? - How do we work with departmental owners of data
to achieve comfort in sharing information
centrally?
24Where we are today
25Where wed like to be
26In Summary
- With their more comprehensive, longitudinal
contents, Clinical Practice Databases like PICARD
overcome several of the limitations of older,
mostly administrative databases used in research - Can be used to help find patients for recruitment
into clinical trials - Can provide data for clinical trials simulation
- to extend the generalizability of known clinical
trial results, - to confirm if older results are still valid.
- to provide insight into a research question if a
formal clinical trial would be prohibitively
expensive or unethical to conduct - Further work is needed to optimize semantic
interoperability among component systems.