Open Source in Healthcare and Public Health Track - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

Open Source in Healthcare and Public Health Track

Description:

Open Source in Healthcare and Public Health Track. The Geo. ... 5. Dr. truelove's diagnosis is both incorrect and incompetent. ... – PowerPoint PPT presentation

Number of Views:94
Avg rating:3.0/5.0
Slides: 24
Provided by: ISCS7
Category:

less

Transcript and Presenter's Notes

Title: Open Source in Healthcare and Public Health Track


1
Open Source in Healthcare and Public Health
Track The Geo. Washington Univ. Open Source
Conference Open Source Confidentiality
Methods 400 P.M. March 18, 2003 Jules J.
Berman, Ph.D., M.D. Program Director, Pathology
Informatics Cancer Diagnosis Program, DCTD, NCI,
NIH email bermanj_at_mail.nih.gov voice
301-496-7147
2
Medical Informatics, as I see it 1.
Acquisition of Data - 49 of my time 2.
Organization of Data - 49 of my time 3.
Analysis of Data - 2 of time
3
1. Acquisition of Data - Getting people to
share, and working within HIPAA and Common Rule
Guidelines, meetings 2. Organization of Data -
Standards, XML, meta-data, self-describing
architectures, more meetings, technical standards
committee of API, Tissue Microarray Data Exchange
Standard 3. Analysis of Data - ??? - almost
irrelevant at this time. People think its ok to
publish without supporting data.
4
UFO Abductees Lots of them They often say about
the same thing (independent confirmations) All
walks of life Generally honest Minority are a
little crazy One problem no evidence
5
Researchers who dont publish their primary
data Lots of them They often say about the same
thing (independent confirmations) All walks of
life Generally honest Minority are a little
crazy One problem no evidence
6
Data Sharing NIH Statement on Data
Sharing http//grants.nih.gov/grants/guide/notice-
files/NOT-OD-03-032.html National Research
Council Statement http//books.nap.edu/books/03090
88593/html/R1.html Comment Letter on NIH Data
Sharing Proposal http//www.aamc.org/advocacy/libr
ary/research/corres/2002/051102.htm
7
So whats stopping us from making incredibly
large and useful medical databases? Human
nature Researcher insecurities Lack of perceived
incentives Non-existence of organized data Human
Subject Protection issues We need ways of
de-identifying medical data
8
Two U.S. regulations that tell us how we can use
medical records in research Common Rule HIPAA
Privacy Both work on the principle that medical
research is good, and it can be conducted without
getting patient consent if you can come up with a
way to avoid harming patients (no harm, no
consent for harm). Typically, this is done by
de-identifying records
9
Legal Importance of de-identification research 1.
Scientific field created in HIPAA HIPAA asks the
community to come up with de-identification
standards 2. Civil Rights Office will not be
looking for misinterpretation. Will probably
only respond to complaints. No pre-screening of
methodology by Civil Rights Office. 3. Published
Research Methodology sure to weigh-in if lawsuit
every occur To a certain extent, whats
de-identified is what scientists promote and
accept in published articles (Daubert - 1993)
10
1. One-way hash method to be described (currently
deprecated under HIPAA) Open Source techniques
Ive been publishing 2. Concept-Match Medical
Data Scrubbing (In press, Archives of
Pathology) 3. Threshold Method (published, BMC
Methods) 4. Zero-Check, A Zero-Knowledge
HIPAA-compliant Protocol for Reconciling Patient
Identities Across Institutions (answer to HIPAA
attack on one-way hash methodology)
11
One-Way Hash Method for de-identifying Allows
you to get follow-up data on de-identified
patients A one-way hash algorithm computes a
fixed length string from a character string. It
is impossible to determine the original character
string by looking at the hash value. The
algorithm always gives the same hash value for
any given string. Therefore it is typically use
as an authenticator for secret messages. Joe
Smith replaced by one-way hash ekso583a2ldg
12
One-Way Hash Method for de-identifying Allows
you to get follow-up data on de-identified
patients Joe Smith replaced by one-way hash
ekso583a2ldg Joe Smith comes back a year later
and his new record is de-identified with one-way
has string ekso583a2ldg The two de-identified
records are merged under the common one-way hash
string, ekso583a2ldg HIPAA restricts the use of
one-way hash de-identificaton protocol
13
Concept-Match algorithm for scrubbing text 1.
Parse all input into sentences. 2. Parse each
sentence, into words. 3. Each "stop word" (high
frequency word) is preserved. 4. Intervening
words and phrases are mapped to a standard
nomenclature. 5. Each coded term is replaced by
an alternate term that maps to the same code. 6.
All other words are replaced by blocking symbol
(consisting of three asterisks).
14
Examples from Hopkins Pathology Phrase
list Diagnosis of severe dysplasia gt
(DiagnosiC0348026) of (severe dysplasiaC0334048)
Diagnosis of sickle gt (DiagnosiC0348026) of
Diagnosis of sickle cell anemia gt
(DiagnosiC0348026) of (herrick anemiaC0002895)
Diagnosis of simple hyperplasia gt
(DiagnosiC0348026) of (simpleC0205352)
(hypercellularityC0020507) Diagnosis of
sjogren gt (DiagnosiC0348026) of (sjogren
diseaseC0037230)
15
1. Dr. Atkinson killed his patient today. gt
(patientC0030705) (todayC0750526)
2. Is this malpractice? gt Is this 3.
Senator garfield was admitted today into the
psychiatric unit. gt was
(todayC0750526) into the (psychiatric
behavioralC0205487) (unitC0439148). 4. Snetor
garfield was admitted today into the psyciatric
unit. gt was (todayC0750526) into
the (unitC0439148) 5. Dr. truelove's
diagnosis is both incorrect and incompetent. gt
(diagnosiC0348026) is both and
6. The patient's social security number is
523845 gt The is
16
Threshold algorithm A familiar plot device.
17
they suggested that the manifestations were as
severe in the mother as in the sons and that this
suggested autosomal dominant inheritance. Bobs
Piece 1. 684327ec3b2f020aa3099edb177d3794 gt
suggested autosomal dominant inheritance 3c188dace
2e7977fd6333e4d8010e181 gt mother 8c81b4aaf9c20096
66d532da3b19d5f8 gt manifestations db277da2e82a4cb
7e9b37c8b0c7f66f0 gt suggested e183376eb9cc9a30195
2c05b5e4e84e3 gt sons 22cf107be97ab08b33a62db68b4a
390d gt severe Bobs Piece 2. they
db277da2e82a4cb7e9b37c8b0c7f66f0 that
the 8c81b4aaf9c2009666d532da3b19d5f8 were
as 22cf107be97ab08b33a62db68b4a390d in
the 3c188dace2e7977fd6333e4d8010e181 as in
the e183376eb9cc9a301952c05b5e4e84e3 and that
this 684327ec3b2f020aa3099edb177d3794.
18
Piece 1 (the listing of phrases and their
one-hashes) 1. Contains no information on the
frequency of occurrence of the phrases found in
the original text (because recurring phrases map
to the same hash code and appear as a single
entry in Piece 1). 2. Contains no information
that Alice can use to connect any patient to any
particular patient record. Records do not exist
as entities in Piece 1. 3. Contains no
information on the order or locations of the
phrases found in the original text. 4. Contains
all the concepts found in the original text.
Stop words are a popular method of parsing text
into concepts . 5. Alice can transfer Piece 1 to
a third party without violating HIPAA privacy
rules or Common Rule human subject regulations
(in the U.S.). For that matter, Alice can keep
Piece 1 and add it to her database of Piece 1
files collected from all of her clients.
19
Properties of Piece 2 1. Contains no
information that can be used to connect any
patient to any particular patient record. 2.
Contains nothing but hash values of phrases and
stop words, in their correct order of occurrence
in the original text. 3. Anyone obtaining Piece
1 and Piece 2 can reconstruct the original
text. 4. The original text can be reconstructed
from Piece 2, and any file into which Piece 1 has
been merged. There is no necessity to preserve
Piece 1 in its original form.
20
How the Threshold Algorithm works Bob gives
Piece 1 to Alice. Alice uses her software to
transform or annotate each phrase from Piece 1.
Alice sends the transformed Piece 1 to Bob, who
uses his copy of Piece 2 to reconstruct the
original file, now annotated with Alices
information.
21
Articles Popular one-way hash de-identification
protocols reviewed in Berman JJ. Confidentiality
for Medical Data Miners. Artificial Intelligence
in Medicine. 26(1-2)25-36, 2002.
http//65.222.228.150 /jjb/jb_aim.pdf Berman JJ.
Concept-Match Medical Data Scrubbing How
pathology datasets can be used in research. In
press, Arch Pathol Lab Med (probably May or June,
2003) Berman JJ. Threshold protocol for the
exchange of confidential medical data. BMC
Medical Research Methodology, 2002, 212.
http//www.biomedcentral.com/bmcmedresmethodol/
22
Software (no warranties) 1. www.cpan.org
one-way hash algorithms (MD5 and SHA) in Perl 2.
www.nlm.nih.gov/research/umls/ Download the
Unified Medical Language System 3.
http//65.222.228.150/jjb/goodcui.pl Perl
extractor script to produce an unencumbered
subset of UMLS 4. http// 65.222.228.150
/jjb/parse.tar.gz Perl sentence parsing,
autocoding and Concept-Match class packages 5.
http//65.222.228.150 /jjb/thresh.tar.gz Gzipped
Perl scripts for threshold algorithm Users
should probably read the articles and have
working knowledge of Perl
23
end
Write a Comment
User Comments (0)
About PowerShow.com