Clinical Natural Language Processing: Part I - PowerPoint PPT Presentation

About This Presentation
Title:

Clinical Natural Language Processing: Part I

Description:

Clinical Natural Language Processing: Part I Guergana K. Savova, PhD Childrens Hospital Boston and Harvard Medical School * * * * * * The patient returns to the ... – PowerPoint PPT presentation

Number of Views:1073
Avg rating:3.0/5.0
Slides: 56
Provided by: Roch165
Category:

less

Transcript and Presenter's Notes

Title: Clinical Natural Language Processing: Part I


1
Clinical Natural Language Processing Part I
  • Guergana K. Savova, PhD
  • Childrens Hospital Boston and Harvard Medical
    School

2
Investigators (in alphabetical order)
  • Childrens Hospital Boston and HMS (site PI
    Guergana Savova)
  • MIT (site PI Peter Szolovits)
  • MITRE corporation (site PI Lynette Hirschman)
  • Seattle Group Health (site PI David Carrell)
  • SUNY Albany (site PI Ozlem Uzuner)
  • University of California, San Diego (site PI
    Wendy Chapman
  • University of Colorado (site PI Martha Palmer)
  • University of Pittsburg (site PI Henk Harkema)
  • University of Utah and Intermountain Healthcare
    (site PI Peter Haug)

3
Special Acknowledgement
  • Our talented super software developers
  • Vinod Kaggal, lead
  • Dingcheng Li
  • Pei Chen
  • James Masanz

4
Overview
  • Part 1
  • Background and objectives of SHARP 4 cNLP project
  • Year 1 achievements
  • Clinical Text Analysis and Knowledge Extraction
    System (cTAKES)
  • Year 2 proposed projects
  • Graphical User Interface to cTAKES demo
  • Part 2
  • cTAKES demo

5
Aims
  • Information extraction (IE) transformation of
    unstructured text into structured representations
    and merging clinical data extracted from free
    text with structured data
  • Entity and Event discovery
  • Relation discovery
  • Normalization template Clinical Element Model
    (CEM)
  • Overarching goal
  • high-throughput phenotype extraction from
    clinical free text based on standards and the
    principles of interoperability
  • general purpose clinical NLP tool with
    applications to the majority of all imaginable
    use cases

6
Processing Clinical Notes
A 43-year-old woman was diagnosed with type 2
diabetes mellitus by her family physician 3
months before this presentation. Her initial
blood glucose was 340 mg/dL. Glyburide 2.5 mg
once daily was prescribed. Since then,
self-monitoring of blood glucose (SMBG) showed
blood glucose levels of 250-270 mg/dL. She was
referred to an endocrinologist for further
evaluation. On examination, she was normotensive
and not acutely ill. Her body mass index (BMI)
was 18.7 kg/m2 following a recent 10 lb weight
loss. Her thyroid was symmetrically enlarged and
ankle reflexes absent. Her blood glucose was 272
mg/dL, and her hemoglobin A1c (HbA1c) was 10.3.
A lipid profile showed a total cholesterol of 261
mg/dL, triglyceride level of 321 mg/dL, HDL level
of 48 mg/dL, and an LDL of 150 mg/dL. Thyroid
function was normal. Urinanalysis showed trace
ketones. She adhered to a regular exercise
program and vitamin regimen, smoked 2 packs of
cigarettes daily for the past 25 years, and
limited her alcohol intake to 1 drink daily. Her
mother's brother was diabetic.
A 43-year-old woman was diagnosed with type 2
diabetes mellitus by her family physician 3
months before this presentation. Her initial
blood glucose was 340 mg/dL. Glyburide 2.5 mg
once daily was prescribed. Since then,
self-monitoring of blood glucose (SMBG) showed
blood glucose levels of 250-270 mg/dL. She was
referred to an endocrinologist for further
evaluation. On examination, she was normotensive
and not acutely ill. Her body mass index (BMI)
was 18.7 kg/m2 following a recent 10 lb weight
loss. Her thyroid was symmetrically enlarged and
ankle reflexes absent. Her blood glucose was 272
mg/dL, and her hemoglobin A1c (HbA1c) was 10.3.
A lipid profile showed a total cholesterol of 261
mg/dL, triglyceride level of 321 mg/dL, HDL level
of 48 mg/dL, and an LDL of 150 mg/dL. Thyroid
function was normal. Urinanalysis showed trace
ketones. She adhered to a regular exercise
program and vitamin regimen, smoked 2 packs of
cigarettes daily for the past 25 years, and
limited her alcohol intake to 1 drink daily. Her
mother's brother was diabetic.
A 43-year-old woman was diagnosed with type 2
diabetes mellitus by her family physician 3
months before this presentation. Her initial
blood glucose was 340 mg/dL. Glyburide
A 43-year-old woman was diagnosed with type 2
diabetes mellitus by her family physician 3
mpresentation. Her initial blood glucose was 340
mg/dL. Glyburide
A 43-year-old woman was diagnosed with type 2
diabetes mellitus by her family physician 3
months before this presentation. Her initial
blood glucose was 340 mg/dL. Glyburide
7
Clinical Element Model
Disorder CEM text diabetes mellitus code
73211009 subject patient relative temporal
context 3 months ago negation indicator not
negated
A 43-year-old woman was diagnosed with type 2
diabetes mellitus by her family physician 3
months before this presentation. Her initial
blood glucose was 340 mg/dL. Glyburide 2.5 mg
once daily was prescribed. Since then,
self-monitoring of blood glucose (SMBG) showed
blood glucose levels of 250-270 mg/dL. She was
referred to an endocrinologist for further
evaluation. On examination, she was normotensive
and not acutely ill. Her body mass index (BMI)
was 18.7 kg/m2 following a recent 10 lb weight
loss. Her thyroid was symmetrically enlarged and
ankle reflexes absent. Her blood glucose was 272
mg/dL, and her hemoglobin A1c (HbA1c) was 10.3.
A lipid profile showed a total cholesterol of 261
mg/dL, triglyceride level of 321 mg/dL, HDL level
of 48 mg/dL, and an LDL of 150 mg/dL. Thyroid
function was normal. Urinanalysis showed trace
ketones. She adhered to a regular exercise
program and vitamin regimen, smoked 2 packs of
cigarettes daily for the past 25 years, and
limited her alcohol intake to 1 drink daily. Her
mother's brother was diabetic.
A 43-year-old woman was diagnosed with type 2
diabetes mellitus by her family physician 3
months before this presentation. Her initial
blood glucose was 340 mg/dL. Glyburide 2.5 mg
once daily was prescribed. Since then,
self-monitoring of blood glucose (SMBG) showed
blood glucose levels of 250-270 mg/dL. She was
referred to an endocrinologist for further
evaluation. On examination, she was normotensive
and not acutely ill. Her body mass index (BMI)
was 18.7 kg/m2 following a recent 10 lb weight
loss. Her thyroid was symmetrically enlarged and
ankle reflexes absent. Her blood glucose was 272
mg/dL, and her hemoglobin A1c (HbA1c) was 10.3.
A lipid profile showed a total cholesterol of 261
mg/dL, triglyceride level of 321 mg/dL, HDL level
of 48 mg/dL, and an LDL of 150 mg/dL. Thyroid
function was normal. Urinanalysis showed trace
ketones. She adhered to a regular exercise
program and vitamin regimen, smoked 2 packs of
cigarettes daily for the past 25 years, and
limited her alcohol intake to 1 drink daily. Her
mother's brother was diabetic.
A 43-year-old woman was diagnosed with type 2
diabetes mellitus by her family physician 3
months before this presentation. Her initial
blood glucose was 340 mg/dL. Glyburide 2.5 mg
once daily was prescribed. Since then,
self-monitoring of blood glucose (SMBG) showed
blood glucose levels of 250-270 mg/dL. She was
referred to an endocrinologist for further
evaluation. On examination, she was normotensive
and not acutely ill. Her body mass index (BMI)
was 18.7 kg/m2 following a recent 10 lb weight
loss. Her thyroid was symmetrically enlarged and
ankle reflexes absent. Her blood glucose was 272
mg/dL, and her hemoglobin A1c (HbA1c) was 10.3.
A lipid profile showed a total cholesterol of 261
mg/dL, triglyceride level of 321 mg/dL, HDL level
of 48 mg/dL, and an LDL of 150 mg/dL. Thyroid
function was normal. Urinanalysis showed trace
ketones. She adhered to a regular exercise
program and vitamin regimen, smoked 2 packs of
cigarettes daily for the past 25 years, and
limited her alcohol intake to 1 drink daily. Her
mother's brother was diabetic.
A 43-year-old woman was diagnosed with type 2
diabetes mellitus by her family physician 3
months before this presentation. Her initial
blood glucose was 340 mg/dL. Glyburide 2.5 mg
once daily was prescribed. Since then,
self-monitoring of blood glucose (SMBG) showed
blood glucose levels of 250-270 mg/dL. She was
referred to an endocrinologist for further
evaluation. On examination, she was normotensive
and not acutely ill. Her body mass index (BMI)
was 18.7 kg/m2 following a recent 10 lb weight
loss. Her thyroid was symmetrically enlarged and
ankle reflexes absent. Her blood glucose was 272
mg/dL, and her hemoglobin A1c (HbA1c) was 10.3.
A lipid profile showed a total cholesterol of 261
mg/dL, triglyceride level of 321 mg/dL, HDL level
of 48 mg/dL, and an LDL of 150 mg/dL. Thyroid
function was normal. Urinanalysis showed trace
ketones. She adhered to a regular exercise
program and vitamin regimen, smoked 2 packs of
cigarettes daily for the past 25 years, and
limited her alcohol intake to 1 drink daily. Her
mother's brother was diabetic.
Medication CEM text Glyburide code
315989 subject patient frequency once
daily negation indicator not negated
strength 2.5 mg
Tobacco Use CEM text smoking code
365981007 subject patient relative temporal
context 25 years negation indicator not
negated
Disorder CEM text diabetes mellitus code
73211009 subject family member relative
temporal context negation indicator not
negated
8
Comparative Effectiveness
Disorder CEM text diabetes mellitus code
73211009 subject patient relative temporal
context 3 months ago negation indicator not
negated
Compare the effectiveness of different treatment
strategies (e.g., modifying target levels for
glucose, lipid, or blood pressure) in reducing
cardiovascular complications in newly diagnosed
adolescents and adults with type 2 diabetes.
Compare the effectiveness of traditional
behavioral interventions versus economic
incentives in motivating behavior changes (e.g.,
weight loss, smoking cessation, avoiding alcohol
and substance abuse) in children and adults.
Medication CEM text Glyburide code
315989 subject patient frequency once
daily negation indicator not negated
strength 2.5 mg
Tobacco Use CEM text smoking code
365981007 subject patient relative temporal
context 25 years negation indicator not
negated
Disorder CEM text diabetes mellitus code
73211009 subject family member relative
temporal context negation indicator not
negated
9
Meaningful Use
Disorder CEM text diabetes mellitus code
73211009 subject patient relative temporal
context 3 months ago negation indicator not
negated
  • Maintain problem list
  • Maintain active med list
  • Record smoking status
  • Provide clinical summaries for each office visit
  • Generate patient lists for specific conditions
  • Submit syndromic surveillance data

Medication CEM text Glyburide code
315989 subject patient frequency once
daily negation indicator not negated
strength 2.5 mg
Tobacco Use CEM text smoking code
365981007 subject patient relative temporal
context 25 years negation indicator not
negated
Disorder CEM text diabetes mellitus code
73211009 subject family member relative
temporal context negation indicator not
negated
10
Clinical Practice
Disorder CEM text diabetes mellitus code
73211009 subject patient relative temporal
context 3 months ago negation indicator not
negated
  • Provide problem list and meds from the visit

Medication CEM text Glyburide code
315989 subject patient frequency once
daily negation indicator not negated
strength 2.5 mg
11
Applications
  • Meaningful use of the EMR
  • Comparative effectiveness
  • Clinical investigation
  • Patient cohort identification
  • Phenotype extraction
  • Epidemiology
  • Clinical practice
  • ..

12
How does NLP fit?
  • Demo pipeline, v1
  • All medications in Mayo dataset extracted with
    cTAKES (NLP method)
  • Processed 360,452 notes for 10,000 patients
  • 3,442,000 CEMs were created
  • Processing time 1.6 sec/doc

13
Year 1
14
Y1 Technical and Scientific Activities
  • Gold standard corpus development
  • corpus creation methodology
  • de-id and PHI surrogate generation tools
  • seed corpus generation (PAD, pneumonia, breast
    cancer)
  • annotation schema development based on CEM
    normalization target
  • annotation guidelines and pilot annotations
  • gold standard annotations are in progress
  • Type System for software development
  • Development of Evaluation workbench
  • Methods development
  • entity and event discovery
  • relation discovery

15
Y1 Software Deliverables(cTAKES modules)
2010
2011
JUL
AUG
SEP
OCT
NOV
DEC
JAN
FEB
MAR
APR
MAY
JUN
16
SHARP Security Roundtable for Cloud-Deployed cNLP
  • May 23-24, 2011
  • Participants SHARP 1, SHARP 4, health care
    organizations, the Veterans Administration,
    industry, and other research institutions
  • Providing guidance to institutions seeking to use
    cloud technologies to support development and
    application ofcNLP tools
  • A set of recommendations for the novel legal and
    governance issues regarding the proper
    stewardship and use of clinical data

17
SHARP Collaborations
  • SHARP 1
  • Around security in a cloud computing environment
  • SHARP 3 (SMaRT)
  • Around extraction of data from the clinical
    narrative
  • I2b2 database for data persistence?

18
Partnerships
  • NCBC-funded initiatives
  • Integrating Informatics and Biology to the
    Bedside (i2b2)
  • Integrating Data for Analysis, Anonymization and
    Sharing (iDASH)
  • Ontology Development and Information Extraction
    (ODIE)
  • Veterans Administration
  • R01s
  • Shared annotated lexical resource
  • Temporal relation discovery for the clinical
    domain
  • Milti-source integrated platform for answering
    clinical questions
  • University of York (UK), University of Trento
    (Italy), Brandeis University (USA)
  • eMERGE, PGRN (Pharmacogenomics Research Network)

19
clinical Text Analysis and Knowledge Extraction
System (cTAKES)
20
(No Transcript)
21
Overview
  • Goal
  • Phenotype extraction
  • Generic to be used for a variety of retrievals
    and use cases
  • Expandable at the information model level and
    methods
  • Modular
  • Cutting edge technologies best methods
    combining existing practices and novel research
    with rapid technology transfer
  • Terminology agnostic able to plug in any
    terminology
  • Best software practices (80M notes)
  • Stand-alone tool easily pluggable within other
    platforms/toolsets
  • Apache v2.0 license
  • http//sourceforge.net/projects/ohnlp/
  • Commitment to both R and D in RD

22
cTAKES Adoption
  • May, 2011
  • 2306 downloads
  • i2b2 NLP cell integration relevance to CTSAs
  • eMERGE (SGH, NW)
  • PGRN (HMS, NW)
  • Extensions Yale (YTEX), MITRE

Source http//sourceforge.net/project/stats/?gr
oup_id255545ugnohnlptypemodealltime
23
cTAKES Technical Details
  • Open source
  • Apache v2.0 license
  • http//sourceforge.net/projects/ohnlp/
  • Java 1.5
  • Framework
  • IBMs Unstructured Information Management
    Architecture (UIMA) open source framework, Apache
    project
  • Methods
  • Natural Language Processing methods (NLP)
  • Based on standards and conventions to foster
    interoperability
  • Application
  • High-throughput system

24
cTAKES Components
  • Sentence boundary detection (OpenNLP technology)
  • Tokenization (rule-based)
  • Morphologic normalization (NLMs LVG)
  • POS tagging (OpenNLP technology)
  • Shallow parsing (OpenNLP technology)
  • Named Entity Recognition
  • Dictionary mapping (lookup algorithm)
  • Machine learning (MAWUI)
  • types diseases/disorders, signs/symptoms,
    anatomical sites, procedures, medications
  • Negation and context identification (NegEx)
  • Dependency parser
  • Drug Profile module
  • Smoking status classifier
  • CEM normalization module

25
Output Example Drug Object
  • Tamoxifen 20 mg po daily started on March 1,
    2005.
  • Drug
  • Text Tamoxifen
  • Associated code C0351245
  • Strength 20 mg
  • Start date March 1, 2005
  • End date null
  • Dosage 1.0
  • Frequency 1.0
  • Frequency unit daily
  • Duration null
  • Route Enteral Oral
  • Form null
  • Status current
  • Change Status no change
  • Certainty null

26
Conversion to CEMs
27
Year 2 and Forward
28
The patient returns to the outpatient clinic
today for follow-up
FUTURE
Today Oct 28, 2009
patient
return
clinic
Agent
Loc
the patient will complete his thiotepa dose today
, and he will return tomorrow for the last dose
of his thiotepa . His donor completed stem-cell
collection yesterday
Courtesy of Martha Palmer
29
The patient returns to the outpatient clinic
today for follow-up the patient will complete his
thiotepa dose today
FUTURE
Today Oct 28, 2009
patient
return
clinic
Thiotepa dose
complete
Agent
Agent
Loc
Theme
, and he will return tomorrow for the last dose
of his thiotepa . His donor completed stem-cell
collection yesterday
Courtesy of Martha Palmer
30
The patient returns to the outpatient clinic
today for follow-up the patient will complete his
thiotepa dose today , and he will return tomorrow
for the last dose of his thiotepa .
FUTURE
Today Oct 28, 2009
patient
return
clinic
Thiotepa dose
complete
Agent
Agent
Loc
Theme
Agent
Thiotepa last dose
Purpose
return
Tomorrow Oct 29, 2009
His donor completed stem-cell collection yesterday
Courtesy of Martha Palmer
31
Agent
Action
donor
completed
stem-cell collection
Coreference patients donor
Yesterday Oct 27, 2009
FUTURE
Today Oct 28, 2009
patient
return
clinic
Thiotepa dose
complete
Agent
Agent
Loc
Theme
Agent
Thiotepa last dose
Purpose
return
Tomorrow Oct 29, 2009
The patient returns to the outpatient clinic
today for follow-up the patient will complete his
thiotepa dose today , and he will return
tomorrow for the last dose of his thiotepa . His
donor completed stem-cell collection yesterday
Courtesy of Martha Palmer
32
Agent
Action
donor
completed
stem-cell collection
TERMINATES
OVERLAP
Coreference patients donor
PAST
Yesterday Oct 27, 2009
FUTURE
Today Oct 28, 2009
OVERLAP
OVERLAP
patient
return
clinic
Thiotepa dose
complete
Agent
Agent
Loc
Theme
Agent
OVERLAP
Thiotepa last dose
Purpose
return
Tomorrow Oct 29, 2009
The patient returns to the outpatient clinic
today for follow-up the patient will complete his
thiotepa dose today , and he will return
tomorrow for the last dose of his thiotepa . His
donor completed stem-cell collection yesterday
Courtesy of Martha Palmer
33
Oct 28, 2009 Patient return to clinic, thiotepa
dose
Oct 29, 2009 Final thiotepa dose
Oct 27, 2009 Donor stem-cell collection completed
FUTURE
PAST
The patient returns to the outpatient clinic
today for follow-up the patient will complete his
thiotepa dose today , and he will return
tomorrow for the last dose of his thiotepa
. His donor completed stem-cell collection
yesterday  
Courtesy of Martha Palmer
34
Y2 Proposed Deliverables
  • Release of a library of de-identification tools
    (Sept, 2011)
  • MIST
  • MIT/SUNY
  • Evaluation workbench (Sept, 2011)
  • cTAKES Side Effects module (Aug, 2011)
  • Modules for relation extraction (Dec, 2011)
  • Semantic role labeler
  • Relation classifier
  • Integration of CLEAR-TK (University of Colorado)
  • End-to-end tool, v2 (cTAKES v2) (April, 2012)
  • NLP to populate CEMs for Diseases, Sign/Symptoms,
    Procedures, Labs, Anatomical sites
  • Integration of LexGrid/LexEVS services

35
Development Challenges and Opportunities
  • Open source strategy
  • Release early release often
  • Test driven development with continuous
    integration
  • All milestones measured by what we can get IRB
    and DUA approved and deployed with real or
    de-identified clinical data

36
Courtesy of David Carrell
37
Partnerships
  • Strengthen existing SHARP collaborations
  • Initiate collaborations with SHARP 2 around
    usability
  • SHARP 1 methods for data security in a cloud
    deployed framework
  • I2b2 the glue between SHARP 3 and SHARP 4
  • Non-SHARP collaborations

38
Graphical User Interface (GUI) to cTAKES a
Prototype
Pei Chen Childrens Hospital Boston
39
cTAKES as a Service
  • Objectives
  • Demo cTAKES prototype web application
  • Empower End Users to leverage cTAKES
  • Gather feedback for future cTAKES GUI
  • Potential system integrations with other
    applications (i.e. i2b2, ARC, Web Annotator)
  • Developed within i2b2 to integrate cTAKES in the
    i2b2 NLP cell

40
Live Demo
  • cTAKES Web Application

http//chipweb2.chip.org/cTakes_webservice_trunk/i
ndex.html
41
Single clinical note
42
(No Transcript)
43
(No Transcript)
44
(No Transcript)
45
(No Transcript)
46
(No Transcript)
47
(No Transcript)
48
(No Transcript)
49
(No Transcript)
50
(No Transcript)
51
(No Transcript)
52
(No Transcript)
53
Technologies
  • Middleware
  • Web Services
  • JAVA
  • Apache CXF
  • JSON
  • Front-End
  • Web GUI
  • ExtJS
  • JavaScript
  • Back-End
  • cTAKES
  • JAVA
  • UIMA

54
Deployment Considerations
  • Deployment Model
  • Security
  • Performance
  • Licensing (UMLS, Apache, GPL v.3)

55
Thoughts?
Write a Comment
User Comments (0)
About PowerShow.com