Christopher Rhodes - PowerPoint PPT Presentation

About This Presentation
Title:

Christopher Rhodes

Description:

Documents need to be edited for confidentiality reasons which takes time and money ... Reverted to original program's algorithm for finding the score ... – PowerPoint PPT presentation

Number of Views:15
Avg rating:3.0/5.0
Slides: 10
Provided by: cms3
Learn more at: http://cms.uhd.edu
Category:

less

Transcript and Presenter's Notes

Title: Christopher Rhodes


1
Clinical Free Processing
  • Christopher Rhodes
  • Uhd-reu
  • 17 July 2009

2
Clinical Free Text
  • Primary data about patients
  • As opposed to journal articles
  • Problems posed to Natural Language Processing
  • Documents need to be edited for confidentiality
    reasons which takes time and money
  • These texts do not follow strictly-edited format,
    which means the texts could contain various
    sub-language characterisitics (fragmented
    sentences, abbreviations, doctor-dependant notes,
    etc)

3
ICD-9-CM
  • "The International Classification of Diseases,
    9th Revision, Clinical Modification" (ICD-9-CM)

ICD-9-CM Examples ICD-9-CM Examples
Astham 493.xx
Diabetes 250.xx
Hyperlipidemia 272.0 - 272.4
Arthritis 714.0 715.9
Hypertension 401.1 401.9
Ischemic heart disease 410.0 414.9x
Depression/dysthymia 296.2, 296.3, 296.82, 296.9, 300.4, 309.0, 309.1, 311
x refers to any possible number in that subset
4
The Research
  • Automated System for assigning ICD-9-CM codes
    using Natural Language Processing
  • The types of files Radiology Reports
  • Contains Majority ICD-9-CM code
  • Contain Clinical History and Impressions
  • Set of Training and Testing Data

5
The Research
  • Testing File -gt Parsed Testing File (miniPar)

6
Where We Are
  • Beginning State
  • Complete program ran with 50.9 accuracy
  • Merged Training files to specific codes
  • Possible advances that didnt work
  • Changing from total summed score from all
    training sentences in a training document to the
    highest individual sentence score of all the
    training sentences (i.e. Ideally the best
    sentence match)
  • Manipulating the way the score/weight is
    calculated for the above method
  • Current state
  • Reverted to original programs algorithm for
    finding the score
  • Normalized the merged training files (Score /
    total sentences)
  • Complete program now ran with 60.5 accuracy

7
Future Hopes
  • Word Importance
  • Medical words should receive higher priority
    (score) than non-medical words that are matched
    between sentences. We will use the UMLS medical
    database for active word searching and comparing.
  • Negation
  • Search for words like no, hardly, none,
    doesnt, not and accurately deals with
    certain Training documents
  • For instance No pneumonia should not match to a
    training document with the code for pneumonia,
    but it will if negation is not taken into
    consideration

8
Goal
  • Our goal when adding the previous two attributes
    to our program is to have about 80 accuracy or
    more.
  • The top accuracy for this type of program was
    submitted by Szeged at 89.08 accuracy.

9
References
  • A Shared Task Involving Multi-label
    Classification of Clinical Free Text - John P.
    Pestian, Christopher Brew, Pawe Matykiewicz, DJ
    Hovermale, Neil Johnson, K. Bretonnel Cohen, W
    lodzis law Duch4
Write a Comment
User Comments (0)
About PowerShow.com