caTIES Processing Text to Share Data and Tissue - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

caTIES Processing Text to Share Data and Tissue

Description:

caTIES Processing Text to Share Data and Tissue – PowerPoint PPT presentation

Number of Views:141
Avg rating:3.0/5.0
Slides: 31
Provided by: mitch
Category:

less

Transcript and Presenter's Notes

Title: caTIES Processing Text to Share Data and Tissue


1
caTIES Processing Text to Share Data and Tissue
  • Rebecca Crowley, Kevin Mitchell, and Wendy
    Chapman
  • Centers for Oncology and Pathology Informatics
  • Center for Biomedical Informatics
  • crowleyrs_at_upmc.edu
  • June 4, 2006

2
caTIES project goals
  • Support research (clinical, translational, basic)
    that uses human tissue or tissue related data by
    providing de-identified, concept-based access to
    surgical pathology report data associated with
    tissue slides and blocks.
  • caTIES v2.0 now in place at four institutions
    University of Pittsburgh, University of
    Pennsylvania, Thomas Jefferson University,
    University of Washington St. Louis

3
Potential Uses
  • FINAL DIAGNOSIS
  • PART 1 LYMPH NODES, LEFT PELVIC,
    LYMPHADENECTOMY
  • SIX BENIGN LYMPH NODES WITH NO
    EVIDENCE OF MALIGNANCY.
  • PART 2 LYMPH NODES, RIGHT PELVIC,
    LYMPHADENECTOMY
  • TWO BENIGN LYMPH NODES WITH NO
    EVIDENCE OF MALIGNANCY.
  • PART 3 PROSTATE AND SEMINAL VESICLES,
    RADICAL PROSTATECTOMY
  • INVASIVE MODERATELY DIFFERENTIATED PROSTATIC
    ADENOCARCINOMA WITH A COMBINED GLEASON SCORE OF
    336.
  • PATHOLOGIC STAGE T2bN0MX
  •  
  • Case finding for paraffin materials
  • Automated annotation for caTISSUE CAE
  • Text-mining
  • Automation of clinical trials eligibility
    screening
  • Cancer Surveillance

4
The tissue bank is.
  • Small (hundreds to thousands of specimens)
  • Annotated with human abstraction manual data
    entry
  • New (less than 10 years old)
  • Focused
  • Mainly Frozen Tissue
  • Storage QA/QC
  • RNA studies

5
The paraffin (clinical) archive is
  • Large (Millions of Specimens)
  • Directly Documented by Pathology
  • Old (Decades)
  • Ubiquitous General
  • DNA and Protein Studies
  • TMA
  • Rare and Discontinued Tumors
  • Tissue Across Generations
  • Fairly Intact

6
Problems to solve
  • Reduce unnecessary barriers and unnecessary work
  • Make entire corpus de-identified and searchable
  • Improve retrieval
  • Many ways to say the same thing (synonymy)
  • Pertinent negatives (negation)
  • Discourse inferences
  • Information Extraction
  • Temporality and aggregation
  • Create network of institutions sharing data
    building collaborations
  • Interoperable with other systems

7
Long Term Project Goals
  • Develop software with value to individual cancer
    centers as well as caBIG matrixed researchers
  • Use caTIES as a platform for research - improve
    our ability to extract meaningful information
    from free-text pathology reports

8
Current caTIES functionality
  • 1. Secure communication between participating
    organizations coordinated by the Open Grid
    Service Architecture (OGSA) Services for data
    transfer and de-identification
  • 2. Concept-coding of surgical pathology reports
    using a GATE-based NLP pipeline
  • 3. Databases for private and sharable information
    accessible via OGSA-Data Access and Integration
    services. OGSA-DAI is an extension of OGSA.
  • 4. A graphical user interface for concept-based
    querying across the network, role-based
    administration of access, and protocol-based
    ordering of tissue samples
  • 5. A set of semantic annotations and services
    based on caCORE and caGRID which foster semantic
    interoperability between caTIES and other caBIG
    applications.

9
Problems to solve
  • Reduce unnecessary barriers and unnecessary work
  • Make entire corpus de-identified and searchable
  • Improve retrieval
  • Many ways to say the same thing (synonymy)
  • Pertinent negatives (negation)
  • Discourse inferences
  • Information Extraction
  • Temporality and aggregation
  • Create network of institutions sharing data
    building collaborations
  • Interoperable with other systems

10
Before HIPAA
11
After HIPAA
12
What we hope will happen with caTIES
13
Problems to solve
  • Reduce unnecessary barriers and unnecessary work
  • Make entire corpus de-identified and searchable
  • Improve retrieval
  • Many ways to say the same thing (synonymy)
  • Pertinent negatives (negation)
  • Temporality and aggregation
  • Create network of institutions sharing data
    building collaborations
  • Interoperable with other systems

14
(No Transcript)
15
Text Acquisition
  • Loads the Identified caTIES MySQL datastore
  • Minimize manual processes
  • Text acquisition service for bulk loads
  • Chunker for text reports
  • HL7 acquisition service at Penn
  • Cerner/CoPath caBIG Data Extractor allows
    scheduled batch input from Cerner CoPath
  • Would like to get other LIS Adapters

16
De-Identification Service
  • caTIES De-Identification service scrubs pathology
    report, creates de-identified identifiers, loads
    De-Identified caTIES datastore
  • caTIES de-identification service wraps the de-ID
    software (Saul, Cooper, etc)
  • Safe-Harbor method removes HIPAA mandated
    identifiers
  • Creates tokens for names and preserves temporal
    relationships
  • De-ID works with adopters as each site comes
    on-line

17
De-Identification Service
18
Preserving identifying relationship
  • De-ID service creates linkage through key pair on
    identified side of the system
  • Honest Broker view of data behind the firewall
    allows only HB to see identifiers in order to
    fill requests for tissue

19
Problems to solve
  • Reduce unnecessary barriers and unnecessary work
  • Make entire corpus de-identified and searchable
  • Improve retrieval
  • Many ways to say the same thing (synonymy)
  • Pertinent negatives (negation)
  • Temporality and aggregation
  • Create network of institutions sharing data
    building collaborations
  • Interoperable with other systems

20
Pipeline Service
21
Current Processing Resources
  • (1)   Resetter clears document, deletes
    existing annotations,
  • (2)   Tokeniser - tokenizes words, numbers,
    punctuation and spaces,
  • (3)   Chunker - parses reports into sections,
    parts, sentences, and phrases,
  • (4)   Spell-checker - identifies erroneous
    spelling and suggests frequency based correction
  • (5)   RegEx - annotates a pre-defined set of
    attribute and value pairs such as tumor grade and
    stage,
  • (6)   Vocabulary Concept Tagger - annotates
    fragments of free text to associated concept from
    controlled terminology
  • (7)   Semantic-type Filter - removes concepts
    associated with unwanted semantic types,
  • (8)   NegEx - implements NegEx negation detection
    algorithm to tag explicitly negated concepts,
  • (9) Semantic-type Categorization - extracts
    organs, procedures, diseases infers topology of
    concepts as modifiers of other concepts

22
Problems to solve
  • Reduce unnecessary barriers and unnecessary work
  • Make entire corpus de-identified and searchable
  • Improve retrieval
  • Many ways to say the same thing (synonymy)
  • Pertinent negatives (negation)
  • Temporality and aggregation
  • Create network of institutions sharing data
    building collaborations
  • Interoperable with other systems

23
Technology
  • Decentralized data management and centralized
    order management
  • Grid services
  • Security

24
Decentralized Data Centralized Ordering
  • Although all report-derived data are
    decentralized, we need a centralized way to
    manage orders

25
Grid services for sharing data
  • Globus toolkit
  • We are in some ways purposely creating two
    systems, one of which is isolated from the caGrid
  • Need for Virtual Organizations not everyone is
    going to want to share their data with everyone
    else!

26
Security
  • Physical and network
  • Application-level
  • Grid-level
  • Processes and Policies

27
Processes and Policies
  • HIPAA
  • Institutional Review Boards
  • Security Compliance
  • Trust agreements
  • Honest Brokers
  • Materials transfer agreements for tissue
  • caTIES is testbed for new Security Policy

28
Problems to solve
  • Reduce unnecessary barriers and unnecessary work
  • Make entire corpus de-identified and searchable
  • Improve retrieval
  • Many ways to say the same thing (synonymy)
  • Pertinent negatives (negation)
  • Temporality and aggregation
  • Create network of institutions sharing data
    building collaborations
  • Interoperable with other systems

29
Definition
  • Semantic interoperability is the ability of
    information systems to exchange information on
    the basis of shared, pre-established and
    negotiated meanings of concepts, such that the
    input received can be machine processed

30
Acknowledgements
  • Kevin Mitchell
  • Girish Chavan
  • Adi Nemlekar,
  • Linda Schmandt
  • Eugene Tseytlin
  • Dean Brown, Rick Nestler, Zeke Holland (coPath) -
    Acquisition
  • Melissa Saul, Paul Hanbury, Greg Cooper DeID
  • Steve Merahn DeID
  • Wendy Chapman NegEx
  • Mike Becich, John Gilbertson, Jim Harrison
  • University of Pennsylvania (Mike Feldman, Dave
    Fenstermacher, Tara McSherry)
  • Washington University and Thomas Jefferson
    University
  • GATE, UMLS Lexical Tools, NCI Enterprise
    Vocabulary Services
  • Funding from the Shared Pathology Informatics
    Network (NCI) and caBIG (NCI)
Write a Comment
User Comments (0)
About PowerShow.com