Title: Tissue Bank
1Tissue Bank Pathology Tools
- John Gilbertson MD
- The Centers for Pathology and Oncology
Informatics - University of Pittsburgh Medical Center
2Informatics at Pittsburgh
- The Centers for Pathology and Oncology
Informatics - 18 Faculty and 120 Staff in one integrated
facility - Support Clinical Systems at UPMC as well as
Research Initiatives - Most Faculty are members of the Department of
Pathology - Active part of the Pathology Led University of
Pittsburgh Health Science Tissue Banking System
3caBIG Pathology Team
- Michael Becich MD PhD
- John Gilbertson MD (Faculty Lead)
- Rajnish Gupta MS (Systems Architect)
- Bill Gross (Systems Manager)
- Sharon Winters (Director of Cancer Registry)
- Rajiv Dhir MD (Director of the Tissue Bank)
- Yimin Nie, Vicky Chu, Harpreet Singh, John
Milnes, Ashok Patel, Susan Urda
4Partners
15 Institutions use, and have helped develop,
our tissue banking and pathology software
- Pennsylvania Cancer Alliance Bio-informatics
Initiative - Fox Chase, U Penn, Thomas Jefferson, Wistar, Penn
State - Shared Pathology Informatics Network
- Harvard, UCLA, Indiana (Regenstrief)
- Collaborative Prostate Cancer Tissue Resource
- GWU, Howard Univ, Wisconsin, NYU, VA
- AHRQ Patient Safety Initiative in Pathology
- Kaiser, Iowa, Henry Ford, West Penn
5Tissue Banking and Pathology
- Important both clinically and in research
- The Pathology 70/70 Rule
- High Quality Human Tissues are Central to
Oncology Research - High Quality Annotation Makes Banked Tissues
Valuable - A link between the clinical world and research
world - A potential link between research systems and the
large operational (clinical) systems that drive
cancer centers - Involves all patients and all specimens over long
periods of time - 5 of patient get involved in a clinical trial
- Paraffin blocks are part of the tissue bank
6UPMC Tissue Banking Informatics
- Components of a Tissue Bank Strategy
- Universal Consent to bank tissue and aggregate
data - Medical Center wide Inventory including
barcoding, paraffin and imaging - Detailed, on-going Tissue Annotation through
clinical systems in a warehouse architecture - Open Sharing of Data and Applications with
Clients (Researchers) and Partners (Tissue Banks)
Clinical and Research Systems
AP-LIS
CP-LIS
Cancer Registry
Clinical Trials
Inventory
Tissue Annotation Data Set
Honest Brokers
De-identification
Prostate
Melanoma
Lung
Other
Breast
Organ Specific Query Engines
7UPMC Tissue Banking Informatics
Pathology
8UPMC Tissue Banking Informatics
Pathology
9UPMC Tissue Banking Informatics
Pathology
10UPMC Tissue Banking Informatics
Cancer Registry
11UPMC Tissue Banking Informatics
Pathology
12UPMC Tissue Banking Informatics
Inventory
13The Tissue Bank Space
- The IT and data management capabilities of tissue
banks is highly variable - Political and control issues are highly variable
across institutions - The scope of individual tissue banks are highly
variable - Appropriate consent to bank tissue remains a
major issue at many cancer centers - The value of banked tissue is often a function of
its annotation. - Annotation is complex, expensive process.
- The nature of tissue annotation varies markedly
between banks - Tissue Banks do not share data well - either with
researcher or with other banks - There is data elements used in tissue banking and
sample annotation varies, nor are there
consistent rules of how data elements are
interpreted or managed. However, there are a
number of candidates. - Given these circumstances, were do we begin.
14A Plan?
- We propose a project with three related phases
running in parallel depending on feedback from
the adopter sites - The basic systems to support best practices in
tissue banking - Universal Consent, Sample Inventory, Manual
Annotation using local data elements, Web Based
Query and Display of Tissue Bank Data - Existing UPMC or Adopter production software will
be used - Common Data Elements and Application
Definitions for Sharing of Data and Applications
in caBIG - Applications based on a Meta-data Registry
- Application to map Local Data Elements to CDE)
- Existing production software will have to be
hardened and extended - Automated Tissue Annotation
- Direct annotation from clinical systems (AP-LIS,
CP-LIS, Cancer Registry, WSI) and Major Research
Labs - Software available at UPMC, but local expertise
and implementation is necessary - Free text De-identification (production) and UMLS
Concept Coding (beta) software is available
15System Design
To describe the system architecture, it is useful
to strip of the clinical system integration
Clinical and Research Systems
AP-LIS
CP-LIS
Cancer Registry
Clinical Trials
Tissue Bank
Tissue Annotation Data Set
Honest Brokers
De-identification
Prostate
Melanoma
Lung
Other
Breast
Organ Specific Query Engines
16System Design
This basic set of data entry, storage and display
systems are called Organ Specific Database
System (for historical reasons)
Basic Tissue Bank Systems
Consented Patients
Manual Annotation
Inventory
Tissue Annotation Data Set
Honest Brokers
De-identification
Prostate
Melanoma
Lung
Other
Breast
Organ Specific Query Engines
17System Design
- The OSD (Organ Specific Databases) is a
multi-tiered java application implemented in
Oracle 9i on a SUN Solaris Unix Server - Web operations require Oracle Apache Services and
http Services running on the Server - Languages
- Java, PL/SQL
- Tools
- Jclass, Jbuilder, Oracle Tools Toad
- CM
- CVS
- Bug Tracking
- Home Grown
Basic Tissue Bank Systems
Consented Patients
Manual Annotation
Inventory
Tissue Annotation Data Set
Honest Brokers
De-identification
Prostate
Melanoma
Lung
Other
Breast
Organ Specific Query Engines
18System Design
- Multi-tiered Application
- Schema Layer - actual data and data relations.
All data is stored in numbers and keys - Meta Data Layer - in which all data is defined in
terms of data elements and groups of data
elements. Data descriptions such as data
attributes(), display attributes(), valid
values(), DB Link(), validation rules and
documentation are supported in meta data. The
meta data layer defines the application layer. - Procedures/Function Layer - a set of dynamic
procedures/functions (in PL/SQL or Java) with
control data transformation at the back end. The
procedures accommodate changes in the meta data
and immediately reflect the changes in the
application layer - Application Layer (Form Builder) - a set of
applications including meta-data dictionary
builder and manager, user management, data entry,
query, display, etc. Depending on the domain
(breast, prostate, etc.) the appearance will be
different. These differences are driven by the
meta-data
Data (Schema) Layer
Meta Data Layer
Procedure Layer
Application (display)
Application (data entry)
Application (admin)
Application (query)
19Component Details
- We divide the OSD structure into several main
areas - Phase I Basic Tissue Banking Functionality
- Consented Patient List
- Tissue Bank Inventory System
- Manual Annotation
- Case Display
- Summary Displays
- Query Engine and Display
- Phase II Meta Data Management and Mapping
- Phase III Data Extraction from Clinical Systems
(Automated Annotation)
Consented Patient List
Tissue Bank Inventory
Manual Annotation System(s)
All driven by the same meta-data dictionary
Database System Meta Data
De-identification
Honest Broker
Query Engine (Prostate)
Query Engine (Lung)
Query Engine (Breast)
20Component Details
- Consented Patient List A set of patients and
identifiers (at UPMC we use Name, DOB, SSN and
AP-LIS number) managed by the tissue bank used to
keep track of patients who have given universal
consent for banking of unused clinical specimens
and aggregation of clinical data for sample
annotation. - Inventory Allows rapid accessioning of samples
and tissue bank specific data. Barcoding,
reporting and inventory management. Documents all
transactions. - Annotation For each organ system, an
administrator can use the OSD to define a set of
data elements and relationships (including valid
values, data entry rules and the way the elements
are displayed on a form. These forms are then
used for clinical annotation (demographics,
exposures, progression, vital status, pathology,
staging, tumor markers, etc.) - Case Display All data on a case (Patient or
Accession) can be displayed on a form created
through the meta data dictionary. - Summary Displays Aggregate or Average data on an
organ system can by generated in the procedure
layer of OSD (see section System Design above).
Unlike the creation of a form, this requires
programming support as the procedures are written
in Java or PL/SQL. - Query Engine and Display All data in a data sets
(usually an organ type) can be queried through a
click and point interface in which cases are
selected by selecting data elements and valid
values (ie African American AND Age at Diagnosis
40 49 AND Gleason Score 7, 8 and 9. The
resultant data set can be examined in a series of
default and user defined views (ie Demographics,
Progression, Prostatectomy Data, Inventory, etc).
Data can also be moved to Excel.
21Component Details
A set of tools that build and manipulate a common
set of meta data drive all of the applications
22(No Transcript)
23Component Details
24Component Details
25Component Details
26Component Details
Data Entry
27Component Details
28Component Details
29Component Details
30Component Details
31Meta Data Management and MappingPlan Phase II
- Currently, all applications are driven through a
meta-data dictionary - This allows multiple compatible applications to
build and modified fairly easily - Also allows externalization of meta data in
applications - Data extracted from clinical systems (ie AP LIS
or Cancer Registry) needs to be mapped to the
canonical data elements supported by the
dictionary. This is now done through procedures.
Current Environment
Data
Local Meta-data
Mapping Application
Translator
Canonical Elements
AP LIS
Tissue Bank Application(s)
CP LIS
CRS
32Sharing Data and Applications Plan Phase II
- Goals of phase II of the project would be to
- Formalize a beginning set of caBIG Common Data
Elements on the basis of best existing elements
(if possible) - Formalize a set of Domain Application
Definitions that represent real things that
can be used to build - and enforce standards on -
compatible applications - Develop a mapping engine so that local elements
can be mapped to the CDE
Current Environment
Local Meta-data
Mapping Application
Translator
Canonical Elements
AP LIS
Tissue Bank Application(s)
CP LIS
CRS
33Sharing Data and Applications Plan Phase II
Makes Use Of
Follows
EVS
caBIG CDE
caBIG Domain Application Definitions
Objects or Work Flow
caDSR
Functions/Steps
National Local
Definitions of Basic Concepts
Copied To
Definitions of Real World Things such as a
Patient, Specimen or Tumor
caBIG Domain Application Definitions
Translator
caBIG CDE
Use
Use
Local Canonical Elements
Tissue Bank Application(s)
Translator
Use
Clinical Systems
Applications built with the same definitions
should be compatible
34Meta Data Management and MappingPlan Phase II
Data
Meta Data Registry
Local Data Elements
caBIG CDE
caBIG Domain Application Definitions
Tissue Bank Application(s)
35Automated AnnotationPlan Phase III
- Phase III Automated annotation is an important
component of our tissue banking infrastructure - Largely a warehousing effort and requires local
expertise - Issues include
- Gateways, interfaces from operational systems
- Identifying patients across systems
- Aggregate, validate (and compress) data across
systems - Map data to canonical elements
- Load data into system
Clinical and Research Systems
AP-LIS
CP-LIS
Cancer Registry
Clinical Trials
Tissue Bank
Tissue Annotation Data Set
Honest Brokers
De-identification
Prostate
Melanoma
Lung
Other
Breast
Organ Specific Query Engines
36Automated AnnotationPlan Phase III
PATIENT HISTORY The patient is a AGEltin
60sgt-year-old male with elevated PSA levels. OSS
SLIDE-NUMBER 12/00, PLACE PRE-OP DIAGNOSIS
Elevated PSA. POST-OP DIAGNOSIS Same. PROCEDURE
Prostate biopsies. 1. Left
apex. 2. Left body. bjs FINAL
DIAGNOSIS PART 1 PROSTATE, LEFT APEX, NEEDLE
BIOPSY (OSS SLIDE-NUMBER 12/00) A. INVASIVE
MODERATELY DIFFERENTIATED PROSTATIC
ADENOCARCINOMA WITH A COMBINED GLEASON SCORE
OF 3 3 6. B. THE CARCINOMA INVOLVES ONE OUT
TWO (1/2) CORE FRAGMENTS AND COMPRISES
APPROXIMATELY 5 OF THE PROSTATE TISSUE EXAMINED.
C. NO EVIDENCE OF PERINEURAL INVASION IS
SEEN. PART 2 PROSTATE, LEFT BODY, NEEDLE
BIOPSY (OSS SLIDE-NUMBER 12/00) BENIGN
PROSTATE TISSUE WITH NO EVIDENCE OF HIGH GRADE
PROSTATIC NEITHER INTRAEPITHELIAL NEOPLASIA NOR
CARCINOMA SEEN. mb INITIALSltQQQ/QQQgt COMMENT
All the foci of prostatic carcinoma found small
in size and constitute less than 5 of the
material submitted. Mb NAMEltVVV NAMEltWWW
Q. XXXgt, M.D., Ph.D. Fellow/Chief Resident
NAMEltUUU Q. TTTgt, M.D. NAMEltSSS RRR QQQ
PPP OOO VVV NAMEltWWW Q. XXXgt, M.D., Ph.D.
DATElt6/25/00gt 1150 ____________________________
___________________________________ OUTSIDE
ACCESSION SLIDE-NUMBER 8 CONSULT SLIDES
SLIDE-NUMBER 8 CONSULT BLOCKS OUTSIDE
NAMEltSSSgt RECEIVED Y CONSULT MATERIAL
DESCRIPTION Received for consultation from
NAMEltYYY Q. ZZZgt, D.O. are eight (8) consult
slides labeled SLIDE-NUMBER and eight (8)
consult blocks labeled SLIDE-NUMBER from
PLACE, ADDRESS, PA along with an
accompanying surgical pathology report. bjs
- Two special cases
- Free Text De-identification
- UPMS Software (not open source) de-identifies
documents to HIPPA Safe Harbor - Studied and Tested extensively
- Approved by University of Pittsburgh IRB and UPMC
Security Office - Takes ASCII text (HL-7) and uses a set of
heuristics, dictionaries and thesauri - Free Text UMLS Autocoding (SPIN)
- UPMC and NCI beta software (open source)
- Modular Java application with GATE
- Input pathology reports
- Outputs UMLS code text in CHIRPS (NCI) schema
- Handles negation
- Being discussed by the CDE group
37Imaging
- Just an idea...
- Whole Slide Imaging is a source system for tissue
annotation (early beta) - Slides are bar coded with a tissue bank
(de-identified) number - Imaging system associates the number with the
image(s) - Tissue Bank displays the number as a http call to
the image system - Imaging system opens a viewer and displays the
images
38Relevant Standards
39Size of Project Installed Base
- Phase One Software used at 15 institutions
beyond UPMC
Basic Tissue Bank Systems
Consented Patients
Manual Annotation
Inventory
Tissue Annotation Data Set
Honest Brokers
De-identification
Prostate
Melanoma
Lung
Other
Breast
Organ Specific Query Engines
40Does Other Software Exist?
- There are a variety of software available to meet
some tissue bank needs, and we do not expect that
all sites will require all software available.
However, there is no dominant player and there
are no good (simple) solutions for all of the
problems associated with tissue banking,
especially the important and expensive issue of
tissue annotation and standards. - There are several open source projects that could
be considered for parts of this project, in
particular the EDRN informatics groups has
developed a mapping tool that may be useful in
this area. - There are several CDE groups that may be useful
in tissue banking, including the NAACCR data
elements for clinical information and the CAP
protocols (not real data elements). These should
be available at most institutions. Furthermore,
there are specific sets of data elements for
specific tissue types such as the CPCTR and CBCTR
elements. - Finally there are distinct but parallel efforts
in tissue finding such as the NCI SPIN project in
which NLP is used to extract tissue and pathology
data from de-identified pathology reports (See
above).
41Points of Interoperability with other caBIG
systems
- Links between the Clinical Trials Management
System and the Tissue Bank warehouse. The Current
OSD system receives limited data from the UPCI
CTMS. (which clinical trials a patient is (or
was) on and his current status in each trial). - Links between research projects and tissue banks
should be investigated. - UMLS vocabulary in all caBIG Common Data
Elements, even better, if the there are useful
caBIG data elements in the first year, the
current architecture in which the meta data
dictionary (eventually a formal registry)
enforces standards on applications means that if
we can define caBIG approved data objects we can
enforce them in our applications. - Close contact with the architecture and CDE
groups
42What Resources are proposed to achieve caBIG
interoperability
- Essentially, all resources will be used to
achieve caBIG interoperability - Central to our plan is the mapping (translating)
of local data elements (from local database or
from local clinical systems) to a set of caBIG
meta data (data elements and application
definitions). Tissue bank applications will run
under a meta data registry that will enforce the
use of caBIG meta-data in all tissue bank
applications. - Essentially the entire plan is to enforce caBIG
standards on applications and on top of (or
parallel to) local data standards. - Finally, we plan to place UPMC personnel in
adopter sites as needed during the caBIG
initiative.
43A 12 month plan...
- The UPMC Tissue Bank and Pathology Tools plan has
three phases that may run concurrently. Details
have to be worked out with adopter sites. - Phase I Within the first three to nine months
- We will evaluation the tissue bank informatics
needs/interests at the adopter sites - If necessary we will provide existing software
including - Universal Consent for Tissue Banking and Data
Aggregation - Consented Patient List
- Tissue Bank Inventory System
- OSD Tissue Annotation System (manual)
- OSD Query and Display System
- Goal will be to create/assure a baseline
functionality and solid working relationship
between institutions
44A 12 month plan...
- Phase II Within the first 9 - 12 months
- We will develop a set of Inventory and Annotation
CDE, a set of domain application definitions, a
Meta data Registry similar to the existing meta
date dictionary, and a mapping engine (linked to
the Registry) that will map local data elements
to the caBIG elements. - This will effectively involve hardening the
existing dictionary so that it can support both
CDE and Domain Application Definitions. - At the end of this phase, local data will be
mapped to caBIG elements and data (in caBIG
elements) can be queried and displayed through a
OSD Query Engine built on caBIG elements. Goal is
caBIG interoperability.
45A 12 month plan...
- Phase III Within the first 9 - 12 months
- We will share de-id and SPIN autocoding software
so that adopter sites will be able to UMLS code
archival pathology reports. - In the next 12 months
- We will work with adopter sites to develop
mechanisms to pull clinical data directly from
clinical systems such as AP LIS, CP LIS, Clinical
Trials and Cancer Registry.