Title: Cancer Translational Research Informatics Platform (caTRIP)
1Cancer Translational Research Informatics
Platform (caTRIP)
Patrick McConnell Duke Comprehensive Cancer
Centerpatrick.mcconnell_at_duke.edu
2caTRIP IRB and deidentification issues
- Local issues building data-oriented systems
- Duke required IRB approval to even build a
data-oriented system based upon real PHI - We worked around by leveraging people already on
IRB protocols - Global issues data deidentification
- Data is owned by different groups across the
cancer center - Traditional deidentification data manager
deidentifies an entire dataset then throws away
the key - Distributed deidentification trusted service
provider (TSP) deidentifies discreet values - Traditional approach is not scalable requires a
middle-man - IRB approval required for distributed approach
because it deviates from traditional
deidentification (at Duke)
3caTRIPGrid security
- Global issue IRB submissions
- Exposing PHI on the grid is a touchy subject
- IRBs dont understand grid technology
- Development teams wont necessarily understand
all the intricacies of the caGrid security
infrastructure - It would be very useful to have some boiler-plate
text and some write-ups for IRB proposals - Global issue trusting the caGrid security
infrastructure - caGrid security infrastructure is very new
- caGrid security is based upon some standard
technologies and some newly developed
technologies - There is a mistrust about exposing PHI through
caGrid security, even though it uses some of the
same technologies as a standard web application
4caTRIPCross-institutional clinical querying
- Global issue caBIG vision is a world-wide-web of
cancer data in the hands of oncologists and
researchers - Cross-institutional clinical querying requires
data be shared - The scenarioEnvision a clinical scenario where
a patient reaches an oncologists office for the
first time with a known histologic diagnosis of
breast cancer. She produces a pathology report
that describes the cancer as invasive lobular,
moderately differentiated. At this point the
oncologist is comfortable explaining the various
options for treatments and their associated
risks. However, now imagine a scenario where
that same patient has a strong family history of
cancer (consider both breast cancer or non-breast
cancer histories as possible scenarios). How
should the risks and therefore suggested
treatments change? Now add in the possibility of
additional confounding data that is not
well-published in the current literature.
Consider that the lobular cancer over expresses
the proto-oncogene Her-2/neu, a condition that
almost never happens in lobular tumors. How
would an oncologist even begin to calculate risks
or appropriately suggest treatment options? Even
taking the cohort of all patients at a single
large tertiary care facility, there may not be
enough data to provide the statistical strength
to derive a valid conclusion. Here is a
situation where a multi-institutional
caGrid-enabled network of clinical, pathology,
and molecular data in concert with the
translational informatics system described above
can help to provide the oncologist with the
statistical power to provide an informed decision
with regard to the best treatment options for
his/her new patient. - No lack of PHI, IP, IRB, and deidentification
issues
5caTRIPCross-institutional research querying
- Global issues caBIG vision is a world-wide-web
of cancer data in the hands of oncologists and
researchers - Cross-institutional research querying requires
data be shared - The scenarioTake the seemingly simple question
of trying to determine if there is a correlation
between Nottingham Score and Her2/Neu status for
patients diagnosed with lobular carcinoma.
Firstly, this query crosses two domains that
typically are in disjoint data systems in a
cancer center clinical/pathology annotations and
diagnosis histology. Secondly, in order to make
a statistically valid analysis, there may not be
enough data at a single cancer center. Bringing
data from other institutions can increase
statistical power immensely. Take a second
related scenario, whereby the researcher would
also like to investigate gene biomarkers that
correlate with a Her2/Neu status of negative and
survival. Pulling basic science data from
multiple studies across different research sites
provides to the fingertips of a researcher a
wealth of data previously impossible.
Furthermore, when tissue banking systems are
hooked in, the researcher has the opportunity to
determine whether tissue samples are available
for further research through other experimental
methods, such as immunohistochemistry. - No lack of PHI, IP, IRB, and deidentification
issues
6Cancer Translational Research Informatics
Platform (caTRIP)
- DSIC issues backup slides
Patrick McConnell Duke Comprehensive Cancer
Centerpatrick.mcconnell_at_duke.edu
7caTRIPDistributed deidentification
Can join on GHI789, which is a deidentified ID
Distributed Query Engine
CAE
caTissueCORE
CAE
caTissueCORE
GHI789
GHI789
Trusted Service Provider
MRN3
MRN3
Data owners submit PHI to be deidentified by the
TSP
Data Owner
Data Owner
8caTRIP in-depthArchitecture
Distributed Query Engine
query
GUI
authenticate
discover
Domain Grid Services
Core Grid Services
authorize
CAE
caTissueCORE
CGEMSSNP
caTIES
TR
IdPService
GridGrouper
IndexService
Duke
caTIES
TR
caTissue CORE
CAE
caIntegrator
Domain Controller
Illumina
MAW3
Tumor Registry
9Challenges in data sharingSecurity
- Challenge
- Security in a distributed grid environment is a
difficult problem - The caGrid security infrastructure is still very
new (caGrid 1.0 is released)
authentication
SAML Assertion
User Credentials
authorization
User Grid Certificate
Grid Data Service
Dorian
caGrid Authentication Service
CSM
Trust Fabric
Duke Authentication Plugin
backenddata
GridGrouper
Duke Domain ControllerNT Security
10caTRIPcaGrid Security
Is member of?
Should I trust the credential signer?
Is Authorized?
SAML Assertion
Grid Credentials
Grid Credentials
Authenticate with Local Credential Provider
SAML Assertion
11caArray Local versus hosted applications
- Global issue locally hosting an application
- For tools to be locally hosted, they need to meet
the auditing, security, and other requirements of
local IRBs - Global issue shared hosting of an application
- Not all cancer centers have the resources to
support a local deployment of caArray - NCI provides a shared instance of caArray for
cancer researchers - There needs to be specific workflows for keeping
data private and secure versus public and openly
accessible - caArray will be hosting genotyping data that can
be considered PHI (array-based SNP) - Do shared instances need to meet local IRB
requirements? - If so, can the requirements from multiple
institutions be conflicting? - How are such conflicts resolved?
- General reluctance to share intellectual capital
12C3PRGrid security (same issues as caTRIP)
- Global issue IRB submissions
- Exposing PHI on the grid is a touchy subject
- IRBs dont understand grid technology
- Development teams wont necessarily understand
all the intricacies of the caGrid security
infrastructure - It would be very useful to have some boiler-plate
text and some write-ups for IRB proposals - Global issue trusting the caGrid security
infrastructure - caGrid security infrastructure is very new
- caGrid security is based upon some standard
technologies and some newly developed
technologies - There is a mistrust about exposing PHI through
caGrid security, even though it uses some of the
same technologies as a standard web application - Global issue coordinating Center hosts data and
remote access - Do shared instances need to meet local IRB
requirements? - If so, can the requirements from multiple
institutions be conflicting? - How are such conflicts resolved?
13C3PRGrid security approach
- C3PR Multi-site Registration
- General approach
- These approaches are not ideal
- Extra layer of complexity, non-grid solutions,
another point of failure, etc.
Affiliate Site
Coordinating Center
Register Subject
BackendDatabase
ExternalNon-GridInterface
C3PR
C3PR
Registration Message
BackendDatabase
Grid Service
ExternalInterface
PHI
Firewall
14caArray
- DSIC issues additional slides
Patrick McConnell Duke Comprehensive Cancer
Centerpatrick.mcconnell_at_duke.edu
15CTMSiBackground data sharing
Lab Results
Participant Registration
Patient Scheduling
Adverse Events
Clinical Trials DB
16CTMSiData sharing issues
- Global issue trust of caGrid security
- Data needs to be exchanged between caBIG
applications - Data is all behind the firewall (for now)
- Is it OK to exchange PHI between systems that use
the caGrid security infrastructure? - What about cross-institutional studies
- Is it still OK to exchange PHI between systems
that use the caGrid security infrastructure?
17Clinical Trials Managements Systems
Interoperability Project (CTMSi)
- DSIC issues backup slides
Patrick McConnell Duke Comprehensive Cancer
Centerpatrick.mcconnell_at_duke.edu
18CTMSiMapping the Clinical Research Domain
19CTMSiEnroll patient
20CTMSiLoad lab data
21CTMSiAdverse Event triggered modification
22CTMSiArchitectural Overview
AuthenticationTrustAuthorization
Messages
caXchange
caGrid
Enterprise Service Bus
InboundBindingComponent
OutboundBindingComponent
Routing Rules
GTS
Dorian
Grid Grouper
23CTMSicaGrid Security
Is member of?
Should I trust the credential signer?
Is Authorized?
SAML Assertion
Grid Credentials
Grid Credentials
Authenticate with Local Credential Provider
SAML Assertion