i2b2 Clinical Research Chart and Hive Architecture - PowerPoint PPT Presentation

1 / 48
About This Presentation
Title:

i2b2 Clinical Research Chart and Hive Architecture

Description:

Intro to the Clinical Research Chart (CRC) Hive / Cell Software Architecture ... Complex encoding; e.g., encoding MIAME in MAGE. Microarray data normalization ... – PowerPoint PPT presentation

Number of Views:200
Avg rating:3.0/5.0
Slides: 49
Provided by: shawnmu
Category:

less

Transcript and Presenter's Notes

Title: i2b2 Clinical Research Chart and Hive Architecture


1
i2b2Clinical Research Chartand Hive
Architecture
  • Henry Chueh
  • Shawn Murphy
  • Isaac Kohane, PI

2
Summary
  • Background
  • Intro to the Clinical Research Chart (CRC)
  • Hive / Cell Software Architecture
  • More details on establishing and using the CRC

3
Background
  • Clinical documentation isclinical
  • Lack of systematic approach for organizing
    clinical data for research
  • Ownership issues are unique
  • Consent issues are a challenge

4
Driving Biological Projects
  • Asthma
  • Hypertension
  • Huntingtons Disease
  • Diabetes

5
Clinical Research Chart (CRC)
  • Organize and transform clinical data to maximize
    its utility for research
  • Develop an Application and Database framework to
    serve this goal
  • Establish an architecture that allows data from
    different studies done on this platform to be
    integrated

6
Design of Clinical Research Chart
CRC DB
HL7 MSH/736401.. PID1023231285..
Text files
XML ltPatient1gt ltimagegt..
database
7
Design of Clinical Research Chart
Data pipeline/workflow application
Pheno/Genotype Database
CRC DB
HL7 MSH/736401.. PID1023231285..
Text files
XML ltPatient1gt ltimagegt..
Visualization and Analysis of database contents
database
8
i2b2 Skeletal Data Flow
EDC Service
EDC applications
Shared data
Enterprise data source (RPDR)
Clinical Research Chart
i2b2 ETL workflow
Annotation Service
Study specific data
Annotation UI
Analytic workflow
Enterprise Systems Registration, ADT,
Labs, Reports, Clinical Notes, etc
Local Systems Systems not gathered
into Enterprise data warehouses
9
Overall Themes
  • Framework to allow development of application
    services in a maximally decoupled fashion.
  • Linux and Windows OS support
  • Java and C programming languages
  • Use Cases for construction of CRC come from
    Driving Biology Projects and experience with
    clients of Partners Research Patient Data Registry

10
Focus on Workflow
  • Necessary for both pre-CRC and post-CRC processes
  • Needed for scientific flexibility
  • Implies a consistent environment for data
    pipelining and flow control

11
i2b2 Hive
  • Formed as a collection of interoperable Cells, or
    services
  • Loosely coupled
  • Makes no assumptions about proximity
  • Connected by Web services
  • Activated by a workflow engine that forms basis
    of choreography among Cells for complex
    interactions

12
Complex choreography
13
i2b2 Cell
  • Behaves as a functional service
  • Separates interactions conceptually into
    transactions and semantics
  • Focuses on facilitating transactions with simple
    semantics (e.g., datatype)
  • Leaves deep semantics to be defined by the
    services provided by a Cell
  • Does not restrict language implementation

14
Target layer for i2b2
Semantic Objects
I2b2 platform
Web Services
TCP/IP
15
Cell examples
  • Concept extraction from clinical narratives
  • Simple transformations e.g., basic text format
    conversion
  • Complex encoding e.g., encoding MIAME in MAGE
  • Microarray data normalization

16
Exposing Cells
  • Protocols layered on top of SOAP
  • At the WSDL level for integrators ie,
    bioinformaticians software engineers
  • At a functional level for investigators
  • i2b2 toolkits to allow integrators to expose
    controlled functionality to investigators
    (Automator)

17
Automator Approach
Extend Kepler workflow engine
informaticians
i2b2 Automator
investigators
18
Birds eye view
Investigator Portal
Workflow engine
CRC Repository
19
Current Implementation
  • Extending Kepler workflow engine for i2b2
  • Data model for CRC repository
  • Defining protocols necessary for interaction (in
    addition to SOAP)
  • Created Cell for concept extraction from
    narratives
  • Early designs for Automator toolkit

20
i2b2 Architecture Key Points
  • Leverage existing workflow standards and software
  • Use Web services as basic form of interaction
  • Assume unlimited choreography, but
  • Provide tools to distill complexity into basic
    automation for clinical investigators

21
SW Licensing and Distribution
  • Commit to Open Source software
  • Use GNU Lesser General Public License
  • Establish local i2b2 repository exposed through
    i2b2 website
  • Contribute to a more global NCBC SourceForge
    style repository if it emerges ?NIH Forge
  • Keep i2b2 protocols fully open

22
Interoperability across NCBC
  • Strongly consider Web services as basic protocol
    for generic shared interactions
  • Consider sharing datasets
  • Promote diversity of approach and use of shared
    software (dont impose uniformity)
  • Facilitate/promote NCBC Open Source project teams

23
Pre-CRC Data Pipeline/Workflow
  • Populating the Clinical Research Chart (CRC)

24
Pre-CRC Data Pipeline/Workflow
  • Use workflow framework to choreograph
    applications services in specific sequences
  • Used to extract, transform, conform, and load
    data and metadata into the CRC

25
Pre-CRC Data Pipeline/Workflow
Services
Ontology
Consent/Tracking
Application Pool
Management
Soap/Http interfaces
Output
Input
Data flowing
Local or through SOAP service
Custom Interfaces
A program
increasingly useful
26
Ontology Service
  • Manages mappings of terms to common vocabularies
  • Provides lists of acceptable (enumerated) values
    for various attribute and value slots.
  • Allows for management of hierarchies, groupings,
    and relationships between terms

Ontology
Consent/Tracking
Application Pool
Management
Ontology
27
Person Consent/Tracking Service
  • Provides mappings between patient/subject
    identifiers
  • Tracks patient/subject consent information
  • Allows identification of the patient/subject
    based upon fuzzy demographic matches

Ontology
Consent/Tracking
Application Pool
Management
Consent/Tracking
28
Application Pool (CVS) Service
  • Stores programs/scripts used in pipeline
  • Provides applications to be downloaded when
    needed
  • Manages versioning of software
  • Provides documentation

Ontology
Consent/Tracking
Application Pool
Management
Application Pool
29
Management Service
  • Stores workflow execution plan
  • Starts and controls workflow execution
  • Schedules workflow execution
  • Monitors workflow execution and data locations
  • Controls permissions associated with workflow
    execution

Ontology
Consent/Tracking
Application Pool
Management
Management
30
Data Pipeline/Workflow ApplicationUse Case for
Asthma Data
RPDR
CRC DB
AsthmaMart
Data retrieval
Language processing
Load Data into Mart
Data de-identification
Vocabulary matching
31
Data Pipeline/WorkflowImplementation
  • Define standard XML representation for workflow -
    MoML
  • Define standards for SOAP services and resource
    discovery
  • Adopt and extend open source workflow package
    (Kepler)
  • Prototypes by July timeframe
  • BIRN -gt NAMIC and LONI collaboration
  • Can follow construction details at
    http//diagon/i2b2

32
Phenotype/Genotype Database
33
Phenotype/Genotype DatabasePrinciples
  • Analytical database schema that does not need to
    change with new data types and concepts
  • Defined fundamental unit of data (atomic fact)
    observation
  • Defined metadata strategy
  • Various levels of de-identification (reviewed and
    approved by IRB)

34
Phenotype/Genotype DatabaseArchitecture
(see preprint)
35
Phenotype/Genotype DatabaseUse Case
  • Smoking observations represented in database

Provider_id Provider_path Name_char
M0022303 MGH\Neurology\M0022303 M0022303
Concept_cd Concept_path Name_char
CT-A-SMK AsthV1\DRptNLP\Tobacco Use\Smoker Smoking
IC9-3051 V2\Diagnosis\Mental Disorders (290-319)\Non-psychotic disorders (300-316)\(305) Nondependent abuse of drugs\(305-1) Tobacco use disorder\(305-11) Tobacco use disorder, co Tobacco Use Disorder, continuous use
CT-A-NSK AsthV1\DRptNLP\Tobacco Use\Non smoker Never smoked
Patient_id_e Concept_cd Start_date Provider_id Confidence_num
Z234 CT-A-SMK 1/1/1997 M0022303 3
Z234 CT-A-SMK 1/1/1998 M0034125 9
Z234 IC9-3051 1/1/2001 M0022303 3
Z234 CT-A-NSK 1/1/2002 M0034125 9
Patient_id_e Birth_date Sex_cd Race_cd Death_date
Z234 3/4/1924 Female Black 4/5/2003
36
Phenotype/Genotype DatabaseImplementation
  • Asthma CRC DB primed with data from 90,000
    patients from Research Patient Data Registry
  • Serves as fundamental data structure for i2b2
    supported data Querying and Visualization
    Application Suite
  • CRC DBs able to fuse seamlessly together
  • Various levels of de-identification to be
    supported for data sharing and publication

37
Visualization and Analysis of CRC database
  • Post-CRC workflow

38
Visualization and AnalysisPrinciples
  • Supported application suite to query and view CRC
    database contents
  • Outside applications for analysis and viewing
    able to plug in to application suite
  • Pipeline/Workflow framework may be used for
    analysis and re-entry of derived data into CRC
    database

39
Visualization and AnalysisArchitecture
  • Supported Applications, Querying and
    Visualization
  • Standard querying
  • Data exploration

40
Visualization and AnalysisArchitecture
  • Supported Applications, ontology management
  • Ontology Management
  • Integrate (outside?) population analysis
    applications

41
Visualization and AnalysisArchitecture
  • Supported applications have plug-in architecture
    for outside analytic tools
  • Standard web-link support with GET and POST
    oriented data transfer
  • Support transfer of specifically transformed data
    to outside applications
  • Complex analysis supported with workflow
    application

42
Visualization and AnalysisArchitecture - Query
Launch
43
Visualization and AnalysisArchitecture -
Exploration
Launch
44
Visualization and AnalysisArchitecture
Ontology mgmt
45
Visualization and AnalysisUse Case
46
Visualization and AnalysisImplementation of
analysis tools
  • Workflow framework to accommodate external
    analytic applications

patient id 0000004
ProgID AA3.3
CRC DB
subject id 4
ProgID CA2.3
ProgID CN2.3
ProgID XN0.9
subject id 4
ProgID SN5.4
ProgID CX2.3
account 347
ProgID PN5.1
ProgID TH3.0
47
Final Assembly
Gene expression in APOE e4 Allele
person
concept
date
raw value
Z5937X
3/4
Outcomes calculated every week
Surgery
microarray (encrypted)
Alzheimer's
ER visit
Z5937X
3/4
Seizures
Trauma
Z5937X
3/4
ER visits
Gene-Chips
Z5937X
3/4
Clinic visits
Trauma
Seizure
Z5937X
4/6
Surgery
Gene-Chips
Z5956X
5/2
Multiple sclerosis
microarray (encrypted)
Seizure
Z5956X
5/2
Alzheimers
Z5956X
5/2
Diabetes
Z5956X
5/2
CT Scan
Z5956X
3/9
Hemorrhage
Z5956X
3/9
Trauma
Z5956X
3/9
Thalamus
Z5956X
3/9
48
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com