Title: From Placenta to the Brain: Image Informatics Challenges and Experiences
1(No Transcript)
2From Placenta to the Brain Image Informatics
Challenges and Experiences
- Tony Pan
- Department of Biomedical Informatics
- The Ohio State University
3Agenda
- Virtual Mouse Placenta
- BIRN Mouse Brain Distributed Processing
- GridPACS
4Virtual Mouse Placenta
5The biological problem
- What is the effect of genetic variation on the
structure and function of an organism? - How do molecular changes translate into changes
at the organism level? - Can we translate molecular changes found to be
associated with a disease into demonstrable
anatomic and physiologic changes?
6Understand function of Rb gene
7Compare phenotypes of normal vs Rb deficient mice
Alignment
Slides/Slices
Placenta
Visualization
Segmentation
8Questions
- What are the mechanisms of fetal death in mutant
mice? - What structural changes occur in the placenta?
- How different are the structural changes between
the wild and mutant types?
9Computational Phenotyping Challenges
- Very large datasets
- Automated image analysis
- Three dimensional reconstruction
- Motion and deformation
- Integration of multiple data sources
- Data indexing and retrieval
10Dataset Size Systems Biology
- Future big science animal experiments on cancer,
heart disease, pathogen host response - Basic small mouse is 3 cm3
- 1 µ resolution very roughly 1013 bytes/mouse
- Molecular data (spatial location) multiply by
102 - Vary genetic composition, environmental
manipulation, systematic mechanisms for varying
genetic expression multiply by 103 - Total 1018 bytes per big science animal
experiment
11Now Virtual Slides(roughly 25GB/cm2 tissue)
12Structural Complexity in the Natural World
13Structural complexity in the biological world
14Data Complexity
- Biomedical informatics research involves a very
large number of heterogeneous types of data - Descriptive metadata is complex and application
specific - Joins frequently need to be carried out between
different types of data - Image data Mass spec data account for 99 of
the storage requirements but maybe 5 of the
complexity
15What will be done with Complex Grid Data?NCI
caBIG Program (Scenario from U Penn Cancer
Center)
- A researcher would like to study the error rate
in pathological diagnoses of solid tumor samples
and compare numerous molecular diagnostic
approaches to determine if the molecular
diagnostic approach can enhance the accuracy of
pathological diagnoses. - Query
- I want all solid tumors, specifically for lung
cancer, that have a diagnosis based on tumor
pathology. Each diagnosis must have an image of
the tumor that allows for independent
verification of diagnoses. Each record retrieved
must also have either proteomics marker data or
microarray data (Affy or two-color) included so
that different molecular techniques can be
correlated to the tumor pathology. In addition, I
want all protein annotations for markers and
genes associated with the proteomics and
microarray data so I can perform meta-analyses.
16Return to Placenta Problem
- What are the mechanisms of fetal death in mutant
mice? - What structural changes occur in the placenta?
- How different are the structural changes between
the wild and mutant types?
17Placental Architecture
decidua
giant trophoblasts
trilaminate layer
spongiotrophoblasts
(E13.5)
labyrinth trophoblasts
yolk sac
18Placental defects in Rb-mutant mice
Good - Labyrinth neat, well-ordered, maternal
blood sinusoids and trophoblasts evenly dispersed
among fetal blood cells.
Bad - Trophoblasts grow wildly, clump together
and disrupt fetal and maternal cells layers
necessary for proper embryonic growth
19 An Exercise in Systems Biology
- Goals morphological changes from Rb gene
mutation - Surface area between different cell layers
- Vascular Density
- Volume of labyrinth layer
- Qualitative insights from 3-D visualization
- .
20Mouse Placenta Flowchart
Compare phenotypes of normal vs Rb deficient mice
Alignment
Slides/Slices
Placenta
Visualization
Segmentation
21Mouse Placenta Flowchart
22Tissue Segmentation Labyrinth
23Probabilistic Segmentation
- Preprocessing - color correction, sub-sampling
- Training set - manually selected ROIs (48) are
used as training set. - Probabilistic classifier Bayesian Maximum A
Posteriori estimator - Local region is classified into one of three
fetal tissue types - Features - Color histogram, gradient histogram,
red pixel count, nuclei size histogram, vacuole
size histogram (more than 500 features)
24Is It Good ? Misclassifications ?
25Tissue Classification Slide 586
26Tissue Classification Sensitivity vs. Specificity
- 10 images were tested
- Sensitivity and Specificity are calculated and
plotted - Sensitivity true positive / (true positive
false negative) - Specificity true negative / (false positive
true negative) - Results
- K-Means performed the worst
- 2-point correlation function has highest
specificity - Bayesian MAP has highest sensitivity
- Difference in sensitivity and specificity is
related to the features used in the
classification. A hybrid set of features from the
three methods would be beneficial.
Kmeans Kmeans 2 Point Correlation 2 Point Correlation Bayesian MAP Bayesian MAP
slide sens spec sens spec sens spec
377 0.61 0.80 0.72 0.98 0.90 0.75
586 0.68 0.79 0.73 0.98 0.91 0.74
826 0.59 0.83 0.69 0.99 0.87 0.74
1047 0.82 0.74 0.76 0.99 0.90 0.71
27 An Exercise in Systems Biology
- Goals morphological changes from Rb gene
mutation - Surface area between different cell layers
- Vascular Density
- Volume of labyrinth layer
- Qualitative insights from 3-D visualization
- .
283D Fingers
Spongiotropoblast
glycogen
labyrinth
29Finger Presence
- Slides 576 625 (46)
- Rigid Registration using Mutual Information and a
2-Level Random walk optimizer - Deformable Registration using piecewise linear
selection of control points - 3D Reconstruction of the finger outlines in each
image
30Rb/
Rb-/-
31(No Transcript)
32Challenges
- Damaged / incomplete data
- Missing parts / slides (broken tissue, folded
tissue) - Flipped slides
- Large variations Deformations
- Non-linear warping
- Different thickness
- Different color
- Morphological changes
- Blood sinus
- Maternal tissues
Flipped
Folded
Color Gradient
Broken
33Algorithmic Challenges
- Output Segmented Tissue Layers from Aligned
Slices - Large data size
- High resolution image (1.0 5.0 GB / image)
- Large number of slides (2-5 mm)
- Rb mutant 800 slides / Wild type 1200 slides
- Too many features any optimization has many
extremas - No well defined boundaries
- Need robust algorithms A Big Challenge
34BIRN Mouse Brain Distributed Processing
35Mouse Brain Phenotype CharacterizationCon-focal
Microscopy (joint work with NCMIR)
correctional tasks
Image file
normalization
stitching
warping
declustering
target task
preprocessing tasks
thresholding
tessellation
prefix sum generation
querying
- Problem definition how many pixels of a certain
color intensity exist within a rectilinear region
of interest? - Implementation the prefix sum solves the query
without scanning every pixel within the region of
interest
36Solving aggregate queries involving Sum or Count
operations on spatial data
SELECT Add(Value(x,y)) FROM Image WHERE (x,y)
in POLYGON lt(10,20),(300,400)gt
37(No Transcript)
38Algorithms Overview
- We develop distributed algorithms that function
across any number of compute nodes in a cluster - When an algorithm begins, data is generally
distributed as well across each compute nodes
disk - Algorithms work on an out-of-core basis by
pulling in one or a few tiles at a time, and
write their result out similarly, one tile at a
time to local disk - Conversions at each end of the pipeline from
single-file formats (e.g. .IMG, .PPM) to and from
distributed storage
39Distributed Execution DataCutter
- Pipe-and-filter metaphor of data processing
- Data is streamed from producer to consumer
filters - Framework for task- and data-parallel
manipulation of large scientific data - Transparent copies of filters
- Provide distributed computation and
application-specific storage access - XML description of data and task flow
40Every Corrective Phase Except Warping
41Indexing Terabyte-scale images on OSC MSS (16
nodes, ext3)
42GridPACS
43 Outline
- Motivation
- Use case
- Mobius Overview
- Mako Service
- Virtual Mako
- Grid PACS
- Questions
44 Motivation
- Integration of multi-institutional data sets
across modalities. - Expose existing data resources with minimal
effort - Provide methods for automatically creating
databases to model new datasets. - Ability to execute distributed queries across all
exposed data resources. - Provide methods for translating between data
types - System should support any data type but promote
the convergence and standardization of similar
types.
45Use Case
46 Mobius
- The Mobius project attempts to define and build a
set of services and protocols enabling the
management and integration of both data and
metadata. - Mobius Core Services
- Global Model Exchange (GME)
- Data Storage and Retrieval (Mako)
- Data Integration and Translation (DTS)
- Mobius Extension Services
- Higher level query services, Adhoc federation
services, Metadata Transportation Services.
47 Mako
- Service framework that exposes data resources as
XML data services through a set of well-defined
interfaces based on the Mako protocol. - Interfaces based on the GGF DAIS working groups
XML realization specification. - Example Operations
- Insertion
- Retrieval
- XPath
- XUpdate
- Deletion
48 Mako Architecture
- Abstract Communication Layer
- Configurable Protocol Handling
- Abstracts Mako Infrastructure from the underlying
data resource - Protocol Handlers Specified at run time.
- Abstract Handlers are extended to expose a
particular data resource - Handlers are easy to write and deploy.
49 Mako Current Support
- MakoDB
- In house XML database, optimized for supporting
specialized Mako features. - XML Databases
- Handler implementation for the XMLDB API
- Tested using Xindice and Exist
- Relational Databases
- Handler implementation for exposing relational
databases using XBridge. - Requires the creation of a XBridge Map file.
50 Mako Features
- Partial Retrieval
- Distributed Document Object Model (DOM)
- Binary Object Support
- Mako protocol supports attaching binary objects
to XML files. - Data Referencing
51 Virtual Mako
- Simplifies client-side complexity of interfacing
with multiple Makos by presenting a single
virtualized interface to a collection of
federated Makos - Acts as a data integration point for distributed
queries - Pluggable algorithms for XML instance
ingestion/distribution - Protocol request broadcast and response
aggregation - Supports all services a standard Mako supports
- Maps a Virtual Collection to a number of remote
standard Collections
52 Grid PACS
- Designed to address the storage, querying, and
processing requirements of large-scale image
databases in a grid wide environment. - Model-centric application, majority of backend
implemented by simply submitting schemas to a
number of Makos - Enables modeling and execution of image
processing workflows
53 Grid PACS
- Relies heavily on the Mobius Infrastructure
- Data Referencing metadata and chunks of data
distributed across grid via references - Partial Retrieval data retrieved on demand
- Distributed DOM emulates local data environment
- VMako query broadcast and aggregation
- Model-driven data storage On demand creation of
schema-based metadata and image storage
collections on Makos
54The Future
553D Model of the PlacentaArchitectural Framework
- Complete annotation of cells
- 4th Dimension of Time
- Genetic and Biochemical analysis
- of purified cell populations
- - Genetic
- - Chromatin
- - Gene expression (mRNA miRNA)
- - Proteonomics
- - Signaling
- - Immunohistochemical
- Monitored data entry
- Educated Bioinformatics
- Interrogation/Experimental testing
56Mammary Gland Microenvironment
573D Model of Breast Cancer Tumor progression
- Epithelial tumor cell
- Myoepithelial cell
- Stromal fibroblast
- Adipocyte
- Tissue macrophage
- T cell
- Endothelial cell
3 Representative Tumor Models of Breast cancer
58Summary Computational Phenotyping and High End
Computing
- Genes comprise (part) of lifes source code
- Understanding biomedicine requires understanding
how genes and environment interact in space-time - Molecules to man is a big quantitative leap!
59NIH BISTI Center at Ohio State
Biomedical Research Imaging Research Computer Science
Ensure success of biventricular pacing Semi-gated cardiac imagery analysis Machine vision On-demand large data analysis
Role of oncogenes in development Multiple modality mouse placenta imaging, information synthesis, registration Image analysis in ensembles of very high resolution 3-D imagery. Interactivity
Mechanism of ischemic cardiac injury Synthesis of multimodal imaging, genotype, gene expression, proteomic data Grid data management and query, Information integration involving multi-modal image, molecular data
60Multiscale Laboratory Research Group
Ohio State University Joel Saltz Gagan
Agrawal Umit Catalyurek Dan Cowden Mike
Gray Tahsin Kurc Shannon Hastings Steve
Langella Scott Oster Tony Pan DK Panda Srini
Parthasarathy P. Sadayappan Sivaramakrishnan
(K2) Michael Zhang
The Ohio Supercomputer Center Stan Ahalt Jason
Bryan Dennis Sessanna Don Stredney Pete Wycoff
61Microscopy Image Analysis
- Pathology
- Dr. Dan Cowden
- Human Cancer Genetics
- Pamela Wenzel
- Dr. Gustavo Leone
- Dr. Alain deBruin
- Biomedical Informatics
- Tony Pan
- Alexandra Gulacy
- Dr. Kun Huang
- Dr. Metin Gurcan
- Dr. Ashish Sharma
- Dr. Joel Saltz
- Computer Science and Engineering
- Kishore Mosaliganti
- Randall Ridgway
- Richard Sharp
62 Mobius Team
- David Ervin
- Daniel Hall
- Shannon Hastings
- Tahsin Kurc
- Stephen Langella
- Scott Oster
- Tony Pan
- Joel Saltz