In silico biology: computational path toward holistic understanding of living cells - PowerPoint PPT Presentation

About This Presentation
Title:

In silico biology: computational path toward holistic understanding of living cells

Description:

In silico biology: computational path toward holistic understanding of living cells ... Jessie Gu, Guruprasad Kora, Chongle Pan, ... – PowerPoint PPT presentation

Number of Views:51
Avg rating:3.0/5.0
Slides: 20
Provided by: Staf544
Learn more at: https://www.csm.ornl.gov
Category:

less

Transcript and Presenter's Notes

Title: In silico biology: computational path toward holistic understanding of living cells


1
In silico biology computational path toward
holistic understanding of living cells
Andrey A. Gorin Computer Science and
Mathematics Oak Ridge National Laboratory agor_at_or
nl.gov
2
Motivation Predictive Biology
  • Developments in Biological Sciences
  • Experimental From Reduction to Systems Science
  • Computational From Validation to Prediction
  • Development in Technologies
  • High Throughput Experimentation
  • High Performance Computing
  • Uniqueness of Biology
  • First Principles Approach is Impossible/Impractica
    l
  • Enormous Multitude of Scales
  • Descriptive Models from Diverse Data
  • Large Uncertainties in Data

3
Scientific Drivers
  • Bio-remediation
  • Environmental restoration using microbial
    processes requires study of
  • Microbial attachment to mineral environments
  • Uptake of contaminant ions by microbes and metal
    reduction at the microbial membrane
  • Conversion of toxic chemicals by microbial
    systems
  • Bio-energy
  • Development of integrated experimental an
    computational approaches for
  • Feedstock optimization with the goal of better
    cellulose deconstruction by special bacterial
    systems
  • Understanding of microbial communities in single
    batch processes (harsh environments)
  • Enzyme or regulatory circuit design to increase
    desirable output

4
Our Research Directions
  • Mass-spectrometry based proteomics protein
    identification and quantification in complex
    biological samples
  • Structural models of protein complexes docking
    from known components, fundamental principles of
    molecular recognition, prediction of protein
    complexes in novel genomes
  • Network models reconstruction from protein
    interactions and other sources, simple simulation
    to demonstrate predictive capabilities

5
Exploring Protein Dimension of Bio Universe
Entire proteome is analyzed in a few hours
1 out of 105-1011 must be selected as the correct
peptide
Mass spectrometry process
6
De novo Platform Probability Profile Method
We made several advancements in the understanding
of the mass spec mathematics. Taken together
they lead to conceptually novel platform.
Peak Assignment
m1 m2 m3 m4
510 VDDLSSLT 305
7
Results Capabilities in Mass Spectrometry
Advances in the mathematical understanding and
dramatic acceleration of fundamental operations
lead to principally new capabilities
  • Output gains, and especially in highly confident
    identifications. Our method gives several times
    more of highly confident identifications
  • Capability to detect unexpected phenomena in the
    samples. We have found novel biological phenomena
    in the legacy data and uncovered mistakes in the
    data sets regarded as benchmarks.

Deamidations (6 spectra) IHPFAQTQSLVYPFPGPIPN
IHPFAQTQSLVYPFPGPIPD Incorrect peptides (7
spectra) VIPAADLSQQISTAGTEASGTGNMK -gt
VIPAADLSEQISTAGTEASGTGNMK Disulphide bond (1
spectra) AAANFFSASCVPCADQSSFPK
De novo methods can improve even manually
verified benchmark data sets obtained by the
existing technology.
8
Model-dependent Science Questions
Many fundamental questions in systems biology are
hampered by the lack of reliable predictive
models.
Network Models Reliant
Structural Models Reliant
  • What biochemical processes in a microbe are
    related to its traits (hydrogen or ethanol
    producer, ethanol resistant)?
  • How does a bacteria degrade lignin or cellulose?
  • What are the mechanisms behind the conversion of
    toxic waste to nontoxic substances by bacteria?
  • What are biochemical or regulatory functions of
    the proteins that are shown to be important for
    hydrogen production?
  • What are the hot spots of cellulases that could
    impair their binding?
  • What are the components of hydrogen producing
    protein assemblies?

9
Predictive Model Building
10
Protein Complexes in GenomicsGTL
GTL is focused on protein interactions that make
life work
Express Bait Protein
Is the interaction real or an artifact? What is
the structure of the protein complex? What is
its function? What is its dynamic
mechanism? Can we answer these questions at
scale?
Exogenous / Endogenous
Pulldown
Mass Spectrometry Analysis
Putative Interacting Proteins at High Throughput
Xray Diffraction
11
Modeling Protein Structures and Complexes
Combinatorial and optimization techniques are
applied for two areas development of knowledge
based potentials and analysis of ultra large
structural sets.
Discovery of protein complexes
?
Geometry and bioinfo libraries
Shared memory indices
Protein folding
Parallel implemenations
Ligand binding
Graph algorithms
12
Computational Algorithms in Structure Modeling
Multitude of combinatorial optimization problems
with different data access patterns.
Example Ab Initio Prediction of Protein 3-d
Structure
13
Results Protein Docking
  • Full implementation for Bayesian potential
    energy functions for protein docking
  • The size of the docking benchmarking set
    (1,200) is unprecedented in the field. High
    quality results were obtained for over 70 of all
    tested complexes (native in top 5), indicating
    that the method is very efficient.
  • Successfully modeled protein complexes using the
    structures from other organisms
  • Docking calculations are scalable to 1000s
    processor due to surface patch approach for
    parallelization

14
FutureProtein Docking
  • Develop further Protein Interface Server (PINS)
    database (pins.ornl.gov).
  • Develop theory and computational implementation
    for docking potentials taking into account
    orientation of the contacting residues
    (Bayesian). Petascale implementations for docking
    platforms.
  • Improve mathematical methods for prediction
    validation.
  • Analyze and annotate PINS interfaces related to
    the metabolic functions involving
    carbon-processing pathways in cellulose-degrading
    bacteria. Construct predictions for the organisms
    important for Bioenergy Centers in cases when
    full or partial genome sequence are available.

15
Predictive Model Building
16
LDRD D07-014 Modeling Cellular Mechanisms for
Efficient Bioethanol Production through Petascale
Analysis of Biological Networks
Andrey Gorin, Nabeela Ahmad, Andrew Bordner,
Robert Day, Jessie Gu, Guruprasad Kora, Chongle
Pan, Byung-Hoon Park, Nagiza Samatova, Edward
Uberbacher, Cray. Inc
Oak Ridge National Laboratory, FY2007-FY2008
17
Future Graph Algorithms for Networks
  • Analysis of graphs reflecting physical
    interactions in structures
  • Very wide set of problems, but with a uniform set
    of translation rules
  • Enormously large graphs
  • Directly connected to modeling petascale
    applications
  • Developing graph representations for real
    cellular subsystems.
  • Nodes genes, proteins, DNA elements,
    metabolites.
  • Edges translation, positive regulation,
    production of metabolite
  • Flexible representations
  • Integration of many tools to annotate 5 regions
  • Promotor alignment
  • Models to predict transcription patterns based on
    promotor models

18
Future Mining Networks for Bioenergy
  • Switch grass paralogs/homologs has to be
    identified in collections of gt250,000 partial
    gene data (EST) from rice, maize, sorghum genomes
  • It is expected that 6000 Populus lines will be
    passed through cell wall phenotyping pipeline
    (expression arrays, proteomics data, etc)
  • Huge data sets are already (e.g. Obauashi et al.
    (2007) 1,388 expression arrays)

(From Bioenergy Center proposal)
19
Future Taking It to Petascale
  • Port to NLCF platforms
  • Scale Maximal Clique Enumeration from 100s to
    1000s processors
  • Introduce similar implementations for other codes
    in pGraph and BioGraphE
  • Optimize performance on NLCF architectures
  • Deliver efficient MPI-based multi-core
    implementations
  • Minimize thread locking/unlocking overheads
  • Exploit data locality in job redistribution
    strategies
  • Minimize I/O overheads via buffering, data
    compression, hierarchical parallel I/O.

20
Potential Areas of Joint Work
  • Mathematics
  • Statistics of pattern recognition problems
  • Combinatorial optimization methods beyond dynamic
    programming
  • Bayesian system with strong feature correlations
  • Computer science
  • Graph algorithms working for VERY large graphs
    104-105 nodes and across many graphs (103) in
    parallel
  • Flexible software systems for representation
    transcription regulation networks in the
    bacterial cells (4 genes)
  • Petascale implementation strategies MPI
    implementations, load balancing, etc
  • Biological Applications
  • Novel applications in mass-spectrometry (e.g.
    gene finding in new genomes, special gene
    expression in stem cells, mass-spec of
    cross-linked systems)
  • Expression array analysis and metabolic pathway
    construction for bacteria involved in bioenergy
    processing
Write a Comment
User Comments (0)
About PowerShow.com