Parallel Algorithms and Distributed Systems for Computational Biophysics PowerPoint PPT Presentation

presentation player overlay
1 / 33
About This Presentation
Transcript and Presenter's Notes

Title: Parallel Algorithms and Distributed Systems for Computational Biophysics


1
Parallel Algorithms and Distributed Systemsfor
Computational Biophysics
Paul R. Brenner Jesús A. Izaguirre, Advisor
  • Department of Computer Science and Engineering
  • University of Notre Dame

Dissertation Defense July 2007
2
Motivation
  • Discovery of biophysical mechanisms via
    simulation accelerates the understanding and
    treatment of disease.
  • Computational Biophysics
  • Atomic scale protein modeling
  • Sampling (Conformational)
  • Functional motion and kinetics
  • PIN1 WW Domain
  • Mechanisms tied to cancer, Huntington Alzheimer
    disease
  • Dynamics important to recognition specificity

PDB 1I8G Rendered by VMD
3
Challenge
  • Systematic analysis of 3N configuration space and
    6N phase space is intractable.
  • Method Limitations
  • Molecular Dynamics
  • Step size (fs)
  • Monte Carlo Methods
  • Trial move acceptance
  • Both
  • Computational complexity of non-bonded forces
  • Rough energy landscape localizes sampling

4
Contributions
  • Parallel Algorithms
  • Reduce mean time to discovery by mapping
    computation to multiple processors
  • Efficient and scalable Monte Carlo algorithm
  • Distributed Systems
  • Maximize utilization of computational
    infrastructure through collaborative computing
  • Simulation framework of autonomous heterogeneous
    resources for high throughput data generation
  • Simulation of the PIN1 WW
  • Revealed reaction coordinates important to the
    dynamics of this target protein (through
    application of the algorithms and distributed
    systems)

5
Parallel Algorithm Motivation
  • Exploit Computational Architectures to Reduce
    Mean Time to Discovery
  • Improve Efficiency of Scale
  • Accelerate/Improve Sampling
  • Ascertain functional conformations
  • Mine data set for reaction coordinate
    correlations
  • Obtain free energy estimates based on
    probability distributions

Blocked Alanine Dipeptide
6
Replica Exchange Method
  • Also known as parallel tempering
  • Multicanonical Monte Carlo sampling
  • Parallel simulations (replicas) over temperature
    range
  • Transfer high temperature conformations to target
  • Can be formalized as macro and micro states
  • Highly parallel but limits to scalability
  • Exchange inversely proportional to vN
  • Increases number of replicas
  • Increases average time for transfer to target

7
Replica Exchange Method
  • Sampling Criterion
  • Must maintain detailed balance
  • Metropolis exchange criterion
  • Traditional generation term

8
Efficient All Pairs Exchange
  • Novel exchange accelerates sampling
  • Improved efficiency estimate

Given
9
Theoretical
  • Conformational Transfer Analysis
  • Given a set Pacc 20
  • Calculate average number of steps for transfer

10
Experimental
  • PE and Ergodic Measure
  • Ramachandran

SNN RMSD 0.880
Baseline
APE RMSD 0.663
11
Experimental
  • Clustering
  • RMSD metric based on correlation of conformation
    geometry (specifically the protein backbone
    atoms)
  • Near neighbors calculated according to cutoff
    RMSD value. Iteratively remove largest remaining
    cluster.

12
Distributed Systems Motivation
  • Collaborative computing for scientific discovery
  • Maximize utilization and share capital
    expenditures
  • Computational biochemistry
  • High throughput and high resolution trajectory
    generation

13
Distributed Systems Motivation
  • Collaborative computing for scientific discovery
  • Maximize utilization and shared capital
    expenditures
  • Computational biochemistry
  • High throughput and high resolution trajectory
    generation

Uniform Computation and Data Mngt. with Scale
Data Sharing
14
The PINS Framework
  • Autonomous and heterogeneous resources
  • Condor matchmaking
  • GEMS hybrid database/filesystem

15
The Software Stacks
  • PINS
  • GEMS
  • Hybrid database/filesystem

16
Committor Probability Application
  • 500 independent simulations
  • 50,000 records
  • Over 1,000,000 output files
  • Performance

17
Tradeoffs and Challenges
  • Efficiency
  • Completion time is gt computation time
  • Large variation in resource evictions
  • Scale limitations with a centralized GEMS server
  • Large variation in CPUs (utilization vs speed)
  • Checkpointing frequency
  • Computation efficiency is proportional
  • PINS/GEMS overhead inversely proportional
  • Additional scripting for fault tolerance
  • Automatic recovery is currently not a generic
    framework feature

18
WW Domain Simulation Motivation
  • NMR results from the Peng lab show WW domain
    dynamics play a significant role in recognition
    specificity
  • Use simulation to reproduce and more thoroughly
    identify functional dynamics
  • Correlation complicated by the disjoint nature of
    accessible observables

PIN1 WW µs-ms (a) and ps-ns (b) mobility from
NMR. Peng et al. 2007
19
WW Simulation Setup/Protocol
  • Initial conformations from Protein Data Bank
  • Unbound/APO PDB ID 1i6c
  • Complex with Cdc25 PDB ID 1i8g
  • Explicitly solvated with TIP3 water molecules
  • Canonical ensemble
  • REM algorithm
  • 278K target temp
  • Periodic BC
  • Particle Mesh Ewald
  • 1fs time step

20
Backbone and ARG12 Mobility
  • STDev per dihedral vs NMR loop mobility
  • ARG12 Behavior
  • One dominant state A
  • Low population path to state C

Jeff
Loop 1
Loop 2
S2
21
Cluster Analysis
  • Backbone clustering 1,000 conformations
  • Separation into 5 representative clusters
  • ARG12 dihedral angles for 5 central conformations
    fall within path from state A to C
  • Slated for chemical shift analysis

22
Hydrogen Bonding Analysis
  • SER11 active donor to SER13 and ARG16

23
Committor Probability Analysis
  • ARG12 Behavior
  • Major separation based on Hbond
  • 2D Ramachandran projection
    not sufficiently unique

24
Summary of Simulation Results
  • Backbone dihedral STDev concurs with loop residue
    mobility obtained from NMR experiments
  • Cluster analysis indicates correlation between
    macro structure and ARG12 behavior
  • Hydrogen bonding analysis indicates the SER11
    residue is important to loop dynamics
  • Committor probability calculations demonstrate
    that ARG12 behavior is correlated to the
    SER11-ARG16 Hbond

25
Summary of Contributions
  • Parallel Algorithms
  • New All Pairs Exchange REM algorithm
  • gt 4 fold speedup in traversal for replica counts
    8
  • 100 sampling improvement in PE and Ergodic
    Metrics
  • Maintains detailed balance with no new
    parameters/heuristics
  • Distributed Systems
  • New PINS framework
  • High throughput data generation and analysis
  • Novel data access and sharing
  • Simulation of the PIN1 WW
  • Revealed components of the multivariate reaction
    coordinate affecting the PIN1 WW ARG12 dynamics.

26
Future Work
  • Parallel Algorithms
  • Multiple Switch All Pairs Exchange REM
  • At each exchange decision interval evaluate all
    pairs and allow for K/2 simultaneous exchanges
  • Distributed Systems
  • PINS on larger scale higher data/compute ratio
  • Multi-institutional grid
  • Simulations with implicit solvation and normal
    modes
  • Folding_at_Home
  • Utilization, comparative analysis, collaborative
    development
  • Grid Heating
  • Capture thermal output of computational
    resources. Transform cooling expenditures into
    facility heating benefit.

27
Future Work
  • Simulation of the PIN1 WW
  • Reaction coordinate correlation (RCC) tool
  • Data mine samples to identify observables
    correlated to target
  • Chemical shift analysis
  • Compare shift differences in primary
    conformations with NMR
  • Rate computation
  • Given multivariate reaction coordinate estimate
    rates
  • Reduced models to reach timescales of milisecond
    motion
  • SCPISM implicit solvation
  • Normal mode constrained/damped dynamics
  • REM with implicit solvation and/or normal modes

28
Publications
  • 3 Journal and 6 Conference papers
  • Journal
  • Brenner, P., Wozniak, J. M., Thain, D., Striegel,
    A., Peng, J. W. Izaguirre, J. A., Biomolecular
    Committor Probability Calculation Enabled by
    Processing in Network Storage, Journal of
    Parallel Computing - Submitted as an Invited
    Paper, 2007
  • Brenner, P., Sweet, C. R., VonHandorf, D.
    Izaguirre, J. A., Accelerating the Replica
    Exchange Method Through an Efficient All-pairs
    Exchange, Journal of Chemical Physics, 2007
  • Wozniak, J. M., Brenner, P., Thain, D., Striegel,
    A. Izaguirre, J. A., Making the Best of a Bad
    Situation Prioritized Storage Management in
    GEMS, Journal of Future Generation Computer
    Systems, 2007  

29
Publications
  • Conference
  • Brenner, P., Wozniak, J. M., Thain, D., Striegel,
    A. Izaguirre, J. A., Biomolecular Path Sampling
    Enabled by Processing in Network Storage, Sixth
    IEEE International Workshop on High-Performance
    Computational Biology , 2007
  • Wozniak, J. M., Brenner, P., Thain, D., Striegel,
    A. Izaguirre, J., Applying Feedback Control to
    a Replica Management System, Proc. Southeastern
    Symposium on System Theory  2006
  • Wozniak, J. M., Brenner, P., Thain, D., Striegel,
    A. Izaguirre, J. A., Access Control for a
    Replica Management Database, Workshop on Storage
    Security and Survivability 2006
  • Thain, D., Klous, S., Wozniak, J., Brenner, P.,
    Striegel, A. Izaguirre, J., Separating
    Abstractions from Resources in a Tactical Storage
    System, Supercomputing Oct 2005
  • Wozniak JM, Brenner P, Thain D, Striegel A,
    Izaguirre JA. 2005. Generosity and Gluttony in
    GEMS Grid Enabled Molecular Simulations, Proc.
    of 14th IEEE International Symposium on High
    Performance Distributed Computing.
  • Hampton S, Brenner P, Wenger A, Chatterjee S,
    Izaguirre JA. 2005. Biomolecular Sampling
    Algorithms, Test Molecules, and Metrics, pp.
    103-123, Vol. 49, Lecture Notes in Computational
    Science and Engineering (LNCSE), Springer Verlag.

30
Acknowledgements
  • Advisor
  • Dr. Jesús A. Izaguirre
  • Committee
  • Dr. Peng, Dr. Striegel, and Dr. Thain
  • Collaborators
  • Distributed Systems Mr. Wozniak
  • Biochemistry Dr. Tao Peng, Mr. Namanja, Mr.
    Zintsmaster
  • Research Group
  • Dr. Sweet, Dr. Hampton, Dr. Huang, Mr. Cickovski,
  • Mr. Chatterjee, Mr. Morcos González
  • Funding
  • NSF Grant DBI-0450067
  • NSF Grant CCF-0135195

31
  • Questions

32
Chemical Shifts (Preliminary)
Loop Align
Backbone Align
33
Conformations for Clusters
Write a Comment
User Comments (0)
About PowerShow.com