Title: Parallel Algorithms and Distributed Systems for Computational Biophysics
1Parallel Algorithms and Distributed Systemsfor
Computational Biophysics
Paul R. Brenner Jesús A. Izaguirre, Advisor
- Department of Computer Science and Engineering
- University of Notre Dame
Dissertation Defense July 2007
2Motivation
- Discovery of biophysical mechanisms via
simulation accelerates the understanding and
treatment of disease.
- Computational Biophysics
- Atomic scale protein modeling
- Sampling (Conformational)
- Functional motion and kinetics
- PIN1 WW Domain
- Mechanisms tied to cancer, Huntington Alzheimer
disease - Dynamics important to recognition specificity
PDB 1I8G Rendered by VMD
3Challenge
- Systematic analysis of 3N configuration space and
6N phase space is intractable.
- Method Limitations
- Molecular Dynamics
- Step size (fs)
- Monte Carlo Methods
- Trial move acceptance
- Both
- Computational complexity of non-bonded forces
- Rough energy landscape localizes sampling
4Contributions
- Parallel Algorithms
- Reduce mean time to discovery by mapping
computation to multiple processors - Efficient and scalable Monte Carlo algorithm
- Distributed Systems
- Maximize utilization of computational
infrastructure through collaborative computing - Simulation framework of autonomous heterogeneous
resources for high throughput data generation - Simulation of the PIN1 WW
- Revealed reaction coordinates important to the
dynamics of this target protein (through
application of the algorithms and distributed
systems)
5Parallel Algorithm Motivation
- Exploit Computational Architectures to Reduce
Mean Time to Discovery - Improve Efficiency of Scale
- Accelerate/Improve Sampling
- Ascertain functional conformations
- Mine data set for reaction coordinate
correlations - Obtain free energy estimates based on
probability distributions
Blocked Alanine Dipeptide
6Replica Exchange Method
- Also known as parallel tempering
- Multicanonical Monte Carlo sampling
- Parallel simulations (replicas) over temperature
range - Transfer high temperature conformations to target
- Can be formalized as macro and micro states
- Highly parallel but limits to scalability
- Exchange inversely proportional to vN
- Increases number of replicas
- Increases average time for transfer to target
7Replica Exchange Method
- Sampling Criterion
- Must maintain detailed balance
- Metropolis exchange criterion
- Traditional generation term
8Efficient All Pairs Exchange
- Novel exchange accelerates sampling
- Improved efficiency estimate
Given
9Theoretical
- Conformational Transfer Analysis
- Given a set Pacc 20
- Calculate average number of steps for transfer
10Experimental
- PE and Ergodic Measure
- Ramachandran
SNN RMSD 0.880
Baseline
APE RMSD 0.663
11Experimental
- Clustering
- RMSD metric based on correlation of conformation
geometry (specifically the protein backbone
atoms) - Near neighbors calculated according to cutoff
RMSD value. Iteratively remove largest remaining
cluster.
12Distributed Systems Motivation
- Collaborative computing for scientific discovery
- Maximize utilization and share capital
expenditures - Computational biochemistry
- High throughput and high resolution trajectory
generation
13Distributed Systems Motivation
- Collaborative computing for scientific discovery
- Maximize utilization and shared capital
expenditures - Computational biochemistry
- High throughput and high resolution trajectory
generation
Uniform Computation and Data Mngt. with Scale
Data Sharing
14The PINS Framework
- Autonomous and heterogeneous resources
- Condor matchmaking
- GEMS hybrid database/filesystem
15The Software Stacks
- GEMS
- Hybrid database/filesystem
16Committor Probability Application
- 500 independent simulations
- 50,000 records
- Over 1,000,000 output files
- Performance
17Tradeoffs and Challenges
- Efficiency
- Completion time is gt computation time
- Large variation in resource evictions
- Scale limitations with a centralized GEMS server
- Large variation in CPUs (utilization vs speed)
- Checkpointing frequency
- Computation efficiency is proportional
- PINS/GEMS overhead inversely proportional
- Additional scripting for fault tolerance
- Automatic recovery is currently not a generic
framework feature
18WW Domain Simulation Motivation
- NMR results from the Peng lab show WW domain
dynamics play a significant role in recognition
specificity - Use simulation to reproduce and more thoroughly
identify functional dynamics
- Correlation complicated by the disjoint nature of
accessible observables
PIN1 WW µs-ms (a) and ps-ns (b) mobility from
NMR. Peng et al. 2007
19WW Simulation Setup/Protocol
- Initial conformations from Protein Data Bank
- Unbound/APO PDB ID 1i6c
- Complex with Cdc25 PDB ID 1i8g
- Explicitly solvated with TIP3 water molecules
- Canonical ensemble
- REM algorithm
- 278K target temp
- Periodic BC
- Particle Mesh Ewald
- 1fs time step
20Backbone and ARG12 Mobility
- STDev per dihedral vs NMR loop mobility
- ARG12 Behavior
- One dominant state A
- Low population path to state C
Jeff
Loop 1
Loop 2
S2
21Cluster Analysis
- Backbone clustering 1,000 conformations
- Separation into 5 representative clusters
- ARG12 dihedral angles for 5 central conformations
fall within path from state A to C - Slated for chemical shift analysis
22Hydrogen Bonding Analysis
- SER11 active donor to SER13 and ARG16
23Committor Probability Analysis
- ARG12 Behavior
- Major separation based on Hbond
- 2D Ramachandran projection
not sufficiently unique
24Summary of Simulation Results
- Backbone dihedral STDev concurs with loop residue
mobility obtained from NMR experiments - Cluster analysis indicates correlation between
macro structure and ARG12 behavior - Hydrogen bonding analysis indicates the SER11
residue is important to loop dynamics - Committor probability calculations demonstrate
that ARG12 behavior is correlated to the
SER11-ARG16 Hbond
25Summary of Contributions
- Parallel Algorithms
- New All Pairs Exchange REM algorithm
- gt 4 fold speedup in traversal for replica counts
8 - 100 sampling improvement in PE and Ergodic
Metrics - Maintains detailed balance with no new
parameters/heuristics - Distributed Systems
- New PINS framework
- High throughput data generation and analysis
- Novel data access and sharing
- Simulation of the PIN1 WW
- Revealed components of the multivariate reaction
coordinate affecting the PIN1 WW ARG12 dynamics.
26Future Work
- Parallel Algorithms
- Multiple Switch All Pairs Exchange REM
- At each exchange decision interval evaluate all
pairs and allow for K/2 simultaneous exchanges - Distributed Systems
- PINS on larger scale higher data/compute ratio
- Multi-institutional grid
- Simulations with implicit solvation and normal
modes - Folding_at_Home
- Utilization, comparative analysis, collaborative
development - Grid Heating
- Capture thermal output of computational
resources. Transform cooling expenditures into
facility heating benefit.
27Future Work
- Simulation of the PIN1 WW
- Reaction coordinate correlation (RCC) tool
- Data mine samples to identify observables
correlated to target - Chemical shift analysis
- Compare shift differences in primary
conformations with NMR - Rate computation
- Given multivariate reaction coordinate estimate
rates - Reduced models to reach timescales of milisecond
motion - SCPISM implicit solvation
- Normal mode constrained/damped dynamics
- REM with implicit solvation and/or normal modes
28Publications
- 3 Journal and 6 Conference papers
- Journal
- Brenner, P., Wozniak, J. M., Thain, D., Striegel,
A., Peng, J. W. Izaguirre, J. A., Biomolecular
Committor Probability Calculation Enabled by
Processing in Network Storage, Journal of
Parallel Computing - Submitted as an Invited
Paper, 2007 - Brenner, P., Sweet, C. R., VonHandorf, D.
Izaguirre, J. A., Accelerating the Replica
Exchange Method Through an Efficient All-pairs
Exchange, Journal of Chemical Physics, 2007 - Wozniak, J. M., Brenner, P., Thain, D., Striegel,
A. Izaguirre, J. A., Making the Best of a Bad
Situation Prioritized Storage Management in
GEMS, Journal of Future Generation Computer
Systems, 2007
29Publications
- Conference
- Brenner, P., Wozniak, J. M., Thain, D., Striegel,
A. Izaguirre, J. A., Biomolecular Path Sampling
Enabled by Processing in Network Storage, Sixth
IEEE International Workshop on High-Performance
Computational Biology , 2007 - Wozniak, J. M., Brenner, P., Thain, D., Striegel,
A. Izaguirre, J., Applying Feedback Control to
a Replica Management System, Proc. Southeastern
Symposium on System Theory 2006 - Wozniak, J. M., Brenner, P., Thain, D., Striegel,
A. Izaguirre, J. A., Access Control for a
Replica Management Database, Workshop on Storage
Security and Survivability 2006 - Thain, D., Klous, S., Wozniak, J., Brenner, P.,
Striegel, A. Izaguirre, J., Separating
Abstractions from Resources in a Tactical Storage
System, Supercomputing Oct 2005 - Wozniak JM, Brenner P, Thain D, Striegel A,
Izaguirre JA. 2005. Generosity and Gluttony in
GEMS Grid Enabled Molecular Simulations, Proc.
of 14th IEEE International Symposium on High
Performance Distributed Computing. - Hampton S, Brenner P, Wenger A, Chatterjee S,
Izaguirre JA. 2005. Biomolecular Sampling
Algorithms, Test Molecules, and Metrics, pp.
103-123, Vol. 49, Lecture Notes in Computational
Science and Engineering (LNCSE), Springer Verlag.
30Acknowledgements
- Advisor
- Dr. Jesús A. Izaguirre
- Committee
- Dr. Peng, Dr. Striegel, and Dr. Thain
- Collaborators
- Distributed Systems Mr. Wozniak
- Biochemistry Dr. Tao Peng, Mr. Namanja, Mr.
Zintsmaster - Research Group
- Dr. Sweet, Dr. Hampton, Dr. Huang, Mr. Cickovski,
- Mr. Chatterjee, Mr. Morcos González
- Funding
- NSF Grant DBI-0450067
- NSF Grant CCF-0135195
31 32Chemical Shifts (Preliminary)
Loop Align
Backbone Align
33Conformations for Clusters