Title: Fault tolerant parallelization of time in molecular dynamics simulations
1Fault tolerant parallelization of time in
molecular dynamics simulations
- Ashok Srinivasan
- Computer Science
- Florida State University
Namas Chandra Mechanical Engineering Florida
State University
Aim Long time scales on small physical systems,
on massively parallel machines Solution features
Latency and fault tolerance are related, through
independence of tasks
2Outline
- Application
- Background Parareal scheme
- Time parallelization with guided simulations
- Parallelization of a model problem
- Parallelization of molecular dynamics simulations
on a Carbon nanotube - Conclusions and future work
3Applications
- Small physical systems for long time scales
- Class of applications considered
- State(Ti) F(StateTi-1)
- Inherently sequential
- Example
- Molecular dynamics simulations of Carbon
nanotubes - Time step size 10-15 second
- After a million steps, we are still only in the
nanosecond range - Even that requires about a day of sequential
computing time for around 3000 atoms - Spatial parallelization will lead to too fine a
granularity
4Background
- Parareal scheme Baffico et. al.
- Based on an approximate-verify-correct sequence
- Notation
- r Exact time/ Approx. time
- P of Procs
- k iterations to convergence
- Speedup
- (1/k) Pr/(Pr)
- Ignoring communication cost
5Limitations of the parareal approach
- Speedups obtained (ignoring communication)
- Toy MD problem with r 1000
- Different time step size for approximation
- Speedup 130, efficiency 1.3
- Value of r is also not realistic adaptive
sequential computation may be as effective - Toy MD problem with different model for
approximation - Speedup 8, efficiency 25
- Quantum control
- Speedup 14, efficiency 1.5
- Limiting factors
- Sequential component
- Methods for approximation unlikely to bring major
improvements
6Time parallelization with guided simulations
- Based on a predict-verify approach
- Use results of old simulations to speed up the
current simulation - Relationship between different problem parameters
often occurs in engineering - Example Temperature and time, stress and time
- Find a relationship and use it to predict the
state at different times - The relationship is determined automatically, and
updated dynamically
7Guided simulations Latency tolerance
- Notation
- s Exact time/ Prediction time
- P of Procs
- l Error rate
- Speedup
- (1/l) Ps/(1s)
- Ignoring communication cost
- P/l
- If prediction cost is relatively small
- Note l lt k
8Fault tolerance
- In case of node failure, another processor fills
in the missing time interval - Other computations need not be discarded
- Efficiency close to 1
- For large P
- Excluding loss in efficiency from errors
- If communication cost is negligible
Master
t1
t3
t4
t2
P3
P1
P2
P4
9Fault tolerance
- In case of node failure, another processor fills
in the missing time interval - Other computations need not be discarded
- Efficiency close to 1
- For large P
- Excluding loss in efficiency from errors
- If communication cost is negligible
Master
t2
t5
t6
P3
P1
P2
P4
10Requirements for this technique
- Method for predicting a state
- Criterion for determining whether two states
(predicted and actual) are similar - A means of informing the master about node failure
11Parallelization of a model problem
- x x0 (x0/L0)a L0 v t
- x current position, L0 initial length, v
velocity, t time, x0 position at time 0, a a
material property - Experimental parameters
- L0 1, Dt 0.01, 600 points
- Base a 1.5, v 0.05
- Actual a 2.0 , v 0.0625
- xpred xold (Dxold/DBold)DB
12Parallelization without node failure
13Speedup without node failure
- Speedup 99.9
- Efficiency 0.999
- Justification for ignoring prediction time and
communication costs - A similar MD computation would take 10 s/time
interval (15 s on IBM SP3) - A reduction on IBM SP3 takes 0.0005s on 100
processors - Prediction is at least three orders of magnitude
smaller than MD computation
14Speedup with node failure
15Parallelization of molecular dynamics simulations
on a Carbon nanotube
- Definition of equivalence of two states
- Atoms vibrate around their mean position
- So consider positions equivalent if the
difference is within the range of motion for the
temperature at which the simulation is being
performed
- Max displacement 0.211
- Mean displacement 0.0789
- s 0.0426
Displacement (from mean)
Mean position
16Prediction
- Predictor
- Use predictor of a form similar to the model
problem - But the changes are computed in spherical
coordinates - Instead of using xpred xold (Dxold/DBold)DB
- Use xpred xold RDB
- where R b (Dxold/DBold) (1-b) Rold
- This prevents R from becoming large due to a
single small value of DBold caused by random
motion
17Experimental parameters
- Carbon nanotube with 1000 atoms
- Subjected to a pull out test
- Around 200 atoms in the beginning fixed
- Around 200 atoms at the end moved
deterministically - Time step size 0.5 femto seconds
- Time interval per processor 1000 time steps
- Tersoff-Brenner potential for MD
- 300 K temperature
- f 0.2
- Base simulation v 0.05A/1000 time steps
- Actual simulation v 0.0625A/1000 time steps
18Error and speedup without faults
0.1641 A
- Speedup on 10 processors 9.5
- Good speedup on larger number of processors too
Loss in efficiency only due to first few sets of
iterations
19Maximum error
0.422 A
20Limitations of the experiments
- They are simulations of a parallel implementation
- But large difference between computation and
communication time suggests efficient
implementation - Positions alone do not define the state
- Velocities and Energy too are needed
- Velocities can be handled through standard
techniques - Pre-computed MD results were used to initialize
the states - Other types of experiments too should be
performed - Use of higher temperature smaller time as base
21Conclusions and future work
- Conclusions
- Promises significant improvement in speedup and
efficiency for long-time simulations, through
latency and fault-tolerance - Future work
- Better predictor
- First predict mean positions, and perturb based
on a probability distribution - Reduce information needed for prediction
- Better definition of the equivalence of states
- Include velocity and energy
- Actual implementation on a parallel machine
- Etc ...