Fault tolerant parallelization of time in molecular dynamics simulations - PowerPoint PPT Presentation

About This Presentation

Title:

Fault tolerant parallelization of time in molecular dynamics simulations

Description:

Fault tolerant parallelization of time in molecular dynamics simulations ... Time step size 0.5 femto seconds. Time interval per processor = 1000 time steps ... – PowerPoint PPT presentation

Number of Views:57

Avg rating:3.0/5.0

Slides: 22

Provided by: asri9

Learn more at: http://www.cs.fsu.edu

Category:

more less

Transcript and Presenter's Notes

Title: Fault tolerant parallelization of time in molecular dynamics simulations

1
Fault tolerant parallelization of time in
molecular dynamics simulations

Ashok Srinivasan
Computer Science
Florida State University

Namas Chandra Mechanical Engineering Florida
State University
Aim Long time scales on small physical systems,
on massively parallel machines Solution features
Latency and fault tolerance are related, through
independence of tasks
2
Outline

Application
Background Parareal scheme
Time parallelization with guided simulations
Parallelization of a model problem
Parallelization of molecular dynamics simulations
on a Carbon nanotube
Conclusions and future work

3
Applications

Small physical systems for long time scales
Class of applications considered
State(Ti) F(StateTi-1)
Inherently sequential
Example
Molecular dynamics simulations of Carbon
nanotubes
Time step size 10-15 second
After a million steps, we are still only in the
nanosecond range
Even that requires about a day of sequential
computing time for around 3000 atoms
Spatial parallelization will lead to too fine a
granularity

4
Background

Parareal scheme Baffico et. al.
Based on an approximate-verify-correct sequence

Notation
r Exact time/ Approx. time
P of Procs
k iterations to convergence
Speedup
(1/k) Pr/(Pr)
Ignoring communication cost

5
Limitations of the parareal approach

Speedups obtained (ignoring communication)
Toy MD problem with r 1000
Different time step size for approximation
Speedup 130, efficiency 1.3
Value of r is also not realistic adaptive
sequential computation may be as effective
Toy MD problem with different model for
approximation
Speedup 8, efficiency 25
Quantum control
Speedup 14, efficiency 1.5
Limiting factors
Sequential component
Methods for approximation unlikely to bring major
improvements

6
Time parallelization with guided simulations

Based on a predict-verify approach
Use results of old simulations to speed up the
current simulation
Relationship between different problem parameters
often occurs in engineering
Example Temperature and time, stress and time
Find a relationship and use it to predict the
state at different times
The relationship is determined automatically, and
updated dynamically

7
Guided simulations Latency tolerance

Notation
s Exact time/ Prediction time
P of Procs
l Error rate
Speedup
(1/l) Ps/(1s)
Ignoring communication cost
P/l
If prediction cost is relatively small
Note l lt k

8
Fault tolerance

In case of node failure, another processor fills
in the missing time interval
Other computations need not be discarded
Efficiency close to 1
For large P
Excluding loss in efficiency from errors
If communication cost is negligible

Master
t1
t3
t4
t2
P3
P1
P2
P4
9
Fault tolerance

In case of node failure, another processor fills
in the missing time interval
Other computations need not be discarded
Efficiency close to 1
For large P
Excluding loss in efficiency from errors
If communication cost is negligible

Master
t2
t5
t6
P3
P1
P2
P4
10
Requirements for this technique

Method for predicting a state
Criterion for determining whether two states
(predicted and actual) are similar
A means of informing the master about node failure

11
Parallelization of a model problem

x x0 (x0/L0)a L0 v t
x current position, L0 initial length, v
velocity, t time, x0 position at time 0, a a
material property
Experimental parameters
L0 1, Dt 0.01, 600 points
Base a 1.5, v 0.05
Actual a 2.0 , v 0.0625
xpred xold (Dxold/DBold)DB

12
Parallelization without node failure

P 100, simulated results

13
Speedup without node failure

Speedup 99.9
Efficiency 0.999
Justification for ignoring prediction time and
communication costs
A similar MD computation would take 10 s/time
interval (15 s on IBM SP3)
A reduction on IBM SP3 takes 0.0005s on 100
processors
Prediction is at least three orders of magnitude
smaller than MD computation

P 100, simulated results

14
Speedup with node failure
15
Parallelization of molecular dynamics simulations
on a Carbon nanotube

Definition of equivalence of two states
Atoms vibrate around their mean position
So consider positions equivalent if the
difference is within the range of motion for the
temperature at which the simulation is being
performed

Max displacement 0.211
Mean displacement 0.0789
s 0.0426

Displacement (from mean)
Mean position
16
Prediction

Predictor
Use predictor of a form similar to the model
problem
But the changes are computed in spherical
coordinates
Instead of using xpred xold (Dxold/DBold)DB
Use xpred xold RDB
where R b (Dxold/DBold) (1-b) Rold
This prevents R from becoming large due to a
single small value of DBold caused by random
motion

17
Experimental parameters

Carbon nanotube with 1000 atoms
Subjected to a pull out test
Around 200 atoms in the beginning fixed
Around 200 atoms at the end moved
deterministically
Time step size 0.5 femto seconds
Time interval per processor 1000 time steps
Tersoff-Brenner potential for MD
300 K temperature
f 0.2
Base simulation v 0.05A/1000 time steps
Actual simulation v 0.0625A/1000 time steps

18
Error and speedup without faults
0.1641 A

Speedup on 10 processors 9.5
Good speedup on larger number of processors too

Loss in efficiency only due to first few sets of
iterations
19
Maximum error
0.422 A
20
Limitations of the experiments

They are simulations of a parallel implementation
But large difference between computation and
communication time suggests efficient
implementation
Positions alone do not define the state
Velocities and Energy too are needed
Velocities can be handled through standard
techniques
Pre-computed MD results were used to initialize
the states
Other types of experiments too should be
performed
Use of higher temperature smaller time as base

21
Conclusions and future work

Conclusions
Promises significant improvement in speedup and
efficiency for long-time simulations, through
latency and fault-tolerance
Future work
Better predictor
First predict mean positions, and perturb based
on a probability distribution
Reduce information needed for prediction
Better definition of the equivalence of states
Include velocity and energy
Actual implementation on a parallel machine
Etc ...