LongTime Molecular Dynamics Simulations in NanoMechanics through Parallelization of the Time Domain - PowerPoint PPT Presentation

About This Presentation

Title:

LongTime Molecular Dynamics Simulations in NanoMechanics through Parallelization of the Time Domain

Description:

Waveform relaxation. Repeatedly solve for the entire time domain ... Features similar to ours and to waveform relaxation. Precedes our approach. Not data-driven ... – PowerPoint PPT presentation

Number of Views:71

Avg rating:3.0/5.0

Slides: 28

Provided by: asri9

Learn more at: http://www.cs.fsu.edu

Category:

more less

Transcript and Presenter's Notes

Title: LongTime Molecular Dynamics Simulations in NanoMechanics through Parallelization of the Time Domain

1
Long-Time Molecular Dynamics Simulations in
Nano-Mechanics through Parallelization of the
Time Domain

Ashok Srinivasan
Florida State University
http//www.cs.fsu.edu/asriniva

Aim Simulate for long time spans Solution
features Use data from prior simulations to
parallelize the time domain
Acknowledgements NSF, ORNL, NERSC,
NCSA Collaborators Yanan Yu and Namas Chandra
2
Outline

Background
Limitations of Conventional Parallelization
Example Application Carbon Nanotube Tensile Test
Small Time Step Size in Molecular Dynamics
Simulations
Other Time Parallelization Approaches
Data-Driven Time Parallelization
Experimental Results
Scaled efficiently to 1000 processors, for a
problem where conventional parallelization scales
to just 2-3 processors
Conclusions

3
Background

Limitations of Conventional Parallelization
Example Application Carbon Nanotube Tensile Test
Molecular Dynamics Simulations

4
Limitations of Conventional Parallelization

Conventional parallelization decomposes the state
space across processors
It is effective for large state space
It is not effective when computational effort
arises from a large number of time steps
or when granularity becomes very fine due to a
large number of processors

5
Scalability of Conventional MD Parallelization

Results on IBM Blue Gene
Does not scale efficiently beyond 10 ms/iteration
If we want to simulate to a ms
Time step 1 fs gt
1012 iterations gt
1010s 300 years
If we scaled to 10 ?s per iteration
4 months of computing time

NAMD, 327K atom ATPase PME, IPDPS 2006 NAMD, 92K
atom ApoA1 PME, IPDPS 2006 IBM Blue Matter, 43K
Rhodopsin, Tech Report 2005
6
Example Application Carbon Nanotube Tensile Test

Pull the CNT at a constant velocity
Determine stress-strain response and yield strain
(when CNT starts breaking) using MD
Strain rate dependent

7
A Drawback of Molecular Dynamics

Molecular dynamics
In each time step, forces of atoms on each other
modeled using some potential
After force is computed, update positions
Repeat for desired number of time steps
Time steps size 10 15 seconds, due to physical
and numerical considerations
Desired time range is much larger
A million time steps are required to reach 10-9 s
Around a day of computing for a 3000-atom CNT
MD uses unrealistically large strain-rates

8
Other Time Parallelization Approaches

Waveform relaxation
Repeatedly solve for the entire time domain
Parallelizes well but convergence can be slow
Several variants to improve convergence
Parareal approach
Features similar to ours and to waveform
relaxation
Precedes our approach
Not data-driven
Sequential phase for prediction
Not very effective in practice so far
Has much potential to be improved

9
Waveform Relaxation

Special case Picard iterations
Ex dy/dt y, y(0) 1 becomes
dyn1/dt yn(t), y0(t) 1
In general
dy/dt f(y,t), y(0) y0 becomes
dyn1/dt g(yn, yn1, t), y0(t) y0
g(u, u, t) f(u, t)
g(yn, yn1, t) f(yn, t) Picard
g(yn, yn1, t) f(yn1, t) Converges in 1
iteration
Jacobi, Gauss-Seidel, and SOR versions of g
defined
Many improvements
Ex DIRM combines above with reduced order
modeling

Exact N 1 N 2 N 3 N 4
10
Parareal approach

Based on an approximate-verify-correct sequence
An example of shooting methods for
time-parallelization
Not shown to be effective in realistic situations

Second prediction
Initial computed result
Correction
Initial prediction
11
Data-Driven Time Parallelization

Time Parallelization
Use Prior Data

12
Time Parallelization

Each processor simulates a different time
interval
Initial state is obtained by prediction, except
for processor 0
Verify if prediction for end state is close to
that computed by MD
Prediction is based on dynamically determining a
relationship between the current simulation and
those in a database of prior results

If time interval is sufficiently large, then
communication overhead is small
13
Problems with multiple time-scales

Fine-scale computations (such as MD) are more
accurate, but more time consuming
Much of the details at the finer scale are
unimportant, but some are

A simple schematic of multiple time scales
14
Use Prior Data

Results for identical simulation exists
Retrieve the results
Results for slightly different parameter, with
the same coarse-scale response exists
Retrieve the results
Verify closeness, or pre-determine acceptable
parameter range
Current simulation behaves like different prior
ones at different times
Identify similar prior results, learn
relationship, verify prediction
Not similar to prior results
Try to identify coarse-scale behavior, apply
dynamic iterations to improve on predictions

15
Experimental Results

CNT tensile test
CNT identical to prior results, but different
strain-rate
1000-atoms CNT, 300 K
Static and dynamic prediction
CNT identical to prior results, but different
strain-rate and temperature
CNT differs in size from prior result, and
simulated with a different strain-rate

16
Dimensionality Reduction

Movement of atoms in a 1000-atom CNT can be
considered the motion of a point in
3000-dimensional space
Find a lower dimensional subspace close to which
the points lie
We use principal orthogonal decomposition
Find a low dimensional affine subspace
Motion may, however, be complex in this subspace
Use results for different strain rates
Velocity 10m/s, 5m/s, and 1 m/s
At five different time points
U, S, V svd(Shifted Data)
Shifted Data USVT
States of CNT expressed as
m c1 u1 c2 u2

u?
u?
m
17
Basis Vectors from POD

CNT of 100 A with 1000 atoms at 300 K

u1 (blue) and u2 (red) for z u1 (green) for x is
not significant
Blue z Green, Red x, y
18
Relate strain rate and time

Coefficients of u1
Blue 1m/s
Red 5 m/s
Green 10m/s
Dotted line same strain
Suggests that behavior is similar at similar
strains
In general, clustering similar coefficients can
give parameter-time relationships

19
Prediction When v is the only parameter

Static Predictor
Independently predict change in each coordinate
Use precomputed results for 40 different time
points each for three different velocities
To predict for (t v) not in the database
Determine coefficients for nearby v at nearby
strains
Fit a linear surface and interpolate/extrapolate
to get coefficients c1 and c2 for (t v)
Get state as m c1 u1 c2 u2

Green 10 m/s, Red 5 m/s, Blue 1 m/s, Magenta
0.1 m/s, Black 0.1m/s through direct prediction

Dynamic Prediction
Correct the above coefficients, by determining
the error between the previously predicted and
computed states

20
Verification of prediction

Definition of equivalence of two states
Atoms vibrate around their mean position
Consider states equivalent if difference in
position, potential energy, and temperature are
within the normal range of fluctuations

Max displacement 0.2 A
Mean displacement 0.08 A
Potential energy fluctuation 0.35
Temperature fluctuation 12.5 K

Displacement (from mean)
Mean position
21
Stress-strain response at 0.1 m/s

Blue Exact result
Green Direct prediction with interpolation /
extrapolation
Points close to yield involve extrapolation in
velocity and strain
Red Time parallel results

22
Speedup

Red line Ideal speedup
Blue v 0.1m/s
Green A different predictor
v 1m/s, using v 10m/s
CNT with 1000 atoms
Xeon/ Myrinet cluster

23
Temperature and velocity vary

Use 1000-atom CNT results
Temperatures 300K, 600K, 900K, 1200K
Velocities 1m/s, 5m/s, 10m/s
Dynamically choose closest simulation for
prediction

Speedup __ 450K, 2m/s Linear Stress-strain Blu
e Exact 450K Red 200 processors
24
CNTs of varying sizes

Use a 1000-atom CNT, 10 m/s, 300K result
Parallelize 1200, 1600, 2000-atom CNT runs
Observe that the dominant mode is approximately a
linear function of the initial z-coordinate
Normalize coordinates to be in 0,1
z tDt z t z tDt Dt, predict z

Speedup
- 2000 atoms
.- 1600 atoms
__ 1200 atoms
Linear
Stress-strain
Blue Exact 2000 atoms, 1m/s
Red 200 processors

25
Predict change in coordinates

Express x in terms of basis functions
Example
x tDt a0, tDt a1, tDt x t
a0, tDt, a1, tDt are unknown
Express changes, y, for the base (old) simulation
similarly, in terms of coefficients b and perform
least squares fit
Predict ai, tDt as bi, tDt R tDt
R tDt (1-b) R t b(ai, t- bi, t)
Intuitively, the difference between the base
coefficient and the current coefficient is
predicted as a weighted combination of previous
weights
We use b 0.5
Gives more weight to latest results
Does not let random fluctuations affect the
predictor too much
Velocity estimated as latest accurate results
known

26
Conclusions

Data-driven time parallelization shows
significant improvement in speed, without
sacrificing accuracy significantly, in the CNT
tensile test
The 980-processor simulation attained a flop
rate of 420 Gflops
Its flops per atom rate of 420 Mflops/atom is
likely the largest flop per atom rate in
classical MD simulations
Scaled to 13.5 ?s/iteration
References
See http//www.cs.fsu.edu/asriniva/research.html

27
Future Work