Title: LongTime Molecular Dynamics Simulations in NanoMechanics through Parallelization of the Time Domain
1Long-Time Molecular Dynamics Simulations in
Nano-Mechanics through Parallelization of the
Time Domain
- Ashok Srinivasan
- Florida State University
- http//www.cs.fsu.edu/asriniva
Aim Simulate for long time spans Solution
features Use data from prior simulations to
parallelize the time domain
Acknowledgements NSF, ORNL, NERSC,
NCSA Collaborators Yanan Yu and Namas Chandra
2Outline
- Background
- Limitations of Conventional Parallelization
- Example Application Carbon Nanotube Tensile Test
- Small Time Step Size in Molecular Dynamics
Simulations - Other Time Parallelization Approaches
- Data-Driven Time Parallelization
- Experimental Results
- Scaled efficiently to 1000 processors, for a
problem where conventional parallelization scales
to just 2-3 processors - Conclusions
3Background
- Limitations of Conventional Parallelization
- Example Application Carbon Nanotube Tensile Test
- Molecular Dynamics Simulations
4Limitations of Conventional Parallelization
- Conventional parallelization decomposes the state
space across processors - It is effective for large state space
- It is not effective when computational effort
arises from a large number of time steps - or when granularity becomes very fine due to a
large number of processors
5Scalability of Conventional MD Parallelization
- Results on IBM Blue Gene
- Does not scale efficiently beyond 10 ms/iteration
- If we want to simulate to a ms
- Time step 1 fs gt
- 1012 iterations gt
- 1010s 300 years
- If we scaled to 10 ?s per iteration
- 4 months of computing time
NAMD, 327K atom ATPase PME, IPDPS 2006 NAMD, 92K
atom ApoA1 PME, IPDPS 2006 IBM Blue Matter, 43K
Rhodopsin, Tech Report 2005
6Example Application Carbon Nanotube Tensile Test
- Pull the CNT at a constant velocity
- Determine stress-strain response and yield strain
(when CNT starts breaking) using MD - Strain rate dependent
7A Drawback of Molecular Dynamics
- Molecular dynamics
- In each time step, forces of atoms on each other
modeled using some potential - After force is computed, update positions
- Repeat for desired number of time steps
- Time steps size 10 15 seconds, due to physical
and numerical considerations - Desired time range is much larger
- A million time steps are required to reach 10-9 s
- Around a day of computing for a 3000-atom CNT
- MD uses unrealistically large strain-rates
8Other Time Parallelization Approaches
- Waveform relaxation
- Repeatedly solve for the entire time domain
- Parallelizes well but convergence can be slow
- Several variants to improve convergence
- Parareal approach
- Features similar to ours and to waveform
relaxation - Precedes our approach
- Not data-driven
- Sequential phase for prediction
- Not very effective in practice so far
- Has much potential to be improved
9Waveform Relaxation
- Special case Picard iterations
- Ex dy/dt y, y(0) 1 becomes
- dyn1/dt yn(t), y0(t) 1
- In general
- dy/dt f(y,t), y(0) y0 becomes
- dyn1/dt g(yn, yn1, t), y0(t) y0
- g(u, u, t) f(u, t)
- g(yn, yn1, t) f(yn, t) Picard
- g(yn, yn1, t) f(yn1, t) Converges in 1
iteration - Jacobi, Gauss-Seidel, and SOR versions of g
defined - Many improvements
- Ex DIRM combines above with reduced order
modeling
Exact N 1 N 2 N 3 N 4
10Parareal approach
- Based on an approximate-verify-correct sequence
- An example of shooting methods for
time-parallelization - Not shown to be effective in realistic situations
Second prediction
Initial computed result
Correction
Initial prediction
11Data-Driven Time Parallelization
- Time Parallelization
- Use Prior Data
12Time Parallelization
- Each processor simulates a different time
interval - Initial state is obtained by prediction, except
for processor 0 - Verify if prediction for end state is close to
that computed by MD - Prediction is based on dynamically determining a
relationship between the current simulation and
those in a database of prior results
If time interval is sufficiently large, then
communication overhead is small
13Problems with multiple time-scales
- Fine-scale computations (such as MD) are more
accurate, but more time consuming - Much of the details at the finer scale are
unimportant, but some are
A simple schematic of multiple time scales
14Use Prior Data
- Results for identical simulation exists
- Retrieve the results
- Results for slightly different parameter, with
the same coarse-scale response exists - Retrieve the results
- Verify closeness, or pre-determine acceptable
parameter range - Current simulation behaves like different prior
ones at different times - Identify similar prior results, learn
relationship, verify prediction - Not similar to prior results
- Try to identify coarse-scale behavior, apply
dynamic iterations to improve on predictions
15Experimental Results
- CNT tensile test
- CNT identical to prior results, but different
strain-rate - 1000-atoms CNT, 300 K
- Static and dynamic prediction
- CNT identical to prior results, but different
strain-rate and temperature - CNT differs in size from prior result, and
simulated with a different strain-rate
16Dimensionality Reduction
- Movement of atoms in a 1000-atom CNT can be
considered the motion of a point in
3000-dimensional space - Find a lower dimensional subspace close to which
the points lie - We use principal orthogonal decomposition
- Find a low dimensional affine subspace
- Motion may, however, be complex in this subspace
- Use results for different strain rates
- Velocity 10m/s, 5m/s, and 1 m/s
- At five different time points
- U, S, V svd(Shifted Data)
- Shifted Data USVT
- States of CNT expressed as
- m c1 u1 c2 u2
u?
u?
m
17Basis Vectors from POD
- CNT of 100 A with 1000 atoms at 300 K
u1 (blue) and u2 (red) for z u1 (green) for x is
not significant
Blue z Green, Red x, y
18Relate strain rate and time
- Coefficients of u1
- Blue 1m/s
- Red 5 m/s
- Green 10m/s
- Dotted line same strain
- Suggests that behavior is similar at similar
strains - In general, clustering similar coefficients can
give parameter-time relationships
19Prediction When v is the only parameter
- Static Predictor
- Independently predict change in each coordinate
- Use precomputed results for 40 different time
points each for three different velocities - To predict for (t v) not in the database
- Determine coefficients for nearby v at nearby
strains - Fit a linear surface and interpolate/extrapolate
to get coefficients c1 and c2 for (t v) - Get state as m c1 u1 c2 u2
Green 10 m/s, Red 5 m/s, Blue 1 m/s, Magenta
0.1 m/s, Black 0.1m/s through direct prediction
- Dynamic Prediction
- Correct the above coefficients, by determining
the error between the previously predicted and
computed states
20Verification of prediction
- Definition of equivalence of two states
- Atoms vibrate around their mean position
- Consider states equivalent if difference in
position, potential energy, and temperature are
within the normal range of fluctuations
- Max displacement 0.2 A
- Mean displacement 0.08 A
- Potential energy fluctuation 0.35
- Temperature fluctuation 12.5 K
Displacement (from mean)
Mean position
21Stress-strain response at 0.1 m/s
- Blue Exact result
- Green Direct prediction with interpolation /
extrapolation - Points close to yield involve extrapolation in
velocity and strain - Red Time parallel results
22Speedup
- Red line Ideal speedup
- Blue v 0.1m/s
- Green A different predictor
- v 1m/s, using v 10m/s
- CNT with 1000 atoms
- Xeon/ Myrinet cluster
23Temperature and velocity vary
- Use 1000-atom CNT results
- Temperatures 300K, 600K, 900K, 1200K
- Velocities 1m/s, 5m/s, 10m/s
- Dynamically choose closest simulation for
prediction
Speedup __ 450K, 2m/s Linear Stress-strain Blu
e Exact 450K Red 200 processors
24CNTs of varying sizes
- Use a 1000-atom CNT, 10 m/s, 300K result
- Parallelize 1200, 1600, 2000-atom CNT runs
- Observe that the dominant mode is approximately a
linear function of the initial z-coordinate - Normalize coordinates to be in 0,1
- z tDt z t z tDt Dt, predict z
- Speedup
- - 2000 atoms
- .- 1600 atoms
- __ 1200 atoms
- Linear
- Stress-strain
- Blue Exact 2000 atoms, 1m/s
- Red 200 processors
25Predict change in coordinates
- Express x in terms of basis functions
- Example
- x tDt a0, tDt a1, tDt x t
- a0, tDt, a1, tDt are unknown
- Express changes, y, for the base (old) simulation
similarly, in terms of coefficients b and perform
least squares fit - Predict ai, tDt as bi, tDt R tDt
- R tDt (1-b) R t b(ai, t- bi, t)
- Intuitively, the difference between the base
coefficient and the current coefficient is
predicted as a weighted combination of previous
weights - We use b 0.5
- Gives more weight to latest results
- Does not let random fluctuations affect the
predictor too much - Velocity estimated as latest accurate results
known
26Conclusions
- Data-driven time parallelization shows
significant improvement in speed, without
sacrificing accuracy significantly, in the CNT
tensile test - The 980-processor simulation attained a flop
rate of 420 Gflops - Its flops per atom rate of 420 Mflops/atom is
likely the largest flop per atom rate in
classical MD simulations - Scaled to 13.5 ?s/iteration
- References
- See http//www.cs.fsu.edu/asriniva/research.html
27Future Work
- More complex problems
- Better prediction
- POD is good for representing data, but not
necessarily for identifying patterns - Use better dimensionality reduction / reduced
order modeling techniques - Satisfy detailed balance
- Better learning
- Better verification