LongTime Molecular Dynamics Simulations in NanoMechanics through Parallelization of the Time Domain - PowerPoint PPT Presentation

About This Presentation
Title:

LongTime Molecular Dynamics Simulations in NanoMechanics through Parallelization of the Time Domain

Description:

Waveform relaxation. Repeatedly solve for the entire time domain ... Features similar to ours and to waveform relaxation. Precedes our approach. Not data-driven ... – PowerPoint PPT presentation

Number of Views:71
Avg rating:3.0/5.0
Slides: 28
Provided by: asri9
Learn more at: http://www.cs.fsu.edu
Category:

less

Transcript and Presenter's Notes

Title: LongTime Molecular Dynamics Simulations in NanoMechanics through Parallelization of the Time Domain


1
Long-Time Molecular Dynamics Simulations in
Nano-Mechanics through Parallelization of the
Time Domain
  • Ashok Srinivasan
  • Florida State University
  • http//www.cs.fsu.edu/asriniva

Aim Simulate for long time spans Solution
features Use data from prior simulations to
parallelize the time domain
Acknowledgements NSF, ORNL, NERSC,
NCSA Collaborators Yanan Yu and Namas Chandra
2
Outline
  • Background
  • Limitations of Conventional Parallelization
  • Example Application Carbon Nanotube Tensile Test
  • Small Time Step Size in Molecular Dynamics
    Simulations
  • Other Time Parallelization Approaches
  • Data-Driven Time Parallelization
  • Experimental Results
  • Scaled efficiently to 1000 processors, for a
    problem where conventional parallelization scales
    to just 2-3 processors
  • Conclusions

3
Background
  • Limitations of Conventional Parallelization
  • Example Application Carbon Nanotube Tensile Test
  • Molecular Dynamics Simulations

4
Limitations of Conventional Parallelization
  • Conventional parallelization decomposes the state
    space across processors
  • It is effective for large state space
  • It is not effective when computational effort
    arises from a large number of time steps
  • or when granularity becomes very fine due to a
    large number of processors

5
Scalability of Conventional MD Parallelization
  • Results on IBM Blue Gene
  • Does not scale efficiently beyond 10 ms/iteration
  • If we want to simulate to a ms
  • Time step 1 fs gt
  • 1012 iterations gt
  • 1010s 300 years
  • If we scaled to 10 ?s per iteration
  • 4 months of computing time

NAMD, 327K atom ATPase PME, IPDPS 2006 NAMD, 92K
atom ApoA1 PME, IPDPS 2006 IBM Blue Matter, 43K
Rhodopsin, Tech Report 2005
6
Example Application Carbon Nanotube Tensile Test
  • Pull the CNT at a constant velocity
  • Determine stress-strain response and yield strain
    (when CNT starts breaking) using MD
  • Strain rate dependent

7
A Drawback of Molecular Dynamics
  • Molecular dynamics
  • In each time step, forces of atoms on each other
    modeled using some potential
  • After force is computed, update positions
  • Repeat for desired number of time steps
  • Time steps size 10 15 seconds, due to physical
    and numerical considerations
  • Desired time range is much larger
  • A million time steps are required to reach 10-9 s
  • Around a day of computing for a 3000-atom CNT
  • MD uses unrealistically large strain-rates

8
Other Time Parallelization Approaches
  • Waveform relaxation
  • Repeatedly solve for the entire time domain
  • Parallelizes well but convergence can be slow
  • Several variants to improve convergence
  • Parareal approach
  • Features similar to ours and to waveform
    relaxation
  • Precedes our approach
  • Not data-driven
  • Sequential phase for prediction
  • Not very effective in practice so far
  • Has much potential to be improved

9
Waveform Relaxation
  • Special case Picard iterations
  • Ex dy/dt y, y(0) 1 becomes
  • dyn1/dt yn(t), y0(t) 1
  • In general
  • dy/dt f(y,t), y(0) y0 becomes
  • dyn1/dt g(yn, yn1, t), y0(t) y0
  • g(u, u, t) f(u, t)
  • g(yn, yn1, t) f(yn, t) Picard
  • g(yn, yn1, t) f(yn1, t) Converges in 1
    iteration
  • Jacobi, Gauss-Seidel, and SOR versions of g
    defined
  • Many improvements
  • Ex DIRM combines above with reduced order
    modeling

Exact N 1 N 2 N 3 N 4
10
Parareal approach
  • Based on an approximate-verify-correct sequence
  • An example of shooting methods for
    time-parallelization
  • Not shown to be effective in realistic situations

Second prediction
Initial computed result
Correction
Initial prediction
11
Data-Driven Time Parallelization
  • Time Parallelization
  • Use Prior Data

12
Time Parallelization
  • Each processor simulates a different time
    interval
  • Initial state is obtained by prediction, except
    for processor 0
  • Verify if prediction for end state is close to
    that computed by MD
  • Prediction is based on dynamically determining a
    relationship between the current simulation and
    those in a database of prior results

If time interval is sufficiently large, then
communication overhead is small
13
Problems with multiple time-scales
  • Fine-scale computations (such as MD) are more
    accurate, but more time consuming
  • Much of the details at the finer scale are
    unimportant, but some are

A simple schematic of multiple time scales
14
Use Prior Data
  • Results for identical simulation exists
  • Retrieve the results
  • Results for slightly different parameter, with
    the same coarse-scale response exists
  • Retrieve the results
  • Verify closeness, or pre-determine acceptable
    parameter range
  • Current simulation behaves like different prior
    ones at different times
  • Identify similar prior results, learn
    relationship, verify prediction
  • Not similar to prior results
  • Try to identify coarse-scale behavior, apply
    dynamic iterations to improve on predictions

15
Experimental Results
  • CNT tensile test
  • CNT identical to prior results, but different
    strain-rate
  • 1000-atoms CNT, 300 K
  • Static and dynamic prediction
  • CNT identical to prior results, but different
    strain-rate and temperature
  • CNT differs in size from prior result, and
    simulated with a different strain-rate

16
Dimensionality Reduction
  • Movement of atoms in a 1000-atom CNT can be
    considered the motion of a point in
    3000-dimensional space
  • Find a lower dimensional subspace close to which
    the points lie
  • We use principal orthogonal decomposition
  • Find a low dimensional affine subspace
  • Motion may, however, be complex in this subspace
  • Use results for different strain rates
  • Velocity 10m/s, 5m/s, and 1 m/s
  • At five different time points
  • U, S, V svd(Shifted Data)
  • Shifted Data USVT
  • States of CNT expressed as
  • m c1 u1 c2 u2

u?
u?
m
17
Basis Vectors from POD
  • CNT of 100 A with 1000 atoms at 300 K

u1 (blue) and u2 (red) for z u1 (green) for x is
not significant
Blue z Green, Red x, y
18
Relate strain rate and time
  • Coefficients of u1
  • Blue 1m/s
  • Red 5 m/s
  • Green 10m/s
  • Dotted line same strain
  • Suggests that behavior is similar at similar
    strains
  • In general, clustering similar coefficients can
    give parameter-time relationships

19
Prediction When v is the only parameter
  • Static Predictor
  • Independently predict change in each coordinate
  • Use precomputed results for 40 different time
    points each for three different velocities
  • To predict for (t v) not in the database
  • Determine coefficients for nearby v at nearby
    strains
  • Fit a linear surface and interpolate/extrapolate
    to get coefficients c1 and c2 for (t v)
  • Get state as m c1 u1 c2 u2

Green 10 m/s, Red 5 m/s, Blue 1 m/s, Magenta
0.1 m/s, Black 0.1m/s through direct prediction
  • Dynamic Prediction
  • Correct the above coefficients, by determining
    the error between the previously predicted and
    computed states

20
Verification of prediction
  • Definition of equivalence of two states
  • Atoms vibrate around their mean position
  • Consider states equivalent if difference in
    position, potential energy, and temperature are
    within the normal range of fluctuations
  • Max displacement 0.2 A
  • Mean displacement 0.08 A
  • Potential energy fluctuation 0.35
  • Temperature fluctuation 12.5 K

Displacement (from mean)
Mean position
21
Stress-strain response at 0.1 m/s
  • Blue Exact result
  • Green Direct prediction with interpolation /
    extrapolation
  • Points close to yield involve extrapolation in
    velocity and strain
  • Red Time parallel results

22
Speedup
  • Red line Ideal speedup
  • Blue v 0.1m/s
  • Green A different predictor
  • v 1m/s, using v 10m/s
  • CNT with 1000 atoms
  • Xeon/ Myrinet cluster

23
Temperature and velocity vary
  • Use 1000-atom CNT results
  • Temperatures 300K, 600K, 900K, 1200K
  • Velocities 1m/s, 5m/s, 10m/s
  • Dynamically choose closest simulation for
    prediction

Speedup __ 450K, 2m/s Linear Stress-strain Blu
e Exact 450K Red 200 processors
24
CNTs of varying sizes
  • Use a 1000-atom CNT, 10 m/s, 300K result
  • Parallelize 1200, 1600, 2000-atom CNT runs
  • Observe that the dominant mode is approximately a
    linear function of the initial z-coordinate
  • Normalize coordinates to be in 0,1
  • z tDt z t z tDt Dt, predict z
  • Speedup
  • - 2000 atoms
  • .- 1600 atoms
  • __ 1200 atoms
  • Linear
  • Stress-strain
  • Blue Exact 2000 atoms, 1m/s
  • Red 200 processors

25
Predict change in coordinates
  • Express x in terms of basis functions
  • Example
  • x tDt a0, tDt a1, tDt x t
  • a0, tDt, a1, tDt are unknown
  • Express changes, y, for the base (old) simulation
    similarly, in terms of coefficients b and perform
    least squares fit
  • Predict ai, tDt as bi, tDt R tDt
  • R tDt (1-b) R t b(ai, t- bi, t)
  • Intuitively, the difference between the base
    coefficient and the current coefficient is
    predicted as a weighted combination of previous
    weights
  • We use b 0.5
  • Gives more weight to latest results
  • Does not let random fluctuations affect the
    predictor too much
  • Velocity estimated as latest accurate results
    known

26
Conclusions
  • Data-driven time parallelization shows
    significant improvement in speed, without
    sacrificing accuracy significantly, in the CNT
    tensile test
  • The 980-processor simulation attained a flop
    rate of 420 Gflops
  • Its flops per atom rate of 420 Mflops/atom is
    likely the largest flop per atom rate in
    classical MD simulations
  • Scaled to 13.5 ?s/iteration
  • References
  • See http//www.cs.fsu.edu/asriniva/research.html

27
Future Work
  • More complex problems
  • Better prediction
  • POD is good for representing data, but not
    necessarily for identifying patterns
  • Use better dimensionality reduction / reduced
    order modeling techniques
  • Satisfy detailed balance
  • Better learning
  • Better verification
Write a Comment
User Comments (0)
About PowerShow.com