Fault tolerant parallelization of time in molecular dynamics simulations - PowerPoint PPT Presentation

About This Presentation
Title:

Fault tolerant parallelization of time in molecular dynamics simulations

Description:

Fault tolerant parallelization of time in molecular dynamics simulations ... Time step size 0.5 femto seconds. Time interval per processor = 1000 time steps ... – PowerPoint PPT presentation

Number of Views:57
Avg rating:3.0/5.0
Slides: 22
Provided by: asri9
Learn more at: http://www.cs.fsu.edu
Category:

less

Transcript and Presenter's Notes

Title: Fault tolerant parallelization of time in molecular dynamics simulations


1
Fault tolerant parallelization of time in
molecular dynamics simulations
  • Ashok Srinivasan
  • Computer Science
  • Florida State University

Namas Chandra Mechanical Engineering Florida
State University
Aim Long time scales on small physical systems,
on massively parallel machines Solution features
Latency and fault tolerance are related, through
independence of tasks
2
Outline
  • Application
  • Background Parareal scheme
  • Time parallelization with guided simulations
  • Parallelization of a model problem
  • Parallelization of molecular dynamics simulations
    on a Carbon nanotube
  • Conclusions and future work

3
Applications
  • Small physical systems for long time scales
  • Class of applications considered
  • State(Ti) F(StateTi-1)
  • Inherently sequential
  • Example
  • Molecular dynamics simulations of Carbon
    nanotubes
  • Time step size 10-15 second
  • After a million steps, we are still only in the
    nanosecond range
  • Even that requires about a day of sequential
    computing time for around 3000 atoms
  • Spatial parallelization will lead to too fine a
    granularity

4
Background
  • Parareal scheme Baffico et. al.
  • Based on an approximate-verify-correct sequence
  • Notation
  • r Exact time/ Approx. time
  • P of Procs
  • k iterations to convergence
  • Speedup
  • (1/k) Pr/(Pr)
  • Ignoring communication cost

5
Limitations of the parareal approach
  • Speedups obtained (ignoring communication)
  • Toy MD problem with r 1000
  • Different time step size for approximation
  • Speedup 130, efficiency 1.3
  • Value of r is also not realistic adaptive
    sequential computation may be as effective
  • Toy MD problem with different model for
    approximation
  • Speedup 8, efficiency 25
  • Quantum control
  • Speedup 14, efficiency 1.5
  • Limiting factors
  • Sequential component
  • Methods for approximation unlikely to bring major
    improvements

6
Time parallelization with guided simulations
  • Based on a predict-verify approach
  • Use results of old simulations to speed up the
    current simulation
  • Relationship between different problem parameters
    often occurs in engineering
  • Example Temperature and time, stress and time
  • Find a relationship and use it to predict the
    state at different times
  • The relationship is determined automatically, and
    updated dynamically

7
Guided simulations Latency tolerance
  • Notation
  • s Exact time/ Prediction time
  • P of Procs
  • l Error rate
  • Speedup
  • (1/l) Ps/(1s)
  • Ignoring communication cost
  • P/l
  • If prediction cost is relatively small
  • Note l lt k

8
Fault tolerance
  • In case of node failure, another processor fills
    in the missing time interval
  • Other computations need not be discarded
  • Efficiency close to 1
  • For large P
  • Excluding loss in efficiency from errors
  • If communication cost is negligible

Master
t1
t3
t4
t2
P3
P1
P2
P4
9
Fault tolerance
  • In case of node failure, another processor fills
    in the missing time interval
  • Other computations need not be discarded
  • Efficiency close to 1
  • For large P
  • Excluding loss in efficiency from errors
  • If communication cost is negligible

Master
t2
t5
t6
P3
P1
P2
P4
10
Requirements for this technique
  • Method for predicting a state
  • Criterion for determining whether two states
    (predicted and actual) are similar
  • A means of informing the master about node failure

11
Parallelization of a model problem
  • x x0 (x0/L0)a L0 v t
  • x current position, L0 initial length, v
    velocity, t time, x0 position at time 0, a a
    material property
  • Experimental parameters
  • L0 1, Dt 0.01, 600 points
  • Base a 1.5, v 0.05
  • Actual a 2.0 , v 0.0625
  • xpred xold (Dxold/DBold)DB

12
Parallelization without node failure
  • P 100, simulated results

13
Speedup without node failure
  • Speedup 99.9
  • Efficiency 0.999
  • Justification for ignoring prediction time and
    communication costs
  • A similar MD computation would take 10 s/time
    interval (15 s on IBM SP3)
  • A reduction on IBM SP3 takes 0.0005s on 100
    processors
  • Prediction is at least three orders of magnitude
    smaller than MD computation
  • P 100, simulated results

14
Speedup with node failure
15
Parallelization of molecular dynamics simulations
on a Carbon nanotube
  • Definition of equivalence of two states
  • Atoms vibrate around their mean position
  • So consider positions equivalent if the
    difference is within the range of motion for the
    temperature at which the simulation is being
    performed
  • Max displacement 0.211
  • Mean displacement 0.0789
  • s 0.0426

Displacement (from mean)
Mean position
16
Prediction
  • Predictor
  • Use predictor of a form similar to the model
    problem
  • But the changes are computed in spherical
    coordinates
  • Instead of using xpred xold (Dxold/DBold)DB
  • Use xpred xold RDB
  • where R b (Dxold/DBold) (1-b) Rold
  • This prevents R from becoming large due to a
    single small value of DBold caused by random
    motion

17
Experimental parameters
  • Carbon nanotube with 1000 atoms
  • Subjected to a pull out test
  • Around 200 atoms in the beginning fixed
  • Around 200 atoms at the end moved
    deterministically
  • Time step size 0.5 femto seconds
  • Time interval per processor 1000 time steps
  • Tersoff-Brenner potential for MD
  • 300 K temperature
  • f 0.2
  • Base simulation v 0.05A/1000 time steps
  • Actual simulation v 0.0625A/1000 time steps

18
Error and speedup without faults
0.1641 A
  • Speedup on 10 processors 9.5
  • Good speedup on larger number of processors too

Loss in efficiency only due to first few sets of
iterations
19
Maximum error
0.422 A
20
Limitations of the experiments
  • They are simulations of a parallel implementation
  • But large difference between computation and
    communication time suggests efficient
    implementation
  • Positions alone do not define the state
  • Velocities and Energy too are needed
  • Velocities can be handled through standard
    techniques
  • Pre-computed MD results were used to initialize
    the states
  • Other types of experiments too should be
    performed
  • Use of higher temperature smaller time as base

21
Conclusions and future work
  • Conclusions
  • Promises significant improvement in speedup and
    efficiency for long-time simulations, through
    latency and fault-tolerance
  • Future work
  • Better predictor
  • First predict mean positions, and perturb based
    on a probability distribution
  • Reduce information needed for prediction
  • Better definition of the equivalence of states
  • Include velocity and energy
  • Actual implementation on a parallel machine
  • Etc ...
Write a Comment
User Comments (0)
About PowerShow.com