MS 15: Data-Aware Parallel Computing - PowerPoint PPT Presentation

About This Presentation
Title:

MS 15: Data-Aware Parallel Computing

Description:

Fit a linear surface and interpolate/extrapolate to get coefficients c1 and ... simulation similarly, in terms of coefficients b and perform least squares fit ... – PowerPoint PPT presentation

Number of Views:21
Avg rating:3.0/5.0
Slides: 25
Provided by: asri9
Learn more at: http://www.cs.fsu.edu
Category:

less

Transcript and Presenter's Notes

Title: MS 15: Data-Aware Parallel Computing


1
MS 15 Data-Aware Parallel Computing
  • Data-Driven Parallelization in Multi-Scale
    Applications
  • Ashok Srinivasan, Florida State University
  • Dynamic Data Driven Finite Element Modeling of
    Brain Shape Deformation During Neurosurgery
  • Amitava Majumdar, San Diego Supercomputer Center
  • Dynamic Computations in Large-Scale Graphs
  • David Bader, Georgia Tech
  • Tackling Obesity in Children
  • Radha Nandkumar, NCSA

www.cs.fsu.edu/asriniva/presentations/siampp06
2
Data-Driven Parallelization in Multi-Scale
Applications
  • Ashok Srinivasan
  • Computer Science, Florida State University
  • http//www.cs.fsu.edu/asriniva

Aim Simulate for long time spans Solution
features Use data from prior simulations to
parallelize the time domain
Acknowledgements NSF, ORNL, NERSC,
NCSA Collaborators Yanan Yu and Namas Chandra
3
Outline
  • Background
  • Limitations of Conventional Parallelization
  • Example Application Carbon Nanotube Tensile Test
  • Small Time Step Size in Molecular Dynamics
    Simulations
  • Data-Driven Time Parallelization
  • Experimental Results
  • Scaled efficiently to 1000 processors, for a
    problem where conventional parallelization scales
    to just 2-3 processors
  • Other time parallelization approaches
  • Conclusions

4
Background
  • Limitations of Conventional Parallelization
  • Example Application Carbon Nanotube Tensile Test
  • Molecular Dynamics Simulations
  • Problems with Multiple Time-Scales

5
Limitations of Conventional Parallelization
  • Conventional parallelization decomposes the state
    space across processors
  • It is effective for large state space
  • It is not effective when computational effort
    arises from a large number of time steps
  • or when granularity becomes very fine due to a
    large number of processors

6
Example Application Carbon Nanotube Tensile Test
  • Pull the CNT at a constant velocity
  • Determine stress-strain response and yield strain
    (when CNT starts breaking) using MD
  • Strain rate dependent

7
A Drawback of Molecular Dynamics
  • Molecular dynamics
  • In each time step, forces of atoms on each other
    modeled using some potential
  • After force is computed, update positions
  • Repeat for desired number of time steps
  • Time steps size 10 15 seconds, due to physical
    and numerical considerations
  • Desired time range is much larger
  • A million time steps are required to reach 10-9 s
  • Around a day of computing for a 3000-atom CNT
  • MD uses unrealistically large strain-rates

8
Problems with multiple time-scales
  • Fine-scale computations (such as MD) are more
    accurate, but more time consuming
  • Much of the details at the finer scale are
    unimportant, but some are

A simple schematic of multiple time scales
9
Data-Driven Time Parallelization
  • Time parallelization
  • Data Driven Prediction
  • Dimensionality Reduction
  • Relate Simulation Parameters
  • Static Prediction
  • Dynamic Prediction
  • Verification

10
Time Parallelization
  • Each processor simulates a different time
    interval
  • Initial state is obtained by prediction, except
    for processor 0
  • Verify if prediction for end state is close to
    that computed by MD
  • Prediction is based on dynamically determining a
    relationship between the current simulation and
    those in a database of prior results

If time interval is sufficiently large, then
communication overhead is small
11
Dimensionality Reduction
  • Movement of atoms in a 1000-atom CNT can be
    considered the motion of a point in
    3000-dimensional space
  • Find a lower dimensional subspace close to which
    the points lie
  • We use principal orthogonal decomposition
  • Find a low dimensional affine subspace
  • Motion may, however, be complex in this subspace
  • Use results for different strain rates
  • Velocity 10m/s, 5m/s, and 1 m/s
  • At five different time points
  • U, S, V svd(Shifted Data)
  • Shifted Data USVT
  • States of CNT expressed as
  • m c1 u1 c2 u2

u?
u?
m
12
Basis Vectors from POD
  • CNT of 100 A with 1000 atoms at 300 K

u1 (blue) and u2 (red) for z u1 (green) for x is
not significant
Blue z Green, Red x, y
13
Relate strain rate and time
  • Coefficients of u1
  • Blue 1m/s
  • Red 5 m/s
  • Green 10m/s
  • Dotted line same strain
  • Suggests that behavior is similar at similar
    strains
  • In general, clustering similar coefficients can
    give parameter-time relationships

14
Prediction When v is the only parameter
  • Direct Predictor
  • Independently predict change in each coordinate
  • Use precomputed results for 40 different time
    points each for three different velocities
  • To predict for (t v) not in the database
  • Determine coefficients for nearby v at nearby
    strains
  • Fit a linear surface and interpolate/extrapolate
    to get coefficients c1 and c2 for (t v)
  • Get state as m c1 u1 c2 u2

Green 10 m/s, Red 5 m/s, Blue 1 m/s, Magenta
0.1 m/s, Black 0.1m/s through direct prediction
  • Dynamic Prediction
  • Correct the above coefficients, by determining
    the error between the previously predicted and
    computed states

15
Verification of prediction
  • Definition of equivalence of two states
  • Atoms vibrate around their mean position
  • Consider states equivalent if difference in
    position, potential energy, and temperature are
    within the normal range of fluctuations

Displacement (from mean)
Mean position
16
Experimental Results
  • Relate simulations with different strain rates
  • Use the above strategy directly
  • Relate simulations with different strain rates
    and different CNT sizes
  • Express basis vectors in a different functional
    form
  • Relate simulations with different temperatures
    and strain rates
  • Dynamically identify different simulations that
    are similar in current behavior

17
Stress-strain response at 0.1 m/s
  • Blue Exact result
  • Green Direct prediction with interpolation /
    extrapolation
  • Points close to yield involve extrapolation in
    velocity and strain
  • Red Time parallel results

18
Speedup
  • Red line Ideal speedup
  • Blue v 0.1m/s
  • Green The next predictor
  • v 1m/s, using v 10m/s
  • CNT with 1000 atoms
  • Xeon/ Myrinet cluster

19
CNTs of varying sizes
  • Use a 1000-atom CNT result
  • Parallelize 1200, 1600, 2000-atom CNT runs
  • Observe that the dominant mode is approximately a
    linear function of the initial z-coordinate
  • Normalize coordinates to be in 0,1
  • z tDt z t z tDt Dt, predict z
  • Speedup
  • - 2000 atoms
  • .- 1600 atoms
  • __ 1200 atoms
  • Linear
  • Stress-strain
  • Blue Exact 2000 atoms
  • Red 200 processors

20
Predict change in coordinates
  • Express x in terms of basis functions
  • Example
  • x tDt a0, tDt a1, tDt x t
  • a0, tDt, a1, tDt are unknown
  • Express changes, y, for the base (old) simulation
    similarly, in terms of coefficients b and perform
    least squares fit
  • Predict ai, tDt as bi, tDt R tDt
  • R tDt (1-b) R t b(ai, t- bi, t)
  • Intuitively, the difference between the base
    coefficient and the current coefficient is
    predicted as a weighted combination of previous
    weights
  • We use b 0.5
  • Gives more weight to latest results
  • Does not let random fluctuations affect the
    predictor too much
  • Velocity estimated as latest accurate results
    known

21
Temperature and velocity vary
  • Use 1000-atom CNT results
  • Temperatures 300K, 600K, 900K, 1200K
  • Velocities 1m/s, 5m/s, 10m/s
  • Dynamically choose closest simulation for
    prediction

Speedup __ 450K, 2m/s Linear Stress-strain Blu
e Exact 450K Red 200 processors
22
Other time parallelization approaches
  • Waveform relaxation
  • Repeatedly solve for the entire time domain
  • Parallelizes well but convergence can be slow
  • Several variants to improve convergence
  • Parareal approach
  • Features similar to ours and to waveform
    relaxation
  • Precedes our approach
  • Not data-driven
  • Sequential phase for prediction
  • Not very effective in practice so far
  • Has much potential to be improved

23
Conclusions
  • Data-driven time parallelization shows
    significant improvement in speed, without
    sacrificing accuracy significantly
  • Direct prediction is very effective when
    applicable
  • The 980-processor simulation attained a flop
    rate of 420 Gflops
  • Its flops per atom rate of 420 Mflops/atom is
    likely the largest flop per atom rate in
    classical MD simulations

24
Future Work
  • More complex problems
  • Better prediction
  • POD is good for representing data, but not
    necessarily for identifying patterns
  • Use better dimensionality reduction / reduced
    order modeling techniques
  • Use experimental data for prediction
  • Better learning
  • Better verification
  • In CP8 Application of Dimensionality Reduction
    Techniques to Time Parallelization, Yanan Yu
  • Tomorrow, 230 300 pm
Write a Comment
User Comments (0)
About PowerShow.com