Parallel Molecular Dynamics - PowerPoint PPT Presentation

About This Presentation

Title:

Parallel Molecular Dynamics

Description:

Robert Brunner, Andrew Dalke, Attila Gursoy, Bill Humphrey, Mark Nelson. NAMD2: ... C : fashionable vs. conservative. Aggressive methods where they matter ... – PowerPoint PPT presentation

Number of Views:149

Avg rating:3.0/5.0

Slides: 41

Provided by: laxmika

Learn more at: http://charm.cs.uiuc.edu

Category:

more less

Transcript and Presenter's Notes

Title: Parallel Molecular Dynamics

1
Parallel Molecular Dynamics

Application Oriented
Computer Science Research
Laxmikant Kale
http//charm.cs.uiuc.edu

2
Outline

What is needed for HPC to succeed?
Parallelization of Molecular Dynamics
Aggressive Parallel decomposition
Load Balancing and performance
Multiparadigm programming
Collaborative Interdisciplinary Research
Comments and lessons

3
Contributors

PI s
Laxmikant Kale, Klaus Schulten, Robert Skeel
NAMD 1
Robert Brunner, Andrew Dalke, Attila Gursoy,
Bill Humphrey, Mark Nelson
NAMD2
M. Bhandarkar, R. Brunner, A. Gursoy, J. Philips,
N.Krawetz, A. Shinozaki, K. Varadarajan,

4
Parallel Computing Research

Trends
application centered CS research
Isolated CS research
Drawback of both
Needed
Computer Science centered, yet application
oriented research

5
Middle layers
Applications
Middle Layers Languages, Tools, Libraries
Parallel Machines
6
(No Transcript)
7
Molecular Dynamics

Collection of charged atoms, with bonds
Newtonian mechanics
At each time-step
Calculate forces on each atom
bonds
non-bonded electrostatic and van der Waals
Calculate velocities and Advance positions
1 femtosecond time-step, millions needed!
Thousands of atoms (1,000 - 100,000)

8
Further MD

Use of cut-off radius to reduce work
8 - 14 Å
Faraway charges ignored!
80-95 work is non-bonded force computations
Some simulations need faraway contributions

9
Scalability

The Program should scale up to use a large number
of processors.
But what does that mean?
An individual simulation isnt truly scalable
Better definition of scalability
If I double the number of processors, I should
be able to retain parallel efficiency by
increasing the problem size

10
Isoefficiency

Quantify scalability
How much increase in problem size is needed to
retain the same efficiency on a larger machine?
Efficiency Seq. Time/ (P Parallel Time)
parallel time
computation communication idle

11
Traditional Approaches

Replicated Data
All atom coordinates stored on each processor
Non-bonded Forces distributed evenly
Analysis Assume N atoms, P processors
Computation O(N/P)
Communication O(N log P)
Communication/Computation ratio P log P
Fraction of communication increases with number
of processors, independent of problem size!

Not Scalable
12
Atom decomposition

Partition the Atoms array across processors
Nearby atoms may not be on the same processor
Communication O(N) per processor
Communication/Computation O(P)

Not Scalable
13
Force Decomposition

Distribute force matrix to processors
Matrix is sparse, non uniform
Each processor has one block
Communication N/sqrt(P)
Ratio sqrt(P)
Better scalability (can use 100 processors)
Hwang, Saltz, et al
6 on 32 Pes 36 on 128 processor

Not Scalable
14
Spatial Decomposition

Allocate close-by atoms to the same processor
Three variations possible
Partitioning into P boxes, 1 per processor
Good scalability, but hard to implement
Partitioning into fixed size boxes, each a little
larger than the cutoff disctance
Partitioning into smaller boxes
Communication O(N/P)

15
Spatial Decomposition in NAMD

NAMD 1 used spatial decomposition
Good theoretical isoefficiency, but for a fixed
size system, load balancing problems
For midsize systems, got good speedups up to 16
processors.
Use the symmetry of Newtons 3rd law to
facilitate load balancing

16
Spatial Decomposition
17

Spatial Decomposition
18
FD SD

Now, we have many more objects to load balance
Each diamond can be assigned to any processor
Number of diamonds (3D)
14Number of Patches

19
Bond Forces

Multiple types of forces
Bonds(2), Angles(3), Dihedrals (4), ..
Luckily, each involves atoms in neighboring
patches only
Straightforward implementation
Send message to all neighbors,
receive forces from them
262 messages per patch!

20
Bonded Forces

Assume one patch per processor

C
A
B
21
Implementation

Multiple Objects per processor
Different types patches, pairwise forces, bonded
forces,
Each may have its data ready at different times
Need ability to map and remap them
Need prioritized scheduling
Charm supports all of these

22
Charm

Data Driven Objects
Object Groups
global object with a representative on each PE
Asynchronous method invocation
Prioritized scheduling
Mature, robust, portable
http//charm.cs.uiuc.edu

23
Data driven execution
Scheduler
Scheduler
Message Q
Message Q
24
Load Balancing

Is a major challenge for this application
especially for a large number of processors
Unpredictable workloads
Each diamond (force object) and patch encapsulate
variable amount of work
Static estimates are inaccurate
Measurement based Load Balancing
Very slow variations across timesteps

25
Bipartite graph balancing

Background load
Patches and angle forces
Migratable load
Non-bonded forces
Bipartite communication graph
between migratable and non-migratable objects
Challenge
Balance Load while minimizing communication

26
Load balancing

Collect timing data for several cycles
Run heuristic load balancer
Several alternative ones
Re-map and migrate objects accordingly
Registration mechanisms facilitate migration
Needs a separate talk!

27
Before and After
28
Before and After
29
(No Transcript)
30
Performance size of system
31
Performance various machines
32
Speedup
33
Multi-paradigm programming

Long-range electrostatic interactions
Some simulations require this
Contributions of faraway atoms can be calculated
infrequently
PVM based library, DPMTA
developed at Duke by John Board et al
Patch life cycle
Better expressed as a thread

34
Converse

Supports multi-paradigm programming
Provides portability
Makes it easy to implement RTS for new paradigms
Several languages/libraries
Charm, threaded MPI, PVM, Java, md-perl, pc,
Nexus, Path, Cid, CC, DP, Agents,..

35
Namd2 with Converse
36
NAMD2