Parallel Molecular Dynamics

About This Presentation

Title:

Parallel Molecular Dynamics

Description:

The Program should scale up to use a large number of processors. But what does that mean? ... Each diamond (force object) and patch encapsulate variable amount of work ... – PowerPoint PPT presentation

Number of Views:16

Avg rating:3.0/5.0

Slides: 26

Provided by: laxmika

Learn more at: http://charm.cs.uiuc.edu

Category:

more less

Transcript and Presenter's Notes

Title: Parallel Molecular Dynamics

1
Parallel Molecular Dynamics

A case study
Programming for performance
Laxmikant Kale
http//charm.cs.uiuc.edu

2
Molecular Dynamics

Collection of charged atoms, with bonds
Newtonian mechanics
At each time-step
Calculate forces on each atom
bonds
non-bonded electrostatic and van der Waals
Calculate velocities and Advance positions
1 femtosecond time-step, millions needed!
Thousands of atoms (1,000 - 100,000)

3
Further MD

Use of cut-off radius to reduce work
8 - 14 Å
Faraway charges ignored!
80-95 work is non-bonded force computations
Some simulations need faraway contributions

4
Scalability

The Program should scale up to use a large number
of processors.
But what does that mean?
An individual simulation isnt truly scalable
Better definition of scalability
If I double the number of processors, I should
be able to retain parallel efficiency by
increasing the problem size

5
Isoefficiency

Quantify scalability
How much increase in problem size is needed to
retain the same efficiency on a larger machine?
Efficiency Seq. Time/ (P Parallel Time)
parallel time
computation communication idle

6
Traditional Approaches

Replicated Data
All atom coordinates stored on each processor
Non-bonded Forces distributed evenly
Analysis Assume N atoms, P processors
Computation O(N/P)
Communication O(N log P)
Communication/Computation ratio P log P
Fraction of communication increases with number
of processors, independent of problem size!

Not Scalable
7
Atom decomposition

Partition the Atoms array across processors
Nearby atoms may not be on the same processor
Communication O(N) per processor
Communication/Computation O(P)

Not Scalable
8
Force Decomposition

Distribute force matrix to processors
Matrix is sparse, non uniform
Each processor has one block
Communication N/sqrt(P)
Ratio sqrt(P)
Better scalability (can use 100 processors)
Hwang, Saltz, et al
6 on 32 Pes 36 on 128 processor

Not Scalable
9
Spatial Decomposition

Allocate close-by atoms to the same processor
Three variations possible
Partitioning into P boxes, 1 per processor
Good scalability, but hard to implement
Partitioning into fixed size boxes, each a little
larger than the cutoff disctance
Partitioning into smaller boxes
Communication O(N/P)

10
Spatial Decomposition in NAMD

NAMD 1 used spatial decomposition
Good theoretical isoefficiency, but for a fixed
size system, load balancing problems
For midsize systems, got good speedups up to 16
processors.
Use the symmetry of Newtons 3rd law to
facilitate load balancing

11
Spatial Decomposition
12

Spatial Decomposition
13
FD SD

Now, we have many more objects to load balance
Each diamond can be assigned to any processor
Number of diamonds (3D)
14Number of Patches

14
Bond Forces

Multiple types of forces
Bonds(2), Angles(3), Dihedrals (4), ..
Luckily, each involves atoms in neighboring
patches only
Straightforward implementation
Send message to all neighbors,
receive forces from them
262 messages per patch!

15
Bonded Forces

Assume one patch per processor

C
A
B
16
Implementation

Multiple Objects per processor
Different types patches, pairwise forces, bonded
forces,
Each may have its data ready at different times
Need ability to map and remap them
Need prioritized scheduling
Charm supports all of these

17
Charm

Data Driven Objects
Object Groups
global object with a representative on each PE
Asynchronous method invocation
Prioritized scheduling
Mature, robust, portable
http//charm.cs.uiuc.edu

18
Data driven execution
Scheduler
Scheduler
Message Q
Message Q
19
Load Balancing

Is a major challenge for this application
especially for a large number of processors
Unpredictable workloads
Each diamond (force object) and patch encapsulate
variable amount of work
Static estimates are inaccurate
Measurement based Load Balancing
Very slow variations across timesteps

20
Bipartite graph balancing

Background load
Patches and angle forces
Migratable load
Non-bonded forces
Bipartite communication graph
between migratable and non-migratable objects
Challenge
Balance Load while minimizing communication

21
Load balancing

Collect timing data for several cycles
Run heuristic load balancer
Several alternative ones
Re-map and migrate objects accordingly
Registration mechanisms facilitate migration

22
(No Transcript)
23
Performance size of system
24
Performance various machines
25
Speedup

Write a Comment

User Comments (0)

About PowerShow.com

Parallel Molecular Dynamics - PowerPoint PPT Presentation

Parallel Molecular Dynamics

The Program should scale up to use a large number of processors. But what does that mean? ... Each diamond (force object) and patch encapsulate variable amount of work ... – PowerPoint PPT presentation