Parallel Molecular Dynamics - PowerPoint PPT Presentation

About This Presentation
Title:

Parallel Molecular Dynamics

Description:

The Program should scale up to use a large number of processors. But what does that mean? ... Each diamond (force object) and patch encapsulate variable amount of work ... – PowerPoint PPT presentation

Number of Views:16
Avg rating:3.0/5.0
Slides: 26
Provided by: laxmika
Learn more at: http://charm.cs.uiuc.edu
Category:

less

Transcript and Presenter's Notes

Title: Parallel Molecular Dynamics


1
Parallel Molecular Dynamics
  • A case study
  • Programming for performance
  • Laxmikant Kale
  • http//charm.cs.uiuc.edu

2
Molecular Dynamics
  • Collection of charged atoms, with bonds
  • Newtonian mechanics
  • At each time-step
  • Calculate forces on each atom
  • bonds
  • non-bonded electrostatic and van der Waals
  • Calculate velocities and Advance positions
  • 1 femtosecond time-step, millions needed!
  • Thousands of atoms (1,000 - 100,000)

3
Further MD
  • Use of cut-off radius to reduce work
  • 8 - 14 Å
  • Faraway charges ignored!
  • 80-95 work is non-bonded force computations
  • Some simulations need faraway contributions

4
Scalability
  • The Program should scale up to use a large number
    of processors.
  • But what does that mean?
  • An individual simulation isnt truly scalable
  • Better definition of scalability
  • If I double the number of processors, I should
    be able to retain parallel efficiency by
    increasing the problem size

5
Isoefficiency
  • Quantify scalability
  • How much increase in problem size is needed to
    retain the same efficiency on a larger machine?
  • Efficiency Seq. Time/ (P Parallel Time)
  • parallel time
  • computation communication idle

6
Traditional Approaches
  • Replicated Data
  • All atom coordinates stored on each processor
  • Non-bonded Forces distributed evenly
  • Analysis Assume N atoms, P processors
  • Computation O(N/P)
  • Communication O(N log P)
  • Communication/Computation ratio P log P
  • Fraction of communication increases with number
    of processors, independent of problem size!

Not Scalable
7
Atom decomposition
  • Partition the Atoms array across processors
  • Nearby atoms may not be on the same processor
  • Communication O(N) per processor
  • Communication/Computation O(P)

Not Scalable
8
Force Decomposition
  • Distribute force matrix to processors
  • Matrix is sparse, non uniform
  • Each processor has one block
  • Communication N/sqrt(P)
  • Ratio sqrt(P)
  • Better scalability (can use 100 processors)
  • Hwang, Saltz, et al
  • 6 on 32 Pes 36 on 128 processor

Not Scalable
9
Spatial Decomposition
  • Allocate close-by atoms to the same processor
  • Three variations possible
  • Partitioning into P boxes, 1 per processor
  • Good scalability, but hard to implement
  • Partitioning into fixed size boxes, each a little
    larger than the cutoff disctance
  • Partitioning into smaller boxes
  • Communication O(N/P)

10
Spatial Decomposition in NAMD
  • NAMD 1 used spatial decomposition
  • Good theoretical isoefficiency, but for a fixed
    size system, load balancing problems
  • For midsize systems, got good speedups up to 16
    processors.
  • Use the symmetry of Newtons 3rd law to
    facilitate load balancing

11
Spatial Decomposition
12

Spatial Decomposition
13
FD SD
  • Now, we have many more objects to load balance
  • Each diamond can be assigned to any processor
  • Number of diamonds (3D)
  • 14Number of Patches

14
Bond Forces
  • Multiple types of forces
  • Bonds(2), Angles(3), Dihedrals (4), ..
  • Luckily, each involves atoms in neighboring
    patches only
  • Straightforward implementation
  • Send message to all neighbors,
  • receive forces from them
  • 262 messages per patch!

15
Bonded Forces
  • Assume one patch per processor

C
A
B
16
Implementation
  • Multiple Objects per processor
  • Different types patches, pairwise forces, bonded
    forces,
  • Each may have its data ready at different times
  • Need ability to map and remap them
  • Need prioritized scheduling
  • Charm supports all of these

17
Charm
  • Data Driven Objects
  • Object Groups
  • global object with a representative on each PE
  • Asynchronous method invocation
  • Prioritized scheduling
  • Mature, robust, portable
  • http//charm.cs.uiuc.edu

18
Data driven execution
Scheduler
Scheduler
Message Q
Message Q
19
Load Balancing
  • Is a major challenge for this application
  • especially for a large number of processors
  • Unpredictable workloads
  • Each diamond (force object) and patch encapsulate
    variable amount of work
  • Static estimates are inaccurate
  • Measurement based Load Balancing
  • Very slow variations across timesteps

20
Bipartite graph balancing
  • Background load
  • Patches and angle forces
  • Migratable load
  • Non-bonded forces
  • Bipartite communication graph
  • between migratable and non-migratable objects
  • Challenge
  • Balance Load while minimizing communication

21
Load balancing
  • Collect timing data for several cycles
  • Run heuristic load balancer
  • Several alternative ones
  • Re-map and migrate objects accordingly
  • Registration mechanisms facilitate migration

22
(No Transcript)
23
Performance size of system
24
Performance various machines
25
Speedup
Write a Comment
User Comments (0)
About PowerShow.com