Parallel Molecular Dynamics - PowerPoint PPT Presentation

About This Presentation
Title:

Parallel Molecular Dynamics

Description:

Robert Brunner, Andrew Dalke, Attila Gursoy, Bill Humphrey, Mark Nelson. NAMD2: ... C : fashionable vs. conservative. Aggressive methods where they matter ... – PowerPoint PPT presentation

Number of Views:149
Avg rating:3.0/5.0
Slides: 41
Provided by: laxmika
Learn more at: http://charm.cs.uiuc.edu
Category:

less

Transcript and Presenter's Notes

Title: Parallel Molecular Dynamics


1
Parallel Molecular Dynamics
  • Application Oriented
  • Computer Science Research
  • Laxmikant Kale
  • http//charm.cs.uiuc.edu

2
Outline
  • What is needed for HPC to succeed?
  • Parallelization of Molecular Dynamics
  • Aggressive Parallel decomposition
  • Load Balancing and performance
  • Multiparadigm programming
  • Collaborative Interdisciplinary Research
  • Comments and lessons

3
Contributors
  • PI s
  • Laxmikant Kale, Klaus Schulten, Robert Skeel
  • NAMD 1
  • Robert Brunner, Andrew Dalke, Attila Gursoy,
    Bill Humphrey, Mark Nelson
  • NAMD2
  • M. Bhandarkar, R. Brunner, A. Gursoy, J. Philips,
    N.Krawetz, A. Shinozaki, K. Varadarajan,

4
Parallel Computing Research
  • Trends
  • application centered CS research
  • Isolated CS research
  • Drawback of both
  • Needed
  • Computer Science centered, yet application
    oriented research

5
Middle layers
Applications
Middle Layers Languages, Tools, Libraries
Parallel Machines
6
(No Transcript)
7
Molecular Dynamics
  • Collection of charged atoms, with bonds
  • Newtonian mechanics
  • At each time-step
  • Calculate forces on each atom
  • bonds
  • non-bonded electrostatic and van der Waals
  • Calculate velocities and Advance positions
  • 1 femtosecond time-step, millions needed!
  • Thousands of atoms (1,000 - 100,000)

8
Further MD
  • Use of cut-off radius to reduce work
  • 8 - 14 Å
  • Faraway charges ignored!
  • 80-95 work is non-bonded force computations
  • Some simulations need faraway contributions

9
Scalability
  • The Program should scale up to use a large number
    of processors.
  • But what does that mean?
  • An individual simulation isnt truly scalable
  • Better definition of scalability
  • If I double the number of processors, I should
    be able to retain parallel efficiency by
    increasing the problem size

10
Isoefficiency
  • Quantify scalability
  • How much increase in problem size is needed to
    retain the same efficiency on a larger machine?
  • Efficiency Seq. Time/ (P Parallel Time)
  • parallel time
  • computation communication idle

11
Traditional Approaches
  • Replicated Data
  • All atom coordinates stored on each processor
  • Non-bonded Forces distributed evenly
  • Analysis Assume N atoms, P processors
  • Computation O(N/P)
  • Communication O(N log P)
  • Communication/Computation ratio P log P
  • Fraction of communication increases with number
    of processors, independent of problem size!

Not Scalable
12
Atom decomposition
  • Partition the Atoms array across processors
  • Nearby atoms may not be on the same processor
  • Communication O(N) per processor
  • Communication/Computation O(P)

Not Scalable
13
Force Decomposition
  • Distribute force matrix to processors
  • Matrix is sparse, non uniform
  • Each processor has one block
  • Communication N/sqrt(P)
  • Ratio sqrt(P)
  • Better scalability (can use 100 processors)
  • Hwang, Saltz, et al
  • 6 on 32 Pes 36 on 128 processor

Not Scalable
14
Spatial Decomposition
  • Allocate close-by atoms to the same processor
  • Three variations possible
  • Partitioning into P boxes, 1 per processor
  • Good scalability, but hard to implement
  • Partitioning into fixed size boxes, each a little
    larger than the cutoff disctance
  • Partitioning into smaller boxes
  • Communication O(N/P)

15
Spatial Decomposition in NAMD
  • NAMD 1 used spatial decomposition
  • Good theoretical isoefficiency, but for a fixed
    size system, load balancing problems
  • For midsize systems, got good speedups up to 16
    processors.
  • Use the symmetry of Newtons 3rd law to
    facilitate load balancing

16
Spatial Decomposition
17

Spatial Decomposition
18
FD SD
  • Now, we have many more objects to load balance
  • Each diamond can be assigned to any processor
  • Number of diamonds (3D)
  • 14Number of Patches

19
Bond Forces
  • Multiple types of forces
  • Bonds(2), Angles(3), Dihedrals (4), ..
  • Luckily, each involves atoms in neighboring
    patches only
  • Straightforward implementation
  • Send message to all neighbors,
  • receive forces from them
  • 262 messages per patch!

20
Bonded Forces
  • Assume one patch per processor

C
A
B
21
Implementation
  • Multiple Objects per processor
  • Different types patches, pairwise forces, bonded
    forces,
  • Each may have its data ready at different times
  • Need ability to map and remap them
  • Need prioritized scheduling
  • Charm supports all of these

22
Charm
  • Data Driven Objects
  • Object Groups
  • global object with a representative on each PE
  • Asynchronous method invocation
  • Prioritized scheduling
  • Mature, robust, portable
  • http//charm.cs.uiuc.edu

23
Data driven execution
Scheduler
Scheduler
Message Q
Message Q
24
Load Balancing
  • Is a major challenge for this application
  • especially for a large number of processors
  • Unpredictable workloads
  • Each diamond (force object) and patch encapsulate
    variable amount of work
  • Static estimates are inaccurate
  • Measurement based Load Balancing
  • Very slow variations across timesteps

25
Bipartite graph balancing
  • Background load
  • Patches and angle forces
  • Migratable load
  • Non-bonded forces
  • Bipartite communication graph
  • between migratable and non-migratable objects
  • Challenge
  • Balance Load while minimizing communication

26
Load balancing
  • Collect timing data for several cycles
  • Run heuristic load balancer
  • Several alternative ones
  • Re-map and migrate objects accordingly
  • Registration mechanisms facilitate migration
  • Needs a separate talk!

27
Before and After
28
Before and After
29
(No Transcript)
30
Performance size of system
31
Performance various machines
32
Speedup
33
Multi-paradigm programming
  • Long-range electrostatic interactions
  • Some simulations require this
  • Contributions of faraway atoms can be calculated
    infrequently
  • PVM based library, DPMTA
  • developed at Duke by John Board et al
  • Patch life cycle
  • Better expressed as a thread

34
Converse
  • Supports multi-paradigm programming
  • Provides portability
  • Makes it easy to implement RTS for new paradigms
  • Several languages/libraries
  • Charm, threaded MPI, PVM, Java, md-perl, pc,
    Nexus, Path, Cid, CC, DP, Agents,..

35
Namd2 with Converse
36
NAMD2
  • In production use
  • Internally for about a year
  • Several simulations completed/published
  • Fastest MD program? We think so
  • Modifiable/extensible
  • Steered MD
  • Free energy calculations

37
Lessons for CSE
  • Technical lessons
  • Multiple-domain (patch) decomposition provides
    necessary flexibility
  • Data driven objects and threads is a great combo
  • Measurement based load balancing is better
  • Multi-paradigm parallel programming works!
  • Integrate independently developed libraries
  • Use appropriate paradigm for each component

38
Real Application?
  • Drawbacks
  • Need to spend effort on mundane details not
    germane to CS research
  • Production program complicates structure

39
Real Application for CS research?
  • Benefits
  • Subtle and complex research problems uncovered
    only with real application
  • Satisfaction of real concrete contribution
  • With careful planning, you can truly enrich the
    middle layers
  • Bring back a rich variety of relevant CS problems
  • Apply to other domains Rockets? Casting?

40
Collaboration lessons
  • Use conservative methods..
  • C fashionable vs. conservative
  • Aggressive methods where they matter
  • Account for differing priorities and objectives
Write a Comment
User Comments (0)
About PowerShow.com