Flexibility and Interoperability in a Parallel MD code presentation

About This Presentation

Transcript and Presenter's Notes

Title: Flexibility and Interoperability in a Parallel MD code

1
Flexibility and Interoperability in a Parallel MD
code

Robert Brunner,
Laxmikant Kale,
Jim Phillips
University of Illinois at Urbana-Champaign

2
Contributors

Principal investigators
Laxmikant Kale, Klaus Schulten, Robert Skeel
Development team
Milind Bhandarkar, Robert Brunner, Attila Gursoy,
Neal Krawetz, Ari Shinozaki, ...

3
Middle layers
Applications
Middle Layers Languages, Tools, Libraries
Parallel Machines
4
(No Transcript)
5
Molecular Dynamics

Collection of charged atoms, with bonds
Newtonian mechanics
At each time-step
Calculate forces on each atom
bonds
non-bonded electrostatic and van der Waals
Calculate velocities and Advance positions
1 femtosecond time-step, millions needed!
Thousands of atoms (1,000 - 100,000)

6
Molecular Dynamics

Collection of charged atoms, with bonds
Newtonian mechanics
At each time-step
Calculate forces on each atom
bonds
non-bonded electrostatic and van der Waals
Calculate velocities and Advance positions
1 femtosecond time-step, millions needed!
Thousands of atoms (1,000 - 100,000)

7
Further MD

Use of cut-off radius to reduce work
8 - 14 Å
Faraway charges ignored!
80-95 work is non-bonded force computations
Some simulations need faraway contributions

8
NAMD Design Objectives

Performance
Scalability
To a small and large number of processors
small and large molecular systems
Modifiable and extensible design
Ability to incorporate new algorithms
Reusing new libraries without re-implementation
Experimenting with alternate strategies

9
Force Decomposition
Distribute force matrix to processors Matrix is
sparse, non uniform Each processor has one
block Communication N/sqrt(P) Ratio
sqrt(P) Better scalability (can use 100
processors) Hwang, Saltz, et al 6 on 32 Pes
36 on 128 processor
Not Scalable
10
Spatial Decomposition
11
Spatial decomposition modified
12
Implementation

Multiple Objects per processor
Different types patches, pairwise forces, bonded
forces,
Each may have its data ready at different times
Need ability to map and remap them
Need prioritized scheduling
Charm supports all of these

13
Charm

Data Driven Objects
Object Groups
global object with a representative on each PE
Asynchronous method invocation
Prioritized scheduling
Mature, robust, portable
http//charm.cs.uiuc.edu

14
Data driven execution
Scheduler
Scheduler
Message Q
Message Q
15
Object oriented design

Two top level classes
Patches cubes containing atoms
Computes force calculation
Home patches and Proxy patches
Home patch sends coordinates to proxies, and
receives forces from them
Each compute interacts with local patches only

16
Compute hierarchy

Many compute subclasses
Allow reuse of coordination code
Reuse of bookkeeping tasks
Easy to add new types of force objects
Example steered molecular dynamics
Implementor focuses on the new force functionality

17
Multi-paradigm programming

Long-range electrostatic interactions
Some simulations require this feature
Contributions of faraway atoms can be computed
infrequently
PVM based library, DPMTA
Developed at Duke, by John Board, et al
Patch life cycle
better expressed as a thread

18
Converse

Supports multi-paradigm programming
Provides portability
Makes it easy to implement RTS for new paradigms
Several languages/libraries
Charm, threaded MPI, PVM, Java, md-perl, pc,
nexus, Path, Cid, CC,..

19
Namd2 with Converse
20
Separation of concerns

Different developers, with different interests
and knowledge, can contribute effectively
Separation of communication and parallel logic
Threads to encapsulate life-cycle of patches
Adding new integrator, improving performance, new
MD ideas, can be performed modularly and
independently

21
Load balancing

Collect timing data for several cycles
Run heuristic load balancer
Several alternative ones
Re-map and migrate objects accordingly
Registration mechanisms facilitate migration
Needs a separate talk!

22
Performance size of system
23
Performance various machines
24
Speedup
25
Conclusion

Multi-domain decomposition works well for
dynamically evolving, or irregular apps
When supported by data driven objects (Charm),
user level threads, call backs
Multi-paradigm programming is effective!
Object oriented parallel programming
promotes reuse ,
good performance
Measurement based load balancing

Write a Comment

User Comments (0)

About PowerShow.com

Flexibility and Interoperability in a Parallel MD code PowerPoint PPT Presentation