Title: Flexibility and Interoperability in a Parallel MD code
1Flexibility and Interoperability in a Parallel MD
code
- Robert Brunner,
- Laxmikant Kale,
- Jim Phillips
- University of Illinois at Urbana-Champaign
2Contributors
- Principal investigators
- Laxmikant Kale, Klaus Schulten, Robert Skeel
- Development team
- Milind Bhandarkar, Robert Brunner, Attila Gursoy,
Neal Krawetz, Ari Shinozaki, ...
3Middle layers
Applications
Middle Layers Languages, Tools, Libraries
Parallel Machines
4(No Transcript)
5Molecular Dynamics
Collection of charged atoms, with
bonds Newtonian mechanics At each
time-step Calculate forces on each atom
bonds non-bonded electrostatic and van der
Waals Calculate velocities and Advance
positions 1 femtosecond time-step, millions
needed! Thousands of atoms (1,000 - 100,000)
6Further MD
Use of cut-off radius to reduce work 8 - 14
Å Faraway charges ignored! 80-95 work is
non-bonded force computations Some simulations
need faraway contributions
7NAMD Design Objectives
- Performance
- Scalability
- To a small and large number of processors
- small and large molecular systems
- Modifiable and extensible design
- Ability to incorporate new algorithms
- Reusing new libraries without re-implementation
- Experimenting with alternate strategies
8Force Decomposition
Distribute force matrix to processors Matrix is
sparse, non uniform Each processor has one
block Communication N/sqrt(P) Ratio
sqrt(P) Better scalability (can use 100
processors) Hwang, Saltz, et al 6 on 32 Pes
36 on 128 processor
Not Scalable
9Spatial Decomposition
10Spatial decomposition modified
11Implementation
- Multiple Objects per processor
- Different types patches, pairwise forces, bonded
forces, - Each may have its data ready at different times
- Need ability to map and remap them
- Need prioritized scheduling
- Charm supports all of these
12Charm
- Data Driven Objects
- Object Groups
- global object with a representative on each PE
- Asynchronous method invocation
- Prioritized scheduling
- Mature, robust, portable
- http//charm.cs.uiuc.edu
13Data driven execution
Scheduler
Scheduler
Message Q
Message Q
14Object oriented design
- Two top level classes
- Patches cubes containing atoms
- Computes force calculation
- Home patches and Proxy patches
- Home patch sends coordinates to proxies, and
receives forces from them - Each compute interacts with local patches only
15Compute hierarchy
- Many compute subclasses
- Allow reuse of coordination code
- reuse of bookkeeping tasks
- Easy to add new type of force object
- Example steered molecular dynamics
- Implementer focuses on the new force functionality
16Multi-paradigm programming
Long-range electrostatic interactions Some
simulations require this Contributions of faraway
atoms can be calculated infrequently PVM based
library, DPMTA developed at Duke by John Board et
al Patch life cycle Better expressed as a thread
17Converse
Supports multi-paradigm programming Provides
portability Makes it easy to implement RTS for
new paradigms Several languages/libraries Charm
, threaded MPI, PVM, Java, md-perl, pc, Nexus,
Path, Cid, CC, DP, Agents,..
18Namd2 with Converse
19Separation of concerns
- Different developers, with different interests
and knowledge, can contribute effectively - Separation of communication and parallel logic
- Threads to encapsulate life-cycle of patches
- Adding new integrator, improving performance, new
MD ideas, can be performed modularly and
independently
20Load balancing
- Collect timing data for several cycles
- Run heuristic load balancer
- Several alternative ones
- Re-map and migrate objects accordingly
- Registration mechanisms facilitate migration
- Needs a separate talk!
21Performance size of system
22Performance various machines
23Speedup
24Conclusion
- Multi-domain decomposition works well for
dynamically evolving, or irregular apps - When supported by data driven objects (Charm),
user level threads, call backs - Multi-paradigm programming is effective!
- Object oriented parallel programming
- promotes reuse ,
- good performance
- Measurement based load balancing