Title: Flexibility and Interoperability in a Parallel MD code
1Flexibility and Interoperability in a Parallel MD
code
- Robert Brunner,
- Laxmikant Kale,
- Jim Phillips
- University of Illinois at Urbana-Champaign
2Contributors
- Principal investigators
- Laxmikant Kale, Klaus Schulten, Robert Skeel
- Development team
- Milind Bhandarkar, Robert Brunner, Attila Gursoy,
Neal Krawetz, Ari Shinozaki, ...
3Middle layers
Applications
Middle Layers Languages, Tools, Libraries
Parallel Machines
4(No Transcript)
5Molecular Dynamics
- Collection of charged atoms, with bonds
- Newtonian mechanics
- At each time-step
- Calculate forces on each atom
- bonds
- non-bonded electrostatic and van der Waals
- Calculate velocities and Advance positions
- 1 femtosecond time-step, millions needed!
- Thousands of atoms (1,000 - 100,000)
6Molecular Dynamics
- Collection of charged atoms, with bonds
- Newtonian mechanics
- At each time-step
- Calculate forces on each atom
- bonds
- non-bonded electrostatic and van der Waals
- Calculate velocities and Advance positions
- 1 femtosecond time-step, millions needed!
- Thousands of atoms (1,000 - 100,000)
7Further MD
- Use of cut-off radius to reduce work
- 8 - 14 Ã…
- Faraway charges ignored!
- 80-95 work is non-bonded force computations
- Some simulations need faraway contributions
8NAMD Design Objectives
- Performance
- Scalability
- To a small and large number of processors
- small and large molecular systems
- Modifiable and extensible design
- Ability to incorporate new algorithms
- Reusing new libraries without re-implementation
- Experimenting with alternate strategies
9Force Decomposition
Distribute force matrix to processors Matrix is
sparse, non uniform Each processor has one
block Communication N/sqrt(P) Ratio
sqrt(P) Better scalability (can use 100
processors) Hwang, Saltz, et al 6 on 32 Pes
36 on 128 processor
Not Scalable
10Spatial Decomposition
11Spatial decomposition modified
12Implementation
- Multiple Objects per processor
- Different types patches, pairwise forces, bonded
forces, - Each may have its data ready at different times
- Need ability to map and remap them
- Need prioritized scheduling
- Charm supports all of these
13Charm
- Data Driven Objects
- Object Groups
- global object with a representative on each PE
- Asynchronous method invocation
- Prioritized scheduling
- Mature, robust, portable
- http//charm.cs.uiuc.edu
14Data driven execution
Scheduler
Scheduler
Message Q
Message Q
15Object oriented design
- Two top level classes
- Patches cubes containing atoms
- Computes force calculation
- Home patches and Proxy patches
- Home patch sends coordinates to proxies, and
receives forces from them - Each compute interacts with local patches only
16Compute hierarchy
- Many compute subclasses
- Allow reuse of coordination code
- Reuse of bookkeeping tasks
- Easy to add new types of force objects
- Example steered molecular dynamics
- Implementor focuses on the new force functionality
17Multi-paradigm programming
- Long-range electrostatic interactions
- Some simulations require this feature
- Contributions of faraway atoms can be computed
infrequently - PVM based library, DPMTA
- Developed at Duke, by John Board, et al
- Patch life cycle
- better expressed as a thread
18Converse
- Supports multi-paradigm programming
- Provides portability
- Makes it easy to implement RTS for new paradigms
- Several languages/libraries
- Charm, threaded MPI, PVM, Java, md-perl, pc,
nexus, Path, Cid, CC,..
19Namd2 with Converse
20Separation of concerns
- Different developers, with different interests
and knowledge, can contribute effectively - Separation of communication and parallel logic
- Threads to encapsulate life-cycle of patches
- Adding new integrator, improving performance, new
MD ideas, can be performed modularly and
independently
21Load balancing
- Collect timing data for several cycles
- Run heuristic load balancer
- Several alternative ones
- Re-map and migrate objects accordingly
- Registration mechanisms facilitate migration
- Needs a separate talk!
22Performance size of system
23Performance various machines
24Speedup
25Conclusion
- Multi-domain decomposition works well for
dynamically evolving, or irregular apps - When supported by data driven objects (Charm),
user level threads, call backs - Multi-paradigm programming is effective!
- Object oriented parallel programming
- promotes reuse ,
- good performance
- Measurement based load balancing