Flexibility and Interoperability in a Parallel MD code - PowerPoint PPT Presentation

About This Presentation
Title:

Flexibility and Interoperability in a Parallel MD code

Description:

Collection of [charged] atoms, with bonds. Newtonian mechanics. At each ... Charm , threaded MPI, PVM, Java, md-perl, pc , Nexus, Path, Cid, CC , DP, Agents, ... – PowerPoint PPT presentation

Number of Views:20
Avg rating:3.0/5.0
Slides: 25
Provided by: laxmika
Learn more at: http://charm.cs.uiuc.edu
Category:

less

Transcript and Presenter's Notes

Title: Flexibility and Interoperability in a Parallel MD code


1
Flexibility and Interoperability in a Parallel MD
code
  • Robert Brunner,
  • Laxmikant Kale,
  • Jim Phillips
  • University of Illinois at Urbana-Champaign

2
Contributors
  • Principal investigators
  • Laxmikant Kale, Klaus Schulten, Robert Skeel
  • Development team
  • Milind Bhandarkar, Robert Brunner, Attila Gursoy,
    Neal Krawetz, Ari Shinozaki, ...

3
Middle layers
Applications
Middle Layers Languages, Tools, Libraries
Parallel Machines
4
(No Transcript)
5
Molecular Dynamics
Collection of charged atoms, with
bonds Newtonian mechanics At each
time-step Calculate forces on each atom
bonds non-bonded electrostatic and van der
Waals Calculate velocities and Advance
positions 1 femtosecond time-step, millions
needed! Thousands of atoms (1,000 - 100,000)
6
Further MD
Use of cut-off radius to reduce work 8 - 14
Å Faraway charges ignored! 80-95 work is
non-bonded force computations Some simulations
need faraway contributions
7
NAMD Design Objectives
  • Performance
  • Scalability
  • To a small and large number of processors
  • small and large molecular systems
  • Modifiable and extensible design
  • Ability to incorporate new algorithms
  • Reusing new libraries without re-implementation
  • Experimenting with alternate strategies

8
Force Decomposition
Distribute force matrix to processors Matrix is
sparse, non uniform Each processor has one
block Communication N/sqrt(P) Ratio
sqrt(P) Better scalability (can use 100
processors) Hwang, Saltz, et al 6 on 32 Pes
36 on 128 processor
Not Scalable
9
Spatial Decomposition
10
Spatial decomposition modified
11
Implementation
  • Multiple Objects per processor
  • Different types patches, pairwise forces, bonded
    forces,
  • Each may have its data ready at different times
  • Need ability to map and remap them
  • Need prioritized scheduling
  • Charm supports all of these

12
Charm
  • Data Driven Objects
  • Object Groups
  • global object with a representative on each PE
  • Asynchronous method invocation
  • Prioritized scheduling
  • Mature, robust, portable
  • http//charm.cs.uiuc.edu

13
Data driven execution
Scheduler
Scheduler
Message Q
Message Q
14
Object oriented design
  • Two top level classes
  • Patches cubes containing atoms
  • Computes force calculation
  • Home patches and Proxy patches
  • Home patch sends coordinates to proxies, and
    receives forces from them
  • Each compute interacts with local patches only

15
Compute hierarchy
  • Many compute subclasses
  • Allow reuse of coordination code
  • reuse of bookkeeping tasks
  • Easy to add new type of force object
  • Example steered molecular dynamics
  • Implementer focuses on the new force functionality

16
Multi-paradigm programming
Long-range electrostatic interactions Some
simulations require this Contributions of faraway
atoms can be calculated infrequently PVM based
library, DPMTA developed at Duke by John Board et
al Patch life cycle Better expressed as a thread
17
Converse
Supports multi-paradigm programming Provides
portability Makes it easy to implement RTS for
new paradigms Several languages/libraries Charm
, threaded MPI, PVM, Java, md-perl, pc, Nexus,
Path, Cid, CC, DP, Agents,..
18
Namd2 with Converse
19
Separation of concerns
  • Different developers, with different interests
    and knowledge, can contribute effectively
  • Separation of communication and parallel logic
  • Threads to encapsulate life-cycle of patches
  • Adding new integrator, improving performance, new
    MD ideas, can be performed modularly and
    independently

20
Load balancing
  • Collect timing data for several cycles
  • Run heuristic load balancer
  • Several alternative ones
  • Re-map and migrate objects accordingly
  • Registration mechanisms facilitate migration
  • Needs a separate talk!

21
Performance size of system
22
Performance various machines
23
Speedup
24
Conclusion
  • Multi-domain decomposition works well for
    dynamically evolving, or irregular apps
  • When supported by data driven objects (Charm),
    user level threads, call backs
  • Multi-paradigm programming is effective!
  • Object oriented parallel programming
  • promotes reuse ,
  • good performance
  • Measurement based load balancing
Write a Comment
User Comments (0)
About PowerShow.com