Parallel Computing, MPI and FLASH March 23, 2005 - PowerPoint PPT Presentation

1 / 37
About This Presentation
Title:

Parallel Computing, MPI and FLASH March 23, 2005

Description:

Different combinations of modules are used for particular problem setups ... (Setup) Driver. The ASC/Alliances Center for ... Any files in setup take precedence ... – PowerPoint PPT presentation

Number of Views:35
Avg rating:3.0/5.0
Slides: 38
Provided by: AndreaMa3
Category:

less

Transcript and Presenter's Notes

Title: Parallel Computing, MPI and FLASH March 23, 2005


1
Parallel Computing, MPI and FLASHMarch 23, 2005
2
What is Parallel Computing ?And why is it useful
  • Parallel Computing is more than one cpu working
    together on one problem
  • It is useful when
  • Large problem, could take very long
  • Data size too big to fit in the memory of one
    processor
  • When to parallelize
  • Problem could be subdivided into relatively
    independent tasks
  • How much to parallelize
  • While the speedup in computation relative to
    single processor is of the order of number of
    processors

3
Parallel paradigms
  • SIMD Single instruction multiple data
  • Processors work in lock-step
  • MIMD Multiple instruction multiple data
  • Processors do their own thing with occasional
    synchronization
  • Shared Memory
  • One way communications
  • Distributed Memory
  • Message passing
  • Loosely Coupled
  • When the process on each cpu is fairly self
    contained and relatively independent of processes
    on other cpus
  • Tightly Coupled
  • When cpus need to communicate with each other
    frequently

4
How to Parallelize
  • Divide a problem into a set of mostly independent
    tasks
  • Partitioning a problem
  • Tasks get their own data
  • Localize a task
  • They operate on their own data for the most part
  • Try to make it self contained
  • Occasionally
  • Data may be needed from other tasks
  • Inter-process communication
  • Synchronization may be required between tasks
  • Global operation
  • Map tasks to different processors
  • One processor may get more than one task
  • Task distribution should be well balanced

5
New Code Components
  • Initialization
  • Query parallel state
  • Identify process
  • Identify number of processes
  • Exchange data between processes
  • Local, Global
  • Synchronization
  • Barriers, Blocking Communication, Locks
  • Finalization

6
MPI
  • Message Passing Interface, standard for
    distributed memory model of parallelism
  • MPI-2 supports one-way communication, commonly
    associated with shared memory operations
  • Works with communicators a collection of
    processors
  • MPI_COMM_WORLD default
  • Has support for lowest level communication
    operations and composite operations
  • Has blocking and non-blocking operations

7
Low level Operations in MPI
  • MPI_Init
  • MPI_Comm_size
  • Find number of processors
  • MPI_Comm_rank
  • Find my processor number
  • MPI_Send/Recv
  • Communicate with other processors one at a time
  • MPI_Bcast
  • Global data transmission
  • MPI_Barrier
  • Synchronization
  • MPI_Finalize

8
Advanced Constructs in MPI
  • Composite Operations
  • Gather/Scatter
  • Allreduce
  • Alltoall
  • Cartesian grid operations
  • Shift
  • Communicators
  • Creating subgroups of processors to operate on
  • User-defined Datatypes
  • I/O
  • Parallel file operations

9
Communicators
COMM1
COMM2
10
Communication Patterns
11
Communication Overheads
  • Latency vs. Bandwidth
  • Blocking vs. Non-Blocking
  • Overlap
  • Buffering and copy
  • Scale of communication
  • Nearest neighbor
  • Short range
  • Long range
  • Volume of data
  • Resource contention for links
  • Efficiency
  • Hardware, software, communication method

12
Parallelism in FLASH
  • Short range communications
  • Nearest neighbor
  • Long range communications
  • Regridding
  • Other global operations
  • All-reduce operations on physical quantities
  • Specific to solvers
  • multi-pole method
  • FFT based solvers

13
Domain Decomposition
P1
P0
P2
P3
14
Border Cells / Ghost Points
  • When splitting up solnData, need data from other
    processors.
  • Need a layer of cells from each processor
  • Need to update each time step

15
Border/Ghost Cells
Short Range communication
16
Two MPI Methods for doing it
  • MPI_Cart_create
  • Create topology
  • MPE_Decomp1d
  • Domain decomp on topology
  • MPI_Cart_shift
  • Whos on the left/right?
  • MPI_SendRecv
  • Ghost cells left
  • MPI_SendRecv
  • Ghost cells right
  • MPI_Comm_rank
  • MPI_Comm_size
  • Manually decompose grid over processors
  • Calculate left/right
  • MPI_Send/MPI_Recv
  • Carefully to avoid deadlocks

17
Adaptive Grid Issues
  • Discretization not uniform
  • Simple left-right guard cell fills inadequate
  • Adjacent grid points may not be mapped to the
    nearest neighbors in processors topology
  • Redistribution of work necessary

18
Regridding
  • Change in number of cells/blocks
  • Some processors get more work than others
  • Load imbalance
  • Redistribute data to even out work on all
    processors
  • Long range communications
  • Large quantities of data moved

19
Regridding
20
Other parallel operations in FLASH
  • Global max/sum etc (Allreduce)
  • Physical quantities
  • In solvers
  • Performance monitoring
  • Alltoall
  • FFT based solver on UG
  • User defined datatypes and file operations
  • Parallel I/O

21
A Little FLASH History
BAM
  • FLASH0
  • Paramesh2, Prometheus and EOS/Burn
  • FLASH1
  • Smoothing out the smash
  • First form of module architecture inheritance
  • FLASH2
  • Untangle modules from each other (Grid)
  • dBase
  • Concept of levels of users
  • FLASH3
  • Stricter interface control module architecture
  • Taming the database

22
What FLASH Provides
  • Physics
  • Hydrodynamics
  • PPM
  • MHD
  • Relativistic PPM
  • Nuclear Physics
  • Gravity
  • Cosmology
  • Particles
  • Infrastructure
  • Setup
  • AMR Paramesh
  • Regular testing
  • Parallel I/O
  • hdf5, pnetcdf,
  • Profiling
  • Runtime and post-processing visualization

23
FLASH Code Basics
  • An application code, composed of units/modules.
    Particular modules are set up together to run
    different physics problems.
  • Performance, Testing, Usability, Portability
  • Fortran, C, Python,
  • More than 560,000 lines of code
  • 75 code, 25 comment
  • Very portable
  • Scaling to 1000s of procs

Internal Release
24
Basic Computational Unit Block
  • The adaptive grid is composed of blocks
  • All blocks same dimensions
  • Cover different fraction of the physical domain.
  • Blocks at different levels of refinement have
    different grid spacing.

25
Structure of FLASH Modules (not exact!)
Materials
Hydro
Source_terms
Gravity
init() tstep() hydro3d()
init() tstep() grav3d()
init() tstep() src_terms()
eos3d() eos1d() eos()
26
Whats a FLASH Module?
  • FLASH basic architecture unit Modules
  • Component of the FLASH code providing a
    particular functionality
  • Different combinations of modules are used for
    particular problem setups
  • Ex driver, hydro, mesh, dBase, I/O
  • Fake inheritance by use of directory structure
  • Modules communicate
  • Driver
  • Variable Database

27
Abstract FLASH2 Module
  • 1. Meta-data (Configuration Info)
  • Interface with driver and setup
  • Variable/parameter registration
  • Variable attributes
  • Module Requirements

FLASH Component
  • 2. Interface Wrapper
  • Exchange with variable database
  • Prep data for kernels
  • 3. Physics Kernel(s)
  • Single patch, single proc functions
  • written in any language
  • Can be sub-classed

FLASH Application
driver
Collection of Flash2 Modules
Database
28
Module Implementations
  • FLASH2 Modules are directory trees
  • source/hydro/explicit/split/ppm
  • Each level might have source
  • Source relevant for all directories/implementation
    s below
  • Preserves interfaces
  • Allows flexible implementations

29
Inheritance Through Directories Hydro
An empty hydro init, hydro, tstep are
defaults on top of the directory tree.
init
hydro
tstep
Explicit
Hydro/Explicit Replaces tstep Introduces
shock No hydro Implemented yet!
tstep
tstep
shock
Split
Hydro/Explicit/Split hydro implemented Uses
general explicit tstep Uses general shock
Replaces init
hydro
hydro
implemtation
init
implemtation
DeltaForm
30
The Module Config File
  • Declare solution variables, fluxes
  • Declare runtime parameters
  • Sets defaults
  • Lists required, exclusive modules
  • Config files are additive down the directory tree
    - no replacements

31
Setup Building an Application
Configuration Tool (Setup)
Mesh
Database
32
FLASH Setup Implements Architecture
  • Python code links together needed physics and
    tools for a problem
  • object
  • Traverses modules to get implementations
  • Determines solution data storage list
  • Creates list of parameters from modules
  • Configures Makefiles properly

33
Pulling it All Together
  • Choose a problem setup
  • Run setup to configure that problem
  • Everything is in a new top-level directory
  • object
  • Make
  • Run
  • Flash.par for runtime parameters
  • Defaults already set from particular modules

34
Setups
  • A basic problem setup
  • Config file
  • Required physics modules
  • Flash.par
  • Default list runtime parameter configuration
  • Init_block
  • Initial conditions for the problem set block by
    block
  • Many other possible files
  • Driver, Refinement algorithms, User defined
    boundary conditions
  • Any files in setup take precedence

35
Provided Driver
  • Provided
  • Second order, state form, strang split
  • New drivers
  • Put in setups
  • Welcome contributions

set time step hydro sourceTerms cosmology radiatio
n particles gravity set time step (repeat
physics) Mesh_updateGrid
evolve.F90
flash.F90
Initialize() Loop over timesteps
evolvePhysics() timestep() output()
visualize() End loop Finalize()
36
FLASH Applications
  • Compressible reactive flow
  • Wide range of length of time scales
  • Many interacting physical processes
  • Only indirect validation possible for the
    astrophysics
  • Many people in collaboration

Flame-vortex interactions
Compressible turbulence
Shocked cylinder
Nova outbursts on white dwarfs
Intracluster interactions
Cellular detonations
White Dwarf deflagration
Helium burning on neutron stars
Rayleigh-Taylor instability
37
  • And that brings us to
  • questions and discussion.
Write a Comment
User Comments (0)
About PowerShow.com