Title: The component architecture of Open MPI: enabling Thirdparty collective algorithms
1The component architecture of Open MPIenabling
Third-party collective algorithms
- Aug 27, 2005
- Sogang University
- Distributed Computing Communication Laboratory
- Eunseok, Kim
2Outline
- Introduction
- Adding a new algorithms to MPI implementation
- Common Interface Approach
- Component-based Approach
- What is Open MPI?
- Design goals
- Architecture
- Collective component in Open MPI
- Example Collective Components
- Conclusion
3Introduction
- Challenges
- Trend increasing the number of processors
- Clusters a widely use architecture
- Scalability issues
- Process control
- Resources exhaustion
- Latency awareness and management
- Optimized collectives
- Fault Tolerance
- Network transmission errors
- Process fault tolerance
- Challenges must be solved by developing new one
- But hard to do
4Adding a new algorithms to MPI implementation
- Common Interface Approach
- Component-based Approach
5Common Interface Approach
- Use the MPI profiling layer
- Allowing third-party libraries
- Without access to source code
- Automatically use new routine without
modification - But allows only one version of overloaded
function - Linker semantic
- Edit an Existing MPI Implementation
- Needs source code and license
- Unmodified MPI application
- But hard to modify
6Common Interface Approach
- Create a New MPI implementation
- Complete control over the entire MPI
implementation - Ex) PA C-X MPI enable to use meta-computing
environment - But extremely hard to do
- Use Alternate function name
- The Simplest way
- Ex) New_Barrier, MPI_Barrier
- Application must be
- Modified
- Can be solved by preprocessor macros
- Recompiled
7Component-based Approach
- Component
- A set of top-level routines
- Component-based approach can solve many problems
which are caused in common interface approach - Open MPI
8What is Open MPI?
- Production-quality
- Easy to develop new algorithm
- Enabling run-time composition of independent s/w
9Design goals
- Full MPI-2 standard conformance
- High Performance
- Fault tolerant (optional)
- Thread safety and concurrency (MPI_THREAD_MULTIPLE
) - Based On Component Architecture
- Flexible run-time environment
- Portable
- Maintainable
- Production quality
- Single library support all networks
- Support for multiple networks
10MPI implementation overview
User application
MPI API
MPI implementation internals
11Architecture of Open MPI
User application
MPI API
MPI Component Architecture (MCA)
12Architecture of Open MPI
- MCA (MPI Component Architecture)
- Backbone component
- Provides management services
- Pass parameter
- Finds and invokes components
- Component frameworks
- Each Major functional area Has a corresponding
back-end component frameworks - Which manages modules
- Discover, load, use, and unload modules on demand
- Modules
- Self-contained software
13Component frameworks
- Point-to-point Transport Layer (PTL)
- Allow the use of multiple networks
- Ex) TCP/IP, Myrinet etc
- Point-to-point Management Layer (PML)
- Message fragmentation, scheduling, and
re-assembly services - Collective Communication (COLL)
- Process Topology (TOPO)
- May benefit from tolopogy-awareness
- Reduction Operation
- Parallel I/O
14Advantages of Component architecture
- Multiple components within a single MPI process
- Ex) using several network device drivers by PTL
- Providing a convenient way to use third-party s/w
- Providing a fine-grained, run-time,
user-controlled component selection mechanism
15Collective Components
- A component paired with a communicator
- Becomes a module
- Top-level MPI collective functions
- Reduced to thin wrappers
- Error checking for parameters
- One coll module
- Assigned to each communicator
- Ex) MPI_BCAST
- simply checks the passed parameters
- Invokes back-end broadcast functions
16Implementation models
- Layered over Point-to-Point
- Utilizes MPI point-to-point functions
- Ex) MPI_SEND, MPI_RECV etc
- Concentration on core algorithm
- Alternate Communication Channels
- Ex) Myrinet, UDP multicast etc
- Hierarchical coll Component
- basic component
- Basic implementation for all collective
operations - More complex model (bridge)
- Using Hierarchy of coll modules
- Single, top-level MPI collective
- Allows each network to utilize its own optimized
coll component
17Example of hierarchical coll component
18Component / Module Lifecycle
Component
- Component
- Open per-process initialization
- Selection per-scope determine if want to use
- Close per-process finalization
- Module
- Initialization if component selected
- Normal usage / checkpoint
- Finalization per-scope cleanup
Module
Comp.
19Coll components lifecycle
- Selection
- Coll frameworks queries each available coll
component - Consider some factors like run-time env or
topology - Choose the most optimized one
- Initialization
- Receive the target communicator as parameter
- After setup, a module with local state of the
target communicators is returned. - Potential run-time optimization
- Pre-computation
20Coll components lifecycle
- Checkpoint / Restart
- coll modules layered on top of point-to-point
functionality - Point-to-point modules perform it
- Optional
- Normal usage
- Invoke the modules collective routines
- When a collective function is invoked on the
communicator - Finalization
- Occurred when the communicator is destroyed
21Component / Module Interfaces
- Emphasis simplicity
- Main groups of interface functions
- One-time (per process) initialization
- Ex) determine threading characteristics during
MPI_INIT - Per-scope query
- Per-scope initialization
- Normal usage and checkpoint / restart
functionality - Per-scope finalization
- One-time (per process) finalization
22Ex of Coll component interface
23Ex of Coll module interface
24Example Components
- The basic Component
- A full set of intra and intercommunicator
collectives - The smp Component
- Maximizing b/w conservation across multiple
levels of network latency - Ex) MagPIe (communication of uniporcessors across
WAN) - Segmenting communicators into group
- Communicating with other group through
representatives
25Broadcast Scenario
26Pseudocode for the Scenario
27Conclusion
- Third-party researchers can develop and test new
algorithms easily. - Through standard component architecture
- Theres already good architecture.
- Then should we optimize collective operations?