Title: caller callee data representation
1 The Parallel Remote Method Invocation in
Multi-threaded Environment Keming Zhang,
Kostadin Damevski, Steve Parker SCI Institute,
University of Utah
PRMI Design
Introduction
MPI and Threads
One approach to build parallel component
architecture is through a Parallel Remote Method
Invocation (PRMI). A PRMI is an extension of Java
Remote Method Invocation (RMI). A group of
processes (parallel caller) can collectively
invoke an interface at another group of processes
(parallel callee) through PRMI. All processes
share the same interface, but the arguments can
be distributed over all processes arbitrarily.
PRMI design needs specify how M callers invoke N
callees and how distributed data can be
efficiently transmitted between M callers and N
callees. In a multi-threaded environment, PRMI
design becomes more challenging because
synchronization becomes more complex and most MPI
implementation are not thread-safe. In this work,
a comparison of RMI and PRMI is made, issues and
challenges of PRMI design are described and
possible solutions and a PRMI design is
presented.
- Identifying a set of threads
- Each thread maintains a CID(PID) pair, where CID
identifies a set of caller (or callee) threads,
and PID identifies a PRMI. - Application user signals the initial threads to
create a new unique initial CID and set
PIDCID1. - When a PRMI is requested, set that PRMIs
CIDcallers PID and PIDCID1 - When a PRMI returns, the caller update its PID to
the callees PID
- One round trip PRMI
- Callers calculate arguments redistribution
schedule and invocation schedule. Callers send
their argument representations, invocation
schedule and arguments to callees.
0(1,6)
2. Upon receiving all data from all callers, each
callee starts its method. After the method
completes, each callee calculates the output
(including the return value) arguments
redistribution schedule. Then it sends the the
output back to the relevant callers.
6 (7)
1 (2,5,6)
2(3,4,5)
5(6)
RMI PRMI
3(4)
4(5)
RMI is neat All arguments are packaged and sent
once. All return (including output) data are
packaged and sent once. One single flag can
indicate if an invocation is successful.
- A unique MPI communicator is created for each
PRMI thread set. - An MPI lock is used separate different
collective MPI calls from different thread sets,
if the MPI implementation is not threadsafe.
Caller
Callee
PRMI data redistribution
Distributed data type multiple dimensional
array. Both callers and callees see the
distributed array as global array. And they use
an array representation scheme to describe the
which part of the global array resides on which
caller or callee.
Conclusions
PRMI Design Issues How do M callers invoke N
callees collectively? How arguments are
distributed and redistributed? How to resolve
inconsistent invocation ordering when multiple
PRMIs are allowed? How to support non-threadsafe
MPI?
?
This work provides an approach of parallel Common
Component Architecture. The approach is based on
the Parallel Remote Method Invocation. The PRMI
hides most parallelism, synchronization, thus
provides a conventional, convenient and
efficient way for building high performance
applications.
global array
CalleeGroup
Caller Group
Callee Array Representation
Caller Array Representation
When the distributed array is passed between the
caller and the callees, the transmission schedule
is calculated based on their representations.
Then the array are sent directly from callers to
the corresponding callees (or reverse) in
parallel, avoiding any bottlenecks.
References
Parallel Proxy
R. ARMSTRONG, D. GANNON, A. GEIST, K. KEAHEY, S.
KEAHEY, S. KOHN, l. MCINNES, S. PARKER, and B.
SMOLINSKI, Toward a common component architecture
for high-performance scientific computing. In
Proceedings of the 8th IEEE International
Symposium on High Performance Distributed
Computing, 1999 K. ZHANG, K. DAMEVSKI, V.
VENKATACHALAPATHY and S. PARKER. SCIRun2 a CCA
framework for high performance computing. In
Proceedings of the 9th International Workshop on
High-Level Parallel Programming Models and
Supportive Environments, 2004 K. DAMEVSKI and S.
PARKER. Parallel Remote Method Invocation and
m-by-n data redistribution. In Proceedings of the
4th Los Alamos Computer Science Institute
Symposium, 2003 S. PARKER, The SCIRun problem
solving environment and computational steering
software system. PhD thesis, University of Utah,
1999 F. BERTRAND, R. BRAMELY, K. DAMEVSKI, D.
BERNHOLDT, J. KOHL, J. LARSON, A. SUSSMAN. Data
Redistribution and Remote Method Invocation in
Parallel Component Architectures, In
Proceedings of The 19th International Parallel
and Distributed Processing Symposium, 2005
A parallel proxy consists of a set of callee
URLs, and it also stores the arguments
representations at the callee side.
Efficient Array Redistribution
Invocation ordering
0
When multiple parallel invocations at the same
callee simultaneously, the invocation ordering
may become inconsistent.
0
Acknowledgments
1
2
A centralized server (e.g. first node) maintains
the order The ordering is not enforced if not
necessary.
This work was supported by DOE Center for
Component Technology for Tera Scale Simulation
Software (CCTTSS) and NSF (ACI 0113829) Data
Parallel Component Software.
caller callee data
representation
kzhang_at_cs.utah.edu