Title: Grid Programming with Pop-C Feedback of some practical experiments
1Grid Programming with Pop-CFeedback of some
practical experiments
- Clovis Dongmo Jiogo
- Pierre Manneback
- CETIC, CoreGRID VI of RMS (WP6) Pierre Kuonen
- HES-SO, CoreGRID VI of PM (WP3) and RMS (WP6)
2Grid Programming with Pop-CFeedback of some
practical experiments
- Inter-workpackage Research Activity (WP3-WP6)
- HES-SO P. Kuonen Prog model
- CETIC (Polytech Mons, B)
- P. Manneback, C. Dongmo Resource Mgt
- I am external of WP3, so please take account of
it !
3Some return of CoreGRID
- Cetic and Polytech-Mons are partners of IP
BeinGRID projects - (Business Experiments on GRID)
- IP, FP6, Call 5, starting June 1st, 2006
- Cetic as Business Experiments Coordinator
- Polytech-Mons as Gridificator of a Image
Rendering Application - Coordinator ATOS Origin, Barcelona
4Some achievements
- Joint paper Polytech-Mons/ HE-ESO
- Accepted to
- POOSC06
- Heteropar06 (Cluster06, CoreGRID label)
- Intended Coregrid TR
- Intended REP and fellowship
- Collaboration with INRIA Futurs about YML
Framework (Grid workflow framework, S. Petiton)
5Purpose of this work
- Test Pop-C for some scientific computations on
Grids - Use the programming model
- Evaluate performance
- Show assets and weaknesses
- Draw perspectives for future work
6Agenda
- Overview of POP-C
- The experiment Sparse Matrix/Vector product
- Programming it in Pop-C
- Experimental results
- Assets and weaknesses of POP-C
7POP-C
- POO programming model developped at EPFL and
HES-SO (Switzerland) - Service oriented
- Resource allocation driven by object requirements
- Various invocations semantics
- Discuss with Pierre Kuonen for more details (or
remind his talks!)
8POP-C Programming Model
- Extension of C language
- Data transmission via shared object
- Two level of parallelism
- Inter-object parallelism
- Intra-object parallelism
- Transparent and dynamic object allocation guided
by the object resources need - Capacity to glue to Grid Toolkits
9Semantic invocation interface side
- Two ways to call a method
- Synchronous
- Method returns when the execution is finished
- Same semantic than sequential invocation
- Asynchronous
- Method returns immediately
- Allows parallelism but.. no returned value
10Method call semantics definition
- 1 - An arriving concurrent call can be executed
concurrently (time sharing) when it arrives,
except if mutex calls are pending or executing.
In the later case he is executed after completion
of all mutex calls previously arrived. - 2 - An arriving sequential call is executed after
completion of all sequential and mutex calls
previously arrived. - 3 - An arriving mutex call is executed after
completion of all calls previously arrived.
11POP-C Syntax
- POP-C is an implementation of the parallel
object model as an extension of C with six new
key words - parclass to declare a parallel class
- Any instance (object) of a parallel class is a
parallel object - async asynchronous method call
- sync synchronous method call
- conc concurrent method execution
- seq sequential method execution
- mutex mutex method execution
12POP-C architecture
- A multi-layer architecture
- POP-C services are implemented as parallel
objects - Easy to extend and to customize through parallel
object inheritance and polymorphism
13POP-C system
High performance applications
- Support parallel objects
- POP-C compiler
POP-C programming language
Global services
App-scope services
Comm services
- Middleware for POP-C applications
Heterogeneous environment
14Requirement-driven objects
- Each parallel object has a user-specified object
description (OD) - OD describes the requirements of parallel objects
- OD is used as a guideline for allocating resource
and object migration - OD can be expressed in terms of
- Maximum computing power (e.g. Mflops)
- Communication bandwidth with its interface
- Memory needed
- OD can be parameterized on each parallel object
- Two types strict description and non-strict OD
15Object description example
- parclass Matrix
- Matrix (int n) _at_
- od.power(300 , 100) od.memory(nnsizeof(
double)/1E6) - od.protocol("socket http")
-
-
- The creation of an object for Matrix parallel
class requires - A computing power of 300Mfps, but 100Mfps are
acceptable - A capacity memory of at least nnsizeof(double)/1
E6 Mbytes - A socket or http protocol for communication
16Agenda
- Overview of POP-C
- The experiment Sparse Matrix/Vector product
- Programming it in Pop-C
- Experimental results
- Assets and weaknesses of POP-C
17Why sparse matrix/vector product ?
- Computation kernel of iterative methods for
linear or eigenvalue solvers - Simple computation but efficiency needed for it!
18Classical sparse storage format CRS
CRS data structure use three vectors
11 0 14 0 0 0 22 0 0 0 0 0 0 0
0 14 0 0 0 45 15 0 0 45 0
Row_ptr 0 2 3 3 5 7 Col_ind
0 2 1 1 4 0 3 Mat_val 11 14 22
14 45 15 45
19Sparse Matrix/vector partitioning
- ? ? ? ?
- ? ? ? ?
- ? ? ? ?
- ? ? ?
- ? ? ?
- ? ? ? ? ?
- ? ? ? ?
- ? ? ? ? ?
? ? ? ? ? ? ? ?
? ? ? ? ? ? ? ?
R1
R2
R3
Sparse matrix is partitioned according to the
resource power
20Distribution model
Execution time
Find a matrix partitioning which minimizes the
total execution time?
A
???? ??? ??
??? ????? ?? ??
?? ??? ?? ???? ?????? ?
? ??? ??? ?? ??? ?? ?
??? ? ?? ?? ? ??? ??
?? ? ???????
???? ?? ???????
A1
A1
?
A2
A2
A3
A3
A4
A4
21Balancing Heuristic
-
- Load balancing
- Linear computing time
- Efficient e ltlt 1
Will be presented at POOSC workshop, ECOOP,
Nantes, july 2006
22Agenda
- Overview of POP-C
- The experiment sparse Matrix/Vector product
- Programming it in Pop-C
- Experimental results
- Assets and weakness of POP-C
23The parallel class SparseMatrix
parclass SparseMatrix public SparseMatrix()_at_od
.power(wanted, min) seq async void Init( in,
sizen1 int row_ptr, in, sizenz
int col_ind, in, sizenz double
mat_val, int
n, int nz) conc sync void MvMultiply( in,
sizen double v_old, int n) mutex sync int
GetResult( out, sizen double v_new, int n)
private int row_ptr, col_ind double
mat_val double v int n, nz
24Implementation of method
Methods are written in C
void SparseMatrix MvMultiply ( double
vector, int n) for (int i 0 i lt n
i) vect resi 0.0 for (int jrow ptri
jltrow ptri1 j) vect resi mat
valj vectorcol indj
25Execution steps
Datafile
R4
R2
R3
R1
MvMultiply
ComputeResult
26Agenda
- Overview of POP-C
- The experiment Sparse Matrix/Vector product
- Programming it in Pop-C
- Experimental results
- Assets and weakness of POP-C
27Platform for the experiment
- AMD Athlon
- 2 Ghz, 256Mb Ram
- Fast Ethernet
- Cluster Sun Fire V20
- 10 bi-opteron nodes
- 1.8 Ghz, 1Gb Ram
- GigaBit Ethernet
28Test matrices
Size(n) nz nz/n
(b) poisson3Db Finite element modeling 85623 2374949 27,74
(c) Stanford-web Web crawling 281903 2382912 8,45
(d) Stanford-w.b. Web crawling 685230 8006115 11,68
29Experimental results
Proc. Proc. 1 2 4 8 12
Matrice Type 1 2 4 8 12
(b) POP-C 108,2 62,8 31,4 22,9 22,7
(b) LAM/MPI 96,5 52,6 39,2 20,7 16,9
(c) POP-C 230,3 120,0 63,3 41,4 36,4
(c) LAM/MPI 215,6 111,6 73,8 43,2 33,6
(d) POP-C 267,7 112,4 80,5 49,2 48,4
(d) LAM/MPI 173,5 101,3 64,5 46,2 46,8
Execution time for 1000 products
30Experimental results
Matrix (b)
Matrix (d)
31Agenda
- Overview of POP-C
- An experiment Sparse Matrix/Vector product
- Programming it in Pop-C
- Experimental results
- Assets and weaknesses of POP-C
32POP-C Assets
- Simple POO model
- Interesting directives for tasks allocation (RM)
- Interesting performances
- Ability to glue to Grid toolkits
- May interact with MPI parallel objects
33Weaknesses of POP-C
- Need coarse grain of granularity
- Task assignment is not optimized
- Important initialization time required by
objects creation - Some important functionalities are not (yet?)
supported - interaction with monitoring for dynamic task
allocation - support for task migration and fault tolerance
34Perspectives
- Improve the performance by coupling POP-C with
MPI - Propose a dynamic scheduler for task assignment
- Reduce object creation time
- Evaluate POP-C performance on real Grid
environment and on real applications - Implement object migration