Title: GRID superscalar: a programming paradigm for GRID applications
1GRID superscalar a programming paradigm for
GRID applications
- CEPBA-IBM Research Institute
- Rosa M. Badia, Jesús Labarta, Josep M. Pérez,
Raül Sirvent
2Outline
- Objective
- The essence
- Users interface
- Automatic code generation
- Run-time features
- Programming experiences
- Ongoing work
- Conclusions
3Objective
- Ease the programming of GRID applications
- Basic idea
? ns ? seconds/minutes/hours
4Outline
- Objective
- The essence
- Users interface
- Automatic code generation
- Current run-time features
- Programming experiences
- Future work
- Conclusions
5The essence
- Assembly language for the GRID
- Simple sequential programming, well defined
operations and operands - C/C, Perl,
- Automatic run time parallelization
- Use architectural concepts from microprocessor
design - Instruction window (DAG), Dependence analysis,
scheduling, locality, renaming, forwarding,
prediction, speculation,
6The essence
Input/output files
- for (int i 0 i lt MAXITER i)
- newBWd GenerateRandom()
- subst (referenceCFG, newBWd, newCFG)
- dimemas (newCFG, traceFile, DimemasOUT)
- post (newBWd, DimemasOUT, FinalOUT)
- if(i 3 0) Display(FinalOUT)
-
- fd GS_Open(FinalOUT, R)
- printf("Results file\n") present (fd)
- GS_Close(fd)
7The essence
8The essence
Subst
Subst
Subst
DIMEMAS
Subst
DIMEMAS
Subst
Subst
DIMEMAS
Subst
EXTRACT
DIMEMAS
EXTRACT
DIMEMAS
DIMEMAS
EXTRACT
DIMEMAS
EXTRACT
EXTRACT
EXTRACT
EXTRACT
Display
Display
GS_open
9Outline
- Objective
- The essence
- Users interface
- Automatic code generation
- Run-time features
- Programming experiences
- Ongoing work
- Conclusions
10Users interface
- Three components
- Main program
- Subroutines/functions
- Interface Definition Language (IDL) file
- Programming languages C/C, Perl
11Users interface
- A Typical sequential program
- Main program
- for (int i 0 i lt MAXITER i)
- newBWd GenerateRandom()
- subst (referenceCFG, newBWd, newCFG)
- dimemas (newCFG, traceFile, DimemasOUT)
- post (newBWd, DimemasOUT, FinalOUT)
- if(i 3 0) Display(FinalOUT)
-
- fd GS_Open(FinalOUT, R)
- printf("Results file\n") present (fd)
- GS_Close(fd)
12Users interface
- A Typical sequential program
- Subroutines/functions
void dimemas(in File newCFG, in File traceFile,
out File DimemasOUT) char command500
putenv("DIMEMAS_HOME/usr/local/cepba-tools")
sprintf(command, "/usr/local/cepba-tools/bin/Dim
emas -o s s",
DimemasOUT, newCFG )
GS_System(command)
void display(in File toplot) char
command500 sprintf(command, "./display.sh
s", toplot) GS_System(command)
13Users interface
- GRID superscalar programming requirements
- Main program open/close files with
- GS_FOpen, GS_Open, GS_FClose, GS_Close
- Currently required. Next versions will implement
a version of C library functions with GRID
superscalar semantic - Subroutines/functions
- Temporal files on local directory or ensure
uniqueness of name per subroutine invocation - GS_System instead of system
- All input/output files required must be passed as
arguments
14Users interface
- Gridifying the sequential program
- CORBA-IDL Like Interface
- In/Out/InOut files
- Scalar values (in or out)
- The subroutines/functions listed in this file
will be executed in a remote server in the Grid.
interface MC void subst(in File referenceCFG,
in double newBW, out File newCFG) void
dimemas(in File newCFG, in File traceFile, out
File DimemasOUT) void post(in File newCFG, in
File DimemasOUT, inout File FinalOUT) void
display(in File toplot)
15Outline
- Objective
- The essence
- Users interface
- Automatic code generation
- Run-time features
- Programming experiences
- Ongoing work
- Conclusions
16Automatic code generation
app.idl
gsstubgen
app.xml
app-worker.c
app.c
app-functions.c
app.h
app-stubs.c
app_constraints.cc
app_constraints_wrapper.cc
app_constraints.h
17Sample stubs file
- include ltstdio.hgt
-
- int gs_result
- void Subst(file referenceCFG, double seed, file
newCFG) -
- / Marshalling/Demarshalling buffers /
- char buff_seed
- / Allocate buffers /
- buff_seed (char )malloc(atoi(getenv("GS_GENL
ENGTH"))1) - / Parameter marshalling /
- sprintf(buff_seed, ".20g", seed)
- Execute(SubstOp, 1, 1, 1, 0, referenceCFG,
buff_seed, newCFG) - / Deallocate buffers /
- free(buff_seed)
-
18Sample worker main file
- include ltstdio.hgt
-
- int main(int argc, char argv)
- enum operationCode opCod (enum
operationCode)atoi(argv2) - IniWorker(argc, argv)
- switch(opCod)
- case SubstOp
- double seed
- seed strtod(argv4, NULL)
- Subst(argv3, seed, argv5)
- break
-
-
- EndWorker(gs_result, argc, argv)
- return 0
19Sample constraints skeleton file
- include "mcarlo_constraints.h"
- include "user_provided_functions.h"
- string Subst_constraints(file referenceCFG,
double seed, file newCFG) - string constraints ""
- return constraints
-
- double Subst_cost(file referenceCFG, double seed,
file newCFG) - return 1.0
-
-
20Sample constraints wrapper file (1)
- include ltstdio.hgt
-
- typedef ClassAd (constraints_wrapper) (char
_parameters) - typedef double (cost_wrapper) (char
_parameters) - // Prototypes
- ClassAd Subst_constraints_wrapper(char
_parameters) - double Subst_cost_wrapper(char _parameters)
-
- // Function tables
- constraints_wrapper constraints_functions4
- Subst_constraints_wrapper,
-
-
- cost_wrapper cost_functions4
- Subst_cost_wrapper,
21Sample constraints wrapper file (2)
- ClassAd Subst_constraints_wrapper(char
_parameters) - char _argp
- // Generic buffers
- char buff_referenceCFG char buff_seed
- // Real parameters
- char referenceCFG double seed
- // Read parameters
- _argp _parameters
- buff_referenceCFG (_argp) buff_seed
(_argp) - //Datatype conversion
- referenceCFG buff_referenceCFG seed
strtod(buff_seed, NULL) - string _constraints Subst_constraints(referen
ceCFG, seed) - ClassAd _ad
- ClassAdParser _parser
- _ad.Insert("Requirements", _parser.ParseExpress
ion(_constraints)) - // Free buffers
- return _ad
22Sample constraints wrapper file (3)
- double Subst_cost_wrapper(char _parameters)
- char _argp
- // Generic buffers
- char buff_referenceCFG
- char buff_referenceCFG char buff_seed
- // Real parameters
- char referenceCFG double seed
- // Allocate buffers
- // Read parameters
- _argp _parameters
- buff_referenceCFG (_argp)
- buff_seed (_argp)
- //Datatype conversion
- referenceCFG buff_referenceCFG
- seed strtod(buff_seed, NULL)
- double _cost Subst_cost(referenceCFG, seed)
23Binary building
app.c
app_constraints_wrapper.cc
. . .
app_constraints.cc
app-stubs.c
GRID superscalar runtime
GT2
client
GT2 services gsiftp, gram
24Calls sequence without GRID superscalar
app.c
app-functions.c
LocalHost
25Calls sequence with GRID superscalar
app.c
app-stubs.c
GRID superscalar runtime
GT2
app-worker.c
app_constraints_wrapper.cc
app-functions.c
app_constraints.cc
RemoteHost
LocalHost
26Outline
- Objective
- The essence
- User interface
- Automatic code generation
- Run-time features
- Programming experiences
- Ongoing work
- Conclusions
27Run-time features
- Previous prototype over Condor and MW
- Current prototype over Globus 2.x, using the API
- File transfer, security, provided by Globus
- Run-time implemented primitives
- GS_on, GS_off
- Execute
- GS_Open, GS_Close, GS_FClose, GS_FOpen
- GS_Barrier
- Worker side GS_System
28Run-time features
- Data dependence analysis
- Renaming
- File forwarding
- Shared disks management and file transfer policy
- Resource brokering
- Task scheduling
- Task submission
- End of task notification
- Results collection
- Explicit task synchronization
- File management primitives
- Checkpointing at task level
- Deployer
- Exception handling
- Current prototype over Globus 2.x, using the API
- File transfer, security, provided by Globus
29Data-dependence analysis
- Data dependence analysis
- Detects RaW, WaR, WaW dependencies based on file
parameters - Oriented to simulations, FET solvers,
bioinformatic applications - Main parameters are data files
- Tasks Directed Acyclic Graph is built based on
these dependencies
30File-renaming
- WaW and WaR dependencies are avoidable with
renaming
While (!end_condition()) T1 (,, f1)
T2 (f1, , ) T3 (,,)
WaR
T1_1
T1_2
T1_N
f1_2
f1_1
f1
f1
f1
WaW
T2_1
T2_2
T1_N
T3_1
T3_2
T1_N
31File forwarding
T1
T1
f1 (by socket)
f1
T2
T2
- File forwarding reduces the impact of RaW data
dependencies
32File transfer policy
Working directories
f1
f4
T1
f1
server1
f7
client
f4
f7
T6
server2
33Shared working directories
Working directories
T1
f1
server1
f7
f7
f1
f4
client
T6
server2
34Shared input disks
Input directories
server1
client
server2
35Disks configuration file
shared directories
- khafre.cepba.upc.es SharedDisk0
/app/DB/input_data - kandake0.cepba.upc.es SharedDisk0 /usr/DB/inputs
- kandake1.cepba.upc.es SharedDisk0 /usr/DB/inputs
- kandake0.cepba.upc.es DiskLocal0
/home/ac/rsirvent/matmul-perl/worker_perl - kandake1.cepba.upc.es DiskLocal0
/home/ac/rsirvent/matmul-perl/worker_perl - khafre.cepba.upc.es DiskLocal1
/home/ac/rsirvent/matmul_worker/worker
working directories
36Resource Broker
- Resource brokering
- Currently not a main project goal
- Interface between run-time and broker
- A Condor resource ClassAdd is built for each
resource - Broker configuration file
- Machine LimitOfJobs Queue WorkingDirectory
- Arch OpSys GFlops Mem NCPUs SoftNameList
- khafre.cepba.upc.es 3 none /home/ac/rsirvent/DEMOS
/mcarlo - i386 Linux 1.475 2587 4 Perl560 Dimemas23
- kadesh.cepba.upc.es 0 short /user1/uni/upc/ac/rsir
vent/DEMOS/mcarlo powerpc AIX 1.5 8000 16 Perl560
Dimemas23 - kandake.cepba.upc.es /home/ac/rsirvent/McarloClAds
workers
localhost
37Resource selection (1)
- Cost and constraints specified by user and per
IDL task - Cost (time) of each task instance is estimated
- double Dimem_cost(file cfgFile, file traceFile)
-
- double time
- time (GS_Filesize(traceFile)/1000000)
f(GS_GFlops()) - return(time)
-
- A task ClassAdd is built on runtime for each task
instance - string Dimem_constraints(file cfgFile, file
traceFile) -
- return "(member(\"Dimemas\",
other.SoftNameList))"
38Resource selection (2)
- Broker receives requests from the run-time
- ClassAdd library used to match resource ClassAdds
with task ClassAdds - If more than one matching, selects the resource
which minimizes - FT File transfer time to resource r
- ET Execution time of task t on resource r
(using user provided cost function)
39Task scheduling
- Distributed between the Execute call, the
callback function and the GS_Barrier call - Possibilities
- The task can be submitted immediately after being
created - Task waiting for resource
- Task waiting for data dependency
- GS_Barrier primitive before ending the program
that waits for all tasks
40Task submission
- Task submitted for execution as soon as the data
dependencies are solved if resources are
available - Composed of
- File transfer
- Task submission
- All specified in RSL
- Temporal directory created in the server working
directory for each task - Calls to globus
- globus_gram_client_job_request
- globus_gram_client_callback_allow
- globus_poll_blocking
41End of task notification
- Asynchronous state-change callbacks monitoring
system - globus_gram_client_callback_allow()
- callback_func function
- Data structures update in Execute function, GRID
superscalar primitives and GS_Barrier
42Results collection
- Collection of output parameters which are not
files - Partial barrier synchronization (task generation
from main code cannot continue till we have this
scalar result value) - Socket and file mechanisms provided
43GS_Barrier
- Implicit task synchronization GS_Barrier
- Inserted in the user main program when required
- Main program execution is blocked
- globus_poll_blocking() called
- Once all tasks are finished the program may
resume
44File management primitives
- GRID superscalar file management API primitives
- GS_FOpen
- GS_FClose
- GS_Open
- GS_Close
- Mandatory for file management operations in main
program - Opening a file with write option
- Data dependence analysis
- Renaming is applied
- Opening a file with read option
- Partial barrier until the task that is generating
that file as output file finishes - Internally file management functions are handled
as local tasks - Task node inserted
- Data-dependence analysis
- Function locally executed
- Future work offer a C library with GS semantic
(source code with typicals calls could be used)
45Task level checkpointing
- Inter-task checkpointing
- Recovers sequential consistency in the
out-of-order execution of tasks
3
0
1
2
3
4
5
6
Completed
Successful execution
Running
Committed
46Task level checkpointing
- Inter-task checkpointing
- Recovers sequential consistency in the
out-of-order execution of tasks
3
0
1
2
3
4
5
6
Finished correctly
Completed
Failing execution
Running
Cancel
Committed
Failing
47Task level checkpointing
- Inter-task checkpointing
- Recovers sequential consistency in the
out-of-order execution of tasks
3
0
1
2
3
4
5
6
Finished correctly
Completed
Restart execution
Running
Committed
Execution continues normally!
Failing
48Checkpointing
- On fail from N versions of a file to one version
(last committed version) - Transparent to application developer
49Deployer
- Java based GUI
- Allows workers specification host details,
libraries location - Selection of Grid configuration
- Grid configuration checking process
- Aliveness of host (ping)
- Globus service is checked by submitting a simple
test - Sends a remote job that copies the code needed in
the worker, and compiles it - Automatic deployment
- sends and compiles code in the remote workers and
the master - Configuration files generation
50Deployer (2)
51Exception handling
- GS_Speculative_End(func) / GS_Throw
void Dimemas(char cfgFile, char traceFile,
double goal, char DimemasOUT)
putenv("DIMEMAS_HOME/aplic/DIMEMAS")
sprintf(aux, "/aplic/DIMEMAS/bin/Dimemas -o s
s", DimemasOUT, cfgFile) gs_result
GS_System(aux) distance_to_goal
distance(get_time(DimemasOUT), goal) if
(distance_to_goal lt goal0.1)
printf("Goal Reached!!! Throwing exception.\n")
GS_Throw
while (jltMAX_ITERS) getRanges(Lini, BWini,
Lmin, Lmax, BWmin, BWmax) for (i0
iltITERS i) Li gen_rand(Lmin,
Lmax) BWi gen_rand(BWmin, BWmax)
Filter("nsend.cfg", Li, BWi, "tmp.cfg")
Dimemas("tmp.cfg", "nsend_rec_nosm.trf",
Elapsed_goal, "dim_ou.txt")
Extract("tmp.cfg", "dim_out.txt",
"final_result.txt") getNewIniRange("fin
al_result.txt",Lini, BWini)
j GS_Speculative_End(my_func)
Function executed when a exception is thrown
52Exception handling (2)
- Any worker can call to GS_Throw at any moment
- Task that rises the GS_Throw is the last valid
task (all sequential tasks after that must be
undone) - The speculative part is considered from the task
that throws the exception till the
GS_Speculative_End (no need of a Begin clause) - Possibly of calling a local function when the
exception is detected.
53Putting all together involved files
User provided files
app-functions.c
app.idl
app.c
Files generated from IDL
app-stubs.c
app-worker.c
app.h
app_constraints_wrapper.cc
app_constraints.cc
app_constraints.h
Files generated by deployer
broker.cfg
diskmaps.cfg
54Outline
- Objective
- The essence
- Users interface
- Automatic code generation
- Run-time features
- Programming experiences
- Ongoing work
- Conclusions
55Programming experiences
- Performance modelling (Dimemas, Paramedir)
- Algorithm flexibility
- NAS Grid Benchmarks
- Improved component programs flexibility
- Reduced Grid level source code lines
- Bioinformatics application (production)
- Improved portability (Globus vs just LoadLeveler)
- Reduced Grid level source code lines
- Pblade solution for bioinformatics
56Programming experiences
- fastDNAml
- Computes the likelihood of various phylogenetic
trees, starting with aligned DNA sequences from a
number of species (Indiana University code) - Sequential and MPI (grid-enabled) versions
available - Ported to GRID superscalar
- Lower pressure on communications than MPI
- Simpler code than MPI
Tree evaluation
Barrier
57NAS Grid Benchmarks
58NAS Grid Benchmarks
- All of them implemented with GRID superscalar
- Run with classes S, W, A
- Results scale as expected
- When several servers are used, ASCII mode
required
59Programming experiences
- Performance analysis
- GRID superscalar run-time instrumented
- Paraver tracefiles from the client side
- Measures of task execution time in the servers
60Programming experiences
- Overhead of GRAM Job Manager polling interval
61Programming experiences
BT
MF
MG
MF
FT
BT
MF
MG
MF
FT
BT
MF
MG
MF
FT
Kadesh
Khafre
Remote file transfers
62Outline
- Objective
- The essence
- Users interface
- Automatic code generation
- Run-time features
- Programming experiences
- Ongoing work
- Conclusions
63Ongoing work
- OGSA oriented resource broker, based on Globus
Toolkit 3.x. - Bindings to Ninf-G2
- Binding to ssh/rsh/scp
- New language bindings (shell script)
- And more future work
- Bindings to other basic middlewares
- GAT,
- Enhancements in the run-time performance guided
by the performance analysis
64Conclusions
- Presentation of the ideas of GRID superscalar
- Exists a viable way to ease the programming of
Grid applications - GRID superscalar run-time enables
- Use of the resources in the Grid
- Exploiting the existent parallelism
65More information
- GRID superscalar home page
- http//people.ac.upc.es/rosab/index_gs.htm
- Rosa M. Badia, Jesús Labarta, Raül Sirvent, Josep
M. Pérez, José M. Cela, Rogeli Grima,
Programming Grid Applications with GRID
Superscalar, Journal of Grid Computing, Volume 1
(Number 2) 151-170 (2003).