GRID superscalar: a programming model for the Grid

About This Presentation

Title:

GRID superscalar: a programming model for the Grid

Description:

GRID superscalar: a programming model for the Grid. Ra l Sirvent Pardell ... JOB A A.condor. JOB B B.condor. JOB C C.condor. JOB D D.condor. PARENT A CHILD B C ... – PowerPoint PPT presentation

Number of Views:57

Avg rating:3.0/5.0

Slides: 68

Provided by: raulsi

Learn more at: https://people.ac.upc.edu

Category:

more less

Transcript and Presenter's Notes

Title: GRID superscalar: a programming model for the Grid

1
GRID superscalar a programming model for the Grid
Doctoral ThesisComputer Architecture
DepartmentTechnical University of Catalonia

Raül Sirvent Pardell
Advisor Rosa M. Badia Sala

2
Outline

Introduction
Programming interface
Runtime
Fault tolerance at the programming model level
Conclusions and future work

3
Outline

Introduction
1.1 Motivation
1.2 Related work
1.3 Thesis objectives and contributions
Programming interface
Runtime
Fault tolerance at the programming model level
Conclusions and future work

4
1.1 Motivation

The Grid architecture layers

Applications
Grid Middleware (Job management, Data transfer,
Security, Information, QoS, ...)
Distributed Resources
5
1.1 Motivation

What middleware should I use?

6
1.1 Motivation

Programming tools are they easy?

Grid AWARE
Grid UNAWARE
VS.
7
1.1 Motivation

Can I run my programs in parallel?

Explicit parallelism
Implicit parallelism
VS.
for(i0 i lt MSIZE i) for(j0 j lt MSIZE
j) for(k0 k lt MSIZE k)
matmul(A(i,k), B(k,j), C(i,j))
fork
Draw it by hand means explicit
join

8
1.1 Motivation

The Grid a massive, dynamic and heterogeneous
environment prone to failures
Study different techniques to detect and overcome
failures
Checkpoint
Retries
Replication

9
1.2 Related work
10
1.3 Thesis objectives and contributions

Objective create a programming model for the
Grid
Grid unaware
Implicit parallelism
Sequential programming
Allows to use well-known imperative languages
Speed up applications
Include fault detection and recovery

11
1.3 Thesis objectives and contributions

Contribution GRID superscalar
Programming interface
Runtime environment
Fault tolerance features

12
Outline

Introduction
Programming interface
2.1 Design
2.2 User interface
2.3 Programming comparison
Runtime
Fault tolerance at the programming model level
Conclusions and future work

13
2.1 Design

Interface objectives
Grid unaware
Implicit parallelism
Sequential programming
Allows to use well-known imperative languages

14
2.1 Design

Target applications
Algorithms which may be easily splitted in tasks
Branch and bound computations, divide and conquer
algorithms, recursive algorithms,
Coarse grained tasks
Independent tasks
Scientific workflows, optimization algorithms,
parameter sweep
Main parameters FILES
External simulators, finite element solvers,
BLAST, GAMESS

15
2.1 Design

Applications architecture a master-worker
paradigm
Master-worker parallel paradigm fits with our
objectives
Main program the master
Functions workers
Function Generic representation of a task
Glue to transform a sequential application into a
master-worker application stubs skeletons
(RMI, RPC, )
Stub call to runtime interface
Skeleton binary which calls to the user function

16
2.1 Design
void matmul(char f1, char f2, char f3)
getBlocks(f1, f2, f3, A, B, C) for (i 0 i lt
A-gtrows i) for (j 0 j lt B-gtcols j)
for (k 0 k lt A-gtcols k)
C-gtdataij A-gtdataik B-gtdatakj
putBlocks(f1, f2, f3, A, B, C)
for(i0 i lt MSIZE i) for(j0 j lt MSIZE
j) for(k0 k lt MSIZE k)
matmul(A(i,k), B(k,j), C(i,j))
Local scenario
17
2.1 Design
app.c
app-functions.c
app-functions.c
app-functions.c
app-functions.c
app-functions.c
app-functions.c
Middleware
Master-Worker paradigm
18
2.1 Design

Intermediate language concept assembler code
In GRIDSs
The Execute generic interface
Instruction set is defined by the user
Single entry point to the runtime
Allows easy building of programming language
bindings (Java, Perl, Shell Script)
Easier technology adoption

C, C,
Assembler
Processor execution
C, C,
Workflow
Grid execution
19
2.2 User interface

Steps to program an application
Task definition
Identify those functions/programs in the
application that are going to be executed in the
computational Grid
All parameters must be passed in the header
(remote execution)
Interface Definition Language (IDL)
For every task defined, identify which parameters
are input/output files and which are input/output
scalars
Programming API master and worker
Write the main program and the tasks using GRIDSs
API

20
2.2 User interface

Interface Definition Language (IDL) file
CORBA-IDL like interface
in/out/inout files
in/out/inout scalar values
The functions listed in this file will be
executed in the Grid

interface MATMUL void matmul(in File f1, in
File f2, inout File f3)
21
2.2 User interface

Programming API master and worker

app.c
app-functions.c

Master side
GS_On
GS_Off
GS_FOpen/GS_FClose
GS_Open/GS_Close
GS_Barrier
GS_Speculative_End

Worker side
GS_System
gs_result
GS_Throw

22
2.2 User interface

Tasks constraints and cost specification
Constraints allow to specify the needs of a task
(CPU, memory, architecture, software, )
Build an expression in a constraint function
(evaluated for every machine)
Cost estimated execution time of a task (in
seconds)
Useful for scheduling
Calculate it in a cost function
GS_GFlops / GS_Filesize may be used
An external estimator can be also called

other.Mem 1024
cost operations / GS_GFlops()
23
2.3 Programming comparison

Globus vs GRIDSs

Grid-aware
int main() rsl "(executable/home/user/si
m)(argumentsinput1.txt output1.txt)
(file_stage_in(gsiftp//bscgrid01.bsc.es/path/inp
ut1.txt home/user/input1.txt))(file_stage_out/hom
e/user/output1.txt gsiftp//bscgrid01.bsc.es/path/
output1.txt)(file_clean_up/home/user/input1.txt
/home/user/output1.txt)" globus_gram_client_j
ob_request(bscgrid02.bsc.es, rsl, NULL, NULL)
rsl "(executable/home/user/sim)(argumentsin
put2.txt output2.txt) (file_stage_in(gsiftp//bsc
grid01.bsc.es/path/input2.txt /home/user/input2.tx
t))(file_stage_out/home/user/output2.txt
gsiftp//bscgrid01.bsc.es/path/output2.txt)(file_c
lean_up/home/user/input2.txt /home/user/output2.t
xt)" globus_gram_client_job_request(bscgrid03
.bsc.es, rsl, NULL, NULL) rsl
"(executable/home/user/sim)(argumentsinput3.txt
output3.txt) (file_stage_in(gsiftp//bscgrid01.b
sc.es/path/input3.txt /home/user/input3.txt))(file
_stage_out/home/user/output3.txt
gsiftp//bscgrid01.bsc.es/path/output3.txt)(file_c
lean_up/home/user/input3.txt /home/user/output3.t
xt)" globus_gram_client_job_request(bscgrid04
.bsc.es, rsl, NULL, NULL)
Explicit parallelism
24
2.3 Programming comparison

Globus vs GRIDSs

void sim(File input, File output) command
"/home/user/sim " input ' ' output
gs_result GS_System(command) int main()
GS_On() sim("/path/input1.txt",
"/path/output1.txt") sim("/path/input2.txt",
"/path/output2.txt") sim("/path/input3.txt",
"/path/output3.txt") GS_Off(0)
25
2.3 Programming comparison

DAGMan vs GRIDSs

A
B
C
D
Explicit parallelism
int main() GS_On() task_A(f1, f2,
f3) task_B(f2, f4) task_C(f3, f5)
task_D(f4, f5, f6) GS_Off(0)
No if/while clauses
JOB A A.condor JOB B B.condor JOB C
C.condor JOB D D.condor PARENT A CHILD B
C PARENT B C CHILD D
26
2.3 Programming comparison

Ninf-G vs GRIDSs

Grid-aware
int main() grpc_initialize("config_file")
grpc_object_handle_init_np("A", A_h,
"class") grpc_object_handle_init_np("B",
B_h," class") for(i 0 i lt 25 i)
grpc_invoke_async_np(A_h,"foo",sid,f_in
2i,f_out2i) grpc_invoke_async_np(B_
h,"foo",sid,f_in2i1,f_out2i1)
grpc_wait_all() grpc_object_handle_dest
ruct_np(A_h) grpc_object_handle_destruct_np(
B_h) grpc_finalize()
Explicit parallelism
int main() GS_On() for(i 0 i lt 50
i) foo(f_ini, f_outi) GS_Off(0)
27
2.3 Programming comparison

VDL vs GRIDSs

No if/while clauses
DV trans1( a2_at_outputtmp.0, a1_at_inputfilein.0
) DV trans2( a2_at_outputfileout.0,
a1_at_inputtmp.0 ) DV trans1(
a2_at_outputtmp.1, a1_at_inputfilein.1 ) DV
trans2( a2_at_outputfileout.1, a1_at_inputtmp.1
) ... DV trans1( a2_at_outputtmp.999,
a1_at_inputfilein.999 ) DV trans2(
a2_at_outputfileout.999, a1_at_inputtmp.999 )
int main() GS_On() for(i 0 i lt
1000 i) tmp "tmp." i filein
"filein." i fileout "fileout."
i trans1(tmp, filein) trans2(fileout, tmp)
GS_Off(0)
28
Outline

Introduction
Programming interface
Runtime
3.1 Scientific contributions
3.2 Developments
3.3 Evaluation tests
Fault tolerance at the programming model level
Conclusions and future work

29
3.1 Scientific contributions

Runtime objectives
Extract implicit parallelism in sequential
applications
Speed up execution using the Grid
Main requirement Grid middleware
Job management
Data transfer
Security

30
3.1 Scientific contributions

Apply computer architecture knowledge to the Grid
(superscalar processor)

? ns ? seconds/minutes/hours
31
3.1 Scientific contributions

Data dependence analysis allow parallelism

task1(..., f1)
Read after Write
task2(f1, ...)
task1(f1, ...)
Write after Read
task2(..., f1)
task1(..., f1)
Write after Write
task2(..., f1)
32
3.1 Scientific contributions
for(i0 i lt MSIZE i) for(j0 j lt MSIZE
j) for(k0 k lt MSIZE k)
matmul(A(i,k), B(k,j), C(i,j))
matmul(A(0,0), B(0,0), C(0,0))
k 0
i 0 j 0
matmul(A(0,1), B(1,0), C(0,0))
k 1
matmul(A(0,2), B(2,0), C(0,0))
k 2
matmul(A(0,0), B(0,0), C(0,1))
...
k 0
i 0 j 1
k 1
matmul(A(0,1), B(1,0), C(0,1))
k 2
matmul(A(0,2), B(2,0), C(0,1))
33
3.1 Scientific contributions
for(i0 i lt MSIZE i) for(j0 j lt MSIZE
j) for(k0 k lt MSIZE k)
matmul(A(i,k), B(k,j), C(i,j))
i 0 j 2
i 1 j 0
i 1 j 1
i 1 j 2
matmul(A(0,0), B(0,0), C(0,0))
k 0
i 0 j 0
matmul(A(0,1), B(1,0), C(0,0))
k 1
matmul(A(0,2), B(2,0), C(0,0))
k 2
...
...
matmul(A(0,0), B(0,0), C(0,1))
k 0
i 0 j 1
k 1
matmul(A(0,1), B(1,0), C(0,1))
k 2
matmul(A(0,2), B(2,0), C(0,1))
34
3.1 Scientific contributions

File renaming increase parallelism

task1(..., f1)
Read after Write
Unavoidable
task2(f1, ...)
task1(f1, ...)
Write after Read
Avoidable
task2(..., f1)
task2(..., f1_NEW)
task1(..., f1)
Avoidable
Write after Write
task2(..., f1)
task2(..., f1_NEW)
35
3.2 Developments

Basic functionality
Job submission (middleware usage)
Select sources for input files
Submit, monitor or cancel jobs
Results collection
API implementation
GS_On read configuration file and environment
GS_Off wait for tasks, cleanup remote data, undo
renaming
GS_(F)Open create a local task
GS_(F)Close notify end of local task
GS_Barrier wait for all running tasks to finish
GS_System translate path
GS_Speculative_End barrier until throw. If
throw, discard tasks from throw to
GS_Speculative_End
GS_Throw use gs_result to notify it

36
3.2 Developments
...
Middleware
Task scheduling Direct Acyclic Graph
37
3.2 Developments

Task scheduling resource brokering
A resource broker is needed (but not an
objective)
Grid configuration file
Information about hosts (hostname, limit of jobs,
queue, working directory, quota, )
Initial set of machines (can be changed
dynamically)

lt?xml version"1.0" encoding"UTF-8"?gt ltproject
isSimple"yes" masterBandwidth"100000"
masterBuildScript"" masterInstallDir"/home/rsirv
ent/matmul-master" masterName"bscgrid01.bsc.es"
masterSourceDir"/datos/GRID-S/GT4/doc/examples/ma
tmul" name"matmul" workerBuildScript""
workerSourceDir"/datos/GRID-S/GT4/doc/examples/ma
tmul"gt ... ltworkersgt ltworker Arch"x86"
GFlops"5.985" LimitOfJobs"2" Mem"1024"
NCPUs"2" NetKbps"100000" OpSys"Linux"
Queue"none" Quota"0" deploymentStatus"deployed"
installDir"/home/rsirvent/matmul-worker"
name"bscgrid01.bsc.es"gt
38
3.2 Developments

Task scheduling resource brokering
Scheduling policy
Estimation of total execution time of a single
task
FileTransferTime time to transfer needed files
to a resource (calculated with the hosts
information and the location of files)
Select fastest source for a file
ExecutionTime estimation of the tasks run time
in a resource. An interface function (can be
calculated, or estimated by an external entity)
Select fastest resource for execution
Smallest estimation is selected

39
3.2 Developments

Task scheduling resource brokering
Match task constraints and machine capabilities
Implemented using the ClassAd library
Machine offers capabilities (from Grid
configuration file memory, architecture, )
Task demands capabilities
Filter candidate machines for a particular task

SoftwareList BLAST, GAMESS
Software BLAST
SoftwareList GAMESS
40
3.2 Developments
f3
f3
Middleware
Task scheduling File locality
41
3.2 Developments

Other file locality exploitation mechanisms
Shared input disks
NFS or replicated data
Shared working directories
NFS
Erasing unused versions of files (decrease disk
usage)
Disk quota control (locality increases disk usage
and quota may be lower than expected)

42
3.3 Evaluation
43
3.3 Evaluation

NAS Grid Benchmarks

HC
ED
MB
VP
44
3.3 Evaluation

Run with classes S, W, A (2 machines x 4 CPUs)
VP benchmark must be analyzed in detail (does not
scale up to 3 CPUs)

45
3.3 Evaluation

Performance analysis
GRID superscalar runtime instrumented
Paraver tracefiles from the client side
The lifecycle of all tasks has been studied in
detail
Overhead of GRAM Job Manager polling interval

46
3.3 Evaluation

VP.S task assignment
14.7 of the transfers when exploiting locality
VP is parallel, but its last part is sequentially
executed

BT
MF
MG
MF
FT
BT
MF
MG
MF
FT
BT
MF
MG
MF
FT
Kadesh8
Khafre
Remote file transfers
47
3.3 Evaluation

Conclusion workflow and granularity are
important to achieve speed up

48
3.3 Evaluation

Two-dimensional potential energy hypersurface for
acetone as a function of the ?1, and ?2 angles

49
3.3 Evaluation

Number of executed tasks 1120
Each task between 45 and 65 minutes
Speed up 26.88 (32 CPUs), 49.17 (64 CPUs)
Long running test, heterogeneous and
transatlantic Grid

22 CPUs
14 CPUs
28 CPUs
50
3.3 Evaluation

15 million protein sequences have been compared
using BLAST and GRID superscalar

Genomes
15 million Proteins
15 million Proteins
51
3.3 Evaluation

100,000 tasks in 4000 CPUs ( 1,000 exclusive
nodes)
Grid of 1,000 machines with very low latency
between them
Stress test for the runtime
Avoids user to work with queuing system
Saves queuing system from handling a huge set of
independent tasks

52
GRID superscalar programming interface and
runtime

Publications
Raül Sirvent, Josep M. Pérez, Rosa M. Badia,
Jesús Labarta, "Automatic Grid workflow based on
imperative programming languages", Concurrency
and Computation Practice and Experience, John
Wiley Sons, vol. 18, no. 10, pp. 1169-1186,
2006.
Rosa M. Badia, Raul Sirvent, Jesus Labarta, Josep
M. Perez, "Programming the GRID An Imperative
Language-based Approach", Engineering The Grid
Status and Perspective, Section 4, Chapter 12,
American Scientific Publishers, January 2006.
Rosa M. Badia, Jesús Labarta, Raül Sirvent, Josep
M. Pérez, José M. Cela and Rogeli Grima,
"Programming Grid Applications with GRID
Superscalar", Journal of Grid Computing, Volume
1, Issue 2, 2003.

53
GRID superscalar programming interface and
runtime

Work related to standards
R.M. Badia, D. Du, E. Huedo, A. Kokossis, I. M.
Llorente, R. S. Montero, M. de Palol, R. Sirvent,
and C. Vázquez, "Integration of GRID superscalar
and GridWay Metascheduler with the DRMAA OGF
Standard", Euro-Par, 2008.
Raül Sirvent, Andre Merzky, Rosa M. Badia, Thilo
Kielmann, "GRID superscalar and SAGA forming a
high-level and platform-independent Grid
programming environment", CoreGRID Integration
Workshop. Integrated Research in Grid Computing,
Pisa (Italy), 2005.

54
Outline

Introduction
Programming interface
Runtime
Fault tolerance at the programming model level
4.1 Checkpointing
4.2 Retry mechanisms
4.3 Task replication
Conclusions and future work

55
4.1 Checkpointing

Inter-task checkpointing
Recovers sequential consistency in the
out-of-order execution of tasks
Single version of every file is saved
No need to save any data structures in the
runtime
Drawback some completed tasks may be lost
Application-level checkpoint can avoid this

3
0
1
2
3
4
5
6
56
4.1 Checkpointing

Conclusions
Low complexity in order to checkpoint a task
1 overhead introduced
Can deal with both application level errors or
Grid level errors
Most important when an unrecoverable error
appears
Transparent for end users

57
4.2 Retry mechanisms
C
C
Middleware
Automatic drop of machines
58
4.2 Retry mechanisms
Soft timeout
Failure
Success
Middleware
Soft timeout
Hard timeout
Soft and hard timeouts for tasks
59
4.2 Retry mechanisms
Success
Failure
Failure
Success
Middleware
Retry of operations
60
4.2 Retry mechanisms

Conclusions
Keep running despite failures
Dynamic when and where to resubmit
Detects performance degradations
No overhead when no failures are detected
Transparent for end users

61
4.3 Task replication
0
1
2
1
0
2
1
3
4
5
6
7
Middleware
Replicate running tasks depending on successors
62
4.3 Task replication
0
1
2
1
0
3
4
5
6
7
Middleware
Replicate running tasks to speed up the execution
63
4.3 Task replication

Conclusions
Dynamic replication application level knowledge
is used (the workflow)
Replication can deal with failures hiding retry
overhead
Replication can speed up applications in
heterogeneous Grids
Transparent for end users
Drawback increased usage of resources

64
4. Fault tolerance features

Publications
Vasilis Dialinos, Rosa M. Badia, Raül Sirvent,
Josep M. Pérez and Jesús Labarta, "Implementing
Phylogenetic Inference with GRID superscalar",
Cluster Computing and Grid 2005 (CCGRID 2005),
Cardiff, UK, 2005.
Raül Sirvent, Rosa M. Badia and Jesús Labarta,
"Graph-based task replication for workflow
applications", Submitted, HPCC 2009.

65
Outline