GRID superscalar: a programming model for the Grid - PowerPoint PPT Presentation

About This Presentation
Title:

GRID superscalar: a programming model for the Grid

Description:

2.1 Design. Application's architecture: a master-worker paradigm ... rsl = '&(executable=/home/user/sim)(arguments=input1.txt output1.txt) ... – PowerPoint PPT presentation

Number of Views:29
Avg rating:3.0/5.0
Slides: 68
Provided by: raulsi
Category:

less

Transcript and Presenter's Notes

Title: GRID superscalar: a programming model for the Grid


1
GRID superscalar a programming model for the Grid
Doctoral ThesisComputer Architecture
DepartmentTechnical University of Catalonia
  • Raül Sirvent Pardell
  • Advisor Rosa M. Badia Sala

2
Outline
  1. Introduction
  2. Programming interface
  3. Runtime
  4. Fault tolerance at the programming model level
  5. Conclusions and future work

3
Outline
  • Introduction
  • 1.1 Motivation
  • 1.2 Related work
  • 1.3 Thesis objectives and contributions
  • Programming interface
  • Runtime
  • Fault tolerance at the programming model level
  • Conclusions and future work

4
1.1 Motivation
  • The Grid architecture layers

Applications
Grid Middleware (Job management, Data transfer,
Security, Information, QoS, ...)
Distributed Resources
5
1.1 Motivation
  • What middleware should I use?

6
1.1 Motivation
  • Programming tools are they easy?

Grid AWARE
Grid UNAWARE
VS.
7
1.1 Motivation
  • Can I run my programs in parallel?

Explicit parallelism
Implicit parallelism
VS.
for(i0 i lt MSIZE i) for(j0 j lt MSIZE
j) for(k0 k lt MSIZE k)
matmul(A(i,k), B(k,j), C(i,j))
fork
Draw it by hand means explicit
join

8
1.1 Motivation
  • The Grid a massive, dynamic and heterogeneous
    environment prone to failures
  • Study different techniques to detect and overcome
    failures
  • Checkpoint
  • Retries
  • Replication

9
1.2 Related work
System / Features Grid unaware Implicit parallelism Language
Triana No No Graphical
Satin Yes No Java
ProActive Partial Partial Java
Pegasus Yes Partial VDL
Swift Yes Partial SwiftScript
10
1.3 Thesis objectives and contributions
  • Objective create a programming model for the
    Grid
  • Grid unaware
  • Implicit parallelism
  • Sequential programming
  • Allows to use well-known imperative languages
  • Speed up applications
  • Include fault detection and recovery

11
1.3 Thesis objectives and contributions
  • Contribution GRID superscalar
  • Programming interface
  • Runtime environment
  • Fault tolerance features

12
Outline
  • Introduction
  • Programming interface
  • 2.1 Design
  • 2.2 User interface
  • 2.3 Programming comparison
  • Runtime
  • Fault tolerance at the programming model level
  • Conclusions and future work

13
2.1 Design
  • Interface objectives
  • Grid unaware
  • Implicit parallelism
  • Sequential programming
  • Allows to use well-known imperative languages

14
2.1 Design
  • Target applications
  • Algorithms which may be easily splitted in tasks
  • Branch and bound computations, divide and conquer
    algorithms, recursive algorithms,
  • Coarse grained tasks
  • Independent tasks
  • Scientific workflows, optimization algorithms,
    parameter sweep
  • Main parameters FILES
  • External simulators, finite element solvers,
    BLAST, GAMESS

15
2.1 Design
  • Applications architecture a master-worker
    paradigm
  • Master-worker parallel paradigm fits with our
    objectives
  • Main program the master
  • Functions workers
  • Function Generic representation of a task
  • Glue to transform a sequential application into a
    master-worker application stubs skeletons
    (RMI, RPC, )
  • Stub call to runtime interface
  • Skeleton binary which calls to the user function

16
2.1 Design
void matmul(char f1, char f2, char f3)
getBlocks(f1, f2, f3, A, B, C) for (i 0 i lt
A-gtrows i) for (j 0 j lt B-gtcols j)
for (k 0 k lt A-gtcols k)
C-gtdataij A-gtdataik B-gtdatakj
putBlocks(f1, f2, f3, A, B, C)
for(i0 i lt MSIZE i) for(j0 j lt MSIZE
j) for(k0 k lt MSIZE k)
matmul(A(i,k), B(k,j), C(i,j))
Local scenario
17
2.1 Design
app.c
app-functions.c
app-functions.c
app-functions.c
app-functions.c
app-functions.c
app-functions.c
Middleware
Master-Worker paradigm
18
2.1 Design
  • Intermediate language concept assembler code
  • In GRIDSs
  • The Execute generic interface
  • Instruction set is defined by the user
  • Single entry point to the runtime
  • Allows easy building of programming language
    bindings (Java, Perl, Shell Script)
  • Easier technology adoption

C, C,
Assembler
Processor execution
C, C,
Workflow
Grid execution
19
2.2 User interface
  • Steps to program an application
  • Task definition
  • Identify those functions/programs in the
    application that are going to be executed in the
    computational Grid
  • All parameters must be passed in the header
    (remote execution)
  • Interface Definition Language (IDL)
  • For every task defined, identify which parameters
    are input/output files and which are input/output
    scalars
  • Programming API master and worker
  • Write the main program and the tasks using GRIDSs
    API

20
2.2 User interface
  • Interface Definition Language (IDL) file
  • CORBA-IDL like interface
  • in/out/inout files
  • in/out/inout scalar values
  • The functions listed in this file will be
    executed in the Grid

interface MATMUL void matmul(in File f1, in
File f2, inout File f3)
21
2.2 User interface
  • Programming API master and worker

app.c
app-functions.c
  • Master side
  • GS_On
  • GS_Off
  • GS_FOpen/GS_FClose
  • GS_Open/GS_Close
  • GS_Barrier
  • GS_Speculative_End
  • Worker side
  • GS_System
  • gs_result
  • GS_Throw

22
2.2 User interface
  • Tasks constraints and cost specification
  • Constraints allow to specify the needs of a task
    (CPU, memory, architecture, software, )
  • Build an expression in a constraint function
    (evaluated for every machine)
  • Cost estimated execution time of a task (in
    seconds)
  • Useful for scheduling
  • Calculate it in a cost function
  • GS_GFlops / GS_Filesize may be used
  • An external estimator can be also called

other.Mem 1024
cost operations / GS_GFlops()
23
2.3 Programming comparison
  • Globus vs GRIDSs

Grid-aware
int main() rsl "(executable/home/user/si
m)(argumentsinput1.txt output1.txt)
(file_stage_in(gsiftp//bscgrid01.bsc.es/path/inp
ut1.txt home/user/input1.txt))(file_stage_out/hom
e/user/output1.txt gsiftp//bscgrid01.bsc.es/path/
output1.txt)(file_clean_up/home/user/input1.txt
/home/user/output1.txt)" globus_gram_client_j
ob_request(bscgrid02.bsc.es, rsl, NULL, NULL)
rsl "(executable/home/user/sim)(argumentsin
put2.txt output2.txt) (file_stage_in(gsiftp//bsc
grid01.bsc.es/path/input2.txt /home/user/input2.tx
t))(file_stage_out/home/user/output2.txt
gsiftp//bscgrid01.bsc.es/path/output2.txt)(file_c
lean_up/home/user/input2.txt /home/user/output2.t
xt)" globus_gram_client_job_request(bscgrid03
.bsc.es, rsl, NULL, NULL) rsl
"(executable/home/user/sim)(argumentsinput3.txt
output3.txt) (file_stage_in(gsiftp//bscgrid01.b
sc.es/path/input3.txt /home/user/input3.txt))(file
_stage_out/home/user/output3.txt
gsiftp//bscgrid01.bsc.es/path/output3.txt)(file_c
lean_up/home/user/input3.txt /home/user/output3.t
xt)" globus_gram_client_job_request(bscgrid04
.bsc.es, rsl, NULL, NULL)
Explicit parallelism
24
2.3 Programming comparison
  • Globus vs GRIDSs

void sim(File input, File output) command
"/home/user/sim " input ' ' output
gs_result GS_System(command) int main()
GS_On() sim("/path/input1.txt",
"/path/output1.txt") sim("/path/input2.txt",
"/path/output2.txt") sim("/path/input3.txt",
"/path/output3.txt") GS_Off(0)
25
2.3 Programming comparison
  • DAGMan vs GRIDSs

A
B
C
D
Explicit parallelism
int main() GS_On() task_A(f1, f2,
f3) task_B(f2, f4) task_C(f3, f5)
task_D(f4, f5, f6) GS_Off(0)
No if/while clauses
JOB A A.condor JOB B B.condor JOB C
C.condor JOB D D.condor PARENT A CHILD B
C PARENT B C CHILD D
26
2.3 Programming comparison
  • Ninf-G vs GRIDSs

Grid-aware
int main() grpc_initialize("config_file")
grpc_object_handle_init_np("A", A_h,
"class") grpc_object_handle_init_np("B",
B_h," class") for(i 0 i lt 25 i)
grpc_invoke_async_np(A_h,"foo",sid,f_in
2i,f_out2i) grpc_invoke_async_np(B_
h,"foo",sid,f_in2i1,f_out2i1)
grpc_wait_all() grpc_object_handle_dest
ruct_np(A_h) grpc_object_handle_destruct_np(
B_h) grpc_finalize()
Explicit parallelism
int main() GS_On() for(i 0 i lt 50
i) foo(f_ini, f_outi) GS_Off(0)
27
2.3 Programming comparison
  • VDL vs GRIDSs

No if/while clauses
DV trans1( a2_at_outputtmp.0, a1_at_inputfilein.0
) DV trans2( a2_at_outputfileout.0,
a1_at_inputtmp.0 ) DV trans1(
a2_at_outputtmp.1, a1_at_inputfilein.1 ) DV
trans2( a2_at_outputfileout.1, a1_at_inputtmp.1
) ... DV trans1( a2_at_outputtmp.999,
a1_at_inputfilein.999 ) DV trans2(
a2_at_outputfileout.999, a1_at_inputtmp.999 )
int main() GS_On() for(i 0 i lt
1000 i) tmp "tmp." i filein
"filein." i fileout "fileout."
i trans1(tmp, filein) trans2(fileout, tmp)
GS_Off(0)
28
Outline
  • Introduction
  • Programming interface
  • Runtime
  • 3.1 Scientific contributions
  • 3.2 Developments
  • 3.3 Evaluation tests
  • Fault tolerance at the programming model level
  • Conclusions and future work

29
3.1 Scientific contributions
  • Runtime objectives
  • Extract implicit parallelism in sequential
    applications
  • Speed up execution using the Grid
  • Main requirement Grid middleware
  • Job management
  • Data transfer
  • Security

30
3.1 Scientific contributions
  • Apply computer architecture knowledge to the Grid
    (superscalar processor)

? ns ? seconds/minutes/hours
31
3.1 Scientific contributions
  • Data dependence analysis allow parallelism

task1(..., f1)
Read after Write
task2(f1, ...)
task1(f1, ...)
Write after Read
task2(..., f1)
task1(..., f1)
Write after Write
task2(..., f1)
32
3.1 Scientific contributions
for(i0 i lt MSIZE i) for(j0 j lt MSIZE
j) for(k0 k lt MSIZE k)
matmul(A(i,k), B(k,j), C(i,j))
matmul(A(0,0), B(0,0), C(0,0))
k 0
i 0 j 0
matmul(A(0,1), B(1,0), C(0,0))
k 1
matmul(A(0,2), B(2,0), C(0,0))
k 2
matmul(A(0,0), B(0,0), C(0,1))
...
k 0
i 0 j 1
k 1
matmul(A(0,1), B(1,0), C(0,1))
k 2
matmul(A(0,2), B(2,0), C(0,1))
33
3.1 Scientific contributions
for(i0 i lt MSIZE i) for(j0 j lt MSIZE
j) for(k0 k lt MSIZE k)
matmul(A(i,k), B(k,j), C(i,j))
i 0 j 2
i 1 j 0
i 1 j 1
i 1 j 2
matmul(A(0,0), B(0,0), C(0,0))
k 0
i 0 j 0
matmul(A(0,1), B(1,0), C(0,0))
k 1
matmul(A(0,2), B(2,0), C(0,0))
k 2
...
...
matmul(A(0,0), B(0,0), C(0,1))
k 0
i 0 j 1
k 1
matmul(A(0,1), B(1,0), C(0,1))
k 2
matmul(A(0,2), B(2,0), C(0,1))
34
3.1 Scientific contributions
  • File renaming increase parallelism

task1(..., f1)
Read after Write
Unavoidable
task2(f1, ...)
task1(f1, ...)
Write after Read
Avoidable
task2(..., f1)
task2(..., f1_NEW)
task1(..., f1)
Avoidable
Write after Write
task2(..., f1)
task2(..., f1_NEW)
35
3.2 Developments
  • Basic functionality
  • Job submission (middleware usage)
  • Select sources for input files
  • Submit, monitor or cancel jobs
  • Results collection
  • API implementation
  • GS_On read configuration file and environment
  • GS_Off wait for tasks, cleanup remote data, undo
    renaming
  • GS_(F)Open create a local task
  • GS_(F)Close notify end of local task
  • GS_Barrier wait for all running tasks to finish
  • GS_System translate path
  • GS_Speculative_End barrier until throw. If
    throw, discard tasks from throw to
    GS_Speculative_End
  • GS_Throw use gs_result to notify it

36
3.2 Developments
...
Middleware
Task scheduling Direct Acyclic Graph
37
3.2 Developments
  • Task scheduling resource brokering
  • A resource broker is needed (but not an
    objective)
  • Grid configuration file
  • Information about hosts (hostname, limit of jobs,
    queue, working directory, quota, )
  • Initial set of machines (can be changed
    dynamically)

lt?xml version"1.0" encoding"UTF-8"?gt ltproject
isSimple"yes" masterBandwidth"100000"
masterBuildScript"" masterInstallDir"/home/rsirv
ent/matmul-master" masterName"bscgrid01.bsc.es"
masterSourceDir"/datos/GRID-S/GT4/doc/examples/ma
tmul" name"matmul" workerBuildScript""
workerSourceDir"/datos/GRID-S/GT4/doc/examples/ma
tmul"gt ... ltworkersgt ltworker Arch"x86"
GFlops"5.985" LimitOfJobs"2" Mem"1024"
NCPUs"2" NetKbps"100000" OpSys"Linux"
Queue"none" Quota"0" deploymentStatus"deployed"
installDir"/home/rsirvent/matmul-worker"
name"bscgrid01.bsc.es"gt
38
3.2 Developments
  • Task scheduling resource brokering
  • Scheduling policy
  • Estimation of total execution time of a single
    task
  • FileTransferTime time to transfer needed files
    to a resource (calculated with the hosts
    information and the location of files)
  • Select fastest source for a file
  • ExecutionTime estimation of the tasks run time
    in a resource. An interface function (can be
    calculated, or estimated by an external entity)
  • Select fastest resource for execution
  • Smallest estimation is selected

39
3.2 Developments
  • Task scheduling resource brokering
  • Match task constraints and machine capabilities
  • Implemented using the ClassAd library
  • Machine offers capabilities (from Grid
    configuration file memory, architecture, )
  • Task demands capabilities
  • Filter candidate machines for a particular task

SoftwareList BLAST, GAMESS
Software BLAST
SoftwareList GAMESS
40
3.2 Developments
f3
f3
Middleware
Task scheduling File locality
41
3.2 Developments
  • Other file locality exploitation mechanisms
  • Shared input disks
  • NFS or replicated data
  • Shared working directories
  • NFS
  • Erasing unused versions of files (decrease disk
    usage)
  • Disk quota control (locality increases disk usage
    and quota may be lower than expected)

42
3.3 Evaluation
NAS Grid Benchmarks Representative benchmark, includes different types of workflows which emulate a wide range of Grid Applications
Simple optimization example Representative of optimization algorithms, workflow with two-level synchronization
New product and process development Production application, workflow with parallel chains of computation
Potential energy hypersurface for acetone Massively parallel, long running application
Protein comparison Production application, big computational challenge, massively parallel, high number of tasks
fastDNAml Well-known application in the context of MPI for Grids, workflow with synchronization steps
43
3.3 Evaluation
  • NAS Grid Benchmarks

HC
ED
MB
VP
44
3.3 Evaluation
  • Run with classes S, W, A (2 machines x 4 CPUs)
  • VP benchmark must be analyzed in detail (does not
    scale up to 3 CPUs)

45
3.3 Evaluation
  • Performance analysis
  • GRID superscalar runtime instrumented
  • Paraver tracefiles from the client side
  • The lifecycle of all tasks has been studied in
    detail
  • Overhead of GRAM Job Manager polling interval

46
3.3 Evaluation
  • VP.S task assignment
  • 14.7 of the transfers when exploiting locality
  • VP is parallel, but its last part is sequentially
    executed

BT
MF
MG
MF
FT
BT
MF
MG
MF
FT
BT
MF
MG
MF
FT
Kadesh8
Khafre
Remote file transfers
47
3.3 Evaluation
  • Conclusion workflow and granularity are
    important to achieve speed up

48
3.3 Evaluation
  • Two-dimensional potential energy hypersurface for
    acetone as a function of the ?1, and ?2 angles

49
3.3 Evaluation
  • Number of executed tasks 1120
  • Each task between 45 and 65 minutes
  • Speed up 26.88 (32 CPUs), 49.17 (64 CPUs)
  • Long running test, heterogeneous and
    transatlantic Grid

22 CPUs
14 CPUs
28 CPUs
50
3.3 Evaluation
  • 15 million protein sequences have been compared
    using BLAST and GRID superscalar

Genomes
15 million Proteins
15 million Proteins
51
3.3 Evaluation
  • 100,000 tasks in 4000 CPUs ( 1,000 exclusive
    nodes)
  • Grid of 1,000 machines with very low latency
    between them
  • Stress test for the runtime
  • Avoids user to work with queuing system
  • Saves queuing system from handling a huge set of
    independent tasks

52
GRID superscalar programming interface and
runtime
  • Publications
  • Raül Sirvent, Josep M. Pérez, Rosa M. Badia,
    Jesús Labarta, "Automatic Grid workflow based on
    imperative programming languages", Concurrency
    and Computation Practice and Experience, John
    Wiley Sons, vol. 18, no. 10, pp. 1169-1186,
    2006.
  • Rosa M. Badia, Raul Sirvent, Jesus Labarta, Josep
    M. Perez, "Programming the GRID An Imperative
    Language-based Approach", Engineering The Grid
    Status and Perspective, Section 4, Chapter 12,
    American Scientific Publishers, January 2006.
  • Rosa M. Badia, Jesús Labarta, Raül Sirvent, Josep
    M. Pérez, José M. Cela and Rogeli Grima,
    "Programming Grid Applications with GRID
    Superscalar", Journal of Grid Computing, Volume
    1, Issue 2, 2003.

53
GRID superscalar programming interface and
runtime
  • Work related to standards
  • R.M. Badia, D. Du, E. Huedo, A. Kokossis, I. M.
    Llorente, R. S. Montero, M. de Palol, R. Sirvent,
    and C. Vázquez, "Integration of GRID superscalar
    and GridWay Metascheduler with the DRMAA OGF
    Standard", Euro-Par, 2008.
  • Raül Sirvent, Andre Merzky, Rosa M. Badia, Thilo
    Kielmann, "GRID superscalar and SAGA forming a
    high-level and platform-independent Grid
    programming environment", CoreGRID Integration
    Workshop. Integrated Research in Grid Computing,
    Pisa (Italy), 2005.

54
Outline
  • Introduction
  • Programming interface
  • Runtime
  • Fault tolerance at the programming model level
  • 4.1 Checkpointing
  • 4.2 Retry mechanisms
  • 4.3 Task replication
  • Conclusions and future work

55
4.1 Checkpointing
  • Inter-task checkpointing
  • Recovers sequential consistency in the
    out-of-order execution of tasks
  • Single version of every file is saved
  • No need to save any data structures in the
    runtime
  • Drawback some completed tasks may be lost
  • Application-level checkpoint can avoid this

3
0
1
2
3
4
5
6
56
4.1 Checkpointing
  • Conclusions
  • Low complexity in order to checkpoint a task
  • 1 overhead introduced
  • Can deal with both application level errors or
    Grid level errors
  • Most important when an unrecoverable error
    appears
  • Transparent for end users

57
4.2 Retry mechanisms
C
C
Middleware
Automatic drop of machines
58
4.2 Retry mechanisms
Soft timeout
Failure
Success
Middleware
Soft timeout
Hard timeout
Soft and hard timeouts for tasks
59
4.2 Retry mechanisms
Success
Failure
Failure
Success
Middleware
Retry of operations
60
4.2 Retry mechanisms
  • Conclusions
  • Keep running despite failures
  • Dynamic when and where to resubmit
  • Detects performance degradations
  • No overhead when no failures are detected
  • Transparent for end users

61
4.3 Task replication
0
1
2
1
0
2
1
3
4
5
6
7
Middleware
Replicate running tasks depending on successors
62
4.3 Task replication
0
1
2
1
0
3
4
5
6
7
Middleware
Replicate running tasks to speed up the execution
63
4.3 Task replication
  • Conclusions
  • Dynamic replication application level knowledge
    is used (the workflow)
  • Replication can deal with failures hiding retry
    overhead
  • Replication can speed up applications in
    heterogeneous Grids
  • Transparent for end users
  • Drawback increased usage of resources

64
4. Fault tolerance features
  • Publications
  • Vasilis Dialinos, Rosa M. Badia, Raül Sirvent,
    Josep M. Pérez and Jesús Labarta, "Implementing
    Phylogenetic Inference with GRID superscalar",
    Cluster Computing and Grid 2005 (CCGRID 2005),
    Cardiff, UK, 2005.
  • Raül Sirvent, Rosa M. Badia and Jesús Labarta,
    "Graph-based task replication for workflow
    applications", Submitted, HPCC 2009.

65
Outline
  1. Introduction
  2. Programming interface
  3. Runtime
  4. Fault tolerance at the programming model level
  5. Conclusions and future work

66
5. Conclusions and future work
  • Grid-unaware programming model
  • Transparent features for users, exploiting
    parallelism and failure treatment
  • Used in REAL systems and REAL applications
  • Some future research is already ONGOING (StarSs)

67
5. Conclusions and future work
  • Future work
  • Grid of supercomputers (Red Española de
    Supercomputación)
  • Higher scale tests (hundreds? thousands?)
  • More complex brokering
  • Resource discovery/monitoring
  • New scheduling policies based on the workflow
  • Automatic prediction of execution times
  • New policies for task replication
  • New architectures for StarSs
Write a Comment
User Comments (0)
About PowerShow.com