Hybrid Preemptive Scheduling of MPI applications - PowerPoint PPT Presentation

1 / 27

About This Presentation

Title:

Hybrid Preemptive Scheduling of MPI applications

Description:

Selective paging out (swapping out only pages of descheduled processes) ... Adaptive paging In (swapping in pages of scheduled process at once) ... – PowerPoint PPT presentation

Number of Views:32

Avg rating:3.0/5.0

Slides: 28

Provided by: lri

Category:

more less

Transcript and Presenter's Notes

Title: Hybrid Preemptive Scheduling of MPI applications

1
Hybrid Preemptive Scheduling of MPI applications

Aurélien Bouteiller, Hinde Lilia Bouziane, Thomas
Hérault,
Pierre Lemarinier, Franck Cappello
MPICH-V team
INRIA Grand-Large
LRI, University Paris South

1
2
Problem definition

Context Clusters and Grids (made of clusters)
shared by many users
(less available resources than required at a
given time)
In this study finite sets of MPI applications.
Time sharing of parallel applications is
attractive to increase fairness between users,
compared to Batch scheduling
It is very likely that several applications will
reside in the virtual memory at the same time,
exceeding the total physical memory
? Out-of-core scheduling of parallel applications
on clusters! (scheduling // applications on
cluster under mem. constraint)
Most of the proposed approaches tries to avoid
this situation (by limiting job admission based
on mem. requirement, delaying some jobs
unpredictably if the jobs exec. time is not
known)
Issue Novel approach (out-of-core) that avoid
delaying some jobs?
Constraint No OS modification (no kernel patch)

2
3
Outline

Introduction (related work)
A Hybrid approach dedicated to out-of-core
Evaluation
Concluding remarks

3
4
Related work 1
Scheduling parallel applications on distributed
memory machines ? a long history of research,
still very active (5 papers in 2004 in main
conferences IPDPS, Cluster, SC, Grid, Europar)!
Time
Co-scheduling all processes of each application
are scheduled independently (no coordination)
Gang-scheduling all processes of each
application are executed simultaneously (coordinat
ion)
Time
sometimes called co-scheduling
4
5
5
6
6
7
Outline

Introduction (related work)
A Hybrid approach dedicated to out-of-core
Evaluation
Concluding remarks

7
8
Our approach 1/2 Hybrid
8
9
Our approach 2/2 Checkpointing
9
10
Implementation using MPICH-V Framework
MPICH-V framework a set of components A MPICH-V
protocol a composition of a subset of these
components
node
10
11
11
12
Coordinated Checkpoint 2 ways
Ckpt. Image (P1)
Ckpt. Image (P1)
P1
12
13
MPICH-V/CL protocol
Reference protocol for coordinated checkpointing
13
14
Implementation details
Dispatchers

Co-scheduling Several Dispatchers (no
master/checkpoint scheduler)
Gang and (Hybrid) Master Scheduler several
checkpoint schedulers
Master Scheduler issues a checkpoint order to the
Checkpoint Scheduler(s) of running application(s)
When receiving this order, a Checkpoint Scheduler
launches a coordinated checkpoint. Every running
daemon computes the MPI process image and store
it on the local disc. All daemons send a
completion message to the Checkpoint Scheduler.
All running daemons stop the MPI process and
their execution
The Master Scheduler selects the Checkpoint
Scheduler(s) of other application(s) and sends a
restart order. Every Checkpoint Scheduler
receiving this order spawns new daemons
restarting MPI processes from local images.

14
15
Outline

Introduction (related work)
A Hybride approach dedicated to out-of-core
Evaluation
Concluding remarks

15
16
Methodology

LRI cluster
Athlon 1800
1GB memory
IDE ATA100 Disc
Ethernet 100Mbs
Linux 2.4.2
Benchmark (MPI)
NAS BT (computation bound)
NAS CG (communication bound)
Time measurement
Homogeneous Applications
Simultaneous launch (scripts)
Time is measured between the first launch and the
last termination
Fairness is measured by response time standard
deviation
Gang Scheduling time slice 200 or 600 sec
Gang sched. also implemented by checkpointing
(not OS signal)

16
17
Context switch overlap policy
In core
Near out-of-core

Policies for NAS Bench. BT C- 25
Overlapping policies do not provide substantial
improvements for the in-core situation
2) They need 2x the memory capacity to stay
in-core.
the sequential policy is the best
We used it for the other xps.

lt3
2X
2X
2X
1X
2X
2X
1X
17
18
Co VS. Gang (Ckpt based)

Which scheduling strategy is the best for
communication bound and compute bound
applications?

Co-scheduling is the best for in-core executions
(but small advantage due to Checkpoint overhead
tinny Comm./comp. overlap)
Gang scheduling outperforms co-scheduling for
out-of-core (ckpt.)
? Memory constraint is managed by checkpointing
not by delaying jobs

18
19
Ckpt Gang VS. Ckpt Hybrid
19
20
Overhead comparison

What is the performance degradation due to time
sharing?

Gang and Hybrid scheduling add no performance
penalty to CG (and also no improvement),
Gang scheduling add 10 performance penalty to
BT,
Hybrid scheduling improves the performance by
almost 10,
Difference is mostly due to communication/computat
ion overlap.

20
21
Co-scheduling Fairness (Linux)

How fair is co-scheduling for in-core and
out-of-core?
? Response time of BT 9 with modified memory sizes

Page miss statistics for 7 and 9 BT C 25
(out-of-core)
21
22
Outline

Introduction (related work)
A Hybrid approach dedicated to out-of-core
Evaluation
Concluding remarks

22
23
Concluding remarks

Checkpoint based Gang Scheduling outperforms
Co-scheduling and certainly classical (OS signal
based) Gang scheduling on out-of-core situation
(thanks to a better memory management)
Compared to known approaches, based on job
admission control, the benefit of ckpt is that it
avoids to delay some jobs
Hybrid scheduling, combining the two approaches
checkpointing, outperforms Gang scheduling on BT
(presumably thanks to overlapping communications
and computations)
More generally, Hybrid scheduling can take
advantage of advanced co-scheduling approaches
within a gang subset
Work in progress
Test with other applications / benchmarks
Compare with traditional gang scheduling based on
OS signals
Experiments with high speed networks
Experiments on Hybrid scheduling with
Co-scheduling optimizations

23
24
Meet us! at the INRIA booth 2345
INRIA Booth 2345
Mail contact bouteiller_at_mpich-v.net
25
References
Ag03 S. Agarwal, G. Choi, C. R. Das, A. B. Yoo,
and S. Nagar. Co-ordinated Coscheduling in
time-sharing Clusters through a Generic
Framework. In Proceedings of International
Conference on Cluster Computing, December
2003. Ar98 A. C. Arpaci-Dusseau, D. E. Culler,
and A. M. Mainwaring. Implicit Scheduling With
Implicit Information in Distributed Systems. In
Proceedings of the 1998 ACM SIGMETRICS joint
International Conference on Measurement and
Modeling of Computer Systems, pages 233243, June
1998. Ba00 Anat Batat and Dror G. Feitelson,
Gang Scheduling with Memory Considerations , in
proceedings of IPDPS 2000. Bo03 Aurélien
Bouteiller, Pierre Lemarinier, Géraud Krawezik,
and Franck Cappello, Coordinated checkpoint
versus message log for fault tolerant MPI , In
IEEE International Conference on Cluster
Computing (Cluster 2003). IEEE CS Press, december
2003. Ch85 K. M. Chandy and L.Lamport,
Distributed snapshots Determining global states
of distributed systems In Transactions on
Computer Systems, volume 3(1), pages 6375. ACM,
February 1985. Fe98 D. G. Feitelson and L.
Rudolph, Metrics and Benchmarking for Parallel
Job Scheduling. In Job Scheduling Strategies for
Parallel Processing, LNCS vol. 1495, pp. 124,
Springer-Verlag, Mar 1998. Fr03 Eitan
Frachtenberg, Dror G. Feitelson, Fabrizio Petrini
and Juan Fernandez, Flexible CoScheduling
Mitigating Load Imbalance and Improving
Utilization of Heterogeneous Resources, IPDPS
2003 Ho98 Atsushi Hori, Hiroshi Tezuka, and
Yutaka Ishikawa, Overhead analysis of
preemptive gang scheduling , Lecture Notes in
Computer Science, 1459 217230, April
1998. Ky04 Kyung Dong Ryu, Nimish Pachapurkar,
Liana L. Fong, Adaptive Memory Paging for
Efficient Gang Scheduling of Parallel
Applications, in proceedings of IPDPS
2004. Na99 S. Nagar, A. Banerjee, A.
Sivasubramaniam, and C. R. Das. Alternatives to
Coscheduling a Network of Workstations. Journal
of Parallel and Distributed Computing,
59(2)302327, November 1999. Ni02 Dimitrios
S. Nikolopoulos and Constantine D.
Polychronopoulos, Adaptive Scheduling under
Memory Pressure on Multiprogrammed Clusters,
CCGRID 2002 Sa04 Gyu Sang Choi, Jin-Ha Kim,
Deniz Ersoz, Andy B. Yoo, Chita R. Das,
Coscheduling in Clusters Is It a Viable
Alternative?, to appear in SC2004 Se99 S.
Setia, M. S. Squillante, and V. K. Naik. The
Impact of Job Memory Requirements on
Gang-Scheduling Performance. ACM SIGMETRICS
Performance Evaluation Review, 26(4)3039,
1999. So98 P. G. Sobalvarro, S. Pakin, W. E.
Weihl, and A. A. Chien. Dynamic Coscheduling on
Workstation Clusters. In Proceedings of the IPPS
Workshop on Job Scheduling Strategies for
Parallel Processing, pages 231256, March
1998. St04 Peter Strazdins and John Uhlmann,
Local scheduling outperforms gang scheduling on a
beowulf cluster Technical report, Department of
Computer Science, Australian National University,
January 2004, to appear in Cluster 2004. Wi03
Yair Wiseman, Dror G. Feitelson, Paired Gang
Scheduling , IEEE TPDS, June 2003
26
26
27
Is result for in-core situationKernel dependent
(Linux)?
Kernel 2.4.2 was used in our experiment How time
sharing efficiency evolves with Linux kernel
maturation (from 2.4 to 2.6)?
27

Write a Comment

User Comments (0)