Hybrid Preemptive Scheduling of MPI applications - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

Hybrid Preemptive Scheduling of MPI applications

Description:

Selective paging out (swapping out only pages of descheduled processes) ... Adaptive paging In (swapping in pages of scheduled process at once) ... – PowerPoint PPT presentation

Number of Views:32
Avg rating:3.0/5.0
Slides: 28
Provided by: lri
Category:

less

Transcript and Presenter's Notes

Title: Hybrid Preemptive Scheduling of MPI applications


1
Hybrid Preemptive Scheduling of MPI applications
  • Aurélien Bouteiller, Hinde Lilia Bouziane, Thomas
    Hérault,
  • Pierre Lemarinier, Franck Cappello
  • MPICH-V team
  • INRIA Grand-Large
  • LRI, University Paris South

1
2
Problem definition
  • Context Clusters and Grids (made of clusters)
    shared by many users
  • (less available resources than required at a
    given time)
  • In this study finite sets of MPI applications.
  • Time sharing of parallel applications is
    attractive to increase fairness between users,
    compared to Batch scheduling
  • It is very likely that several applications will
    reside in the virtual memory at the same time,
    exceeding the total physical memory
  • ? Out-of-core scheduling of parallel applications
    on clusters! (scheduling // applications on
    cluster under mem. constraint)
  • Most of the proposed approaches tries to avoid
    this situation (by limiting job admission based
    on mem. requirement, delaying some jobs
  • unpredictably if the jobs exec. time is not
    known)
  • Issue Novel approach (out-of-core) that avoid
    delaying some jobs?
  • Constraint No OS modification (no kernel patch)

2
3
Outline
  • Introduction (related work)
  • A Hybrid approach dedicated to out-of-core
  • Evaluation
  • Concluding remarks

3
4
Related work 1
Scheduling parallel applications on distributed
memory machines ? a long history of research,
still very active (5 papers in 2004 in main
conferences IPDPS, Cluster, SC, Grid, Europar)!
Time
Co-scheduling all processes of each application
are scheduled independently (no coordination)
Gang-scheduling all processes of each
application are executed simultaneously (coordinat
ion)
Time
sometimes called co-scheduling
4
5
5
6
6
7
Outline
  • Introduction (related work)
  • A Hybrid approach dedicated to out-of-core
  • Evaluation
  • Concluding remarks

7
8
Our approach 1/2 Hybrid
8
9
Our approach 2/2 Checkpointing
9
10
Implementation using MPICH-V Framework
MPICH-V framework a set of components A MPICH-V
protocol a composition of a subset of these
components
node
10
11
11
12
Coordinated Checkpoint 2 ways
Ckpt. Image (P1)
Ckpt. Image (P1)
P1
12
13
MPICH-V/CL protocol
Reference protocol for coordinated checkpointing
13
14
Implementation details
Dispatchers
  • Co-scheduling Several Dispatchers (no
    master/checkpoint scheduler)
  • Gang and (Hybrid) Master Scheduler several
    checkpoint schedulers
  • Master Scheduler issues a checkpoint order to the
    Checkpoint Scheduler(s) of running application(s)
  • When receiving this order, a Checkpoint Scheduler
    launches a coordinated checkpoint. Every running
    daemon computes the MPI process image and store
    it on the local disc. All daemons send a
    completion message to the Checkpoint Scheduler.
  • All running daemons stop the MPI process and
    their execution
  • The Master Scheduler selects the Checkpoint
    Scheduler(s) of other application(s) and sends a
    restart order. Every Checkpoint Scheduler
    receiving this order spawns new daemons
    restarting MPI processes from local images.

14
15
Outline
  • Introduction (related work)
  • A Hybride approach dedicated to out-of-core
  • Evaluation
  • Concluding remarks

15
16
Methodology
  • LRI cluster
  • Athlon 1800
  • 1GB memory
  • IDE ATA100 Disc
  • Ethernet 100Mbs
  • Linux 2.4.2
  • Benchmark (MPI)
  • NAS BT (computation bound)
  • NAS CG (communication bound)
  • Time measurement
  • Homogeneous Applications
  • Simultaneous launch (scripts)
  • Time is measured between the first launch and the
    last termination
  • Fairness is measured by response time standard
    deviation
  • Gang Scheduling time slice 200 or 600 sec
  • Gang sched. also implemented by checkpointing
    (not OS signal)

16
17
Context switch overlap policy
In core
Near out-of-core
  • Policies for NAS Bench. BT C- 25
  • Overlapping policies do not provide substantial
  • improvements for the in-core situation
  • 2) They need 2x the memory capacity to stay
    in-core.
  • the sequential policy is the best
  • We used it for the other xps.

lt3
2X
2X
2X
1X
2X
2X
1X
17
18
Co VS. Gang (Ckpt based)
  • Which scheduling strategy is the best for
    communication bound and compute bound
  • applications?
  • Co-scheduling is the best for in-core executions
    (but small advantage due to Checkpoint overhead
    tinny Comm./comp. overlap)
  • Gang scheduling outperforms co-scheduling for
    out-of-core (ckpt.)
  • ? Memory constraint is managed by checkpointing
    not by delaying jobs

18
19
Ckpt Gang VS. Ckpt Hybrid
19
20
Overhead comparison
  • What is the performance degradation due to time
    sharing?
  • Gang and Hybrid scheduling add no performance
    penalty to CG (and also no improvement),
  • Gang scheduling add 10 performance penalty to
    BT,
  • Hybrid scheduling improves the performance by
    almost 10,
  • Difference is mostly due to communication/computat
    ion overlap.

20
21
Co-scheduling Fairness (Linux)
  • How fair is co-scheduling for in-core and
    out-of-core?
  • ? Response time of BT 9 with modified memory sizes

Page miss statistics for 7 and 9 BT C 25
(out-of-core)
21
22
Outline
  • Introduction (related work)
  • A Hybrid approach dedicated to out-of-core
  • Evaluation
  • Concluding remarks

22
23
Concluding remarks
  • Checkpoint based Gang Scheduling outperforms
    Co-scheduling and certainly classical (OS signal
    based) Gang scheduling on out-of-core situation
    (thanks to a better memory management)
  • Compared to known approaches, based on job
    admission control, the benefit of ckpt is that it
    avoids to delay some jobs
  • Hybrid scheduling, combining the two approaches
    checkpointing, outperforms Gang scheduling on BT
    (presumably thanks to overlapping communications
    and computations)
  • More generally, Hybrid scheduling can take
    advantage of advanced co-scheduling approaches
    within a gang subset
  • Work in progress
  • Test with other applications / benchmarks
  • Compare with traditional gang scheduling based on
    OS signals
  • Experiments with high speed networks
  • Experiments on Hybrid scheduling with
    Co-scheduling optimizations

23
24
Meet us! at the INRIA booth 2345
INRIA Booth 2345
Mail contact bouteiller_at_mpich-v.net
25
References
Ag03 S. Agarwal, G. Choi, C. R. Das, A. B. Yoo,
and S. Nagar. Co-ordinated Coscheduling in
time-sharing Clusters through a Generic
Framework. In Proceedings of International
Conference on Cluster Computing, December
2003. Ar98 A. C. Arpaci-Dusseau, D. E. Culler,
and A. M. Mainwaring. Implicit Scheduling With
Implicit Information in Distributed Systems. In
Proceedings of the 1998 ACM SIGMETRICS joint
International Conference on Measurement and
Modeling of Computer Systems, pages 233243, June
1998. Ba00 Anat Batat and Dror G. Feitelson,
Gang Scheduling with Memory Considerations , in
proceedings of IPDPS 2000. Bo03 Aurélien
Bouteiller, Pierre Lemarinier, Géraud Krawezik,
and Franck Cappello, Coordinated checkpoint
versus message log for fault tolerant MPI , In
IEEE International Conference on Cluster
Computing (Cluster 2003). IEEE CS Press, december
2003. Ch85 K. M. Chandy and L.Lamport,
Distributed snapshots Determining global states
of distributed systems In Transactions on
Computer Systems, volume 3(1), pages 6375. ACM,
February 1985. Fe98 D. G. Feitelson and L.
Rudolph, Metrics and Benchmarking for Parallel
Job Scheduling. In Job Scheduling Strategies for
Parallel Processing, LNCS vol. 1495, pp. 124,
Springer-Verlag, Mar 1998. Fr03 Eitan
Frachtenberg, Dror G. Feitelson, Fabrizio Petrini
and Juan Fernandez, Flexible CoScheduling
Mitigating Load Imbalance and Improving
Utilization of Heterogeneous Resources, IPDPS
2003 Ho98 Atsushi Hori, Hiroshi Tezuka, and
Yutaka Ishikawa, Overhead analysis of
preemptive gang scheduling , Lecture Notes in
Computer Science, 1459 217230, April
1998. Ky04 Kyung Dong Ryu, Nimish Pachapurkar,
Liana L. Fong, Adaptive Memory Paging for
Efficient Gang Scheduling of Parallel
Applications, in proceedings of IPDPS
2004. Na99 S. Nagar, A. Banerjee, A.
Sivasubramaniam, and C. R. Das. Alternatives to
Coscheduling a Network of Workstations. Journal
of Parallel and Distributed Computing,
59(2)302327, November 1999. Ni02 Dimitrios
S. Nikolopoulos and Constantine D.
Polychronopoulos, Adaptive Scheduling under
Memory Pressure on Multiprogrammed Clusters,
CCGRID 2002 Sa04 Gyu Sang Choi, Jin-Ha Kim,
Deniz Ersoz, Andy B. Yoo, Chita R. Das,
Coscheduling in Clusters Is It a Viable
Alternative?, to appear in SC2004 Se99 S.
Setia, M. S. Squillante, and V. K. Naik. The
Impact of Job Memory Requirements on
Gang-Scheduling Performance. ACM SIGMETRICS
Performance Evaluation Review, 26(4)3039,
1999. So98 P. G. Sobalvarro, S. Pakin, W. E.
Weihl, and A. A. Chien. Dynamic Coscheduling on
Workstation Clusters. In Proceedings of the IPPS
Workshop on Job Scheduling Strategies for
Parallel Processing, pages 231256, March
1998. St04 Peter Strazdins and John Uhlmann,
Local scheduling outperforms gang scheduling on a
beowulf cluster Technical report, Department of
Computer Science, Australian National University,
January 2004, to appear in Cluster 2004. Wi03
Yair Wiseman, Dror G. Feitelson, Paired Gang
Scheduling , IEEE TPDS, June 2003
26
26
27
Is result for in-core situationKernel dependent
(Linux)?
Kernel 2.4.2 was used in our experiment How time
sharing efficiency evolves with Linux kernel
maturation (from 2.4 to 2.6)?
27
Write a Comment
User Comments (0)
About PowerShow.com