Linux checkpoint/restart - PowerPoint PPT Presentation

About This Presentation
Title:

Linux checkpoint/restart

Description:

An Overview of Berkeley Lab s Linux Checkpoint/Restart (BLCR) Paul Hargrove with Jason Duell and Eric Roman January 13th, 2004 (Based on s by Jason Duell) – PowerPoint PPT presentation

Number of Views:51
Avg rating:3.0/5.0
Slides: 13
Provided by: crdLblGov
Learn more at: https://crd.lbl.gov
Category:

less

Transcript and Presenter's Notes

Title: Linux checkpoint/restart


1
(No Transcript)
2
Linux checkpoint/restart
  • Outline
  • Project goals
  • System design
  • Entension interface
  • Current status
  • Future work

3
Uses of Checkpoint/Restart
  • Gang scheduling
  • No queue drain for maintenance, policy change
  • Higher utilization and/or more flexible
    scheduling
  • Process migration
  • Save job if node failure imminent
  • Pack jobs for optimal network performance
  • Periodic backup
  • Not our main focus
  • Application can always do more efficiently
  • But may be useful for systems with long jobs,
    fast I/O, and/or high node failure rates

4
Implementation Strategies
  • Application-based checkpointing
  • Efficient save only needed data as step
    completes
  • Good for fault tolerance bad for preemption
  • Requires per-application effort by programmer
  • Library-based checkpointing
  • Portable across operating systems
  • Transparent to application (but may require
    relink, etc.)
  • Can't (generally) restore all resources (ex
    process IDs)
  • Cant checkpoint shell scripts
  • Kernel-based checkpointing
  • Not portable, and harder to implement
  • Can save/restore all resources

5
Design Goals
  • Target parallel scientific applications
  • MPI is a must
  • But allow support for other programs/models, too
  • Esoteric features (ptrace, Unix domain sockets)
    have lower implementation priority
  • Implemention Linux kernel module
  • lower barrier to adoption than kernel patch
  • Allows upgrades, bug fixes, without reboot

6
Design Goals II
  • Provide toolkit for distributed C/R
  • We provide single node checkpoint/restart
  • We dont support distributed operating system
    features
  • No built-in support for TCP sockets, bproc
    namespaces, etc.
  • We provide hooks to allow parallel
    runtimes/libraries to implement distributed
    checkpoint/restart
  • So the MPI library needs to know about
    checkpointing, but user applications dont

7
Extension Interface
  • Callback functions
  • Registered at startup (or as needed)
  • Run at checkpoint time, then resume at
    restart/continue
  • Handle parallel coordination and/or unsupported
    objects
  • Two types of callbacks
  • Signal handler context
  • Run with same PID (LinuxThreads) no
    thread-safety needed
  • But callback limited to calling signal-safe
    functions (small subset of POSIX)
  • Separate thread context
  • Can call any function
  • But code needs to be thread-safe, and separate
    PID (LinuxThreads)
  • Critical sections
  • Use to protect uncheckpointable sections of code

8
Current Status
  • Support LAM-MPI jobs
  • Both TCP and Myrinet supported
  • Infrastructure in place for Infiniband, Quadrics
  • Process migration currently must restart whole
    job
  • Simple semantics for open files
  • Reopen and seek to original position
  • Must be regular files (pipe support coming soon)
  • Files must exist in same location on filesystem
  • Single- and multi-threaded processes
  • checkpoint of mpirun checkpoints whole MPI job
  • Will support process groups, sessions in future
  • Restore original PID

9
Current status II
  • Work with wide variety of 2.4 kernels
  • kernel.org versions 2.4.3 onwards
  • RedHat 7.2 through 9
  • SuSE 7.2 through 9.0
  • autoconf feature probing, so support of custom
    patched kernels likely to be automatic
  • well maintain 2.4 support once 2.6 comes out
  • Support both new and old pthreads
  • I.e., old LinuxThreads, plus new 2.6 pthreads
    (backported to 2.4 by Red Hat)

10
Future Work
  • Support for sessions process groups
  • Including pipes, mmaps, etc., shared within group
  • Full restoration of parent/child tree, with
    original PIDs
  • More semantics for files
  • Allow checksum of file, with restart error if it
    has changed
  • Allow saving contents of file (restore either
    clobbers, or opens anonymously)
  • Support files that are not open at checkpoint
    time, but are specified as being part of the
    checkpoint
  • Laundry list of other resources to support
  • Page 4 of Design and Implementation paper

11
Future Work II
  • Integration with parallel job systems
  • Funded to work within suite from DOE Scalable
    systems software SciDAC. Work is in progress.
  • Possibility of OpenPBS, PBSPro support
  • Interested in others (LSF, SGE, SLURM, etc.)
  • More MPI implementations
  • MPICH 2 support anticipated
  • Vendor support (Quadrics)?
  • LAM/MPI support for partial/live migration

12
Conclusion
  • http//ftg.lbl.gov/checkpoint
  • Papers (available from website)
  • Design and Implementation of BLCR high-level
    system design, including description of user API
  • Requirements for Linux Checkpoint/Restart
    exhaustive list of Unix features we will support
    (or not).
  • A Survey of Checkpoint/Restart Implementations
    focusing on open source versions that run on
    Linux
  • The LAM/MPI Checkpoint/Restart Framework
    System-Initiated Checkpointing implementation
    with LAM/MPI
Write a Comment
User Comments (0)
About PowerShow.com