DMTCP: A New Linux Checkpointing Mechanism For Vanilla Universe Jobs PowerPoint PPT Presentation

presentation player overlay
About This Presentation
Transcript and Presenter's Notes

Title: DMTCP: A New Linux Checkpointing Mechanism For Vanilla Universe Jobs


1
DMTCP A New Linux Checkpointing Mechanism For
Vanilla Universe Jobs
2
Why DMTCP?
  • Why checkpoint at all?
  • Problems with Condors Standard Universe
  • Single process.
  • No pthreads.
  • No mmap() support.
  • Forced re-link to form a static executable.
  • DMTCP removes these restrictions!

3
What is DMTCP?
  • Distributed Multi-Threaded CheckPointing.
  • Works with Linux Kernel 2.6.9 and later.
  • Supports sequential and multi-threaded
    computations across single/multiple hosts.
  • Entirely in user space (no kernel modules or root
    privilege).
  • Transparent (no recompiling, no re-linking).
  • Written at Northeastern U. and MIT and under
    active development for 4 years.
  • LGPLd and freely available.
  • No Remote I/O.

4
Process Structure
Coordinator
Signal (USR2)
DMTCP
CT
CT
Process 1
Process N
T1
T1
T2
Network Socket
CT DMTCP checkpoint thread T User Thread
5
How Does It Work?
  • ./dmtcp_checkpoint a.out starts coordinator too
  • ./dmtcp_command c talks to coordinator
  • ./dmtcp_restart ckpt_a.out-.dmtcp
  • Coordinator is a stateless synchronization server
    for the distributed checkpointing algorithm.
  • Checkpoint/Restart performance related to size of
    memory, disk write speed, and synchronization.

6
How Does It Work?
  • LD_PRELOAD Transparently preloads checkpoint
    libraries which installs libc wrappers and
    checkpointing code.
  • SIGUSR2 Used internally from checkpoint thread
    to user threads.
  • Wrappers Only on less heavily used calls to libc
  • fork, exec, system, pipe, bind, listen,
    setsockopt, connect, accept, clone, close,
    ptsname, openlog, closelog, signal, sigaction,
    sigvec, sigblock, sigsetmask, sigprocmask,
    rt_sigprocmask, pthread_sigmask
  • Overhead is negligible.

7
How Does It Work?
  • Additional wrappers when process id thread id
    virtualization is enabled
  • getpid, getppid, gettid, tcgetpgrp, tcsetprgrp,
    getgrp, setpgrp, getsid, setsid, kill, tkill,
    tgkill, wait, waitpid, waitid, wait3, wait4

8
How Does It Work?
  • Checkpoint image compression on-the-fly
    (default).
  • Currently only supports dynamically linking to
    libc.so. Support for static libc.a is feasible,
    but not implemented.
  • Stays close to POSIX API standards.

9
A Checkpoint Under DMTCP
  • dmtcphijack.so mtcp.so present in executables
    memory.
  • Ask coordinator process for checkpoint via
    dmtcp_command.
  • Now what happens?

10
A Checkpoint Under DMTCP
  • Suspend user threads with SIGUSR2.
  • Elect shared file descriptor leaders.
  • Drain kernel buffers and do network handshake
    with peers.
  • Write checkpoint to disk.
  • Refill kernel buffers.
  • Resume user threads.

11
Where Is the Checkpoint?
  • In the cwd of the application.
  • A set of ckpt_ltexecgt_ltidgt.dmtcp files.
  • In the cwd of the coordinator.
  • A dmtcp_restart_script.sh file.
  • The dmtcp_restart_script.sh may need tweaking
    depending upon circumstance.

12
A Restart Under DMTCP
  • Restart Process loads in memory.
  • Reopen files and recreate ptys.
  • Recreate and reconnect sockets.
  • Fork into user processes.
  • Rearrange file descriptors to initial layout.
  • Restore memory and threads.
  • Refill kernel buffers.
  • Resume user threads.

13
Supported OS Features
  • Threads, mutexes/semaphores, fork, exec
  • Shared memory (via mmap), TCP/IP sockets, UNIX
    domain sockets, pipes, ptys, terminal modes,
    ownership of controlling terminals, signal
    handlers, open and/or shared fds, I/O (including
    the readline library), parent-child process
    relationships, process id thread id
    virtualization, session and process group ids,
    and more
  • Trying to keep the implementation small!

14
Supported Applications
  • MPICH-2, OpenMPI, SciPy/iPython, Python
  • cmsRun, Perl, Ruby, PHP, GHCi (Glasgow Haskell
    Compiler), Ocaml, Octave, Macaulay2, GNUPlot,
    slsh (S-Lang scripts), MZScheme, GST (Gnu
    Smalltalk virtual machine), tcsh, dash, csh,
    tclsh (tcl-based interpreter), SQLite.
  • And many others!

15
Planned Application Support
  • Bash, gcl (GNU Common Lisp), maxima (based on
    gcl), and the Sun JVM.
  • These programs use sbrk() for their own memory
    management and induce a bug in DMTCP.
  • A fix is planned and will go in soon.

16
Planned Application Support
  • Matlab
  • Directly calling the binary without graphics
    works, but matlab uses bash which needs the
    sbrk() fix.

17
Condor/DMTCP Integration
  • Experimental at this time.
  • Determining scalability, stability, and extent of
    weird edge cases of DMTCP mixed with Condor.
  • Completely outside of Condor source code.
  • A vanilla job called shim_dmtcp that wraps the
    users job and stdfiles with DMTCP.
  • A submit description file which transfers needed
    dmtcp files over to the remote side and saves
    intermediate checkpoints.
  • No remote I/O!

18
Shim Script Execution
condor_starter
shim_dmtcp
Job
Coordinator
19
Submit File Example
  • universe vanilla
  • executable shim_dmtcp
  • arguments logfile stdinf stdoutf stderrf a.out
    arg0 arg1
  • should_transfer_files YES
  • when_to_transfer_output ON_EVICT_OR_EXIT
  • transfer_input_files ltdmtcp libraries and
    programsgt,\ a.out, stdinf, stdoutf, stderrf
  • environment DMTCP_TMPDIR./JALIB_STDERR_PATH/d
    ev/null
  • kill_sig 2
  • output shim.(Cluster).(Process).out
  • error shim.(Cluster).(Process).err
  • log shim.log
  • queue

20
Condor/DMTCP Integration
  • Early Results
  • It works with our test case and thousands of
    jobs.
  • Problems
  • Checkpointing between Physical Address Kernels
    and normal kernels is a challenge.
  • DMTCPs API needs some improvement.
  • Coordinator failure means job failure.
  • Shim script is clunky, e.g. no streaming I/O.
  • Next Integration into our stduniv test suite for
    full regression testing.

21
Future Condor Integration
  • Add WantCheckpoint True and CheckpointMethod
    DMTCP for a vanilla universe job.
  • Condor takes care of the wrapping of the job with
    DMTCP and transferal of needed DMTCP files--no
    shim script voodoo.
  • Condor should honor CheckpointPlatform for
    Vanilla universe jobs in case of pool
    segmentation.
  • Parallel universe support with single
    coordinator.
  • Doug Thains Parrot for remote I/O.

22
Challenges
  • C/C runtime library compatibility issues.
  • Recompile DMTCP on slot before job execution?
  • Dynamic library incompatibilities.
  • No Checkpoint Server.
  • Condor file transfer protocol enhancement?
  • Debugging methods and practices?

23
Further Reading
  • DMTCP Transparent Checkpointing for Cluster
    Computation and the Desktop
  • http//arxiv.org/abs/cs/0701037
  • Source Code
  • http//dmtcp.sourceforge.net

24
Questions?
  • DMTCP
  • http//dmtcp.sourceforge.net
  • Gene Cooperman gene_at_ccs.neu.edu
  • Condor/DMTCP Integration
  • Pete Keller psilord_at_cs.wisc.edu
  • Ask me if you want to try the Alpha Version out!

25
Thank you
Write a Comment
User Comments (0)
About PowerShow.com