Scalable Systems Software Center Al Geist, coordinating P.I. - PowerPoint PPT Presentation

About This Presentation
Title:

Scalable Systems Software Center Al Geist, coordinating P.I.

Description:

Machine-specific, PBS, LSF, POE, SLURM, COOAE (Collections Of Odds And Ends) ... Manages nodes as they are installed, reconfigured, added to active pool ... – PowerPoint PPT presentation

Number of Views:22
Avg rating:3.0/5.0
Slides: 20
Provided by: csmO6
Learn more at: https://www.csm.ornl.gov
Category:

less

Transcript and Presenter's Notes

Title: Scalable Systems Software Center Al Geist, coordinating P.I.


1
Scalable Systems Software CenterAl Geist,
coordinating P.I.
  • Rusty Lusk
  • Mathematics and Computer Science Division
  • Argonne National Laboratory

2
Outline
  • The Scalable Systems Software Center
  • Goals
  • Participants
  • Scope
  • Structure
  • Issues
  • Status
  • Experiences
  • Lessons

3
Current State of Systems Software for Large-Scale
Machines
  • Both proprietary and open-source systems
  • Machine-specific, PBS, LSF, POE, SLURM, COOAE
    (Collections Of Odds And Ends),
  • Many are monolithic resource management
    systems, combining multiple functions
  • Job queuing, scheduling, process management, node
    monitoring, job monitoring, accounting,
    configuration management, etc.
  • A few established separate components exist
  • Maui scheduler
  • Qbank accounting system
  • Many home-grown, local pieces of software
  • Scalability often a weak point

4
The Scalable Systems Software SciDAC Project
  • Research goal to develop a component-based
    architecture for systems software for scalable
    machines
  • Software goal to demonstrate this architecture
    with prototype open-source components
  • One powerful effect forcing rigorous (and
    aggressive) definition of what components should
    do and what should be encapsulated in other
    components
  • http//www.scidac.org//ScalableSystems

5
Participants
  • Labs
  • ORNL, ANL, LBNL, PNNL, Ames, SNL, LANL
  • Universities
  • NCSA, SDSC, PSC, Clemson
  • Vendors
  • Unlimited Scale, IBM, Cray, Intel
  • Open to anyone who wants to participate

6
Project Concept
Meta Scheduler
Meta Monitor
Meta Manager
Access Control Security Manager
Meta Services
Interacts with all components
Node Configuration Build Manager
System Monitor
Accounting
Scheduler
Resource Allocation Management
Process Manager
Queue Manager
User DB
Data Migration
High Performance Communication I/O
Usage Reports
User Utilities
Checkpoint / Restart
File System
Testing Validation
Application Environment
7
Structure of Project
  • Working Groups
  • Node build, configuration, and global
    infrastructure
  • Job submission, queue management, scheduling, and
    accounting
  • Process management, system monitoring, and
    checkpointing
  • Validation and integration
  • Quarterly project meetings, weekly working group
    conference calls
  • Electronic notebooks for all working documents
  • www.scidac.org/ScalableSystems

8
SSS Project Issues
  • Put minimal constraints on component
    implementations
  • Ease merging of existing components into SSS
    framework
  • E.g., Maui scheduler
  • Ease development of new components
  • Encourage multiple implementations from vendors,
    others
  • Define minimal global structure
  • Components need to find one another
  • Need common communication method
  • Need common data format at some level
  • Each component will compose messages others will
    read and parse
  • Message-framing protocols

9
SSS Project Status Global
  • Early decisions on inter-component communication
  • Lowest level communication is over sockets (at
    least)
  • Message content will be XML
  • Parsers available in all languages
  • Did not reach consensus on transport protocol
    (HTTP, SOAP, BEEP, assorted home grown),
    especially to cope with local security
    requirements, so multiple protocols are supported
  • Early implementation work on global issues
  • Service directory component defined and
    implemented
  • SSSlib library for inter-component communication
  • Handles interaction with service directory
  • Hides details of transport protocols from
    component logic
  • Anyone can add protocols to the library
  • Bindings for C, C, Java, Perl, and Python
  • Event manager for asynchronous communication

10
SSS Project Status Individual Component
Prototypes
  • Precise XML interfaces not settled on yet,
    pending experiments with component prototypes
  • Both new and existing components
  • Maui scheduler is existing full-featured
    scheduler, SSS communication added
  • QBank accounting system has added SSS
    communication interface, evolving into Gold
  • New Checkpoint Manager component being integrated
    now
  • System-initiated checkpoints of LAM jobs

11
SSS Project Status More Individual Component
Prototypes
  • New Build-and-Configuration Manager completed
  • Controls how nodes are configured and built
  • New Node State Manager
  • Manages nodes as they are installed,
    reconfigured, added to active pool
  • New Event Manager for asynchronous communication
    among components
  • Components can register for notification of
    events supplied by other components (useful in
    monitoring, fault tolerance)
  • New Queue Manager mediates among user (job
    submitter), Job Scheduler, and Process Manager
  • Multiple monitoring components, both new and old
  • Data warehouse

12
SSS Project Status Still More Individual
Component Prototypes
  • New Process Manager component provides SSS
    interface to MPD scalable process manager
  • Speaks XML through SSSlib to other SSS components
  • Invokes MPD to implement SSS process management
    specification
  • MPD itself is not an SSS component
  • Allows MPD development, especially with respect
    to supporting MPI and MPI-2, to proceed
    independently
  • SSS Process Manager abstract definitions have
    influenced addition of MPD functionality beyond
    what is needed to implement mpiexec from MPI-2
    standard
  • E.g. separate environment variables for separate
    processes

13
Schematic of Process Management Component in
Scalable Systems Software Context
NSM
SD
Sched
EM
MPDs
SSS Components
QM
PM
PM
SSS XML
application processes
mpdrun
simple scripts or hairy GUIs using SSS XML
QMs job submission language
XML file
mpiexec
(MPI Standard args)
interactive
Prototype MPD-based implementation side
SSS side
Other managers could go here instead
14
Other Accomplishments
  • APItest is a component test framework, capable of
    conducting unit tests on components
  • Well-suited to complicated network such as this
    one
  • Allows testing of one component at a time without
    testing all at once
  • Used on SSS components this year
  • SSS-OSCAR is a public, open-source release of the
    current state of the component system, tested for
    compatibility. (Get from web page)
  • Tested subset of components on 5000-node cluster
    at NCSA
  • SSS component architecture put into production on
    clusters at ORNL, PNNL, ANL (ANL story follows)

15
New Challenges on Chiba City
  • Medium-sized, middle-aged cluster at Argonne
  • Dedicated to computer science scalability
    research, not applications
  • Also used by friendly, hungry applications
  • New requirement support research requiring
    specialized kernels and alternate operating
    systems, for OS scalability research
  • Want to schedule jobs that require node rebuilds
    (for new OSs, kernel module tests, virtual
    nodes, etc.) as part of normal job scheduling
  • Requires major upgrade of Chiba City systems
    software

16
Chiba Commits to SSS
  • Fork in the road
  • Major overhaul of old, crufty, Chiba systems
    software (open PBS Maui scheduler homegrown
    stuff), OR
  • Take leap forward and bet on all-new software
    architecture of SSS
  • Problems with leaping approach
  • SSS interfaces not finalized
  • Some components dont yet use library (implement
    own protocols in open code, not encapsulated in
    library)
  • Some components not fully functional yet
  • Solutions to problems
  • Collect components that are adequately functional
    and integrated (PM, SD, EM, BCM)
  • Write stubs for other critical components
    (Sched, QM)
  • Do without some components (CKPT, monitors,
    accounting) until ready

17
Features of Adopted Solution
  • Stubs adequate, at least for time being
  • Scheduler does FIFO reservations backfill,
    improving
  • QM implements PBS compatibility mode (accepts
    user PBS scripts) as well as asking Process
    Manager to start parallel jobs directly
  • Process Manager wraps MPD, as described above
  • Single ring of daemons runs as root, managing all
    jobs for all users
  • Daemonss started by Build-and-Config manager at
    boot time
  • An MPI program called MPISH (MPI Shell) wraps
    user jobs for handling file staging and multiple
    job steps
  • Python implementation of most components
  • Each component lt 400 lines
  • Demonstrates feasibility of using SSS component
    approach to systems software
  • Running normal Chiba job mix for over six months
    now
  • Only systems software on this machine
  • Moving forward on meeting new requirements for
    research support

18
Lessons Learned This Approach Really Works!
  • Components can use one anothers data
  • Functionality only needs to be implemented once
  • E.g., broadcast of messages
  • Components are more robust, since they focus on
    one task
  • Code volume shrinks because of less duplication
    of functionality
  • Easy to add new functionality
  • File staging
  • MPISH
  • Rich infrastructure on which to build new
    components
  • Communication, logging, location services
  • Need not be limited by existing subcomponents of
    existing systems
  • Can replace just the functionality needed (get to
    solve the problem you want to solve, without
    re-implementing everything).
  • E.g. having queue manager accept requests for
    rebuilt nodes before starting jobs.

19
Summary
  • The Scalable Systems Software SciDAC project is
    addressing the problem of systems software for
    terascale systems.
  • Component architecture for systems software
  • Definitions of standard interfaces between
    components
  • An infrastructure to support component
    implementations within this framework
  • A set of component implementations, continuing to
    improve
  • Prototype software suite released
  • Experimental production use of the component
    architecture and some of the component
    implementations
  • Encourages development of sharable tools and
    solutions
  • Scalability testing under way
Write a Comment
User Comments (0)
About PowerShow.com