Pittsburgh Supercomputing Center XT3 Configuration - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

Pittsburgh Supercomputing Center XT3 Configuration

Description:

Originally by J Ray Scott. Pittsburgh Supercomputing Center. ETF (Rachel) 512GB Main Memory. Storage Silos. 2 PB. DMF Archive Server. Visualization ... – PowerPoint PPT presentation

Number of Views:74
Avg rating:3.0/5.0
Slides: 20
Provided by: JRayS
Category:

less

Transcript and Presenter's Notes

Title: Pittsburgh Supercomputing Center XT3 Configuration


1
Pittsburgh Supercomputing CenterXT3 Configuration
  • John Urbanic
  • Introduction to the Cray XT3
  • Originally by J Ray Scott

2
Pittsburgh Supercomputing Center
ETF (Rachel) 512GB Main Memory
TCS (LeMieux) 6.0 TFlop
XT3 (BigBen)
Visualization Nodes NVidia Quadro4 980XGL
Storage Cache Nodes 100 TB
Storage Silos 2 PB
DMF Archive Server
3
Hardware Summary
  • 2,090 CPUs
  • AMD Opteron 2.4GHz
  • 2,068 Compute Nodes
  • 22 I/O nodes
  • Boot/Login Node
  • System Management Node
  • Login Node (3)
  • Storage Nodes

4
Software Summary
  • Catmount on Compute Nodes
  • Linux on I/O nodes
  • Job Scheduling through PBS
  • Torque/PSC now
  • yod integration
  • Console Log Support
  • Custom Scheduler
  • PBS Pro soon

5
File Systems
  • UFS-type home directories
  • /usr/users/Nlogin-name
  • Not high-performance
  • Lustre
  • /lustre
  • Accessible from all compute and I/O nodes
  • 200 TB RAID-Protected Storage
  • HOME and SCRATCH

6
Networking
  • ssh access to frontends (tg-login.bigben.psc.terag
    rid.org )
  • scp to file systems
  • PSC far command to archiver

7
System Overview
pbsyod
qsub
8
PBS Outline
  • Running A Job
  • Scheduling Policies
  • Batch Access
  • Interactive Access
  • Packing Jobs
  • Monitoring And Killing Jobs

9
Scheduling Policies
  • The Portable Batch Scheduler (PBS) controls all
    access to bigben's compute processors, for both
    batch and interactive jobs. PBS on bigben
    currently has two queues. Interactive and batch
    jobs compete in these queues for scheduling.
    The two queues are "batch" and "debug" which
    are controlled through two different modes during
    a 24 hour day.  The "batch" or default queue
    (does not need to be explicitly named in a job
    submission) is active during both day and night
    modes discussed next.  The "debug" queue must be
    explicitly named in a job script     PBS -q
    debug and is limited to 32 cpus and 15 minutes
    of wall-clock time.  PBS specifications are
    discussed below. Day Mode During the day,
    defined to be 8am-8pm, 64 cpus will be reserved
    for debugging jobs (jobs run from the "debug"
    queue).  Jobs submitted to the "debug" queue may
    request no more than 32 cpus and 15 minutes of
    wall-clock time.  Jobs submitted to the "batch"
    (default) queue may be any size up to the limit
    of the machine but only jobs of 1024 cpus or less
    will be scheduled to start during Day Mode. 
    "batch" jobs are limited to 6 wall-clock hours in
    duration.  Jobs in the "debug" and "batch" queues
    will be ordered FIFO and also in a way to keep
    any one user from dominating usage and to ensure
    fair turnaround. Jobs started during the Day
    Mode must finish by 8pm at which time the machine
    will be rebooted. Night Mode During the
    night, defined to be 8pm-8am (starts following a
    machine reboot), jobs of 2048 cpus or less will
    be allowed to run and are limited to 6 wall-clock
    hours in duration.  Jobs will be ordered largest
    to smallest and in a way to keep any one user
    from dominating usage. Jobs in the "debug" queue
    will not be allowed to run during Night Mode.

10
Scheduling Queues
11
Batch Access
  • You use the qsub command to submit a job script
    to PBS.
  • A PBS job script consists of PBS directives,
    comments and executable commands.
  • A sample job script is
  • !/bin/csh
  • PBS -l size4
  • PBS -l walltime500
  • PBS -j oe
  • set echo
  • move to my /scratch directory
  • cd /scratch/myscratchdir
  • run my executable
  • pbsyod ./hellompi

12
Batch Access (contd)
  • PBS -l size4
  • The first directive requests 4 processors.
  • PBS -l walltime500
  • The first directive requests 5 minutes of
    wallclock time. Specify the time in the format
    HHMMSS. At most two digits can be used for
    minutes and seconds. Do not use leading zeroes in
    your walltime specification.
  • PBS -j oe
  • The final PBS directive combines your .o and .e
    output into one file, in this case your .o file.
    This will make your program easier to debug.
  • The remaining lines in the script are comments or
    command lines.
  • set echo
  • This command causes your batch output to display
    each command next to its corresponding output.
    This will make your program easier to debug. If
    you are using the Bourne shell or one of its
    descendants use 'set -x' instead of 'set echo'.
  • Comment lines
  • The other lines in the sample script that begin
    with '' are comment lines. The '' for comments
    and PBS directives must begin in column one of
    your script file. The remaining lines in the
    sample script are executable commands.
  • pbsyod
  • The pbsyod command is used to launch your
    executable on your compute processors. Only
    programs executed with pbsyod are executed on
    your compute processors. All other commands are
    executed on the front end processor. Thus, you
    must use pbsyod to run your executable or it will
    run on the front end, where it will probably not
    work. If it does work it will degrade system
    performance.

13
Batch Access (contd)
  • Within your batch script the variable
    PBS_O_WORKDIR is set to the directory from which
    you issued your qsub command. The variable
    PBS_O_SIZE is set to the number of processors you
    requested.
  • After you create your script you must make it
    executable with the chmod command. chmod 755
    myscript.job
  • Then you can submit it to PBS with the qsub
    command.
  • qsub myscript.job
  • Your batch output--your .o and .e files--is
    returned to the directory from which you issued
    the qsub comand after your job finishes.
  • You can also specify PBS directives as
    command-line options to qsub. Thus, you could
    omit the PBS directives in the sample script
    above and submit the script with qsub -l size4
    -l walltime50000 -j oe
  • Command-line options override PBS directives
    included in your script.
  • The -M and -m options can be used to have the
    system send you email when your job undergoes
    specified state transitions.

14
Interactive Access
  • The command
  • qsub -I -l walltime1000 -l size2 requests
    interactive access to 2 processors for 10
    minutes.
  • The system will respond with a message similar to
  • qsub waiting for job 54.bigben.psc.edu to start
  • When your job starts you will receive the message
  • qsub job 54.bigben.psc.edu ready and then you
    will your shell prompt. At this point any
    commands you enter will be run as if you had
    entered them in a batch script.
  • Use the pbsyod command to send executables to the
    compute nodes.
  • Stdin, stdout, and stderr are all connected to
    your terminal.
  • When you are finished with your interactive
    session type D. The system will respond
  • qsub job 54.bigben.psc.edu completed

15
Packing Jobs
  • You can pack several pbsyod commands into a
    single job and have each of them run on a
    distinct set of processors. This will allow you
    to increase the number of total processors your
    job asks for, which will become important once
    the scheduler is changed to favor large jobs.
  • For example, the job
  • !/bin/csh PBS -l size12
  • PBS -l walltime3000
  • PBS -j oe
  • set echo
  • cd /scratch/myscratchdir
  • pbsyod -size 4 -base 0 ./mympi
  • pbsyod -size 4 -base 4 ./mympi
  • pbsyod -size 4 -base 8 ./mympi
  • will launch three executions, each on a distinct
    set of 4 processors.
  • The -size option to pbsyod indicates how many
    processors a pbsyod is to use. The default is to
    use all of your compute processors. The -base
    option indicates on which processor a pbsyod
    should begin executing, with your first processor
    having a base of 0. Thus, the first pbsyod above
    will begin executing on your first processor and
    use 4 processors, the second will run on the next
    4 processors starting with your fifth processor
    and the third pbsyod will run on your final 4
    processors. If you do not use the -base option
    all of your executions will run on top of each
    other on the same set of processors.

16
Monitoring and Killing Jobs
  • The qstat -a command is used to display the
    status of the PBS queue. It includes running and
    queued jobs. For each job in the queue it shows
    the amount of walltime and number of processors
    requested. This information can be useful in
    predicting when your job might run. The -f option
    to qstat provides you with more extensive status
    information for a single job.
  • The shownids command, located in /usr/local/bin,
    shows you the status of all the compute
    processors on bigben. A nid is a node id or
    processor. The output of shownids shows the
    number of processors in certain types of states.
    Enabled processors are all processors available
    to PBS for scheduling. Allocated processors are
    those enabled processors that are currently
    running jobs. Free processors are those enabled
    processors that are currently free. You can use
    the output from shownids and qstat -a to
    determine when your jobs might start.
  • The qdel command is used to kill queued and
    running jobs.
  • qdel 54
  • The argument to qdel is the jobid of the job you
    want to kill. If you cannot kill a job that you
    want to kill send email to remarks_at_psc.edu.

17
Workshop Scheduling
  • For the workshop, users should submit jobs to the
    "training" queue qsub -q training
  • or in their job scripts as PBS -q training
  • We all share 128 PEs in this queue, but the
    individual limits are 32 PEs and 30 minutes.
    You should normally be using a lot less than this.

18
Staying In Touch
  • remarks_at_psc.edu
  • xt3-users_at_psc.edu

19
Thank You!
Write a Comment
User Comments (0)
About PowerShow.com