Using Lewis and Clark - PowerPoint PPT Presentation

1 / 50
About This Presentation
Title:

Using Lewis and Clark

Description:

To encourage you to take advantage of resources available through the UMBC. Running your jobs in ... 13573 shemp 25 0 319m 243m 11m R 99.4 6.2 6871:15 namd9 ... – PowerPoint PPT presentation

Number of Views:52
Avg rating:3.0/5.0
Slides: 51
Provided by: williamg7
Category:
Tags: clark | lewis | shemp | using

less

Transcript and Presenter's Notes

Title: Using Lewis and Clark


1
Using Lewis and Clark
William G. Spollen Division of Information
Technology/Research Support Computing Thursday,
Nov. 1, 2007
http//umbc.rnet.missouri.edu/ spollenw_at_missouri.e
du
2
Outline
  • Background
  • Overview of High Performance Computing at the
    UMBC
  • Clark usage
  • Portable Batch Submission and qsub
  • Lewis usage
  • Load Sharing Facility and bsub

3
Why this workshop?
  • To encourage you to take advantage of resources
    available through the UMBC.
  • Running your jobs in parallel will save you time.
  • We may have tools to broaden your investigative
    reach.
  • To show how best to use the resources.

4
(No Transcript)
5
Please note that each user must have their own
account. Do not use your advisors, friends,
etc.
6
Some Definitions
  • A process is a program in execution.
  • Serial processing uses only 1 cpu.
  • Parallel processing (multiprocessing) uses two or
    more cpus simultaneously.
  • A cluster is a collection of interconnected
    computers but each node has its own OS.
  • Threads allow a process to run on more than one
    cpu but only if they are all on the same computer
    or node.

7
Parallel Architectures
  • Distinguished by the kind of interconnection,
    both between processors, and between processors
    and memory
  • Shared memory
  • Network

(A)
Job
(B)
8
A High Performance Computing Infrastructure
A 2 M Federal Earmark was made to the UM
Bioinformatics Consortium to obtain computers
with architectures to match the research problems
in the UM system.
9
High Performance Computing Infrastructure Concept
(1) Clark Modeling and Simulations SGI Altix
3700 BX2 128 GB shared memory 64 cpus
(3) York Macromolecule Database
Searches TimeLogic DeCypher - Hardware/software f
or streamlined searches
FC
FC
(4) 12 TB SGI TP9500 Infinite Storage Disk Array
FC
(2) Lewis General Purpose Computing Dell
Linux Cluster with 128 nodes, 4 cpus per node
Fusion IBRIX
(5) 50 TB EMC CLARiiON CX700 Networked Storage
IB
IBRIX
IB
IBRIX
IB
IBRIX
IB
IBRIX
10
(1) SGI Altix 3700 BX2
  • 64 1.5 GHz Itanium2 processors
  • 128 GB NumaLink Symmetric Multi-Processor (SMP)
    Shared Memory
  • One OS image with 64 P
  • Each processor has 28 ns access to all 128 GB RAM

clark.rnet.missouri.edu
11
(2) Dell 130-Node Dual-Core HPC Cluster
  • Woodcrest Head node 2 Dell Dual Core 2950 2.66
    GHz cpus
  • Dell Xeon 2.8 GHz cluster admin node
  • 128 Dell PowerEdge 1850 Xeon EM64T 2.8 GHz
    compute nodes (512P)
  • 640 GB RAM (64 nodes _at_ 6GB, 64 nodes _at_ 4 GB)
  • TopSpin Infiniband 2-Tier interconnect switch
  • Access to 50 TB disk storage

lewis.rnet.missouri.edu
12
(3) Sun/TimeLogic DeCypher
  • 4 Sun V240 servers (UltraSparc IIIi, 1.5 GHz, 4P,
    4GB)
  • 8 TimeLogic G4 DeCypher FPGA Engines
  • TimeLogic DeCypher Annotation Suite (BLAST, HMM,
    Smith-Waterman, etc.)
  • 50-1,000 times faster than clusters for some
    BLASTs

york.rnet.missouri.edu
13
(4) SGI TP9500 Infinite Storage Disk Array
  • SGI TP9500 Disk Array w/ dual 2 Gbit controllers,
    2 GB cache
  • 12 TB Fiber Channel disk array (6 Drawers 14 146
    GB disks/drawer 2.044 TB/drawer)
  • 2 fiber connections each to the Altix, Dell, and
    Sun systems.

14
(5) EMC CLARiiON CX700 Disk Storage
  • 125 500 GB SATA drives
  • IB SAN support to Lewis
  • IBRIX software is used to manage
  • the I/O to the disk storage
  • to all Lewis nodes

15
Selected Software Installed
  • SAS
  • R
  • Matlab
  • Oracle
  • MySQL
  • PGenesis
  • M-Cells
  • Phred, Phrap, Consed
  • Locally developed code
  • More
  • Gaussian03
  • NAMD
  • AMBER
  • CHARMM
  • Octopus
  • NCBI Blast
  • WU Blast
  • MSA
  • ClustalW

16
Compilers
  • Linux (lewis, clark)
  • Intel (icc, ifort) this usually is preferred
    better optimized for the architecture than gnu.
  • Gnu (gcc, g, g77)
  • javac

17
Some Research Areas
  • Chemical structure prediction and property
    analysis with GAUSSIAN
  • Ab initio quantum-mechanical molecular dynamics
    with VASP
  • Simulation of large biomolecular systems with
    NAMD
  • Molecular simulations with CHARMM/AMBER
  • Statistics of microarray experiments with R

18
Clark 128 GB SMP Shared Memory 1 linux OS w/ 64
processors
Use Portable Batch System (PBS)!!!

CPU1
CPU2
CPU63
CPU64
19
Using Clark PBS (Portable Batch System)
  • clark qsub scriptfile
  • clark cat scriptfile
  • PBS l cput100000,ncpus8,mem2gb
  • (note -l for resource list)
  • ./myProgram

20
Using Clark output
  • scriptfile.onnnn  (written to the standard output
    stream)
  • scriptfile.ennnn  (written to the error output
    stream)

21
Using Clark PBS example 1
  • clark qsub runSAS
  • 6190.clark
  • clark cat runSAS
  • PBS -l cput10000,ncpus1,mem1gb
  • cd workingdir/
  • sas test
  • runSAS.o6190
  • runSAS.e6190
  • test.log
  • test.lst

22
Using Clark PBS example 2
As part of a script qsub -V -k n -j oe -o
PBS_path\ -r n z\ -l ncpusPBS_ncpus \ -l
cputPBS_cput myprog To learn more about
qsub clark man qsub
23
Using Clark qstat and queues
  • clark qstat
  • Jobid Name User Time Use S Queue
  • ----- ----- ---- -------- - -----
  • 6422 qcid2 fjon 603620 R long
  • 6432 redo1 fjon 0 Q long
  • 6434 wrky4 fjon 100334 R standard
  • 6487 job1 cdar 050610 R standard
  • 6488 job23 cdar 013412 R standard
  • 6489 jobh2 cdar 0 Q standard
  • Long queue is for 100h.
  • 1 or 2 jobs can run simultaneously.
  • Only one can be long.
  • Submit as many as you like.

24
Using Clark qstat -f and qdel
  • clark qstat -f 6502
  • Job Id 6502.clark
  • Job_Name Blast.rice.
  • Job_Owner mid_at_clark.rnet.missouri.edu
  • resources_used.cpupercent 0
  • resources_used.cput 000001
  • resources_used.mem 52048kb
  • resources_used.ncpus 8
  • job_state R
  • ctime Thu Apr 19 095515 2007
  • .......................................
  • clark qdel 6502 (to kill a job)

25
Using Clark user limits
  • Maximum
  • number cpus 16
  • jobs running 2
  • jobs pending no limit
  • data storage no limit (yet)

26
Lewis 129 linux OSs 1 OS coordinates the
restInfiniband connects all nodes
Head Node
50 TB EMC CLARiiON CX700 Networked Storage
Use the Load Sharing Facility!!!
FC
IBRIX
Node 127
Node 1
Node 2
Node 128
128 Compute Nodes
27
LSF ex 1 1 processor program
  • lewis bsub
  • lewis cat myJob
  • BSUB -J 1Pjob
  • BSUB -oo 1Pjob.oJ
  • BSUB -eo 1Pjob.eJ
  • ./myProg

28
LSF ex 2 threaded/1 node
  • lewis cat myJob
  • BSUB -J thrdjob
  • . . . . . . . . .
  • BSUB -n 4
  • BSUB -R "spanhosts1"
  • (-R for resource requirement)
  • ./myProg

29
LSF ex 3 parallel program -things to know
  • Lewis uses the Message Passing Interface (MPI) to
    communicate between nodes.
  • Parallel programs run on
  • Infiniband connection
  • TCP/IP network connection

30
LSF ex 3 MPI with Infiniband
  • BSUB -a mvapich
  • (-a for specific application requirements here,
    a program compiled for Infiniband)
  • BSUB -J jobname
  • . . . . . . . . . . . . . . . . . . . .
  • Set number of CPUs
  • BSUB -n 16
  • mpirun.lsf ./mpi_program

31
LSF ex 4MPI but pre-compiled for TCP/IP
  • BSUB -a mpichp4
  • (note prog compiled for TCP/IP)
  • BSUB -J jobname
  • . . . . . . . . . .
  • BSUB -n 16
  • mpirun.lsf ./mpi_program

32
Using Lewis Intel compiling programs for MPI
  • mpicc.i - to compile C programs
  • mpiCC.i to compile C programs
  • mpif77.i to compile FORTRAN 77 programs
  • mpif90.i to compile FORTRAN 90 programs

33
Using Lewis bjobs (-w)
  • Lewis bjobs
  • JOBID USER STAT QUEUE HOST EXEC_HOST JOB_NAME
    SUB_TIME
  • 14070 sqx1 RUN norm lewis 4compute-2 myjob
    Apr 18 132
  • 4compute-22-28
  • 4compute-20-11
  • 3compute-22-29
  • 1compute-20-30

Lewis bjobs w JOBID USER STAT QUEUE HOST
EXEC_HOST JOB_NAME SUB_TIME 14070 sqx1 RUN
norm lewis 4compute-22-284compute-22- 304com
pute-20-113compute-22-291compute-20-30 myjob
Apr 18 1 32
34
monitor job performance on Lewis
(go to a compute node) Lewis lsrun P m
compute-22-30 top top - 103546 up 65 days,
1702, 1 user, load average 4.00, 4.00,
4.00 Tasks 149 total, 5 running, 144 sleeping,
0 stopped, 0 zombie Cpu(s) 98.2 us, 1.8
sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0
si Mem 4038084k total, 2421636k used,
1616448k free, 251500k buffers Swap 11261556k
total, 26776k used, 11234780k free, 1069484k
cached PID USER PR NI VIRT RES SHR S
CPU MEM TIME COMMAND 14070 sqx1 25
0 247m 177m 9672 R 99.9 4.5 150901
BlastFull 13602 larry 25 0 318m 241m 11m
R 99.7 6.1 695842 namd9 18608 moe 25
0 247m 177m 9668 R 99.7 4.5 151103
MyProg 13573 shemp 25 0 319m 243m 11m R
99.4 6.2 687115 namd9 3055 root 16 0
153m 47m 1496 S 0.7 1.2 30003.49 precept . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . (for help interpreting
the output run man top)
35
Using Lewis short and normal queues
  • short jobs have higher priority than normal jobs
    but will quit at 15 minutes.
  • In a script
  • BSUB -q short
  • Or, on submission
  • lewisbsub -q short

36
Using Lewis lsrun for short test runs and
interactive tasks
  • lsrun command argument
  • lsrun -P commandargument
  • e.g.,
  • lsrun P vi myFile.txt

37
job arrays for multiple inputs
  • bsub -J myArray1-100
  • -o J.output.I
  • (for command line input)
  • ./myProgram file.\LSB_JOBINDEX
  • Where input files are numbered
  • file.1, file.2, file.100

38
job arrays for multiple inputs
  • bsub -J myArray1-100
  • -o J.output.I
  • (for standard in)
  • -i file.I ./myProgram
  • Where input files are numbered
  • file.1, file.2, file.100

39
conditional jobs bsub -w
  • bsub -w "done("myArray1-100")" 
  • -J collate ./collateData
  • (conditions include done, ended, exit,
    external, numdone, numended, numexit, numhold,
    numpend, numrun, numstart, post_done, post_err,
    started)

40
gocomp for interactive sessions with GUI
  • gocomp
  • Logging on to interactive compute node
  • Last login Wed Apr 18 135446 2007 from
  • Platform OCS Compute Node
  • Platform OCS 4.1.1-1.0 (Cobblestone)
  • Profile built 1741 12-Apr-2007
  • Kickstarted 1758 12-Apr-2007
  • spollenw_at_compute-20-5
  • LSF does not schedule to this node. It is free
    for interactive jobs.
  • Some interactive programs
  • MATLAB
  • MapMan
  • Genesis

41
MATLAB Graphical Use 1 cpu
  • X-windowing system needed for the MATLAB window
    to display. See the UMBC web site for
    instructions on downloading and installing the
    Cygwin Xserver.
  • After opening an xwindow type
  • lewisgocomp
  • lewismatlab

42
MATLAB Multi CPU, non-Graphical Use
  • lewis cat secondscript
  • BSUB -J myjob
  • BSUB -n 1
  • BSUB R "rusagematlab1duration1
  • BSUB -oo myjob.oJ
  • BSUB -eo myjob.eJ
  • matlab -nodisplay -r firstscript
  • lewis bsub

43
MATLAB Multi CPU, non-Graphical Use
  • lewis cat firstscript.m
  • sched findResource('scheduler',
    'configuration', 'lsf')
  • set(sched, 'configuration', 'lsf')
  • job createJob(sched)
  • createTask(job, _at_sum, 1, 1 1)
  • createTask(job, _at_sum, 1, 2 2)
  • createTask(job, _at_sum, 1, 3 3)
  • submit(job)
  • waitForState(job, 'finished')
  • results getAllOutputArguments(job)

44
Using Lewis processor limits
  • Maximum number of
  • cpus 64
  • jobs running 64
  • jobs pending 200

Do not submit more than 264!
45
Using Lewis memory limits
  • For large memory requirements (900 MB), use the
    resource specification string
  • BSUB -R "rusagememnnnn" 
  • nnnn is in MB, and is per node.
  • Maximum available on any node 5,700 MB
  • If a job spans multiple nodes, each node will
    have to have nnnn available.

46
Storage through Lewis
5 TB for Home Directories (2.5 GB/user)
15 TB Data Directories (50 GB/user) uid_at_lewis
./data
50 TB EMC CLARiiON CX700 Networked Storage
14 TB paid for by a grant and dedicated to that
project
13 TB for backup and future needs
47
Using Lewis Storage Limits
  • 2.5 GB in home directory.
  • 50 GB - soft quota - under /data
  • 55 GB hard quota no further writing to files.
  • The EMC CLARiiON storage is not backed up. A
    deleted file cannot be retrieved.
  • The EMC is a RAID5 design so if one disk fails
    the data are still available.
  • The data are viewable only by the user unless
    he/she changes the permissions to their directory.

48
Checkpointing on Lewis and Clark
  • Checkpointing is not supported on either system.
  • However, some programs, e.g., GAUSSIAN, come with
    their own checkpointing options which can be used.

49
Distributed and Parallel Computing with
MATLAB Thursday, November 8, 2007 Registration
Sign-in 100 p.m. Presentation 130 p.m. to 330
p.m. W1005 Lafferre Hall www.mathworks.com/semina
rs/columbianov8 For more information
contact Alyssa Winer alyssa.winer_at_mathworks.com 5
08-647-4343
50
http//umbc.rnet.missouri.edu
Please note that each user must have their own
account. Do not use your advisors, friends,
etc.
Write a Comment
User Comments (0)
About PowerShow.com