Using Lewis and Clark - PowerPoint PPT Presentation

1 / 56
About This Presentation
Title:

Using Lewis and Clark

Description:

To encourage you to take advantage of resources available through the UMBC. ... TimeLogic DeCypher Annotation Suite (BLAST, HMM, Smith-Waterman, etc. ... – PowerPoint PPT presentation

Number of Views:55
Avg rating:3.0/5.0
Slides: 57
Provided by: williamg7
Category:

less

Transcript and Presenter's Notes

Title: Using Lewis and Clark


1
Using Lewis and Clark
Bill Spollen Division of Information
Technology/Research Support Computing Thursday,
Sept. 24, 2009
http//umbc.rnet.missouri.edu/ spollenw_at_missouri.e
du
2
Outline
  • Background
  • Overview of High Performance Computing at the
    UMBC
  • Clark usage
  • Portable Batch Submission (PBS) and qsub
  • Lewis usage
  • Load Sharing Facility (LSF) and bsub

3
Why this workshop?
  • To encourage you to take advantage of resources
    available through the UMBC.
  • Running your jobs in parallel will save you time.
  • We may have tools to broaden your investigative
    reach.
  • To show how best to use the resources.

4
http//umbc.rnet.missouri.edu
5
(No Transcript)
6
Some Definitions
  • A process is a program in execution.
  • Serial processing uses only 1 cpu.
  • Parallel processing (multiprocessing) uses two or
    more cpus simultaneously.
  • Threads allow a process to run on more than one
    cpu but only if they are all on the same computer
    or node.
  • A cluster is a collection of interconnected
    computers but each node has its own OS.

7
Parallel Architectures
  • Distinguished by the kind of interconnection,
    both between processors, and between processors
    and memory
  • Shared memory
  • Network

(A)
Job
(B)
8
A High Performance Computing Infrastructure
A 2 M Federal Earmark was made to the UM
Bioinformatics Consortium to obtain computers
with architectures to match the research problems
in the UM system.
9
High Performance Computing Infrastructure Concept
(1) Clark Modeling and Simulations SGI Altix
3700 BX2 128 GB shared memory 64 cpus
(3) York Macromolecule Database
Searches TimeLogic DeCypher - Hardware/software f
or streamlined searches
FC
FC
(4) 12 TB SGI TP9500 Infinite Storage Disk Array
FC
(2) Lewis General Purpose Computing Dell
Linux Cluster with 128 nodes, 4 cpus per node
Fusion IBRIX
(5) 50 TB EMC CLARiiON CX700 Networked Storage
IB
IBRIX
IB
IBRIX
IB
IBRIX
IB
IBRIX
10
(1) SGI Altix 3700 BX2
  • 64 1.5 GHz Itanium2 processors
  • 128 GB NumaLink Symmetric Multi-Processor (SMP)
    Shared Memory
  • One OS image with 64 P
  • Each processor has 28 ns access to all 128 GB RAM

clark.rnet.missouri.edu
11
(2) Dell 130-Node Dual-Core HPC Cluster
  • Woodcrest Head node 2 Dell Dual Core 2950 2.66
    GHz cpus
  • Dell Xeon 2.8 GHz cluster admin node
  • 128 Dell PowerEdge 1850 Xeon EM64T 2.8 GHz
    compute nodes (512P)
  • 640 GB RAM (64 nodes _at_ 6GB, 64 nodes _at_ 4 GB)
  • TopSpin Infiniband 2-Tier interconnect switch
  • Access to 50 TB disk storage

lewis.rnet.missouri.edu
12
(3) Sun/TimeLogic DeCypher
  • 4 Sun V240 servers (UltraSparc IIIi, 1.5 GHz, 4P,
    4GB)
  • 8 TimeLogic G4 DeCypher FPGA Engines
  • TimeLogic DeCypher Annotation Suite (BLAST, HMM,
    Smith-Waterman, etc.)
  • 50-1,000 times faster than clusters for some
    BLASTs

york.rnet.missouri.edu
13
(4) SGI TP9500 Infinite Storage Disk Array
  • SGI TP9500 Disk Array w/ dual 2 Gbit controllers,
    2 GB cache
  • 12 TB Fiber Channel disk array (6 Drawers 14 146
    GB disks/drawer 2.044 TB/drawer)
  • 2 fiber connections each to the Altix, Dell, and
    Sun systems.

14
(5) EMC CLARiiON CX700 Disk Storage
  • 125 500 GB SATA drives
  • IB SAN support to Lewis
  • IBRIX software is used to manage
  • the I/O to the disk storage
  • to all Lewis nodes

15
Selected Software Installed
  • SAS
  • R
  • Matlab
  • Gaussian03
  • NAMD
  • AMBER
  • CHARMM
  • Octopus
  • Locally developed code
  • More
  • NCBI Blast
  • WU Blast
  • HMMER
  • ClustalW
  • NextGen sequencing tools
  • Phred, Phrap, Consed
  • Oracle
  • MySQL
  • PGenesis
  • M-Cells
  • Yours?

16
Compilers
  • Linux (lewis, clark)
  • Intel (icc, ifort) preferred better optimized
    for the architecture than gnu.
  • Gnu (gcc, g, g77)
  • javac

17
Some Research Areas
  • Chemical structure prediction and property
    analysis with GAUSSIAN
  • Ab initio quantum-mechanical molecular dynamics
    with VASP
  • Simulation of large biomolecular systems with
    NAMD
  • Molecular simulations with CHARMM/AMBER
  • Statistics of microarray experiments with R

18
Clark 128 GB SMP Shared Memory 1 linux OS w/ 64
processors
Use Portable Batch System (PBS)!!!

CPU1
CPU2
CPU63
CPU64
19
Using Clark PBS (Portable Batch System)
  • clarkgt qsub scriptfile
  • qsub submits a batch job to PBS. Submitting a PBS
  • job specifies a task, requests resources and sets
  • job attributes.
  • clarkgt cat scriptfile
  • PBS l cput100000,ncpus8,mem2gb
  • (note -l for resource list)
  • ./myProgram

20
Using Clark output
  • scriptfile.onnnn  (written to the standard output
    stream)
  • scriptfile.ennnn  (written to the standard error
    stream)

21
Using Clark PBS example 1
  • clarkgt qsub runSAS
  • 6190.clark
  • clarkgt cat runSAS
  • PBS -l cput10000,ncpus1,mem1gb
  • cd workingdir/
  • sas test
  • runSAS.o6190
  • runSAS.e6190
  • test.log
  • test.lst

22
Using Clark PBS example 2
As part of a script qsub -V -k n -j oe -o
PBS_path\ -r n z\ -l ncpusPBS_ncpus \ -l
cputPBS_cput myprog To learn more about
qsub clarkgt man qsub
23
Using Clark qstat and queues
  • clarkgt qstat
  • Jobid Name User Time Use S Queue
  • ----- ----- ---- -------- - -----
  • 6422 qcid2 fjon 603620 R long
  • 6432 redo1 fjon 0 Q long
  • 6434 wrky4 fjon 100334 R standard
  • 6487 job1 cdar 050610 R standard
  • 6488 job23 cdar 013412 R standard
  • 6489 jobh2 cdar 0 Q standard
  • Long queue is for gt 100h.
  • 1 or 2 jobs can run simultaneously.
  • Only one can be long.
  • Submit as many as you like.

24
Using Clark qstat -f
  • clarkgt qstat -f 6502
  • Job Id 6502.clark
  • Job_Name Blast.rice.
  • Job_Owner mid_at_clark.rnet.missouri.edu
  • resources_used.cpupercent 0
  • resources_used.cput 000001
  • resources_used.mem 52048kb
  • resources_used.ncpus 8
  • job_state R
  • ctime Thu Apr 19 095515 2007
  • .......................................

25
Using Clark qdel
  • clarkgt qstat 6422
  • Jobid Name User Time Use S Queue
  • ----- ----- ---- -------- - -----
  • 6422 qcid2 fjon 603620 R long
  • clarkgt qdel 6422 (to kill a job)

26
Using Clark user limits
  • Maximum
  • number cpus 16
  • jobs running 2
  • jobs pending no limit
  • data storage no limit (yet)

27
Lewis 129 linux OSs 1 OS coordinates the
restInfiniband connects all nodes
Head Node
50 TB EMC CLARiiON CX700 Networked Storage
Use the Load Sharing Facility!!!
FC
IBRIX
Node 127
Node 1
Node 2
Node 128
128 Compute Nodes
28
LSF ex 1 1 processor program
  • lewisgtbsub ./myProg

29
LSF ex 1 1 processor program
  • lewisgt bsub lt myJob
  • lewisgt cat myJob
  • BSUB -J 1Pjob
  • BSUB -oo 1Pjob.oJ
  • BSUB -eo 1Pjob.eJ
  • ./myProg

N.B. oo, eo avoids filling your mailbox with
output
30
Using Lewis bjobs
  • lewisgt bsub lt myJob
  • lewisgt bjobs
  • JOBID USER STAT QUEUE HOST EXEC_HOST
    JOB_NAME SUB_TIME
  • 14070 spollen RUN norm lewis compute-20-5 myjob
    Sep 18 132

31
Using Lewis bjobs
  • lewisgt bjobs
  • JOBID 14070
  • USER spollenw
  • STAT RUN
  • QUEUE norm
  • HOST lewis
  • EXEC_HOST compute-20-5
  • JOB_NAME myjob
  • SUB_TIME Sep 18 132

32
Using Lewis bjobs (-w)
  • Lewisgt bjobs
  • JOBID USER STAT QUEUE HOST EXEC_HOST JOB_NAME
    SUB_TIME
  • 14070 sqx1 RUN norm lewis 4compute-2 myjob
    Apr 18 132
  • 4compute-22-
  • 4compute-20-
  • 3compute-22-
  • 1compute-20-

Lewisgt bjobs w JOBID USER STAT QUEUE HOST
EXEC_HOST JOB_NAME SUB_TIME 14070 sqx1 RUN
norm lewis 4compute-22-284compute-22- 304com
pute-20-113compute-22-291compute-20-30 myjob
Apr 18 1 32
33
monitor job performance on Lewis
(go to a compute node) Lewisgt lsrun P m
compute-22-30 top top - 103546 up 65 days,
1702, 1 user, load average 4.00, 4.00,
4.00 Tasks 149 total, 5 running, 144 sleeping,
0 stopped, 0 zombie Cpu(s) 98.2 us, 1.8
sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0
si Mem 4038084k total, 2421636k used,
1616448k free, 251500k buffers Swap 11261556k
total, 26776k used, 11234780k free, 1069484k
cached PID USER PR NI VIRT RES SHR S
CPU MEM TIME COMMAND 14070 sqx1 25
0 247m 177m 9672 R 99.9 4.5 150901
BlastFull 13602 larry 25 0 318m 241m 11m
R 99.7 6.1 695842 namd9 18608 moe 25
0 247m 177m 9668 R 99.7 4.5 151103
MyProg 13573 shemp 25 0 319m 243m 11m R
99.4 6.2 687115 namd9 3055 root 16 0
153m 47m 1496 S 0.7 1.2 30003.49 precept . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . (for help interpreting
the output run man top)
34
Using Lewis lsrun for short test runs and
interactive tasks
  • gtlsrun command argument
  • gtlsrun -P commandargument
  • e.g.,
  • gtlsrun P vi myFile.txt

35
Using Lewis bjobs l 646190
spollenw_at_lewis bjobs -l 646190 Job
lt646190gt, Job Name ltbustgt, User ltspollenwgt,
Project ltdefaultgt, Status ltRUNgt
, Queue ltmultigt, Command ltBSUB -q multiBSUB
-J bustBS UB -o
bust.oJBSUB -e bust.eJBSUB -n 4BSUB -R
"span hosts1" nohup make
-j 4 recursivegt Wed Sep 23 114045 Submitted
from host ltlewisgt, CWD lt/ifs/data/dnalab/Solexa/
090918_HWI-EAS313_42KW0AAXX/Da
ta/C1-42_Firecrest1.4.0_22-0
9-2009_spollenw/Bustard1.4.0_22-09-2009_spollenwgt,
Output File ltbust.oJgt,
Error File ltbust.eJgt, 4 Processors Reque
sted, Requested Resources
ltspanhosts1gt Wed Sep 23 114049 Started on
4 Hosts/Processors lt4compute-20-32gt, Execution
Home lt/home/spollenwgt,
Execution CWD lt/ifs/data/dnalab/Sol
exa/090918_HWI-EAS313_42KW0AAXX/Data/C1-42_
Firecrest1.4.0_
22-09-2009_spollenw/Bustard1.4.0_22-09-2009_spolle
nwgt Wed Sep 23 121941 Resource usage
collected. The CPU time used
is 2637 seconds. MEM 352
Mbytes SWAP 865 Mbytes NTHREAD 13
PGID 3112 PIDs 3112 3113 3116 3117
3123 11957 11958 11961
11962 11965 11966 11969 11970 SCHEDULING
PARAMETERS r15s r1m r15m ut
pg io ls it tmp swp mem
loadSched - - - - - -
- - - - - loadStop -
- - - - - - - -
- - gm_ports loadSched -
loadStop -
36
LSF ex 2 threaded/1 node
  • lewisgt cat myJob
  • BSUB -J thrdjob
  • . . . . . . . . .
  • BSUB -n 4
  • BSUB -R "spanhosts1"
  • (-R for resource requirement)
  • ./myProg

37
LSF ex 3 parallel program -things to know
  • Lewis uses the Message Passing Interface (MPI) to
    communicate between nodes.
  • Parallel programs run on
  • Infiniband connection (preferred)
  • TCP/IP network connection

38
LSF ex 3 MPI with Infiniband
  • BSUB -a mvapich
  • (-a for specific application requirements here,
    a program compiled for Infiniband)
  • BSUB -J jobname
  • . . . . . . . . . . . . . . . . . . . .
  • Set number of CPUs
  • BSUB -n 16
  • mpirun.lsf ./mpi_program

39
LSF ex 4MPI but pre-compiled for TCP/IP
  • BSUB -a mpichp4
  • (note prog compiled for TCP/IP)
  • BSUB -J jobname
  • . . . . . . . . . .
  • BSUB -n 16
  • mpirun.lsf ./mpi_program

40
job arrays for multiple inputs
  • bsub -J myArray1-100
  • -o J.output.I
  • (for command line input)
  • ./myProgram file.\LSB_JOBINDEX
  • Where input files are numbered
  • file.1, file.2, file.100

41
job arrays for multiple inputs
  • bsub -J myArray1-100
  • -o J.output.I
  • (for standard in)
  • -i file.I ./myProgram
  • Where input files are numbered
  • file.1, file.2, file.100

42
conditional jobs bsub -w
  • bsub -w "done("myArray1-100")" 
  • -J collate ./collateData
  • (conditions include done, ended, exit,
    external, numdone, numended, numexit, numhold,
    numpend, numrun, numstart, post_done, post_err,
    started)

43
gocomp for interactive sessions with GUI
  • spollenw_at_lewis gocomp
  • Logging on to interactive compute node
  • spollenw_at_compute-20-5

LSF does not schedule to this node. It is free
for interactive jobs.
Some programs used interactively MATLAB MapMan Ge
nesis
44
MATLAB
  • Batch job (32 licenses available) or interactive
    mode
  • Graphical or non-graphical

45
MATLAB Batch, Multi CPU, non-Graphical Use
  • lewisgt cat firstscript
  • BSUB -J myjob
  • BSUB -n 1
  • BSUB R "rusagematlab1duration1
  • BSUB -oo myjob.oJ
  • BSUB -eo myjob.eJ
  • matlab -nodisplay -r MyMATLABscript
  • lewisgt bsub lt firstscript

N.B. One cpu requested
Multiple cpus requested in script
46
MATLAB Multi CPU, non-Graphical Use
  • lewisgt cat MyMATLABscript.m
  • sched findResource('scheduler',
    'configuration', 'lsf')
  • set(sched, 'configuration', 'lsf')
  • set(sched, 'SubmitArguments', '-R
    "rusagemdce3"')
  • job createJob(sched)
  • createTask(job, _at_sum, 1, 1 1)
  • createTask(job, _at_sum, 1, 2 2)
  • createTask(job, _at_sum, 1, 3 3)
  • submit(job)
  • waitForState(job, 'finished')
  • results getAllOutputArguments(job)

47
MATLAB Graphical Use 1 cpu
  • X-windowing system needed for the MATLAB window
    to display. See the UMBC web site for
    instructions on downloading and installing the
    Cygwin Xserver.
  • After opening an xwindow type
  • lewisgtgocomp
  • lewisgtmatlab

48
Using Lewis short and normal and multi queues
  • short jobs have higher priority than normal jobs
    but will quit at 15 minutes.
  • Script BSUB -q short
  • Or lewisgtbsub -q short ltscriptfile
  • More cpus for non-mpi jobs with multi
  • bsub q multi ltscriptfile

49
Using Lewis Intel compiling programs for MPI
  • mpicc.i - to compile C programs
  • mpiCC.i to compile C programs
  • mpif77.i to compile FORTRAN 77 programs
  • mpif90.i to compile FORTRAN 90 programs

50
Using Lewis processor limits
  • Maximum number of
  • cores in jobs running 48
  • cores in jobs pending 200

Do not submit jobs requiring more than 248 cores
else those in pend state will never progress!
51
Using Lewis memory limits
  • For large memory requirements (gt900 MB), use the
    resource specification string
  • BSUB -R "rusagememnnnn" 
  • nnnn is in MB, and is per node.
  • Maximum available on any node 5,700 MB
  • If a job spans multiple nodes, each node will
    have to have nnnn available.

52
Storage through Lewis
5 TB for Home Directories (2.5 GB/user)
15 TB Data Directories (50 GB/user) uid_at_lewis
./data
50 TB EMC CLARiiON CX700 Networked Storage
14 TB paid for by a grant and dedicated to that
project
13 TB for backup and future needs
53
Using Lewis Storage Limits
  • 2.5 GB in home directory.
  • 50 GB - soft quota - under ltuseridgt/data
  • 55 GB hard quota no further writing to files.
  • The EMC CLARiiON storage is not backed up. A
    deleted file cannot be retrieved.
  • The EMC is a RAID5 design so if one disk fails
    the data are still available.
  • The data are viewable only by the user unless
    he/she changes the permissions to their directory.

54
Checkpointing on Lewis and Clark
  • Checkpointing is not supported on either system.
  • However, some programs, e.g., GAUSSIAN, come with
    their own checkpointing options which can be used.

55
Questions?
  • Any questions about the high performance
    computing equipment and its use can be sent to
    support_at_rnet.missouri.edu

56
http//umbc.rnet.missouri.edu
Write a Comment
User Comments (0)
About PowerShow.com