Title: Introduction to ITC Research Computing Support at U'Va' September, 2003 Presented by the ITC Researc
1 Introduction to ITC Research
Computing Support at U.Va. September,
2003Presented by theITC Research Computing
Support Group Kathy Gerber, Ed
Hall, Katherine Holcomb, Tim F. Jost Tolson
- Overview of Research Hardware, Software, Support
Storage Tuesday, September 10 - Statistical Software Thursday, September 12
- Mathematical Visualization Software Tuesday,
September 17, 330 PM - High Performance Computing Thursday, September
25, 330 PM - Whats New In Maple 9 by Kathy Gerber Wednesday,
October 22 at 330 PM - Whats New In Mathematica 5 by Ed Hall
Wednesday, November 12 at 330 PM - Using Matlab Effectively by Ed Hall - Wednesday,
October 9 at 330 PM
2High Performance Computing
- Katherine Holcomb
- ITC Research Computing Support Group
- res-consult_at_virginia.edu
3 Topics
- The Unix Environment
- Program Development
- Basic Efficiency Guidelines
- Profiling and Timing
- High Performance Platforms
- Parallel Programming
4The Unix Environment
- De facto environment for High Performance
Computing (HPC) - Portability across many hardware platforms
- Hardware independent networking
- Strong application programming interface
- www.itc.virginia.edu/research/unixbasics.html
5Linux
- Open source Unix-workalike operating system
- Linux clusters COTS (commodity off the shelf)
supercomputers - www.beowulf.org/
- http//lcic.org/computational.html
6Unix Editors
- Vi and Vim
- Standard Unix screen-oriented editor
- www.thomer.com/vi/vi.html
- http//vim.sourceforge.net/
- Emacs
- extensible, customizable, self-documenting
- www.lib.uchicago.edu/keith/tcl-course/emacs-tut
orial.html
7More Unix Editors - GUI
- nedit
- www.itc.virginia.edu/research/nedit.html
- Pico
- www.itd.umich.edu/itcsdocs/r1168/
- Jove www.itc.virginia.edu/desktop/unix/docs/u003.j
ove.html
8Program Development
- General Programming Advice
- Compilers
- Makefiles
- Debugging
- Checkpointing
9General Programming Advice
- ALWAYS document your code.
- Use DEBUG statements that can be removed by the
Pre-processor. - When developing your code, compile across a
number of architectures to insure the code is
portable and bug free. - Be prepared to rewrite the code if necessary.
- Do not re-invent the wheel.
- Keep functions/sub-routines short each should
perform one task. - White space and blank lines are Free.
10More General Programming Advice
- Compile with full warnings on (e.g. if using
gcc use the Wall option). - Todays warnings are tomorrows bugs/errors.
- Use descriptive names for variables (e.g.
NumPeople rather than N) - Read more than one book on the language you are
using. - Read other peoples code.
- Use lint or ftnchek to check your code.
- If learning a language from a book, do some of
the exercises.
11General Programming References
- Programming Pearls by Bentley.
- The Pragmatic Programmer From Journeyman to
Master by Hunt et al. - The Practice of Programming by Kernighan and
Pike. - Code Complete A Practical Handbook of Software
Construction by McConnell. - www.itc.virginia.edu/research/petsc/docs/codemanag
ement.html
12Compiled Languages
- Fortran 77/90/95
- C
- C
- ITC Supported Unix Compilers
- www.itc.virginia.edu/research/compilers.html
- ITC Supported Linux Compilers
- www.itc.virginia.edu/research/pgi/
- www.itc.virginia.edu/research/intel/
13Fortran
- Popular versions of Fortran used are Fortran 77,
Fortran 90 and Fortran 95 (Fortran 2000 is under
development) - Easy to learn.
- Very efficient.
- Supports multi-dimensional arrays (Matrices).
- Many existing codes are written in Fortran.
- Many numerical libraries such as IMSL and NAG
- Supports Complex Numbers
14Fortran References
- Fortran 90/95 for Scientists and Engineers by
Stephen Chapman - Fortran 90/95 Explained by Metcalf and Reid
- Introducing Fortran 95 by Chivers and
Sleightholme - Numerical Recipes in Fortran by Press et al.
- www.itc.virginia.edu/research/fortranprog.html
- comp.lang.fortran newsgroup
15Fortran Advice
- Use Fortran 95 it has many advantages over
Fortran 77 - Avoid go to if possible
- Use IMPLICIT NONE
16C
- Popular versions are KR C and ANSI C, new
standard was formalized in 1999. - Relatively easy to learn.
- Java and C are object-oriented descendents with
similar syntax
17C References
- The C Programming Language by Kernighan and
Ritchie - C Programming A Modern Approach by KN King
- comp.lang.c
- comp.lang.c.moderated
- http//www.eskimo.com/scs/cclass/cclass.html
- Lint a C program checker
- http//www.pdc.kth.se/training/Tutor/Basics/
lint/index- frame.html
18C
- ISO standard was ratified in 1998 however
different compilers support the standard to
different levels which can lead to portability
problems - C is a large language which can take time to
learn. - Can be as efficient as Fortran but it is
difficult to acquire this level of performance. - Supports object-orientated programming.
- Supports meta-programming with Templates.
- Used widely outside Academia.
- Supports complex numbers
19C References
- Beginner
- Practical C Programming by Oualline
- Intermediate
- The C Programming Language by Stroustrup
- Advanced
- Effective C by Meyers
- More Effective C by Meyers
- Modern C Design by Alexandrescu
20C Web References
- comp.lang.c
- comp.lang.c.moderated
- comp.lang.c.std
- http//www.research.att.com/bs/C.html
- http//www.zib.de/Visual/people/mueller/Course/Tut
orial/tutorial.html - http//www.cs.wustl.edu/schmidt/C/
21C Toolkits
- Many Toolkits have been written in C such as
Blitz and Pooma. - A good reference site for Scientific programming
tools in C is - http//www.oonumerics.org/
22Makefiles
- Automates compilation of programs
- Shortens compilation by keeping track of what has
been changed/needs recompiling - Simplifies inclusion of multiple compiler
- flags, e.g. for debugging and optimization
- www.itc.virginia.edu/research/make.html
23Debugging Statements
- Fortran
- Parameter DEBUG1
- if ( DEBUG.eq.1) then
- WRITE(2,)
- endif
- C/C
- define DEBUG
- or
- cc DDEBUG c test.c
- ifdef DEBUG
- printf()
- endif
24Debugging Software
- Must compile with debug flag (usually g)
disables optimization - dbx or similar is found on most Unix systems
- www.itc.virginia.edu/research/debug.html
- gdb - is the GNU version of dbx for gcc, g and
g77 - TotalView for serial and parallel programs
- www.itc.virginia.edu/research/totalview/
- PGI PGDBG
- www.pgroup.com/tools/pgdbg.htm
25Checkpointing
- Prevent losing computation results due to
premature program termination resulting from
machine crash or cpu time limits. - Periodically save program state (variables) to a
file which could later be read into the program
so that computation can proceed from that state. - Allows monitoring progress of running program
- Especially useful for parallel programs
26Basic Efficiency Guidelines
- Select best algorithm.
- Use efficient libraries.
- Compiler optimizations.
- Code Optimization
27Compiler Options for producing the Fastest
Executable
- Using optimization flags when compiling can
greatly reduce the runtime of an executable. - Each compiler has a different set of options for
creating the fastest executable . - Often the best compiler options can only be
arrived at by empirical testing and timing of
your code. - A good reference for compiler flags that can be
used with various architectures is the SPEC web
site www.spec.org. - Read the Compiler manpages.
28Example of Compiler Flags used on a Sun Ultra 10
workstation
- Compiler SUNWpro 4.2
- Flags none
- Runtime 23min 22.4 Sec
- Compiler SUNWpro 5.0
- Flags none
- Runtime 14min 21.0 Sec
29More optimizations
- Compiler SUNWpro 5.0
- Flags -O3
- Runtime 2min 24.4 sec
- Compiler SUNWpro 5.0
- Flags -fast
- Runtime 2min 06.7 sec
30Interprocedural Optimization
- Compiler SUNWpro 5.0
- Flags -fast xcrossfile -xprofile
- Runtime 1min 57.3 sec
- Interprocedural analysis options vary by
compiler - Compiler Intel ifc
- Flag -ipo
31Useful Compiler Options
- IBM AIX -O3 qstrict -qtunep2sc qhot
-qarchp2sc qipa - PGI -fastsse (on Athlon/P4)
- Intel -O3 tpp7 (P4) ipo (Interprocedural)
- GNU -O3 ffast-math funroll-loops
32Optimizing Floating Point Operations
- Loop invariant conditionals
- DO I1,K
- IF ( N.EQ.0 ) THEN
- A(I)A(I)B(I)C
- ELSE
- A(I)0
- ENDIF
- ENDDO
-
33Optimizing Floating Point Operations
- Move loop invariant conditional outside loop
- IF ( N.EQ.0 ) then
- DO I1,k
- A(I)A(I)B(I)C
- ENDDO
- ELSE
- DO I1,K
- A(I)0
- ENDDO
- ENDIF
-
34The Memory Bottleneck
Cpu 2x every 2 years
Memory 2x every 6 years
- Optimize Memory Access
- Cache-based systemscurrently most common
- Vector pipelining making a comeback
35Optimizing Memory Access
- Memory access more of performance bottleneck than
processor speed - Largest potential for performance improvement
- Access data to minimize out-of-cache memory use
36Loop Ordering
- Fortran column-major
- Ordering
- DO J1,N
- DO I1,N
- A(I,J) B(I,J) C(I,J)D
- ENDDO
- ENDDO
- C/C row-major ordering
- for(I0 Iltn I)
- for(J0 Jltn J)
- aI,JaIJ CIJD
37Code Optimization References
- www.itc.virginia.edu/research/Optim.html
- www.npaci.edu/T3E/single_pe.html
- www.llnl.gov/computing/tutorials/workshops/worksho
p/optimization/ - Software Optimizations for High Performance
Computing by Crawford and Wadleigh - High Performance Computing by Kevin Dowd et al
- Performance Optimization for Numerically
Intensive Codes by Goedecker and Hoisie
38Timing and Profiling Codes
- Early Optimization is the root of all evil
Donald Knuth - The 80-20 rule codes generally spend 80 of
their time executing 20 of their instructions
39time command
- Useful for calculating how long a code runs for,
provides both user and system time. - /usr/bin/time test.x
- real 111.7
- user 109.7
- sys 0.5
40Example F95 cpu_time function
- Can be used to time sections of code
- real start,finish
- call cpu_time(start)
- ..
- ..
- call cpu_time(finish)
- write(,) finish-start
-
41gprof
- Provides a very detailed break down on how much
time is spent in each function of a code - Compile and link with pg -p for Intel option
- f90 pg O o test.x test.f
- Execute code in normal manner
- ./test.x
- Create profile with gprof
- gprof test.x gt test.prof
- www.itc.virginia.edu/research/profile.html
42Vendor Tools for Profiling and Timing
- Most vendors provide specialized tools for
profiling and timing codes which usually have a
simple to use GUI. - Sun Forte Workshop
- http//www.sun.com/forte/
- SGI Speedshop
- www.sgi.com/developers/devtools/tools/prodev.ht
ml - PGI PGPROF
- www.pgroup.com/tools/pgprof.htm
43HPC Hardware
- Single Processor
- Symmetric Multi-Processor
- Distributed Processing
44Single Processor System
45Shared Memory System
46Distributed Memory System
47ITC HPC Resources
- IBM SMP 4 Power3 nodes with 12GB total memory.
- http//www.itc.virginia.edu/research/ibm/smp/
- Unixlab Cluster 54 SGI and Sun workstations.
- http//www.itc.virginia.edu/research/unixlab-acc
ount.html
48ITC Linux ClustersAspen
- Consists of 48 dual processor AMD K7 Athlon MP
1800, 1.533 Ghz . - Each node has 1 GB RAM
- Gigabit Ethernet interconnect
- www.itc.virginia.edu/research/linux-clusters/aspen
49ITC Linux ClustersBirch
- Consists of 32 dual processor Intel Xeon Pentium
4, 2.4 Ghz . - Each node has 2 GB RAM
- Gigabit Ethernet interconnect
- Low-latency Myrinet interconnect
- www.itc.virginia.edu/research/linux-clusters/birch
50Portable Batch System
- Queueing system resource manager
- Used on Linux clusters and will be used on IBM
SMP system - Job scripts are prepared on the frontend and
submitted to the queue manager with the command
qsub - Aspen and Birch tutorials have examples
51Parallel Programming
- Simultaneous use of multiple compute resources.
- Parallelism can be coarse or fine-grained.
- Saves wall-clock time, solves bigger problems
- Make sure serial program optimized before
parallelizing it. - www.llnl.gov/computing/tutorials/workshops/worksho
p/parallel_comp/
52Message Passing Interface
- MPI can be used to program a distributed memory
system, such as a Linux cluster, as well as SMP
machines. - MPI tends to be used more than PVM.
- MPI is an industry standard supported by most
vendors. - MPI versions run on most Unix and Windows 2000
53MPI
- MPI is a library of functions that can be called
by a users code to pass information between
processors. - The MPI library consists of over 200 functions
in general only a small subset of these are used
in any code. - MPI can be used with Fortran, C and C.
- MPI can be used on a single processor system to
emulate a parallel system useful for developing
and testing code.
54MPI Books
- Parallel Programming with MPI by Peter Pacheco
- Using MPI Portable Parallel Programming With
the Message-Passing Interface by William Gropp et
al - Using MPI-2 Advanced Features of the Message-
Passing Interface by William Gropp et al - MPI the Complete Reference The MPI Core by Marc
Snir - Mpi the Complete Reference The MPI-2 Extensions
by William Gropp et al
55MPI Web-References
- http//www-unix.mcs.anl.gov/mpi/mpich/index.html
- http//www.lam-mpi.org/
- http//www.mpi-forum.org/
- http//fawlty.cs.usfca.edu/mpi/
56HPC Libraries
- Well tested and widely useddont reinvent the
wheel. - If not already present, ITC can install them.
- Examples include ScaLAPACK, PETSc, LAPACK etc.
- Examples http//www.netlib.org
57National Supercomputer Centers
- NPACI - National Partnership for Advanced
Computational Infrastructure. - NCSA - National Center for Supercomputing
Applications - US researchers are eligible to apply for time on
NCSA/NPACI machines. - If application is successful time is free.
- Small allocations for 10,000 hours are relatively
easy to get.
58Some Sample Machines.
- BlueHorizon 1152 IBM Power3 CPUs configured in
144 nodes with 8 CPUs and 4GB per nodes based at
San Deigo. - http//www.npaci.edu/BlueHorizon/
- TeraScale Computing System 2728 1GHz Alpha CPUs
configured in 682 nodes - http//www.psc.edu/machines/tcs/
- Distributed Terascale Facility a 13.6 Teraflop
Linux Cluster - http//www.npaci.edu/teragrid/index.html