NUG Training 1032005 - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

NUG Training 1032005

Description:

Jacquard is a 640-CPU Opteron cluster running a Linux operating system. ... Jacquard has 320 dual-processor nodes available for scientific calculations. ... – PowerPoint PPT presentation

Number of Views:42
Avg rating:3.0/5.0
Slides: 24
Provided by: FlavioR1
Category:
Tags: nug | jacquard | training

less

Transcript and Presenter's Notes

Title: NUG Training 1032005


1
NUG Training 10/3/2005
  • Logistics
  • Morning only coffee and snacks
  • Additional drinks 0.50 in refrigerator in small
    kitchen area can easily go out to get coffee
    during 15-minute breaks
  • Parking garage vouchers at reception desk on
    second floor
  • Lunch
  • On your own, but can go out in groups

2
Todays Presentations
  • Jacquard Introduction
  • Jacquard Nodes and CPUs
  • High Speed Interconnect and MVAPICH
  • Compiling
  • Running Jobs
  • Software overview
  • Hands-on
  • Machine room tour

3
Overview of Jacquard Richard Gerber NERSC User
Services RAGerber_at_lbl.gov NERSC Users
Group October 3, 2005 Oakland, CA
4
Presentation Overview
  • Cluster overview
  • Connecting
  • Nodes and processors
  • Node interconnect
  • Disks and file systems
  • Compilers
  • Operating system
  • Message passing interface
  • Batch system and queues
  • Benchmarks and application performance

5
Status
  • Status Update

Jacquard has been experiencing node
failures. While this problem is being worked on
we are making Jacquard available to users in a
degraded mode. About 200 computational nodes are
available, one login node, and about half of the
storage nodes that support the GPFS file
system. Expect lower than usual I/O
performance. Because we may still experience some
instability, users will not be charged until
Jacquard is returned to full production
6
Introduction to Jacquard
  • Named in honor of inventor Joseph Marie Jacquard,
    whose loom was the first machine to use punch
    cards to control a sequence of operations.
  • Jacquard is a 640-CPU Opteron cluster running a
    Linux operating system.
  • Integrated, delivered, and supported by Linux
    Networx
  • Jacquard has 320 dual-processor nodes available
    for scientific calculations. (Not dual-core
    processors.)
  • The nodes are interconnected with a high-speed
    InfiniBand network.
  • Global shared file storage is provided by a GPFS
    file system.

7
Jacquard
  • http//www.nersc.gov/nusers/resources/jacquard/

8
Jacquard Characteristics
9
Jacquards Role
  • Jacquard is meant to be for codes that do not
    scale well on Seaborg.
  • Hope to relieve Seaborg backlog.
  • Typical job expected to be in the concurrency
    range of 16-64 nodes.
  • Applications typically run 4X Seaborg speed. Jobs
    that cannot scale to large parallel concurrency
    should benefit from faster CPUs.

10
Connecting to Jacquard
  • Interactive shell access is via SSH.
  • ssh l login_name jacquard.nersc.gov
  • Four login nodes for compiling and launching
    parallel jobs. Parallel jobs do not run on login
    nodes.
  • Globus file transfer utilities can be used.
  • Outbound network services are open (e.g., ftp).
  • Use hsi for interfacing with HPSS mass storage.

11
Nodes and processors
  • Each jacquard node has 2 processors that share 6
    GB of memory. OS/network/GPFS uses 1 (?) GB of
    that.
  • Each processor is a 2.2 GHz AMD Opteron
  • Processor theoretical peak 4.4 GFlops/sec
  • Opteron offers advanced 64-bit processor,
    becoming widely used in HPC.

12
Node Interconnect
  • Nodes are connected by an InfiniBand high speed
    network from Mellanox.
  • Adapters and switches from Mellanox
  • Low latency 7µs vs. 25 µs on Seaborg
  • Bandwidth 2X Seaborg
  • Fat tree

13
Disks and file systems
  • Homes, scratch, and project directories are in
    global file system from IBM, GFPS.
  • SCRATCH environment variable is defined to
    contain path to a users personal scratch space.
  • 30 TBytes total usable disk
  • 5 GByte space, 15,000 inode quota in HOME per
    user
  • 50 GByte space, 50,000 inode quota in SCRATCH
    per user
  • SCRATCH gives better performance, but may be
    purged if space is needed

14
Project directories
  • Project directories are coming (some are already
    here).
  • Designed to facilitate group sharing of code and
    data.
  • Can be repo- or arbitrary group-based
  • /home/projects/group
  • For sharing group code
  • /scratch/projects/group
  • For sharing group data and binaries
  • Quotas TBD

15
Compilers
  • High performance Fortran/C/C compilers from
    Pathscale.
  • Fortran compiler pathf90
  • C/C compiler pathcc, pathCC
  • MPI compiler scripts use Pathscale compilers
    underneath and have all MPI I, -L, -l options
    already defined
  • mpif90
  • mpicc
  • mpicxx

16
Operating system
  • Jacquard is running Novell SUSE Linux Enterprise
    Linux 9
  • Has all the usual Linux tools and utilities
    (gcc, GNU utilities, etc.)
  • It was the first enterprise-ready Linux for
    Opteron.
  • Novell (indirectly) provides support and product
    lifetime assurances (5 yrs).

17
Message passing interface
  • MPI implementation is known as MVAPICH.
  • Based on MPICH from Argonne with additions and
    modifications from LBNL for InfiniBand. Developed
    and supported ultimately by Mellanox/Ohio State
    group.
  • Provides standard MPI and MPI/IO functionality.

18
Batch system
  • Batch scheduler is PBS Pro from Altair
  • Scripts not much different from LoadLeveler _at_
    -gt PBS
  • Queues for interactive, debug, premium charge,
    regular charge, low charge.
  • Configured to run jobs using 1-128 nodes (1-256
    CPUs).

19
Performance and benchmarks
  • Applications run 4x Seaborg, some more, some less
  • NAS Parallel Benchmarks (64-way) are 3.5-7
    times seaborg
  • Three applications the author has examined (-O3
    out of the box)
  • CAM 3.0 (climate) 3.5 x Seaborg
  • GTC (fusion) 4.1 x Seaborg
  • Paratec (materials) 2.9 x Seaborg

20
User Experiences
  • Positives
  • Shorter wait in the queues
  • Linux many codes already run under Linux
  • Good performance for 16-48 node jobs some codes
    scale better than on Seaborg
  • Opteron is fast

21
User Experiences
  • Negatives
  • Fortran compiler is not common, so some porting
    issues.
  • Small disk quotas.
  • Unstable at times.
  • Job launch doesnt work well (cant pass ENV
    variables).
  • Charge factor.
  • Big endian I/O.

22
Todays Presentations
  • Jacquard Introduction
  • Jacquard Nodes and CPUs
  • High Speed Interconnect and MVAPICH
  • Compiling
  • Running Jobs
  • Software overview
  • Hands-on
  • Machine room tour

23
Hands On
  • We have a special queue blah with 64 nodes
    reserved.
  • You may work on your own code.
  • Try building and running test code
  • Copy to your directory and untar
    /scratch/scratchdirs/ragerber/NUG.tar
  • 3 NPB parallel benchmarks ft, mg, sp
  • Configure in config/make.def
  • make ft CLASSC NPROCS16
  • Sample PBS scripts in run/
  • Try new MPI version, opt levels, -g, IPM
Write a Comment
User Comments (0)
About PowerShow.com