NPACI All Hands Meeting 2002 User Feedback Session - PowerPoint PPT Presentation

1 / 27

About This Presentation

Title:

NPACI All Hands Meeting 2002 User Feedback Session

Description:

(modules, looks & feels like NPACI machine) VISualization ... longhorn.tacc.utexas.edu. 64 IBM Power4 1.3 GHz Processors. Arranged as 4 16-way SMP (now) ... – PowerPoint PPT presentation

Number of Views:19

Avg rating:3.0/5.0

Slides: 28

Provided by: johnrbo

Category:

more less

Transcript and Presenter's Notes

Title: NPACI All Hands Meeting 2002 User Feedback Session

1
NPACI All Hands Meeting 2002User Feedback Session

Session Chair Dr. Jay Boisseau, TACC
Speakers Dr. Kent Milfeld, TACC
Dr. Bill Martin, U. Michigan
Don Frederick, SDSC
Friday, March 8, 2002

2
Organization of HPC Resourcesat TACCKent
Milfeldmilfeld_at_tacc.utexas.edu

March 8, 2002

University of Texas at Austin
Texas Advanced Computing Center
3
TACC Resources
Allocatable Machines

SV1, Vector Processing
T3E, MPP (RISC) Processing
April, Regatta (Power4)

Infrastructure

Storage Archive (GigE machine room network)
IA-32/64 Linux Cluster Computing (NPACI Rocks)
Integrated Prog. Env. (modules, looks feels
like NPACI machine)
VISualization Facilities (world class)
Mini TeraGrid (12-mile OC-48 Network experiment
next month)
Grid Team Development, Support and Portals
NPACI/TACC Training Consulting

4
NPACI-TACC Resources

SV1 16CPU 16 GB Memory (Vector)

Memory Bandwidth
Gather/Scatter
300MHz x 4 flop/CP 1200 MFLOPS
-OpenMP, MPI

(RISC)

T3E 272CPU 128MB/node

RISC with streams
Cache
300MHz x 2 flop/CP 600 MFLOPS
High Speed Interconnect
-MPI

5
TACC Regatta HPC longhorn.tacc.utexas.edu

64 IBM Power4 1.3 GHz Processors
Arranged as 4 16-way SMP (now)
32 GB Memory/Node (128GB total)
1TB disk
1/3 TFLOPS (peak)
Early Summer
Interconnected by High Speed Switch
(1-2 GB/sec. point to point, theoretical)

6
TACC Regatta HPC
7
Storage RobotPetabyte capacity
8
IBM p690 HPC Design Configuration
135 watts / die x 4 ?HOT!!!
9
TACC Visualization Lab

SGI Onyx2
24 CPUs, 6 Infinite Reality 2 Graphics Pipelines
24 GB Memory, 1TB Disk
Front and Rear Projection Systems
3x1 cylindrically-symmetric Power Wall
5x2 large-screen, 169 panel Power Wall

10
Power4
11
TACC IA-32 System 64 Compute Processors
32 compute nodes 2-way SMPs 1 GB mem./node 1
GHz Pentium III (IBM x330)
18 GB local disk per node
20 GB
/work
GigE 125MB/sec
Switch
32 lines
20 GB
x340
/home
100Base-T 12.5MB/sec
login node
32 lines
M3-SW16
x340
3/4 TB
/gpfs
x340
250MB/sec
Myrinet
2 GPFS nodes
12
TACC IA-64 System 40 Compute Processors
20 compute nodes 2-way SMPS Intel Itanium (IBM
x380) 2 GB mem./node
800 MHz
Switch
20 lines
125MB/sec
100Base-T
Fast Ethernet 12.5MB/sec
GigE
140GB
/work
x380
23GB
/home
login node
32 GB local disk per node
x340
TBD
/gpfs
x340
2 gpfs nodes
Myrinet 250MB/sec
M3-SW16
late spring, 02
20 lines
Myrinet
13
User Feedback SessionAHM2002

Bill Martin
Director, NPACI Midrange Site
Director, Center for Advanced Computing
University of Michigan
March 8, 2002

14
Who we are .

Tom Hacker head of Systems Support team
Rod Mach
Matt Britt
Abhijit Bose head of User Support team
Randy Crawford
David Woodcock
Contributing faculty
Quentin Stout (EECS)
John Volakis (EECS)
Linda Abriola (Civil and Environmental
Engineering)

15
The UM Mid-Range Site Operate and maintain HPC
equipment

112 cpu SP2 (160 MHZ) system including 64 cpu SP2
from SDSC soon to be 176 nodes with the 64 cpu
SP2 from SDSC (via Texas)
24 cpu (3 8-way nodes) Nighthawk (375 MHZ) system
will add 4 interactive nodes soon
Built and operated 100 cpu (soon to be 128 cpu)
Intel cluster (Pentium III) during past year
Operate mass store system (Timberwolf/Tivoli)

16
Systems support local and distributed

Three full time staff
Operate and self-maintain all IBM equipment
Developed joint job submission system with Texas
with their SP2 for NPACI allocations
Participate on development team for SRB (ported
to Tivoli)
Use SRB for Visible Human Project

17
User support and expert consultation

Three full time user support staff (2 PhDs 1 MS)
Assist in NPACI 800 hotline ( 260 Remedy
tickets in 2001) for all NPACI platforms
including data resources
Work at algorithm and numerical methods level
Monte Carlo photon cancer treatment therapy (Y.
Dewarja)
Gene sequence alignment and optimization (R.
Goldstein)
Environmental remediation simulator (MISER) for
EPA
More demand than capacity for user support
Absolutely critical for effective utilization of
parallel systems. Recall quote by Charlie Catlett
yesterday
User support, user support, user support

18
Workshops and Distance Training

Developed several web-based modules for parallel
computing
Using the UM SP2 system
Domain decomposition
OpenMP
Parallel Object-Oriented Programming
Linux Clusters
Parallel computing workshops (at Michigan)
Fall NPACI Workshops (2x) 106 signed up 87
attended
Summer parallelization workshop 42 attendees

19
Michigan and Texas collaboration has yielded
improved user interface

Co-scheduling SP2 systems ( one virtual SP2
system) with single queue (Load Leveler), enables
load balancing between sites
Shared file space (single AFS cell)
Data intensive computing infrastructure (SRB,
AFS)
Coordinated account management and accounting
systems
May be viewed as developing, testing, and
deploying prototype Grid technologies in a
production environment

20
New high end cluster at Michigan ..

256 node AMD cluster (Athlon 32 bit)
1.55 GHZ, 1 GB/cpu, Myrinet 2000
Assembled by Atipa the first installment (100
CPUs) is now operational
Partnering with other UM research groups to
increase size to gt 500 cpu
Will exceed 2 teraflops peak
Allocable NPACI resource 2/3 of system

21
(No Transcript)
22
AHM 02 - NPACI User Feedback Session
SDSC Current Future Resources Donald
Frederick, Scientific Computing
Department 858-534-5020, frederik_at_sdsc.edu
23
Current SDSC Resources - 2002
24
SDSC TeraGrid System Future Resource 2003
ANL 1 TF .25 TB Memory 25 TB disk
Caltech 0.5 TF .4 TB Memory 86 TB disk
Chicago LA DTF Core Switch/Routers Cisco 65xx
Catalyst Switch (256 Gb/s Crossbar)
NCSA 62 TF 4 TB Memory 240 TB disk
SDSC 4.1 TF 2 TB Memory 225 TB SAN
vBNS Abilene Calren ESnet
OC-12
OC-12
OC-12
OC-3
4
HPSS 300 TB
Myrinet
4
10
1176p IBM SP 1.7 TFLOPs Blue Horizon
Sun Server
4
16
2 x Sun E10K
25
All TeraGrid Sites Have Focal Points