NPACI All Hands Meeting 2002 User Feedback Session - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

NPACI All Hands Meeting 2002 User Feedback Session

Description:

(modules, looks & feels like NPACI machine) VISualization ... longhorn.tacc.utexas.edu. 64 IBM Power4 1.3 GHz Processors. Arranged as 4 16-way SMP (now) ... – PowerPoint PPT presentation

Number of Views:19
Avg rating:3.0/5.0
Slides: 28
Provided by: johnrbo
Category:

less

Transcript and Presenter's Notes

Title: NPACI All Hands Meeting 2002 User Feedback Session


1
NPACI All Hands Meeting 2002User Feedback Session
  • Session Chair Dr. Jay Boisseau, TACC
  • Speakers Dr. Kent Milfeld, TACC
  • Dr. Bill Martin, U. Michigan
  • Don Frederick, SDSC
  • Friday, March 8, 2002

2
Organization of HPC Resourcesat TACCKent
Milfeldmilfeld_at_tacc.utexas.edu
  • March 8, 2002

University of Texas at Austin
Texas Advanced Computing Center
3
TACC Resources
Allocatable Machines
  • SV1, Vector Processing
  • T3E, MPP (RISC) Processing
  • April, Regatta (Power4)

Infrastructure
  • Storage Archive (GigE machine room network)
  • IA-32/64 Linux Cluster Computing (NPACI Rocks)
  • Integrated Prog. Env. (modules, looks feels
    like NPACI machine)
  • VISualization Facilities (world class)
  • Mini TeraGrid (12-mile OC-48 Network experiment
    next month)
  • Grid Team Development, Support and Portals
  • NPACI/TACC Training Consulting

4
NPACI-TACC Resources
  • SV1 16CPU 16 GB Memory (Vector)
  • Memory Bandwidth
  • Gather/Scatter
  • 300MHz x 4 flop/CP 1200 MFLOPS
  • -OpenMP, MPI

(RISC)
  • T3E 272CPU 128MB/node
  • RISC with streams
  • Cache
  • 300MHz x 2 flop/CP 600 MFLOPS
  • High Speed Interconnect
  • -MPI

5
TACC Regatta HPC longhorn.tacc.utexas.edu
  • 64 IBM Power4 1.3 GHz Processors
  • Arranged as 4 16-way SMP (now)
  • 32 GB Memory/Node (128GB total)
  • 1TB disk
  • 1/3 TFLOPS (peak)
  • Early Summer
  • Interconnected by High Speed Switch
  • (1-2 GB/sec. point to point, theoretical)

6
TACC Regatta HPC
7
Storage RobotPetabyte capacity
8
IBM p690 HPC Design Configuration
135 watts / die x 4 ?HOT!!!
9
TACC Visualization Lab
  • SGI Onyx2
  • 24 CPUs, 6 Infinite Reality 2 Graphics Pipelines
  • 24 GB Memory, 1TB Disk
  • Front and Rear Projection Systems
  • 3x1 cylindrically-symmetric Power Wall
  • 5x2 large-screen, 169 panel Power Wall

10
Power4
11
TACC IA-32 System 64 Compute Processors
32 compute nodes 2-way SMPs 1 GB mem./node 1
GHz Pentium III (IBM x330)
18 GB local disk per node
20 GB
/work
GigE 125MB/sec
Switch
32 lines
20 GB
x340
/home
100Base-T 12.5MB/sec
login node
32 lines
M3-SW16
x340
3/4 TB
/gpfs
x340
250MB/sec
Myrinet
2 GPFS nodes
12
TACC IA-64 System 40 Compute Processors
20 compute nodes 2-way SMPS Intel Itanium (IBM
x380) 2 GB mem./node
800 MHz
Switch
20 lines
125MB/sec
100Base-T
Fast Ethernet 12.5MB/sec
GigE
140GB
/work
x380
23GB
/home
login node
32 GB local disk per node
x340
TBD
/gpfs
x340
2 gpfs nodes
Myrinet 250MB/sec
M3-SW16
late spring, 02
20 lines
Myrinet
13
User Feedback SessionAHM2002
  • Bill Martin
  • Director, NPACI Midrange Site
  • Director, Center for Advanced Computing
  • University of Michigan
  • March 8, 2002

14
Who we are .
  • Tom Hacker head of Systems Support team
  • Rod Mach
  • Matt Britt
  • Abhijit Bose head of User Support team
  • Randy Crawford
  • David Woodcock
  • Contributing faculty
  • Quentin Stout (EECS)
  • John Volakis (EECS)
  • Linda Abriola (Civil and Environmental
    Engineering)

15
The UM Mid-Range Site Operate and maintain HPC
equipment
  • 112 cpu SP2 (160 MHZ) system including 64 cpu SP2
    from SDSC soon to be 176 nodes with the 64 cpu
    SP2 from SDSC (via Texas)
  • 24 cpu (3 8-way nodes) Nighthawk (375 MHZ) system
    will add 4 interactive nodes soon
  • Built and operated 100 cpu (soon to be 128 cpu)
    Intel cluster (Pentium III) during past year
  • Operate mass store system (Timberwolf/Tivoli)

16
Systems support local and distributed
  • Three full time staff
  • Operate and self-maintain all IBM equipment
  • Developed joint job submission system with Texas
    with their SP2 for NPACI allocations
  • Participate on development team for SRB (ported
    to Tivoli)
  • Use SRB for Visible Human Project

17
User support and expert consultation
  • Three full time user support staff (2 PhDs 1 MS)
  • Assist in NPACI 800 hotline ( 260 Remedy
    tickets in 2001) for all NPACI platforms
    including data resources
  • Work at algorithm and numerical methods level
  • Monte Carlo photon cancer treatment therapy (Y.
    Dewarja)
  • Gene sequence alignment and optimization (R.
    Goldstein)
  • Environmental remediation simulator (MISER) for
    EPA
  • More demand than capacity for user support
  • Absolutely critical for effective utilization of
    parallel systems. Recall quote by Charlie Catlett
    yesterday
  • User support, user support, user support

18
Workshops and Distance Training
  • Developed several web-based modules for parallel
    computing
  • Using the UM SP2 system
  • Domain decomposition
  • OpenMP
  • Parallel Object-Oriented Programming
  • Linux Clusters
  • Parallel computing workshops (at Michigan)
  • Fall NPACI Workshops (2x) 106 signed up 87
    attended
  • Summer parallelization workshop 42 attendees

19
Michigan and Texas collaboration has yielded
improved user interface
  • Co-scheduling SP2 systems ( one virtual SP2
    system) with single queue (Load Leveler), enables
    load balancing between sites
  • Shared file space (single AFS cell)
  • Data intensive computing infrastructure (SRB,
    AFS)
  • Coordinated account management and accounting
    systems
  • May be viewed as developing, testing, and
    deploying prototype Grid technologies in a
    production environment

20
New high end cluster at Michigan ..
  • 256 node AMD cluster (Athlon 32 bit)
  • 1.55 GHZ, 1 GB/cpu, Myrinet 2000
  • Assembled by Atipa the first installment (100
    CPUs) is now operational
  • Partnering with other UM research groups to
    increase size to gt 500 cpu
  • Will exceed 2 teraflops peak
  • Allocable NPACI resource 2/3 of system

21
(No Transcript)
22
AHM 02 - NPACI User Feedback Session
SDSC Current Future Resources Donald
Frederick, Scientific Computing
Department 858-534-5020, frederik_at_sdsc.edu
23
Current SDSC Resources - 2002
24
SDSC TeraGrid System Future Resource 2003
ANL 1 TF .25 TB Memory 25 TB disk
Caltech 0.5 TF .4 TB Memory 86 TB disk
Chicago LA DTF Core Switch/Routers Cisco 65xx
Catalyst Switch (256 Gb/s Crossbar)
NCSA 62 TF 4 TB Memory 240 TB disk
SDSC 4.1 TF 2 TB Memory 225 TB SAN
vBNS Abilene Calren ESnet
OC-12
OC-12
OC-12
OC-3
4
HPSS 300 TB
Myrinet
4
10
1176p IBM SP 1.7 TFLOPs Blue Horizon
Sun Server
4
16
2 x Sun E10K
25
All TeraGrid Sites Have Focal Points
  • SDSC The TeraGrid Data Place
  • Large-scale and high-performance data
    analysis/handling
  • Every Cluster Node is Directly Attached to SAN

26
Basic Cluster Components
  • Systems actual HW configuration not settled
  • IA-64 McKinley-based IBM node candidate cpu
  • 23 GB Memory/CPU
  • Connectivity
  • Gigabit Ethernet in every node (multiple?)
  • Myrinet network in every node (multiple?)
  • Storage
  • Local Disk (gt 73GB)
  • Access to large secondary, tertiary storage
  • Primarily Open Source software stack
  • Linux, cluster software, Grid software
  • Proprietary where it makes sense (compilers,
    debugger, etc.)

27
TeraGrid Storage
  • Storage in 4 flavors
  • Local node, up to 91GB/node
  • 2 SCSI drives/node
  • Secondary storage on each site
  • 0.6 PB across the sites
  • Locally accessible at each site
  • Secondary storage from remote sites
  • Metadata management requires serious effort
  • Data location and replication with SRB
  • Unique SDSC configuration with dedicated Sun
    Starcat server
  • Expected as major use of WAN
  • Tertiary storage on each site
  • Locally accessible, needs to be integrated with
    TeraGrid
Write a Comment
User Comments (0)
About PowerShow.com