Clusters to Supercomputers

About This Presentation

Title:

Clusters to Supercomputers

Description:

Nobody does server-side refresh anymore. Job positions at NCAR and CU ... Don't submit from a shell script without a `sleep 1' statement. Storage space considerations ... – PowerPoint PPT presentation

Number of Views:80

Avg rating:3.0/5.0

Slides: 49

Provided by: matthe297

Category:

more less

Transcript and Presenter's Notes

Title: Clusters to Supercomputers

1
Clusters to Supercomputers
Schenks System Administration
April 2008

Matthew Woitaszek
University of Colorado, Boulder
NCAR Computer Science Section
mattheww_at_ucar.edu

Presented for Chris Schenks CSCI 4113 Unix
System Administration
2
Were Hiring!

System Administrators
Software Developers
Web Technology Geeks
Nobody does server-side refresh anymore
Job positions at NCAR and CU
Full-time, part-time, and occasional

3
Outline

Motivation
My other computer is a
Parallel Computing
Processors
Networks
Storage
Software
Grid Computing
Software
Platforms

4
James Demmels Reasons for HPC

Traditional scientific and engineering paradigm
Do theory or paper design
Perform experiments or build system
Replacing both by numerical experiments
Real phenomena are too complicated to model by
hand
Real experiments are
too hard, e.g., build large wind tunnels
too expensive, e.g., build a throw-away passenger
jet
too slow, e.g., wait for climate or galactic
evolution
too dangerous, e.g., weapons, drug design

5
Time-Critical Simulations
Realtime Computing Model and simulate to predict
The forecast simulation has to be done before the
weather happens.

NCARs time-critical HPC simulations
Mesoscale meterology
Global climate
My favorite Traffic simulations
Require more than a single processor to complete
in a reasonable amount of time

6
Performance Vector vs. Parallel MM5 (1999)
J. Dorband, J. Kouatchou, J. Michalakes, and U.
Ranawake, Implementing MM5 on NASA Goddard Space
Flight Center computing systems a performance
study, 1999.
7
Performance POP 640x768 (2003)
POP on Xeon memory bandwidth limit
M. Woitaszek, M. Oberg, and H. M. Tufo,
Comparing Linux Clusters for the Community
Climate Systems Model, 2003.
8
Realtime Computing by Parallelization
Minneapolis I-494 Highway Simulation (1997)
Traffic Flow Simulation 15.5 miles 17 exit
ramps 19 entry ramps ?t .5 s ?d 100 ft T 24
hr
9
Performance Realtime Computing by Parallelization

Legacy Code and Hardware
Intel P133 single processor
65.7 minutes (3942 seconds)simulation time
Massively Parallel Implementation
Cray T3E 450 MHz Alpha 21164
67.04 seconds with 1 PE 60x faster than P133
6.26 seconds with 16 PEs 629x
2.39 seconds with 256 PEs 1649x
Wait! If it takes 1 minute on 1 PE, shouldnt it
take67 seconds / 16 4.10 s on 16 PEs, or
67 seconds / 256 0.25 s on 256 PEs?

C. Johnston and A. Chronopolus, The
parallelization of a highway traffic flow
simulation, 1999.
10
Speedup and Overhead

Amdahls Law
The part you dont optimize comes back to haunt
you!
Speedup is limited by
Memory latency
Disk I/O bottlenecks
Network bandwidth and latency
Algorithm

11
Performance HOMME on BlueGene/L (2007)
G. Bhanot, J.M. Dennis, J. Edwards, W. Grabowski,
M. Gupta, K. Jordan, R.D. Loft, J. Sexton, A.
St-Cyr, S.J. Thomas, H.M. Tufo, T. Voran, R.
Walkup, and A.A. Wyszogrodski, "Early Experiences
with the 360TF IBM BlueGene/L Platform,"
International Journal of Computational Methods,
2006.
12
Performance HOMME on BlueGene/L (2007)
13
Outline

Motivation
My other computer is a
Parallel Computing
Processors
Networks
Storage
Software
Grid Computing
Software
Platforms

14
Processors
From Gregory Pfisters In Search of Clusters
Source Gregory Pfisters In Search of Clusters
15
Parallel Architectures
CPU

A processor.
RAM
16
Parallel Architectures
A cache-coherent non-uniform distributed shared
memory (ccNUMA) cluster of chip-multiprocessor
(CMP) symmetric multiprocessors (SMP).
CPU
CPU
L1
L1
RAM
Ln
RAM
Ln
L1
L1
CPU
CPU
17
Scalable Parallel Architectures
64 racks in system
32 node cards per rack
32 chips per node card
Two chips per card
Two CPUs per chip

Emerging massively parallel architectures
IBM BlueGene 131072 chips (can be 2x
processors)
Multi-core commodity architectures
AMD Opteron, now Intel

Source IBM
18
Networks

Network types
Message passing (MPI)
File system
Job control
System monitoring
Technologies and Competitors
1Gbps Ethernet and RDMA
10Gbps Ethernet
Fixed Topology (3D Torus, Tree, Scali, etc.)
Switched (Infiniband, Myrinet)

19
Gigabit Ethernet Performance (2006)
RDMA
Legacy
Motherboard

RDMA has highest throughput (switched
configuration)110 MB/s RDMA, 66 MB/s legacy, 45
MB/s motherboard

M. Oberg, H. M. Tufo, T. Voran, and M. Woitaszek,
Evaluation of RDMA Over Ethernet Technology for
Building Cost Effective Linux Clusters, May
2006.
20
RDMA for High-Performance Applications

Single network interface for all communications
RDMA for MPI, DAPL (Direct Access Programming
Library), and Sockets Direct Protocol (SDP)
RDMA bypasses operating system kernel
Legacy interface for standard operating system
TCP/IP

User Space Application
User Space Application
OS Kernel
OS Kernel
RDMA NIC
RDMA NIC

Zero-copy, interrupt-free RDMA for MPI
applications

21
Interconnect Performance (2006)
Atoll Benchmarking Results
Manufacturers Ratings
1.5 Mbps 192 KB/s and 1Gbps 125 MB/s
22
10Gbps Ethernet Performance (2007)

Ethernet approaches 10Gbps (and can be trunked!)
Infiniband (4x) reported at 8Gbps sustainable

23
Storage
Archival Storage Tape silo systems (3 PB)
Supercomputers and local working storage (1 100
TB per system)
Archive Management and disk cache controller
Visualization Systems and local working storage
Grid Gateway GridFTP Servers
Shared Storage Cluster with shared file
system (100 500 TB)
24
Thousands of Disks
25
The Single Server Limitation
Just you
SHARED with others
Aggregate bandwidth decreases with increasing
concurrent use!
26
Cluster File Systems Read Rate (2005)
27
Cluster File Systems Write Rate (2005)
28
Table of Administrator Pain and Agony

Original goal was to fit file system in
environment
File system influences operating system stack
GPFS required a commercial OS and a specific
kernel version
Lustre required a commercial OS and a specific
kernel patch
TerraFS required a custom kernel

29
Bullet Points of Administrator Pain and Agony

Remain responsive even in failure conditions
Filesystem failure should not interrupt standard
UNIX commands used by administrators
ls la /mnt or df should not hang the console
Zombies should respond to kill s 9
Support clean normal and abnormal termination
Support both service start and shutdown commands
Provide an Emergency Stop feature
Cut losses and let the administrators fix things

Never hang Linux reboot command
30
Block-Based Access is Complicated!
Logical access size
Filesystem block size
One file shared for writes on four nodes (cyclic
mapping)
1
6
2
5
3
8
4
7
Logical file view and physical block placement on
two servers
1
2
1
2
1
2
1
2
1
2
1
Server 2
Server 1
Consider the overhead of correlating blocks to
servers (Example Where is the first byte of the
red data stored?)
(Adapted from May, 2001, p. 79)
31
Blue Gene/L Single-Partition Performance (2008)
32
Blue Gene/L Storage Performance (2008)
33
Software

Parallel Execution
MPI
Job Control
Batch queues PBS, Torque/Maui
Libraries
Optimized math routines
BLAS, LAPACK
The next slides show what we tell the users

34
The Batch Queue System

Batch queues control access to compute nodes
Please dont ssh to a node and run programs
Please dont mpirun on the head node itself
People expect to have the whole node for
performance runs!
Resource management
Flags and disables offline nodes (down or
administrative)
Matches job requests to nodes
Reserves nodes preventing oversubscribing
Scheduling
Queue prioritization spreads CPUs among users
Queue limits prevent a single user from hogging
the cluster

35
PBS Queues on CSC Systems

Debugging for HPSC students
Limited to 8 nodes, 10 minutes
Default queue for friendly jobs
Limited to 16 nodes, 24 hours
Queue for large and long running jobs
No resource limit, only 1 running job
Queue for users with special projects approved by
the people in charge

speedq
friendlyq
workq
reservedq
36
PBS Commands Queue Status

What jobs are running?
What jobs are waiting?

matthew_at_hemisphere qstat -a
hemisphere.cs.colorado.edu
Req'd Req'd Elap
Job ID Username Queue Jobname
SessID NDS TSK Memory Time S Time
--------------- -------- -------- ----------
------ --- --- ------ ----- - -----
320102.hemisphe jkihm friendly WL_SIMP_17
7032 1 1 1024mb 2400 R 0952
320103.hemisphe jkihm friendly WL_SIMP_18
7078 1 1 1024mb 2400 R 0952
320355.hemisphe jkihm friendly WL_SIMP_17
4537 1 1 1024mb 2400 R 0818
320388.hemisphe jkihm friendly WL_SIMP_25
-- 1 1 1024mb 2400 Q --
320389.hemisphe jkihm friendly WL_SIMP_25
-- 1 1 1024mb 2400 Q --
320390.hemisphe jkihm friendly WL_SIMP_30
-- 1 1 1024mb 2400 Q --
321397.hemisphe barcelos workq missile
21769 16 32 -- 0112 R 0004

37
Playing Nicely in the Cluster Sandbox

Security considerations
Dont share your account or your files (orw)
Dont put the current directory (.) in your path
Compute time considerations
Dont submit more than 10-60 jobs to PBS at a
time
Dont submit from a shell script without a sleep
1 statement
Storage space considerations
Keep large input and output sets in /quicksand,
not /home
Dont keep large files around forever compress
or delete
Please store your personal media collections
elsewhere

Dont use a password you have ever used anywhere
else!
38
Outline

Motivation
My other computer is a
Parallel Computing
Processors
Networks
Storage
Software
Grid Computing
Software
Platforms

39
Sharing Computing and Data with Grids

Grids link computers together more than a
network!
Networks connect computers
Grids allow distant computers to work on a single
problem
Services look like web servers
HTTP for data transfer
XML Simple Object Access Protocol (SOAP) instead
of HTML
Grid services
Metadata and Discovery Services (WS MDS)
Job execution (WS GRAM)
Data transfer (GridFTP)
Workflow management (thats what we do!)

40
Grid-BGC Carbon Cycle Model
J. Cope, C. Hartsough, S. McCreary, P. Thornton,
H. M. Tufo, N. Wilhelmi, and M. Woitaszek,
Experiences from Simulating the Global Carbon
Cycle in a Grid Computing Environment, 2005.
41
TeraGrid Extensible Terascale Facility
42
A National Research Priority
All figures are in millions.
2000
36
Terascale Computing System PSC
2001
45
Distributed Terascale Facility NCSA, SDSC, ANL,
CalTech
2002
35
Extensible Terascale Facility PSC
2003
150
TeraGrid Extension (10M) Ops IU, Purdue, ORNL,
TACC
2007
65
Track 2 Mid-Range HPC ORNL, TACC, NCAR
2007
208
Track 1 Blue Waters Petascale UIUC / NCSA
http//www.nsf.gov/news/news_summ.jsp?cntn_id1098
50 http//www.nsf.gov/news/news_summ.jsp?cntn_id1
06875
43
A Few TeraGrid Resources
44
Challenges and Definitions

Power consumption
BlueVista 276 kilowatts
Average U.S. home 10.5 kilowatts
Physical space
Whats the difference between a cluster and a
supercomputer?
Price
Number of SMP processors in a compute node
Network used to connect nodes in the cluster

45
(No Transcript)
46
Cluster Administration

Parallel and distributed shells
pdsh
dsh
sudo pdsh w node001-27 /etc/init.d/sshd
restart
Configuration file management
IBM CSM
xCAT
Automated operating system installation

47
Cluster Security

The most important question
Centralized inaccessible logging
Intrusion detection
Custom scripts
Network monitoring difficult at 10Gbps
Desperate measures
Extreme firewalling (but dont depend on it!)
Virtual hosting for services
One-time passwords (RSA SecureID, CryptoCard)

How do you know if youve been compromised?
48
Questions?
Schenks System Administration
April 2008

Matthew Woitaszek
mattheww_at_ucar.edu
Thanks to my CU and NCAR colleagues
Jason Cope, John Dennis, Bobby House,
Rory Kelly, Dustin Leverman, Paul Marshal,
Michael Oberg, Henry Tufo, and Theron Voran

Write a Comment

User Comments (0)

About PowerShow.com

Clusters to Supercomputers - PowerPoint PPT Presentation

Clusters to Supercomputers

Nobody does server-side refresh anymore. Job positions at NCAR and CU ... Don't submit from a shell script without a `sleep 1' statement. Storage space considerations ... – PowerPoint PPT presentation