Technologies for the Future: CLUSTERS presentation

About This Presentation

Transcript and Presenter's Notes

Title: Technologies for the Future: CLUSTERS

1
Technologies for the Future CLUSTERS

Anne C. Elster
Dept. of Computer Information Science (IDI)
Norwegian Univ. of Science Tech. (NTNU)
Trondheim, Norway

NOTUR 2003
2
Clusters (Networks of PCs/Workstation)

Are they suitable for HPC?
Advantage
Cost-effective hardware since uses COTS
(Commercial Of-The-Shelf) parts
BUT
Typically much slower processor interconnectes
than traditional HPC systems
What about usability?

NTNU IDIs 40-node AMD 1.46GHz cluster 2GB RAM,
40GB disk, Fast Ethernet
3
Cluster TechnologiesNOTUR Emerging Technology
projectCollaboration between NTNU Univ. of
Tromsø

Goal
Analyze Cluster technologies suitability for HPC
by looking at some of the most interesting NOTUR
applications
The results will provide a foundation for
decisions regarding future HPC programs

4
Main Collaborators include

Anne C. Elster (IDI, NTNU) Project leader
Otto Anshus Tore Larsen (CS, U of Tromsø)
Tor Johansen staff (CC, U of Tromsø)
Torbjørn Hallgren (IDI, NTNU)
Einar Rønquist (IMF, NTNU)
Master Ph.D. Students and Post Docs at NTNU and
Univ. of Tromsø

5
General Issues to Consider

Why cluster vs. Powerful desktop vs. Large SMPs?
What are the total costs associated with clusters
(hardware, software, support, usability)
32-bit vs. 64-bit architectures

6
Cluster Project ACTIVITIES

A.1 Profiling Tuning Selected Applications
A.1.a/b Physics and Chemistry Codes
(Elster students, Dept. of Computer Science
Dept., NTNU)
A.1.2a Profiling User-Analysis of Amber, Dalton
Gaussian
(Tor Johansen staff, Comp. Center, U of
Tromsø)
A.1.2b Optimization tool analysis of Dalton
(Anshus PostDoc/student, Dept. of Comp. Sci.,
U of Tromsø)

7
Cluster Project ACTIVITIES continuted

A.2 Execution Monitoring
(Anshus, Tore Larsen students, CS, U of T)
A.3 Visualization servers, etc.
(Hallgren, Elster students, CS, NTNU)
A.4 Impact of future numerical algorithms
(Rønquist student, Dept. of Mathematics, NTNU
A.5 Interface with NOTUR ET Grid Project
(Elster, Harald Simonsen and colleagues, staff
students associated with the NOTUR ET Cluster
Grid projects)

8
A.1.a/b Physics Chemistry Codes (Elster
students, Dept. of CS Dept., NTNU)
Lessons Learned so far -- Paul Sacks work on a
Physics application (report available on the
Web)

FORTRAN problems
Different FORTRAN implementations have
non-stardard add-ons (e.g. FORTRAN 90)
Leads to great difficulty in porting code to a
different platform with a different Fortran
compiler (e.g. by a different vendor)

9
A.1.a/b Physics Chemistry Codes contin.

Performance of programs can individually vary on
different machines
Åsmund Østvold wrote a proj. report on
porting PROTOMOL from an SMP w/ MPI one-siden
communication primitives (MPI put/get) to a
cluster. (available on WWW)
He also did a MS study with SCALI on various
MPI broadcast algorithms and bechmarking

10
A.1.a/b Physics Chemistry Codes contin.2

Ongoing work with Snorre Boasson Jan Christian
Meyer on porting of PIC code using Pthread (SMP
primitives) to MPI .
Preliminary report will be available later this
week.
Recent Trends in Cluster Computing presented at
ParCo 2003 by Elster et. al. includes harware
trends and survey of libraries and performance
tools.

11
A.1.2a Profiling User-Analysis of Amber, Dalton
Gaussian (Tor Johansen staff, Comp.
Center, U of Tromsø)

Koordineringsarbeide
Reise NOTUR 2003
Porting og testing av Amber og Scali SW

12
A.1.2b Optimization tool analysis of
Dalton(Anshus PostDoc/students, CS, U of
Tromsø)

Ytelsesmålinger gjort på DALTON
A Report for the NOTUR Project Emerging
Technologies Cluster
Daniel Stødle, Otto J. Anshus, John Markus
Bjørndalen
Survey of optimizing techniques for parallel
programs running on computer clusters
Espen S. Johnsen, Otto J. Anshus, John Markus
Bjørndalen, Lars Ailo Bongo (September 29, 2003)

13
A.1.2b Optimization tool analysis of Dalton
(Anshus PostDoc/student, IFI, U i Tromsø)
CONTINUED

RESULTS
Dalton scales pretty well 25x speedup on 32
nodes
NOTE Only with-out caching temp. If use cache
only 3-5x speedup on 32!
Even through the 8-way cluster had no local disk
(only a netork file system), the sequential
Dalton code was significantly faster.
This indicates that network bandwith may not
be a problem if caching is used in the parallel
Communication pattern master-slave
"bag-of-tasks" oriented programs with little
communicaiton sychronization and generally good
utilization of the slave nodes.
Master does relatively little work and is blocked
most of the time
Finally checked if the master node could be a
bottle neck, but could not detect differences in
execution time when Master put on a slow node vs.
a fast node.. NOTE Only tested up to 32 nodes
using larger no. of nodes may limit performance
by overloading the master node.

14
A.1.2b Optimization tool analysis of Dalton
(Anshus PostDoc/student, IFI, U i Tromsø)
CONTINUED 2

Thanks to
Kenneth Ruud, Chemistry, UiT
Roy Dragseth, CC UiT for support on the Itanium
at U og Tromsø.

15
A.2 Execution Monitoring (Anshus, Tore Larsen
students, CS, U of T)

Survey of execution monitoring tools for
computer clusters
Espen S. Johnsen, Otto J. Anshus, John Markus
Bjørndalen, Lars Ailo Bongo, Sept 03
Performance Monitoring
Lars Ailo Bongo, Otto J. Anshus, John Markus
Bjørndalen

16
A.3 Visualization servers, etc. (Hallgren,
Elster students, CS, NTNU)

On going work with Torbjørn Vik
Preliminary report on survey of how clusters are
currently used in visualization
To types of Cluster usages
off-line (non-real-time rendering). Often called
"renderingfarms" with lots of nodes which all
work on a frame each of a larger animation.
Typically used in the film industry and other
areas where interactivity and/or real-time
rendering not needed.
All larger 3D modelling programs such as
Lightwave, 3DStudio, Maya has functionality for
this.
on-line ( realtime). Most interesting from a
technical viewpoint...

17
A.3 Visualization servers, etc. - Contin.

Cluster brukes innenfor interaktiv
visualiseringsprogramvare for å
øke ytelsen,
muliggjøre større datasett,
unngå begrensninger i lokal hardware.
De fleste visualiseringscluster fungerer
prinsipielt ved at en bruker sitter på en
klientmaskin som i seg selv ikke har noe særlig
kapasitet. Clusteret tar seg av all beregning og
sender bare de ferdige bildene til klienten.
Klientmaskinen sørger også for å ta imot input
fra bruker og sende disse til cluster. Datasett
for slik visualisering er ofte svært store, og,
avhengig av situasjonen, brukes både
polygonbasert og voxelbasert rendering.
Hovedproblemet med å få clusters brukbare
innenfor interaktive visualiseringsprogram er
forsinkelser pga nettverk. Dette løses ved å
redusere tiden som brukes for å overføre bilder
mellom cluster og klient. Det kan enten løses ved
å
redusere datamengden (komprimeringsmetoder) eller
øke nettverksytelsen. Eller begge.
Parallelitet i selve clusteret baseres på
uavhengighetsforhold mellom forskjellige data.
Det kan være uavhengigheter mellom forskjellige
deler i samme datasett, eller det kan være
uavhengigheter mellom forskjellige frames i et 4D
datasett.
Load-balancing blir ofte et problem i slike
sammenhenger og er et viktig forskningsområde.
Hvilken metode som brukes for load-balancing er
som oftest svært kontekstavhengig.
Clusterprogramvare for visualisering fremdeles
manglende ??

18
A.4 Impact of future numerical algorithms (Rønqui
st student, Dept. of Mathematics, NTNU

Rønquist student Staff (now at Simulasenteret)
wrote a report based on his summer jobb
May add in experiences from Elsters group fall
2003

19
A.5 Interface with NOTUR ET Grid
Project (Elster, Harald Simonsen and colleagues,
staff students associated with the NOTUR ET
Cluster Grid projects)

Test node established at NTNU
Andreas Botnen(USIT) and
Robin Holtet (IDI, now ITEA)
May use IDIs 30-40-node cluster in testgrid
Meetings
Between Elster and Simonsens groups
Robin Holtet and Elsters student Thorvald Natvig
to Linköping meeting this month.
Collaborations re. National GRID and EEGE
Student from NTNU and UiO at CERN

20
Main cluster issues

Global operations have more severe impact on
cluster performance than traditional
supercomputers since communication between
processors take relatively more of the total
execution time
SCALABILITY!!

21
Lessons leared

Clusters generally have cheap hardware, but may
cause increased hidden costs regarding
More incompatible compilers, especially Fortran
90 (also C)
Some applications are non-trivial to port from a
share-memory paradigm to a distributed memory
paradigms
Some applications require high-bandwidth
interconnects which drive up costs (e.g. SGI
Altix)
Power and cooling costs (ref. Brian Vinter)
Stability, recovery
Overall costs and scalability should be further
studied

22
The Ideal Cluster -- Hardware

High-bandwidth network
Low-latency network
Low Operating System overhead (tcp causes slow
start)
Great floating-point performance
(64-bit processors or more?)

23
The Ideal Cluster -- Software

Compiler that is
Portable
Optimizing
Do extra work to save communication
Self-tuning /Load -balanced
Automatic selection of best algorithm
One-sided communication support?
Optimized middleware

24
For more information

A dozen or more reports associated with this
project will be made available on the web at
http//www.idi.ntnu.no/elster
Email elster_at_idi.ntnu.no

Write a Comment

User Comments (0)

About PowerShow.com

Technologies for the Future: CLUSTERS PowerPoint PPT Presentation