N N Meeting October 3 2003 Exploiting Terascale Supercomputers Experiences from HPCx

1 / 19
About This Presentation
Title:

N N Meeting October 3 2003 Exploiting Terascale Supercomputers Experiences from HPCx

Description:

Consortium of leading UK organisations committed to creating ... on almost all systems, a 'Grim Reaper' must be run by hand! eg I am running 128 processes ... –

Number of Views:65
Avg rating:3.0/5.0
Slides: 20
Provided by: david218
Category:

less

Transcript and Presenter's Notes

Title: N N Meeting October 3 2003 Exploiting Terascale Supercomputers Experiences from HPCx


1
NN MeetingOctober 3 2003Exploiting Terascale
Supercomputers Experiences from HPCx
  • David Henty
  • d.henty_at_epcc.ed.ac.uk

2
Overview
  • HPCx
  • the machine
  • the consortium
  • Usability issues
  • a brief summary
  • HPCx, EPCC and the Grid
  • current activities
  • DEISA
  • Distributed Infrastructure for Supercomputing
    Applications
  • an EC-funded pan-European Grid testbed proposal

3
What is HPCx?
  • Consortium of leading UK organisations committed
    to creating and managing the new UK HPC resource
    for the next 6 years

Multi-stage project to deliver a world class
academic computing resource, the largest in
Europe, with ultimate peak performance of 22
TFlop/s
  • 50M/70M budget from EPSRC
  • Grid-enabled, a key component in the UK e-Science
    program

4
The HPCx Consortium Members
  • University of Edinburgh
  • Edinburgh Parallel Computing Centre
  • Central Laboratory of the Research Councils
    Daresbury Laboratory
  • IBM

5
University of Edinburgh
  • Lead contractor of the HPCx Consortium
  • International centre of academic excellence
  • One of the largest and most successful research
    universities in the UK
  • Partner in the National e-Science Centre (NeSC)

6
The HPCx Consortium
7
The HPCx Consortium EPCC
8
  • Leading computer centre in Europe, bridging the
    gap between academia and industry
  • Self-funding, in existence for over 10 years
  • Provides both HPC and novel computing solutions
    to a wide range of problems and users
  • Long experience of providing national HPC
    services including
  • Meiko Computing Surfaces
  • Thinking Machines CM200
  • Cray T3D/T3E (1994 to 2001)

9
Daresbury Laboratory
10
Daresbury Laboratory
  • A multi disciplinary research lab with over 500
    people
  • Provides large-scale research facilities both for
    UK academic and industrial research communities
  • Daresbury hosts and maintains the hardware for
    the HPCx system

11
  • IBM will provide the technology for HPCx
  • Long standing involvement in HPC including the
    development of a number of ASCI machines and 5 of
    the top 10 machines in the 6/2002 TOP500 list
  • No 2 ASCI White Rmax 7.2 TFlop/s
  • No 5 SP Power3 (3328 Processors) Rmax 3.0
    TFlop/s
  • No 8 pSeries 690 (864 Processors) Rmax 2.3
    TFlop/s
  • IBM has the long term technology road map
    essential to a 6 year project such as HPCx

12
HPCx Operational Phases
  • System will be commissioned in three main stages
    with phase 1 covering 2002-2004
  • phase 1 December 2002, performance 3 Tflops
    Linpack
  • phase 2 June 2004 6 Tflops
  • phase 3 June 2006 12 Tflops
  • Focussed on Capability jobs
  • using at least 50 of the CPU resource
  • target is for half of the jobs to be capability
    jobs

13
Usability Issues (i)
  • Note these are NOT specific to HPCx or IBM!
  • Batch systems
  • not deeply integrated with the OS
  • incompatibility between systems
  • lack of useful information to the user
  • Real time limits
  • seem to be completely alien to UNIX
  • accounting and charging therefore done by hacks

14
Usability (ii)
  • Operating Systems
  • written for multi-user, general-purpose systems
  • desktop users work with the OS
  • HPC users spend their whole lives fighting it
  • we liked the Cray T3D because it DIDNT HAVE an
    OS!
  • modern OSs are far too relaxed and sloppy
  • eg runaway processes just run and run at 100 CPU
  • ... on almost all systems, a Grim Reaper must
    be run by hand!
  • eg I am running 128 processes
  • ... is it a single MPI job? multiple MPI jobs?
    mixed MPI/OpenMP?
  • spawn 100s of threads for tasks that arent
    needed for HPC
  • IO
  • MPI-IO is nice but I dont see people using it
  • usually develop bespoke solutions which dont
    port well

15
Usability (iii)
  • What about accounting?
  • Users have to buy CPU time (at least in the UK!)
  • and be charged for it
  • in a common currency
  • almost zero support for users or administrators
    to control resource allocation to projects,
    groups and users
  • can be tape, disk, CPU, memory etc
  • HPC centres have to develop their own software
  • we wrote an application from scratch for HPCx
  • If these things are hard on a parallel machine,
    just think how hard they will be on the Grid!

16
Grid Actvities
  • HPCx
  • available over the Grid via Globus 2
  • issues due to back-end CPUs being on a private
    network
  • EPCC
  • part of Globus Alliance along with Argonne, ISI
    and PDC
  • planning the direction of the Globus toolkit
  • many e-Science projects, collaborations with
    NeSC, etc.
  • DEISA
  • a 5-year project in the pipeline, under
    negotiation with EC
  • 9 partners in 7 countries, requested budget
    around 14M

17
DEISA Vision
18
DEISA Overview
  • A bottom-up approach to an EU Grid
  • most of the sites have IBM hardware (a
    coincidence in time)
  • a US TeraGrid on the cheap with little (initial)
    hardware
  • using the best available commodity software
  • Major focus is shared file system
  • initially extending GPFS
  • also investigate other technologies (AFS, Avaki,
    ...)
  • EPCCs involvement
  • ensure HPCx is integrated
  • develop a Cosmology application demonstrator
  • develop OGSA middleware to enhance heterogeneity

19
Simulation by the OCCAM group
Write a Comment
User Comments (0)
About PowerShow.com