Irish Centre for High-End - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

Irish Centre for High-End

Description:

Engineering software CFX, Abaqus, Marc, Patran Computational chemistry / materials sciences / life sciences Amber, (MPI-)BLAST, Charmm, CPMD, Crystal, DLPoly ... – PowerPoint PPT presentation

Number of Views:208
Avg rating:3.0/5.0
Slides: 24
Provided by: Gerard140
Category:
Tags: centre | end | high | irish | patran

less

Transcript and Presenter's Notes

Title: Irish Centre for High-End


1
Irish Centre for High-End Computing Bringing
capability computing to the Irish research
community
Andy Shearer Director ICHEC
2
Why ICHEC?
  • Ireland lags well behind the rest of Europe in
    terms of installed HEC capacity
  • See www.arcade-eu.info/academicsupercomputing/comp
    arison.html
  • Ireland now has an installed capacity of about
    6000 Gflops/s
  • Ireland has about15-20 systems above 64 Gflops/s
    sustained

3
Why ICHEC?
  • Ireland - lt2500 processors
  • But now about 1500 Kflops/s per capita

4
High-End Computing in Ireland
DFTU spin density. Michael Nolan1, Dean C.
Sayle2, Stephen C. Parker3 and Graeme W.
Watson1(1) TCD, (2) Cranfield University, (3)
University of Bath
Marine Modelling Centre, NUI, Galway
G.Murphy, et al, , DIAS/CosmoGrid
Llyod D., et al, Dept. of Biochemistry, TCD
5
Context and Motivation
  • Irelands ability to perform internationally
    competitive research and to attract the best
    computational scientists is currently hindered
    due to a lack of high end computational resources
  • Ireland has the research demands to justify a
    large-scale national facility
  • The Irish Centre for High-End Computing (ICHEC)
  • SFI-funded (2.6M) with in collaboration with the
    PRTLI project CosmoGrid (700k), plus links to
    the TCD/IITAC cluster. CosmoGrid - Grid-Enabled
    Computational Physics of Natural Phenomena.
  • Initially funded for one year, with proposal to
    extend to a minimum of three years

6
Participating Institutions
National University of Ireland, Galway Dublin
City University Dublin Institute for Advanced
Studies National University of Ireland,
Maynooth Trinity College Dublin Tyndall
Institute University College Cork University
College Dublin HEAnet
7
ICHECs objectives
  • Support world-class research programmes within
    Ireland
  • Undertake collaborative research programmes
  • Provide access to HEC facilities to students and
    researchers in Ireland
  • Provide HPC training
  • Increase the application of HEC and GRID
    technologies
  • Encourage and publicise activities conducive to
    establishing and sustaining a world-class HEC
    facility in Ireland
  • Foster collaboration with similar internationally
    recognised organisations

8
ICHECs Roadmap
Phase I (2005) examine the challenges
associated with the management and operations of
a distributed national HEC centre ? Hire key
personnel, set up infrastructure and policies,
gather user requirements, Phase II (2006)
bring the personnel required in to support such a
system and develop large-scale new science ?
Fully utilise resources, deploy points of
presence, participate in community-led
initiatives, explore prospects for technology
transfer activities, Phase III (2007/8) ICHEC
fully established as a national large-scale
facility
9
ICHEC Hardware Resources
  • We have a diverse user group so we require
    heterogeneous resources.
  • Our tender looked for
  • Shared Memory System (20)
  • Cluster (60)
  • Shared File system (20)
  • We were also open to imaginative solutions - as
    long as they were within our budget
  • Acceptable responses from Bull, IBM,
    Clustervision, HP, Sun.
  • Other vendors did not read the tender documents
    and/or deliver on time!

and of course if we do this again we would do it
differently - and so I would hope would the
manufacturers
10
Shared memory system
  • Bull NovaScale NS6320
  • 32 Itanium 2 _at_1.5GHz
  • 256GB RAM
  • 2.1Tb storage
  • 192 GFlops/s peak
  • 166GFlops/s LINPACK
  • Bull Novascale NS4040
  • front-end functionalities
  • 4 Itanium 2 _at_1.5GHz

11
Distributed memory cluster and filesystem
  • IBM cluster 1350
  • 474 e326 servers
  • Dual Opterons _at_2.4GHz each
  • 2.2Tb distributed memory
  • Mix of (410) 4Gb and (64) 8Gb nodes
  • 20Tb storage
  • SAN IBM DS4500 using GPFS
  • 4.55 TFlops/s peak, 2.8Tflops/s
  • Will feature in the upper end of the Top500
  • 15 additional nodes
  • front-end functionalities and storage/cluster
    management
  • Interconnect initially Gigabit Ethernet

12
ICHEC Machine Room
  • Installation on schedule
  • Acceptance tests mainly completed
  • Transitional Service opened 1st September

13
Software
  • Establishing user requirements
  • discussions with user groups
  • Limited software budget
  • Prices for National Centres higher than for
    academic research groups
  • NB not possible to purchase all packages on
    peoples wish list
  • We had pressure to purchase many packages with
    overlapping functionalities need to rationalise
  • Preference given to packages with best
    scalability/performance
  • Will work with research communities to reduce
    this list

14
Software (cont.)
  • Currently shortlisted 30 packages
  • Current list
  • Operating systems
  • SUSE Linux Enterprise 9, Bull Linux (based on Red
    Hat Enterprise)
  • Programming environments
  • Intel compilers, Portland Group Cluster
    Development Kit, DDT
  • Pre-/Post-processing, data visualisation
  • IDL, Matlab, OpenDX, VTK
  • Libraries and utilities
  • FFTW, Scalapack, PetSC, (Par)Metis

15
Software (cont.)
  • Engineering software
  • CFX, Abaqus, Marc, Patran
  • Computational chemistry / materials sciences /
    life sciences
  • Amber, (MPI-)BLAST, Charmm, CPMD, Crystal,
    DLPoly, Gamess UK, Gaussian03, Gromacs, MOE,
    Molpro, NAMD, NWchem, Siesta, Turbomole, VASP
  • Environmental sciences
  • POM
  • Expensive packages (Accelerys, etc.) not
    initially available
  • defer decision until Phase 2 budget approved

16
Scheduling Policies
  • Transitional Period, cluster
  • Day regime week days, 10am 6pm
  • Night regime week days, 6pm 10am
  • Week-end regime Friday 6pm Monday 10am
  • Tuning pending
  • Scalability studies of key applications
  • User feedback
  • Gaussian users expected to use the Production
    region / HiMem
  • Allow acceptable turn-around time on the Bull for
    FEM users

17
Scheduling Policies
  • Transitional Period, shared-memory system
  • Day/night/week-end regimes same as cluster
  • Feedback from user community is important to help
    us tune policies for the start of the full
    National Service
  • Need for checkpoint/re-start capability for long
    runs
  • Fair share mechanism enforced by Maui
  • Mechanism to jump the queue for important
    deadlines

18
Supporting the research community
  • ICHEC will be set up as a distributed national
    centre
  • The Centre will be an equally important mix of
    Hardware and People
  • Large groups in Dublin, Galway and Cork
  • Points of presence in each of the participating
    institutions (Phase 2)
  • Support will be essential
  • Plans are to recruit applications scientists in
    scientific areas
  • engineering (FEM/CFD/LB)
  • life sciences (bioinformatics, drug design)
  • physics /chemistry (soft and hard condensed
    matter)
  • environmental sciences (geology/geophysics,
    oceanography, meteorology/climatology)
  • applied maths
  • HPC

19
Supporting the research community (cont.)
  • The face of the Centre will be the support
    scientists
  • Serve their research community, rather than
    providing free staff to individual groups
  • Code development (community-led initiative)
  • Make the best use of the infrastructure
    (parallelise/optimise)
  • Development of specialised material (course,
    documentation)
  • Technical support for Grand Challenge projects
  • Persistent communication channel with ICHEC
  • Key to our success will be a close and sustained
    collaboration with the research community

20
Training activities
  • Training is seen as a way of increasing
    performance and efficiency of the machines - our
    programme reflects the relative youth of
    Irelands HEC activity.
  • ICHEC has developed courses - the first was
    delivered to UCD in October.
  • An introduction to High-Performance Computing
    (HPC)
  • What is HPC, HPC as a tool for furthering
    research, etc.
  • Overview of current hardware architectures
  • Overview of common programming paradigms
  • Decomposition strategies
  • An introduction to the ICHEC national service
  • Parallel programming with MPI an introduction
  • Parallel programming with OpenMP an introduction
  • Specialised material to be developed in 2006,
    e.g.
  • HPC for engineers, HPC in bioinformatics,
    parallel linear algebra, etc.

21
Now
  • Service started on the 1st September (Bull)
  • 21st September (IBM)
  • Have 60 approved projects plus 17 from CosmoGrid
  • 38 Physical Sciences
  • 22 Engineering
  • 25 Life Sciences
  • 7 Environmental Sciences
  • 7 Other (Maths, Computer Sciences, Humanities)
  • At least three times as many applications will be
    submitted in 3-6 months
  • Running at full capacity by the end of the year

22
Problems
  • Getting the community up to speed on
  • Batch submission
  • Moving from serial to parallel computing
  • Licences?
  • National software agreements
  • Firewalls and security
  • Having a miriad of security and firewall
    standards for HEAnet is insanity
  • DCU and TCD do not accept zip files but have
    different ways around this
  • Proxy servers for ssh is different on each site
  • Grid access is a nightmare .
  • A common approach via HEAnet would help
  • The last mile problem
  • Our access is fine, redundant Gigabit but the
    universities throttle this back

23
The future enhanced hardware
  • Technology/Architecture?
  • Massive clusters bring their own problems of
    reliability
  • Large scale SMP vs tightly bound clusters of
    fat nodes
  • Novel architecture - IBMs BlueGene or FPGAs -
    others
  • Software?
  • Massive clusters bring massive problems
  • Scalability
  • Fault tolerance - how to do a check point restart
    on 10000 nodes?
  • Licences?
  • How best to licence software on a 10000 node
    cluster?
  • How to benchmark?
  • In 2006/7 we will be looking for original
    solutions to these problems
Write a Comment
User Comments (0)
About PowerShow.com