Title: Presented by: Prof Mark Baker
1Emerging Technologies and IdeasÂ
- Presented by Prof Mark Baker
- ACET, University of Reading Tel 44 118 378
8615 E-mail Mark.Baker_at_computer.org - Web http//acet.rdg.ac.uk/mab
2Overview
- Aims and Objectives of Workshop.
- Emerging Technologies
- Multi-core systems,
- Cloud-based systems,
- Virtualisation,
- Green computing.
- Embracing failure.
3Thanks
- Would like to say thanks for everyone attending
and thanks for Guy Robinson for helping me
organse this event. - Attendees
- ECMWF,
- Met Office,
- Met Dept, University of Reading,
- ACET, University of Reading,
- NAG,
- Cray,
- IBM,
- Leeds,
- Oxford,
- Daresbury, STFC.
4Aims and Objectives of Workshop.
- Meteorological community have a very long history
of developing and using large and complex climate
and weather modelling codes. - The codes used have been developed over many
years and have required hundreds of years of
person effort to achieve the scientific quality
that can be obtained today. - The Fortran programs are based on legacy codes,
and are sensitive to compilers, libraries, and
the execution environment. - Code formulation issues can create inertia to
exploiting new computing technologies.
5Aims and Objectives of Workshop.
- We wish explore the gap between the emerging
technologies and innovations as well as the way
these can possibly be utilised by the
meteorological community. - In particular we want to discuss and debate the
codes that are used to model climate/weather, and
the emerging technologies and innovations - Other issues will emerge from the discussion of
new emerging technologies that may be relevant to
the meteorological community such as robustness
and reliability too.
6Schedule
- 1030 1100 - Introduction and Overview of
Emerging Technologies and Ideas -MAB - 1100 1120 - IFS program developments for future
systems ECMWF - 1120 1150 - Using earth system models on UK
academic computers a study of generic
performance issues Met Dept. - 1150 1200 - Exploring earth system model
performance to anticipate change in emerging
computer architectures Met Dept. - 1200 1210 - METAFOR, Climate Metadata for
Climate Modelling Digital Repositories Met
Dept. - 1210 1230 - If only we could forecast computer
architectures as well as we can forecast the
weather! Met Office - 1230 1300 - Bridging the Gap. We are on the
brink of a new era if only...! IBM - 1300 1400 Lunch
- 1400 1430 - What can HECToR do for me? NAG
- 1430 1500 - Computational and Numerical
Challenges in Climate and Environmental Modelling
ACET - 1500 1510 - Advanced Scalable Algorithms for
Advanced Architectures ACET - 1510 1525 - The HPC-NA Roadmap Activity -
Looking into the Future of Numerical Advanced
Computing Oxford - 1525 1600 - Panel Discussion Session
7Multi-core Systems
- The number, scale and type multi-core systems
available are changing the landscape of
application development. - Parallel computing used to be the preserve of a
relatively small group of highly skilled
programmers - The emergence of multi-core systems means that
parallel programming is becoming a mainstream
activity again! - It is important to ensure that multi-core
machines can be effectively and efficiently
programmed. - Also, there is an issue of how to program large
clusters of multi-core processes where there is
the need for intra and inter communications. - A key part of that strategy is the availability
of high-level tools and libraries to assist
expert and the novice parallel programmers create
successful programs.
8Some Issues
- Can expect 10s-100s of core on processors in the
coming years - Currently vendors seem to believe that
multi-threaded programs will be OK for
programming these systems - Also, vendors seem to think a shared memory
programming model is fine! - For some programs this may be OK, but coming from
a HPC/parallel computing arena there are issues
that threads do not address. - Architecture of multi-core processors are
different use different cache levels to share
data - Consequently will need different strategies for
using different processor types. - Need to create more efficient applications
rather than 20 efficiency, want 90 (part of
the green computing revolution).
9Thoughts
- On large systems you will want to ensure data
locality so you need to have thread affinity. - Want to effectively load-balance applications
running on these systems. - Need to go back and seriously think about
concurrent programming course and fine grain
approaches! - May want to synchronise threads, (all or groups).
- Will want to optimise strategies for using cache,
this may depend on 2nd or 3rd level cache! - Want to overlap communication and computation
too. - Need to consider the limitations of the
architecture when considering communications and
access to the system bus.
10Thoughts
- Need to cope with different operating systems,
ranging from UNIX to Microsoft Windows - May need to work with different programming
paradigms message passing (e.g. MPI), OpenMP,
through to data - parallelism (e.g. UPC and
Fortran) which best suit the applications
needs!! - In the future will need to consider heterogeneous
multi-core processors. - Think about mixed precision programming single
and double precision. - Lighter-weight threads
- Need to schedule program efficiently on large
machines do not want fragmented partitioning
11Multi-Core Some Offerings
- AMD
- Opteron
- Athlon 64
- Turion 64
- Barcelona
- ARM
- MPCore (ARM9 and ARM11)
- Broadcom
- SiByte
- Cradle Technologies
- DSP processor
- Cavium Networks
- Octeon (16 MIPS cores)
- IBM
- Cell (Sony, and Toshiba)
- POWER4,5,6
- Intel
- Core
- Xeon
- Motorola
- Freescale dual-core PowerPC
- Picochip
- DSP devices (300 16-bit processor MIMD cores on
one die) - Parallax - Propeller (eight 32 bit cores)
- HP - PA-RISC
- Raza Microelectronics - XLR (eight MIPS cores)
- Stream Processors
- Storm-1 (2 MIPS CPUs and DSP)
- Sun Microsystems
- UltraSPARC IV,
- IntellaSys - seaForth-24.
12AMD and Intel Multi-core Processors
Quad-Core Clovertown
Core 1
Core 2
Core 3
Core 4
100011
L2
L2
Front-Side Bus
Front-Side Bus
Memory Controller
Northbridge
- AMD Independent L2 caches
- Multiple Memory modules,
- Communication over Point-to-
- Point HyperTransport Channels.
- Intel Shared L2 caches
- Single Memory,
- Communication over Front- side Buses.
13Cell/B.E. Architecture
- Power Processing Element (PPE)
- Eight Processing Elements (SPE)
- 4 SIMD ALUs
- DMA Engines
- 256 kB Local Storage (LS)
- System Memory
- 25 GB/s
- Element Interconnect Bus (EIB)
- Over 200 GB/s
14GPUs - GeForce 8800
- 16 threaded SMs, gt128 FPUs, 367 GFLOPS, 768 MB
DRAM, 86.4 GB/S Mem BW, 4GB/S BW to CPU
15Multi-core Issues to Solve
- Communications, synchronsation, resource
management between/among cores. - Debugging connectivity and synchronisation.
- Distributed power management.
- Concurrent programming.
- OS virtualisation.
- Modelling and simulation.
- Load balancing.
- Algorithm partitioning.
- Performance analysis.
16Reason for the crisis
- Memory bandwidth is a major limitation with
multi-core systems - All cores must share access to the same memory
and therefore have to take turns to read from or
write it. - Disk speed can throttle performance
- If a disk can read only 40 MBytes/sec, then a
program that needs to read a 400 MByte file will
take a minimum of 10 seconds to open it, even if
it has no computation to do. Software and
scalability - Software today is frequently not written to take
advantage of multiple cores efficiently - It is technically challenging to
write efficient code that is free of bugs a
highly scalable program is difficult to build and
requires a lot of expertise.
17There is MPI style messaging and ..
- OpenMP annotation or Automatic Parallelism of
existing software is practical way to use those
cores with existing code - As parallelism is typically not expressed
precisely, one needs skill to get good
performance - Writing in Fortran, C, C, Java throws away
information about parallelism - HPCS Languages should be able to properly express
parallelism but we do not know how efficient and
reliable compilers will be - High Performance Fortran failed as language
expressed a subset of parallelism and compilers
did not give predictable performance.
18There is MPI style messaging and ..
- PGAS (Partitioned Global Address Space) like UPC,
Co-array Fortran, Titanium, HPJava - One decomposes application into parts and writes
the code for each component but use some form of
global index. - Compiler generates synchronisation and messaging.
- PGAS approach should work but has never been
widely used presumably because compilers not
mature.
19What are Clouds?
- Clouds are Virtual Clusters
- They may cross administrative domains or may
just be a single cluster the user cannot and
does not want to know! - Clouds support access (lease of) computer
instances - Instances accept data and job descriptions (code)
and return results that are data and status
flags. - When does Cloud concept work
- Parameter searches, LHC style data analysis, Face
Book
20What makes a Cloud?
- Virtual Machines.
- VM Manager
- Scalability.
- File system Infrastructure.
- Remote access (portal).
- Cost?
- Fairly low cost for CPU/data movement.
- Security?
Key Parts of Cloud Definition
21Cloud Architecture
Source S.Tai
22Some Commercial Cloud Offerings
- Problem Commercial offerings are proprietary and
usually not open for cloud systems research and
development
23Cloud Computing Service Layers
Description
Services
Services Complete business services such as
PayPal, OpenID, OAuth, Google Maps, Alexa
Application Focused
Application Cloud based software that
eliminates the need for local installation such
as Google Apps, Microsoft Online
Development Software development platforms used
to build custom cloud based applications (PAAS
SAAS) such as SalesForce
Platform Cloud based platforms, typically
provided using virtualization, such as Amazon
ECC, Sun Grid
Storage Data storage or cloud based NAS such
as CTERA, iDisk, CloudNAS
Infrastructure Focused
Hosting Physical data centers such as those run
by IBM, HP, NaviSite, etc.
24Technical Issues about Clouds
- Using VMs.
- No common standards APIs often different on each
system. - Protocols Web Services and RESTful.
- Proprietary workflow systems.
- Database systems.
- Not clear if MPI/OpenMP can be run!
- Multiple VMs on a processor.
- Code development, debugging and optimisation!
- Security.
- Pricing/costs
25Virtualization in General
- Virtual machines run in software that emulates
computer hardware - Host machine hardware running the virtual
machine software, - Host operating system operating system running
the virtual machine software, - Hypervisor slimmed down host operating system
that virtualises the physical hardware, - Guest system operating system.
- Examples of Virtual Machines
- VMware,
- Microsoft Virtual PC and Microsoft Virtual
Server, - Parallels Workstation,
- Sun xVM,
- Kernel-based Virtual Machine (KVM),
- Xen (Opensource),
26Virtual Machines
- VM technology allows multiple virtual machines to
run on a single physical machine.
App
App
App
App
App
Xen
Guest OS (Linux)
Guest OS (NetBSD)
Guest OS (Windows)
VMWare
UML
Virtual Machine Monitor (VMM) / Hypervisor
Denali
Hardware
etc.
Performance Paravirtualization (e.g. Xen) is
very close to raw physical performance
27Virtual Machines
- Multiple virtual machine instances on a single
physical host - Fault tolerance,
- Isolated OS instances,
- Virtual servers.
- Use emulation to support different instruction
set architectures such as Intel IA-32 and
PowerPC. - Support novel architectures.
- Support for high-level language virtual machines
(Java).
28Virtualization in General
- Advantages of virtual machines
- Run operating systems where the physical hardware
is unavailable - Easier to create new machines, backup machines,
etc. - Software testing using clean installs of
operating systems and software - Emulate more machines than are physically
available - Timeshare lightly loaded systems on one host
- Debug problems (suspend and resume the problem
machine) - Easy migration of virtual machines (shutdown
needed or not)
29Workload Consolidation pros/cons
- Pros
- Each application can run in a separate
environment delivering true isolation, - Cost Savings Power, space, cooling, hardware,
software and management, - Ability to run legacy applications in legacy OSs,
- Ability to run through emulation legacy
applications in legacy HW. - Cons
- Disk and memory footprint increase due to
multiples OSs, - Performance penalty caused by resource sharing
management. - Workload consolidation provides the basis most
usages/benefits of virtualization
30VMS
- VM Use
- Workload isolation,
- Workload migration,
- Workload migration for dynamic load balancing,
- Workload migration for disaster recovery,
- Control of VM resources,
- Legacy OSs
- Green computing,
31Green Computing
- Why?
- Computer energy is often wasteful
- Leaving the computer on when not in use (CPU and
fan consume power, screen savers consume power). - Pollution
- Manufacturing techniques,
- Packaging,
- Disposal of computers and components.
- Toxicity
- Toxic chemicals used in the manufacturing of
computers and components which can enter the food
chain and water!
32Energy Use of PCs
- CPU uses 120 Watts.
- CRT uses 150 Watts.
- 8 hours of usage, 5 days a week 562 Kwatts
- If the computer is left on all the time without
proper power saver modes, this can lead to 1,600
Kwatts. - For a large institution, say a university of
40,000 students and faculty, the power bill for
just computers can come to 1.5 million / year - Energy use comes from
- Electrical current to run the CPU, motherboard,
and memory, - Running the fan and spinning the disk(s),
- Monitor (CRTs consume more power than any other
computer component).
33Reducing Energy Consumption
- Turn off the computer when not in use, even if
just for an hour. - Turn off the monitor when not in use (as opposed
to running a screen saver). - Use power saver mode
- In power saver mode, the top item is not
necessary, but screen savers use as much
electricity as any normal processing, and the
screen saver is not necessary on a flat panel
display. - Use hardware/software with the Energy Star label
- Energy Star is a seal of approval by the Energy
Star organization of the government (the EPA) - Use LCDs instead of CRTs as they are more power
efficient.
34Embracing Failure
- As more systems encompass ever-increasing numbers
of components, even a small fault rate on
individual processors will generate multiple
faults across the components, stopping
long-running applications in their tracks. - Weather/Climate Modelling
- Outlive Complexity
- Increasingly sophisticated models,
- Model coupling,
- Inter-disciplinary.
- Sustained Performance
- Increasingly complex algorithms,
- Increasingly diverse architectures,
- Increasingly demanding applications.
35Embracing Failure
- 1000s of core,
- 2 Gbytes Memory per core,
- Caching problems,
- File systems,
- Networks,
- Operating Systems,
- Middleware (MPI/OpenMP),
- Applications algorithms.
36Key Challenges
- How do you know if there were any transient or
permanent failures of the hardware or system
software that invalidate the computation?
(Detect) - Can a middleware library implement the
functionality of the computational workflow model
isolating the computational software from the
transient and permanent failure management?
(Isolate/Contain) - In the case of permanent failures, how does one
remap the computation to the remaining available
resources? Does the application programmer have
to do this? (Recover)
37Summary
- Multi-core systems need to understand
underlying hardware, and be able to efficiently
program clusters of multi-core processors. - Cloud-based systems becoming increasing popular
Grid is dead!. - Flexible utility-based platform, that has a range
of interesting and innovative properties. - Virtualisation being using in Clouds, but also
to provide management and control of machines. - Green computing becoming important, and a key
feature of FP7. - Embracing failure machines are becoming bigger,
more sophisticated and so failure is increasingly
like must check out ways of creating reliable
applications!