Title: 7a.1
1Computational Grids
2Computational Problems
- Problems that have lots of computations and
usually lots of data.
3Demand for Computational Speed
- Continual demand for greater computational speed
from a computer system than is currently possible - Areas requiring great computational speed include
numerical modeling and simulation of scientific
and engineering problems. - Computations must be completed within a
reasonable time period.
4Grand Challenge Problems
- One that cannot be solved in a reasonable amount
of time with todays computers. Obviously, an
execution time of 10 years is always
unreasonable. - Examples
- Modeling large DNA structures
- Global weather forecasting
- Modeling motion of astronomical bodies.
5Weather Forecasting
- Atmosphere modeled by dividing it into
3-dimensional cells. - Calculations of each cell repeated many times to
model passage of time.
6Global Weather Forecasting Example
- Suppose global atmosphere divided into cells of
size 1 mile ? 1 mile ? 1 mile to a height of 10
miles - about 5 ? 108 cells. - Suppose each calculation requires 200 floating
point operations. In one time step, 1011 floating
point operations necessary. - To forecast weather over 7 day period using
1-minute intervals, a computer operating at
1Gflops (109 floating point operations/s) takes
106 seconds or over 10 days.
7Modeling Motion of Astronomical Bodies
- Each body attracted to each other body by
gravitational forces. Movement of each body
predicted by calculating total force on each
body. -
- With N bodies, N - 1 forces to calculate for each
body, or N2 calculations. - (N log2 N for an efficient approximate
algorithm.) - After determining new positions of bodies,
calculations repeated.
8- A galaxy might have, say, 1011 stars.
- Even if each calculation done in 1 ms (extremely
optimistic figure), it takes almost a year for
one iteration using the N log2 N algorithm. - 100 years for 100 iterations. Typically require
millions of iterations.
9- Astrophysical N-body simulation by Scott Linssen
(undergraduate UNC-Charlotte student).
10High Performance Computing (HPC)
- Traditionally, achieved by using the multiple
computers together - parallel computing. - Simple idea! -- Using multiple computers (or
processors) simultaneously should be able can
solve the problem faster than a single computer.
11Using multiple computers or processors
- Key concept - dividing problem into parts that
can be computed simultaneously. - Parallel programming - programming a computing
platform consisting of more than one processor or
computer. - Concept very old (50 years).
12High Performance Computing
- Long History
- Multiprocessor system of various types (1950s
onwards) - Supercomputers (1960s-80s)
- Cluster computing (1990s)
- Grid computing (2000s) ??
Maybe, but lets first look at how to achieve HPC.
13Speedup Factor
- ts is execution time on a single processor
- tp is execution time on a multiprocessor.
- S(p) gives increase in speed by using
multiprocessor. - Best sequential algorithm for single processor.
Parallel algorithm usually different.
14Maximum Speedup
- Maximum speedup is usually p with p processors
(linear speedup). - Possible to get superlinear speedup (greater than
p) but usually a specific reason such as - Extra memory in multiprocessor system
- Non-deterministic algorithm
15Maximum Speedup Amdahls law
16- Speedup factor is given by
- This equation is known as Amdahls law
17Speedup against number of processors
- Even with infinite number of processors, max.
speedup limited to 1/f . - Example With only 5 of computation being
serial, max. speedup 20, irrespective of number
of processors.
18Superlinear Speedup Example Searching
- (a) Searching each sub-space sequentially
19- (b) Searching each sub-space in parallel
20- Question
- What is the speed-up now?
21- Worst case for sequential search when solution
found in last sub-space search. Then parallel
version offers greatest benefit, i.e.
22- Least advantage for parallel version when
solution found in first sub-space search of the
sequential search, i.e. - Actual speed-up depends upon which subspace holds
solution but could be extremely large.
23Types of Parallel Computers
- Two principal types
- 1. Single computer containing multiple processors
- main memory is shared, hence called Shared
memory multiprocessor - 2. Multiple computer system
24Conventional Computer
- Consists of a processor executing a program
stored in a (main) memory - Each main memory location located by its address
within a single memory space.
25Shared Memory Multiprocessor
- Extend single processor model - multiple
processors connected to multiple memory modules - Each processor can access any memory module
26- Examples
- Dual Pentiums
- Quad Pentiums
27Programming Shared Memory Multiprocessors
- Threads - programmer decomposes program into
parallel sequences (threads), each being able to
access variables declared outside threads.
Example Pthreads - Use sequential programming language with
preprocessor compiler directives, constructs, or
syntax to declare shared variables and specify
parallelism. Examples OpenMP (an industry
standard), UPC (Unified Parallel C) -- needs
compilers.
28- Parallel programming language with syntax to
express parallelism. Compiler creates executable
code -- not now common. - Use parallelizing compiler to convert regular
sequential language programs into parallel
executable code - also not now common.
29Multiple ComputersMessage-passing multicomputer
- Complete computers connected through and
interconnection network
30Networked Computers as a Computing Platform
- Became a very attractive alternative to expensive
supercomputers and parallel computer systems for
high-performance computing in 1990s. - Several early projects. Notable
- Berkeley NOW (network of workstations)
project. - NASA Beowulf project.
31Key Hardware Advantages
- Very high performance workstations and PCs
readily available at low cost. - Latest processors can easily be incorporated into
the system as they become available.
32Programming Clusters
- Usually based upon explicit message-passing.
- Common approach -- a set of user-level libraries
for message passing. Example - Parallel Virtual Machine (PVM) - late 1980s.
Became very popular in mid 1990s. - Message-Passing Interface (MPI) - standard
defined in 1990s and now dominant.
33Beowulf Clusters
- Name given to a group of interconnected
commodity computers designed to achieve high
performance with low cost. - Typically using commodity interconnects
(high-speed Ethernet). - Typically Linux OS.
- Beowulf comes from name given by NASA Goddard
Space Flight Center cluster project.
34Cluster Interconnects
- Originally fast Ethernet on low cost clusters
- Gigabit Ethernet - easy upgrade path
- More Specialized/Higher Performance
- Myrinet - 2.4 Gbits/sec - disadvantage single
vendor - Infiniband - may be important as Infiniband
interfaces may be integrated on next generation
PCs
35Dedicated cluster with a master node
36WCU Department of Mathematics and CS leo I
cluster(now dismandled)
Being replaced with Pentium IVs and Gigabit
Ethernet.
37Message-Passing Programming using User-level
Message Passing Libraries
- Two primary mechanisms needed
- 1. A method of creating separate processes for
execution on different computers - 2. A method of sending and receiving messages
38Multiple program, multiple data model(MPMD)
39Single Program Multiple Data Model(SPMD)
- Different processes merged into one program.
-
- Control statements select different parts for
each processor to execute. - All executables started together - static process
creation
40Single Program Multiple Data Model(SPMD)
41Multiple Program Multiple Data Model(MPMD)
- Separate programs for each processor.
- One processor executes master process.
- Other processes started from within master
process - dynamic process creation.
42Multiple Program Multiple Data Model(MPMD)
43Point-to-point send and receive routines
Passing a message between processes using send()
and recv() library calls
44Synchronous Message Passing
- Routines that return when message transfer
completed. - Synchronous send routine
- Waits until complete message can be accepted by
the receiving process before sending the message. -
- Synchronous receive routine
- Waits until the message it is expecting arrives.
-
45Synchronous send() and recv() using 3-way protocol
46- Synchronous routines intrinsically perform two
actions - They transfer data and
- They synchronize processes.
47Asynchronous Message Passing
- Do not wait for actions to complete before
returning. - More than one version depending upon semantics
for returning. - Usually require local storage for messages.
- They do not synchronize processes and allow
processes to move forward sooner. Must be used
with care.
48MPI Definitions of Blocking and Non-Blocking
- Blocking - return after their local actions
complete, though message transfer may not have
been completed. - Non-blocking - return immediately.
-
- Assumes data storage not modified by subsequent
statements prior to being used for transfer, and
it is left to the programmer to ensure this.
49How message-passing routines return before
transfer completed
Message buffer needed between source and
destination to hold message
50Asynchronous (blocking) routines changing to
synchronous routines
- Buffers only of finite length and a point could
be reached when send routine held up because all
available buffer space exhausted. - Then, send routine will wait until storage
becomes re-available - i.e then routine behaves
as a synchronous routine.
51Message Tag
- Used to differentiate between different types of
messages being sent. - Message tag is carried within message.
52Message Tag Example
To send a data, x, with message tag 5 from
process, 1, to destination process, 2, and
assign to y
53Wild Card
- If message tag matching not required, wild card
message tag used. - Then, recv() will match with any send().
54Collective Message Passing Routines
- Have routines that send message(s) to a group of
processes or receive message(s) from a group of
processes - Higher efficiency than separate point-to-point
routines although not absolutely necessary.
55Broadcast
Sending same message to all processes concerned
with problem.
56Scatter
Sending each element of an array in root process
to a separate process. Contents of ith location
of array sent to ith process.
57Gather
Having one process collect individual values from
set of processes.
58Reduce
Gather operation combined with arithmetic/logical
operation. Example Values gathered and added
together
59Grid Computing
- A grid is a form of multiple computer system.
- For solving computational problems, it could be
viewed as the next step after cluster computing,
and the same programming techniques used.
Why is this not necessarily true?
60- VERY expensive, sending data across network costs
millions of cycles
- Bandwidth shared with other users
61Computational Strategies
- As a computing platform, a grid favors situations
with absolute minimum communication between
computers. - Next class will look at these strategies and
details of MPI programming.