Parallel Computers

About This Presentation

Title:

Parallel Computers

Description:

... of embedded and server markets driving microprocessors in addition to desktops ... Diameter The maximum distance between two processors in the computer system ... – PowerPoint PPT presentation

Number of Views:123

Avg rating:3.0/5.0

Slides: 72

Provided by: csS74

Category:

more less

Transcript and Presenter's Notes

Title: Parallel Computers

1
Parallel Computers
CS147 Lecture 18

Prof. Sin-Min Lee
Department of Computer Science

2
Uniprocessor Systems

Improve performance
Allowing multiple, simultaneous memory access
- requires multiple address, data, and control
buses
(one set for each simultaneous memory access)
- The memory chip has to be able to handle
multiple
transfers simultaneously

3
Uniprocessor Systems

Multiport Memory
Has two sets of address, data, and control pins
to allow simultaneous data transfers to occur
CPU and DMA controller can transfer data
concurrently
A system with more than one CPU could handle
simultaneous requests from two different
processors

4
Uniprocessor Systems

Multiport Memory (cont.)

Can
Multiport memory can handle two requests to read
data from the same location at the same time

Cannot
Process two simultaneous requests to write data
to the same memory location
- Requests to read from and write to the same
memory location simultaneously

5
Multiprocessors
Bus
CPU
Device
I/O Port
Memory
6
(No Transcript)
7
Multiprocessors

Systems designed to have 2 to 8 CPUs
The CPUs all share the other parts of the
computer
Memory
Disk
System Bus
etc
CPUs communicate via Memory and the System Bus

8
MultiProcessors

Each CPU shares memory, disks, etc
Cheaper than clusters
Not as good performance as clusters
Often used for
Small Servers
High-end Workstations

9
MultiProcessors

OS automatically shares work among available CPUs
On a workstation
One CPU can be running an engineering design
program
Another CPU can be doing complex graphics
formatting

10
Applications of Parallel Computers

Traditionally government labs, numerically
intensive applications
Research Institutions
Recent Growth in Industrial Applications
236 of the top 500
Financial analysis, drug design and analysis, oil
exploration, aerospace and automotive

11
Multiprocessor SystemsFlynns Classification

Single instruction multiple data (SIMD)

Main Memory
Control Unit
Processor
Memory
Communications Network
Processor
Memory
Processor
Memory

Executes a single instruction on multiple data
values simultaneously using many processors
Since only one instruction is processed at any
given time, it is not necessary for each
processor to fetch and decode the instruction
This task is handled by a single control unit
that sends the control signals to each processor.
Example Array processor

12
Why Multiprocessors?

Microprocessors as the fastest CPUs
Collecting several much easier than redesigning 1
Complexity of current microprocessors
Do we have enough ideas to sustain 1.5X/yr?
Can we deliver such complexity on schedule?
Slow (but steady) improvement in parallel
software (scientific apps, databases, OS)
Emergence of embedded and server markets driving
microprocessors in addition to desktops
Embedded functional parallelism,
producer/consumer model
Server figure of merit is tasks per hour vs.
latency

13
Parallel Processing Intro

Long term goal of the field scale number
processors to size of budget, desired performance
Machines today Sun Enterprise 10000 (8/00)
64 400 MHz UltraSPARC II CPUs,64 GB SDRAM
memory, 868 18GB disk,tape
4,720,800 total
64 CPUs 15,64 GB DRAM 11, disks 55, cabinet
16 (10,800 per processor or 0.2 per
processor)
Minimal E10K - 1 CPU, 1 GB DRAM, 0 disks, tape
286,700
10,800 (4) per CPU, plus 39,600 board/4 CPUs
(8/CPU)
Machines today Dell Workstation 220 (2/01)
866 MHz Intel Pentium III (in Minitower)
0.125 GB RDRAM memory, 1 10GB disk, 12X CD, 17
monitor, nVIDIA GeForce 2 GTS,32MB DDR Graphics
card, 1yr service
1,600 for extra processor, add 350 (20)

14
Major MIMD Styles

Centralized shared memory ("Uniform Memory
Access" time or "Shared Memory Processor")
Decentralized memory (memory module with CPU)
get more memory bandwidth, lower memory latency
Drawback Longer communication latency
Drawback Software model more complex

15
(No Transcript)
16
Organization of Multiprocessor Systems

Three different ways to organize/classify
systems

Flynns Classification

System Topologies

MIMD System Architectures

17
Multiprocessor SystemsFlynns Classification

Flynns Classification
Based on the flow of instructions and data
processing
A computer is classified by
- whether it processes a single instruction at a
time or multiple instructions simultaneously
- whether it operates on one more multiple data
sets

18
Multiprocessor SystemsFlynns Classification

Four Categories of Flynns Classification
SISD Single instruction single data
SIMD Single instruction multiple data
MISD Multiple instruction single data
MIMD Multiple instruction multiple data
The MISD classification is not practical to
implement.
In fact, no significant MISD computers have ever
been build.
It is included only for completeness.

19
From the beginning of time, computer scientists
have been challenging computers with larger and
larger problems. Eventually, computer processors
were combined together in parallel to work on the
same task together. This is parallel processing.
Types Of Parallel Processing
SISD Single Instruction stream, Single Data
stream MISD Multiple Instruction stream, Single
Data stream SIMD Single Instruction stream,
Multiple Data stream MIMD Multiple Instruction
stream, Multiple Data stream
20
SISD
One piece of data is sent to one processor.
Ex To multiply one hundred numbers by the number
three, each number would be sent and calculated
until all one hundred results were calculated.
21
MISD
One piece of data is broken up and sent to many
processor.
CPU
Data
CPU
Search
CPU
CPU
Ex A database is broken up into sections of
records and sent to several different processor,
each of which searches the section for a specific
key.
22
SIMD
Multiple processors execute the same instruction
of separate data.
Ex A SIMD machine with 100 processors could
multiply 100 numbers, each by the number three,
at the same time.
23
MIMD
Multiple processors execute different instruction
of separate data.
CPU
Data
Multiply
CPU
Data
Search
CPU
Data
Add
CPU
Data
Subtract
This is the most complex form of parallel
processing. It is used on complex simulations
like modeling the growth of cities.
24
The Granddaddy of Parallel Processing
MIMD
25
MIMD computers usually have a different program
running on every processor. This makes for a
very complex programming environment.
Whats doing what when?
What processor? Doing which task? At what time?
26
Memory latency
The time between issuing a memory fetch and
receiving the response.
Simply put, if execution proceeds before the
memory request responds, unexpected results will
occur. What values are being used? Not the
ones requested!
27
A similar problem can occur with instruction
executions themselves.
Synchronization The need to enforce the ordering
of instruction executions according to their data
dependencies.
Instruction b must occur before instruction a.
28
Despite potential problems, MIMD can prove larger
than life.
MIMD Successes
IBM Deep Blue Computer beats professional chess
player.
Some may not consider this to be a fair example,
because Deep Blue was built to beat Kasparov
alone. It knew his play style so it could
counter is projected moves. Still, Deep Blues
win marked a major victory for computing.
29
IBMs latest, a supercomputer that models nuclear
explosions.
IBM Poughkeepsie built the worlds fastest
supercomputer for the U. S. Department of Energy.
Its job was to model nuclear explosions.
30
MIMD its the most complex, fastest, flexible
parallel paradigm. Its beat a world class chess
player at his own game. It models things that
few people understand. It is parallel processing
at its finest.
31
Multiprocessor SystemsFlynns Classification

Single instruction single data (SISD)
Consists of a single CPU executing individual
instructions on individual data values

32
Multiprocessor SystemsFlynns Classification

Multiple instruction Multiple data (MIMD)
Executes different instructions simultaneously
Each processor must include its own control unit
The processors can be assigned to parts of the
same task or to completely separate tasks
Example Multiprocessors, multicomputers

33
Popular Flynn Categories

SISD (Single Instruction Single Data)
Uniprocessors
MISD (Multiple Instruction Single Data)
??? multiple processors on a single data stream
SIMD (Single Instruction Multiple Data)
Examples Illiac-IV, CM-2
Simple programming model
Low overhead
Flexibility
All custom integrated circuits
(Phrase reused by Intel marketing for media
instructions vector)
MIMD (Multiple Instruction Multiple Data)
Examples Sun Enterprise 5000, Cray T3D, SGI
Origin
Flexible
Use off-the-shelf micros
MIMD current winner Concentrate on major design
emphasis lt 128 processor MIMD machines

34
Multiprocessor Systems

System Topologies
The topology of a multiprocessor system refers to
the pattern of connections between its processors
Quantified by standard metrics
Diameter The maximum distance between two
processors in the computer system
Bandwidth The capacity of a communications link
multiplied by the number of such links in
the system (best case)
Bisectional Bandwidth The total bandwidth of the
links connecting the two halves of the
processor split so that the number of
links between the two halves is
minimized (worst case)

35
Multiprocessor SystemsSystem Topologies

Six Categories of System Topologies

Shared bus
Ring
Tree

Mesh
Hypercube
Completely Connected

36
(No Transcript)
37
Multiprocessor SystemsSystem Topologies

Shared bus
The simplest topology
Processors communicate with each other
exclusively via this bus
Can handle only one data transmission at a time
Can be easily expanded by connecting additional
processors to the shared bus, along with the
necessary bus arbitration circuitry

M
M
M
P
P
P
Shared Bus
Global Memory
38
(No Transcript)
39
(No Transcript)
40
(No Transcript)
41
Multiprocessor SystemsSystem Topologies

Ring
Uses direct dedicated connections between
processors
Allows all communication links to be active
simultaneously
A piece of data may have to travel through
several processors to reach its final destination
All processors must have two communication links

P
P
P
P
P
P
42
Multiprocessor SystemsSystem Topologies

Tree topology
Uses direct connections between processors
Each processor has three connections
Its primary advantage is its relatively low
diameter
Example DADO Computer

P
P
P
P
P
P
P
43
(No Transcript)
44
(No Transcript)
45
(No Transcript)
46
Multiprocessor SystemsSystem Topologies

Mesh topology
Every processor connects to the processors above,
below, left, and right
Left to right and top to bottom wraparound
connections may or may not be present

P
P
P
P
P
P
P
P
P
47
(No Transcript)
48
(No Transcript)
49
Multiprocessor SystemsSystem Topologies

Hypercube
Multidimensional mesh
Has n processors, each with log n connections

50
(No Transcript)
51
(No Transcript)
52
Multiprocessor SystemsSystem Topologies

Completely Connected

Every processor has n-1
connections, one to each
of the other processors
The complexity of the
processors increases as
the system grows
Offers maximum
communication capabilities

53
Architecture Details

Computers ? MPPs

Worlds simplest computer (processor/memory)
Standard computer (add cache,disk)
Network
54
A Supercomputer at 5.2 million
Virginia Tech 1,100 node Macs. G5 supercomputer
55
The Virginia Polytechnic Institute and State
University has built a supercomputer comprised of
a cluster of 1,100 dual-processor Macintosh G5
computers. Based on preliminary benchmarks, Big
Mac is capable of 8.1 teraflops per second. The
Mac supercomputer still is being fine tuned, and
the full extent of its computing power will not
be known until November. But the 8.1 teraflops
figure would make the Big Mac the world's fourth
fastest supercomputer
56
Big Mac's cost relative to similar machines is as
noteworthy as its performance. The Apple
supercomputer was constructed for just over US5
million, and the cluster was assembled in about
four weeks. In contrast, the world's leading
supercomputers cost well over 100 million to
build and require several years to construct. The
Earth Simulator, which clocked in at 38.5
teraflops in 2002, reportedly cost up to 250
million.
57
October 28 2003Time 730pm - 900pmLocation Sa
nta Clara Ballroom
Srinidhi Varadarajan, Ph.D.Dr. Srinidhi
Varadarajan is an Assistant Professor of Computer
Science at Virginia Tech. He was honored with the
NSF Career Award in 2002 for "Weaving a Code
Tapestry A Compiler Directed Framework for
Scalable Network Emulation." He has focused his
research on building a distributed network
emulation system that can scale to emulate
hundreds of thousands of virtual nodes.
58
Parallel Computers

Two common types
Cluster
Multi-Processor

59
Cluster Computers
60
Clusters on the Rise Using clusters of small
machines to build a supercomputer is not a new
concept. Another of the world's top machines,
housed at the Lawrence Livermore National
Laboratory, was constructed from 2,304 Xeon
processors. The machine was build by Utah-based
Linux Networx. Clustering technology has meant
that traditional big-iron leaders like Cray
(Nasdaq CRAY) and IBM have new competition
from makers of smaller machines. Dell (Nasdaq
DELL) , among other companies, has sold
high-powered computing clusters to research
institutions.
61
Cluster Computers

Each computer in a cluster is a complete computer
by itself
CPU
Memory
Disk
etc
Computers communicate with each other via some
interconnection bus

62
Cluster Computers

Typically used where one computer does not have
enough capacity to do the expected work
Large Servers
Cheaper than building one GIANT computer

63
Although not new, supercomputing clustering
technology still is impressive. It works by
farming out chunks of data to individual
machines, adding that clustering works better for
some types of computing problems than others.
For example, a cluster would not be ideal to
compete against IBM's Deep Blue supercomputer in
a chess match in this case, all the data must be
available to one processor at the same moment --
the machine operates much in the same way as the
human brain handles tasks. However, a cluster
would be ideal for the processing of seismic data
for oil exploration, because that computing job
can be divided into many smaller tasks.
64
Cluster Computers

Need to break up work among the computers in the
cluster
Example Microsoft.com Search Engine
6 computers running SQL Server
Each has a copy of the MS Knowledge Base
Search requests come to one computer
Sends request to one of the 6
Attempts to keep all 6 busy

65
The Virginia Tech Mac supercomputer should be
fully functional and in use by January 2004. It
will be used for research into nanoscale
electronics, quantum chemistry, computational
chemistry, aerodynamics, molecular statics,
computational acoustics and the molecular
modeling of proteins.
66
Specialized Processors

Vector Processors
Massively Parallel Computers

67
Vector Processors
For (I0IltnI) array1I array2I
array3I
This is an array (vector) operation
68
Vector Processors

Special instructions to operate on vectors
(arrays)
Vector instruction specifies
Starting addresses of all 3 arrays
Loop count
Saves For Loop overhead
Can more efficiently access memory
Also Known as SIMD Computers
Single Instruction Multiple Data

69
Vector Processors

Until the 1990s, the worlds fastest
supercomputers were implemented as vector
processors
Now, Vector Processors are typically special
peripheral devices that can be installed on a
regular computer

70
Massively Parallel Computers