Title: Introduction to Parallel Processing
1Introduction to Parallel Processing
- Debbie Hui
- CS 147 Prof. Sin-Min Lee
- 7 / 11 / 2001
2Parallel Processing
- Parallelism in Uniprocessor Systems
- Organization of Multiprocessor Systems
3Parallelism in Uniprocessor Systems
- A computer achieves parallelism when it performs
two or more unrelated tasks simultaneously
4Uniprocessor Systems
- Uniprocessor system may incorporate parallelism
using - an instruction pipeline
- a fixed or reconfigurable arithmetic pipeline
- I/O processors
- vector arithmetic units
- multiport memory
5Uniprocessor Systems
- Instruction pipeline
- By overlapping the fetching, decoding, and
execution of instructions - Allows the CPU to execute one instruction per
clock cycle
6Uniprocessor Systems
- Reconfigurable Arithmetic Pipeline
- Better suited for general purpose computing
- Each stage has a multiplexer at its input
- The control unit of the CPU sets the selected
data to configure the pipeline - Problem Although arithmetic pipelines can
perform many iterations of the same operation in
parallel, they cannot perform different
operations simultaneously.
7Uniprocessor Systems
- Vectored Arithmetic Unit
- Provides a solution to the reconfigurable
arithmetic pipeline problem - Purpose to perform different arithmetic
operations in parallel
8Uniprocessor Systems
- Vectored Arithmetic Unit (cont.)
- Contains multiple functional
- units
- - Some performs addition,
- subtraction, etc.
- Input and output switches
- are needed to route the
- proper data to their proper
- destinations
- - Switches are set by the
- control unit
9Uniprocessor Systems
- Vectored Arithmetic Unit (cont.)
- How do we get all that data to the vector
arithmetic unit? -
- By transferring several data values
simultaneously using - - Multiple buses
- - Very wide data buses
10Uniprocessor Systems
- Improve performance
- Allowing multiple, simultaneous memory access
- - requires multiple address, data, and control
buses - (one set for each simultaneous memory access)
- - The memory chip has to be able to handle
multiple - transfers simultaneously
11Uniprocessor Systems
- Multiport Memory
- Has two sets of address, data, and control pins
to allow simultaneous data transfers to occur - CPU and DMA controller can transfer data
concurrently - A system with more than one CPU could handle
simultaneous requests from two different
processors
12Uniprocessor Systems
- Can
- Multiport memory can handle two requests to read
data from the same location at the same time
- Cannot
- Process two simultaneous requests to write data
to the same memory location - - Requests to read from and write to the same
memory location simultaneously
13Organization of Multiprocessor Systems
- Three different ways to organize/classify
systems
- MIMD System Architectures
14Multiprocessor SystemsFlynns Classification
- Flynns Classification
- Based on the flow of instructions and data
processing - A computer is classified by
- - whether it processes a single instruction at a
time or multiple instructions simultaneously - - whether it operates on one more multiple data
sets
15Multiprocessor SystemsFlynns Classification
- Four Categories of Flynns Classification
- SISD Single instruction single data
- SIMD Single instruction multiple data
- MISD Multiple instruction single data
- MIMD Multiple instruction multiple data
- The MISD classification is not practical to
implement. - In fact, no significant MISD computers have ever
been build. - It is included only for completeness.
16Multiprocessor SystemsFlynns Classification
- Single instruction single data (SISD)
- Consists of a single CPU executing individual
instructions on individual data values
17Multiprocessor SystemsFlynns Classification
- Single instruction multiple data (SIMD)
Main Memory
Control Unit
Processor
Memory
Communications Network
Processor
Memory
Processor
Memory
- Executes a single instruction on multiple data
values simultaneously using many processors - Since only one instruction is processed at any
given time, it is not necessary for each
processor to fetch and decode the instruction - This task is handled by a single control unit
that sends the control signals to each processor. - Example Array processor
18Multiprocessor SystemsFlynns Classification
- Multiple instruction Multiple data (MIMD)
- Executes different instructions simultaneously
- Each processor must include its own control unit
- The processors can be assigned to parts of the
same task or to completely separate tasks - Example Multiprocessors, multicomputers
19Multiprocessor SystemsSystem Topologies
- System Topologies
- The topology of a multiprocessor system refers to
the pattern of connections between its processors - Quantified by standard metrics
- Diameter The maximum distance between two
processors in the computer system - Bandwidth The capacity of a communications link
multiplied by the number of such links in
the system (best case) - Bisectional Bandwidth The total bandwidth of the
links connecting the two halves of the
processor split so that the number of
links between the two halves is
minimized (worst case)
20Multiprocessor SystemsSystem Topologies
- Six Categories of System Topologies
- Mesh
- Hypercube
- Completely Connected
21Multiprocessor SystemsSystem Topologies
- Shared bus
- The simplest topology
- Processors communicate with each other
exclusively via this bus - Can handle only one data transmission at a time
- Can be easily expanded by connecting additional
processors to the shared bus, along with the
necessary bus arbitration circuitry
M
M
M
P
P
P
Shared Bus
Global Memory
22Multiprocessor SystemsSystem Topologies
- Ring
- Uses direct dedicated connections between
processors - Allows all communication links to be active
simultaneously - A piece of data may have to travel through
several processors to reach its final destination - All processors must have two communication links
P
P
P
P
P
P
23Multiprocessor SystemsSystem Topologies
- Tree topology
- Uses direct connections between processors
- Each processor has three connections
- Its primary advantage is its relatively low
diameter - Example DADO Computer
P
P
P
P
P
P
24Multiprocessor SystemsSystem Topologies
- Mesh topology
- Every processor connects to the processors above,
below, left, and right - Left to right and top to bottom wraparound
connections may or may not be present
P
P
P
P
P
P
P
P
P
25Multiprocessor SystemsSystem Topologies
- Hypercube
- Multidimensional mesh
- Has n processors, each with log n connections
26Multiprocessor SystemsSystem Topologies
- Every processor has n-1
- connections, one to each
- of the other processors
- The complexity of the
- processors increases as
- the system grows
- Offers maximum
- communication capabilities
27Multiprocessor SystemsSystem Topologies
Without wraparound With wraparound
l bandwidth of the bus n number of processors
28Multiprocessor SystemsMIMD System Architecture
- MIMD System Architecture
- The architecture of an MIMD system refers to its
connections with respect to system memory - Multiprocessor
- Multicomputers
29Multiprocessor SystemsMIMD System Architecture
- Symmetric multiprocessor (SMP)
- A computer system that has two or more processor
with comparable capabilities - Four different types
- - Uniform memory access (UMA)
- - Nonuniform memory access (NUMA)
- - Cache coherent NUMA (CC-NUMA)
- - Cache only memory access (COMA)
30Multiprocessor SystemsMIMD System Architecture
- Uniform memory access (UMA)
- Gives all CPUs equal (uniform) access to all
shared memory locations - Each processor may have its own cache memory, not
directly accessible by the other processors
Processor 1
Communications Mechanism
Shared Memory
Processor 2
Processor n
31Multiprocessor SystemsMIMD System Architecture
- Nonuniform memory access (NUMA)
- Dos not allow uniform access to all shared memory
locations - It still allows all processors to access all
shared memory locations, however, each processor
can access the memory module closest to it faster
than other modules
Processor 1
Processor 2
Processor n
Memory 1
Memory 2
Memory n
Communications Mechanism
32Multiprocessor SystemsMIMD System Architecture
- Cache Coherent NUMA (CC-NUMA)
- Similar to NUMA except each processor includes
cache memory - The cache can buffer data from memory modules
that are not local to the processor, which can
reduce the access time of the memory transfers - Creates a problem when two or more caches hold
the same piece of data - A solution to this problem is Cache only memory
access (COMA)
33Multiprocessor SystemsMIMD System Architecture
- Cache Only Memory Access (COMA)
- Each processors local memory is treated as a
cache - When the processor requests data that is not in
its cache (local memory), the system loads that
data into local memory as part of the memory
operation
34Multiprocessor SystemsMIMD System Architecture
- Multicomputer
- An MIMD machine in which all processors are not
under the control of one operating system - Each processor or group of processors is under
the control of a different operating system, or a
different instantiation of the same operating
system - Two different types
- - Network or cluster of workstations (NOW or
COW) - - Massively parallel processor (MPP)
35Multiprocessor SystemsMIMD System Architecture
- Network of workstation (NOW) or
- Cluster of workstation (COW)
- More than a group of workstations on a local area
network (LAN) - Have a master scheduler, which matches tasks and
processors together
36Multiprocessor SystemsMIMD System Architecture
- Massively Parallel Processor (MPP)
- Consists of many self-contained nodes, each
having a processor, memory, and hardware for
implementing internal communications - The processors communicate with each other using
shared memory - Example IBMs Blue Gene Computer
37Thank you!