Title: Parallelism in Uniprocessor System
1Chapter 9
2Parallelism in Uniprocessor System
- Reconfigurable arithmetic pipeline
- Better suited for general purpose computing
- Bi?CiDi
3Parallelism in Uniprocessor System
- Vector arithmetic unit
- Perform different arithmetic operations in
parallel
4Vector arithmetic unit
- How to get the data to the vectored arithmetic
unit? - Multiport Memory
- Designed for the purpose that allowing multiple,
simultaneous memory accesses
5Organization of Multiprocessor System
- Flynns classification
- SISD single instruction single data
- SIMD single instruction multiple data
- MISD multiple instruction single data
- MIMD multiple instruction multiple data
6A generic SIMD organization
7System Topologies
- Topology
- Diameter
- Maximum distance between two processors in the
computer system - Bandwidth
- The systems total bandwidth is the capacity of a
communications link multiplied by the number of
such links in the system
8Topology
9Topology
10Topology
- (e) hypercube
- (f) completely
11MIMD System Architecture
- SMP
- UMA
- NUMA
- CC-NUMA
- COMA
- NOW COW
- MPP
12Uniform memory access architecture (UMA)
13Nonuniform memory access architecture(NUMA)
access local mem. only
14Communication in multiprocessor System
- Fixed Connections
- Clustering
- Cluster bus
- Intercluster gateway
15(No Transcript)
16Communication in multiprocessor System
- Reconfigurable Connection
- Crossbar switch
- Drawback -size
- An n x m crossbar switch
17Reconfigurable Connection
- Multistage Interconnection Networks(MINs)
- Routing algorithm
- By setting the switches to the correct states,
the MIN realizes the desired connections between
inputs and outputs
18Nonblocking network
- It can realize any of the n! connections.
- Strictly nonblocking
- If a network can modify one connection without
changing any others - Rearrangeably nonblocking
- If a network can realize a new connection, but
may have to reroute the path used to realize an
existing connection in order to do so. - Clos network
- Bene network
19Clos network originally designed for telephone
switching system
20Clos network
- Nn?k inputs
- Three stages
- If m?n , the network is rearrangeably
nonblocking. - If m?2n-1 , the network is strictly nonblocking.
21Bene network derrived from Clos network by
setting nm2 and kN/2, and recursively
decomposing the two (N/2)?(N/2) switches
22Blocking network
- (a)Omega network
- Hardware complexity O(nlogn)
- 12 switches
- (b)Baseline network
- These two networks are isomorphic
23Communication in multiprocessor System
- Routing on multistage interconnection networks
- Looping algorithm
- Recursive method used to set the switches of a
Bene network. - Group theory is necessary !
24Result of Looping Algorithm(a)after one
iteration (b)final results
25(a)successful and (b) unsuccessful routing on the
Omega network
26Memory Organization in Multiprocessor System
- Shared Memory
- Interleaving
- High-order
- Low-order
- UMA system with shared memory
27Address ranges using high-order and low-order
interleaving
28Cache Coherence
- Cache coherence problem
- Each processor has its local cache
- Solution
- Program compilation mark shared data
non-cacheable - Cache directory (cache controller)
- Snooping each cache monitors memory activity on
bus - MESI protocol
- Modified Exclusive Shared Invalid
- Four memory access scenarios
- Read hit or miss
- Write hit or miss
29Multiprocessor
- Load balancing
- Distributing tasks among processors to maximize
processor utilization - Graceful degradation of performance
- Ensure that the system continues to function
- Data anti-dependency
- 1 and 2 (A)
- Data output dependency
- 1 and 3 (A in 1 stored before 3)
1A ?BC 2D ?AE 3A ?FG
30Parallel Algorithms
- Parallel bubble sort
- The following code segment implements the
bubble sort on the n elements of array A - The parallel algorithm
31Trace of the (a) sequential (b) parallel bubble
sort algorithms
32Parallel Algorithms
- Parallel Matrix Multiplication
- the following code segment performs this matrix
multiplication - the parallel code
33Trace of the (a) sequential and (b) parallel
matrix multiplication algorithms
34Alternative Parallel Architecture
- Non-von Neumann machine
- Dataflow Computing availability of operands
- Dataflow graphs
- Single assignment rule
35Single assignment rule
- To avoid data dependency
- No two statements write data to the same variable
- An operand of one equation cannot be written to
by any later equation. - 1A ?BC The revised code segment and its
dataflow graph - 2B ?AD
- 3C ?AB
- 4D ?CB
- 5A ?AC
36Dataflow graph
- Token
- Data that traverses an edge
- Fire
- A vertex is ready to execute its instruction
- statement 1 has been executed
- statement 2 is read to fire
- statement 3 and 5 each have one operand
- available
37I-structure
- Dataflow vertices are usually stored as
I-structures - An I-structure includes operation, its operands
(or space to store the operand when it is ready),
and a list of destinations for its result - Dataflow graph of previous graph with I-structure
- statement 2, operand 1 B2, C3, D4
38Architectures of Dataflow System
- Static architecture
- Only one token to reside on an edge at any given
time - Dynamic architecture
- Allow multiple, tagged tokens
39Systolic Arrays
- A systolic arrays incorporates several processing
elements into a regular structure, such as a
linear array or mesh - A 2 x 2 systolic array to multiply two matrices
40Trace of the four clock cycle of the
multiplication algorithm on the systolic array
41Trace of the four clock cycle of the
multiplication algorithm on the systolic array
42A 2 x 2 systolic array to multiply two matrices
with data values shown
43Neural Networks
- Neurons
- Named after the conductors of signals in the
human nervous system - Unlike traditional computers, which are
programmed, neural network are trained - Weighting factor
- Output of each neuron is multiplied by its
weighting factor
44Neural Networks
- Threshold value
- If the weighted value is greater than or equal to
the threshold value, the neuron outputs a logical
value of 1 - Neural system are being used in control systems
and artificial intelligence application