Parallelism in Uniprocessor System

About This Presentation

Title:

Parallelism in Uniprocessor System

Description:

Perform different arithmetic operations in parallel. 4. Vector arithmetic unit ... These two networks are isomorphic. 23. Communication in multiprocessor System ... – PowerPoint PPT presentation

Number of Views:5563

Avg rating:3.0/5.0

Slides: 45

Provided by: SCF8

Category:

more less

Transcript and Presenter's Notes

Title: Parallelism in Uniprocessor System

1
Chapter 9
2
Parallelism in Uniprocessor System

Reconfigurable arithmetic pipeline
Better suited for general purpose computing
Bi?CiDi

3
Parallelism in Uniprocessor System

Vector arithmetic unit
Perform different arithmetic operations in
parallel

4
Vector arithmetic unit

How to get the data to the vectored arithmetic
unit?
Multiport Memory
Designed for the purpose that allowing multiple,
simultaneous memory accesses

5
Organization of Multiprocessor System

Flynns classification
SISD single instruction single data
SIMD single instruction multiple data
MISD multiple instruction single data
MIMD multiple instruction multiple data

6
A generic SIMD organization
7
System Topologies

Topology
Diameter
Maximum distance between two processors in the
computer system
Bandwidth
The systems total bandwidth is the capacity of a
communications link multiplied by the number of
such links in the system

8
Topology

(a)shared bus
(b)ring

9
Topology

(c) tree
(d) mesh

10
Topology

(e) hypercube
(f) completely

11
MIMD System Architecture

SMP
UMA
NUMA
CC-NUMA
COMA
NOW COW
MPP

12
Uniform memory access architecture (UMA)
13
Nonuniform memory access architecture(NUMA)
access local mem. only
14
Communication in multiprocessor System

Fixed Connections
Clustering
Cluster bus
Intercluster gateway

15
(No Transcript)
16
Communication in multiprocessor System

Reconfigurable Connection
Crossbar switch
Drawback -size
An n x m crossbar switch

17
Reconfigurable Connection

Multistage Interconnection Networks(MINs)
Routing algorithm
By setting the switches to the correct states,
the MIN realizes the desired connections between
inputs and outputs

18
Nonblocking network

It can realize any of the n! connections.
Strictly nonblocking
If a network can modify one connection without
changing any others
Rearrangeably nonblocking
If a network can realize a new connection, but
may have to reroute the path used to realize an
existing connection in order to do so.
Clos network
Bene network

19
Clos network originally designed for telephone
switching system
20
Clos network

Nn?k inputs
Three stages
If m?n , the network is rearrangeably
nonblocking.
If m?2n-1 , the network is strictly nonblocking.

21
Bene network derrived from Clos network by
setting nm2 and kN/2, and recursively
decomposing the two (N/2)?(N/2) switches
22
Blocking network

(a)Omega network
Hardware complexity O(nlogn)
12 switches
(b)Baseline network
These two networks are isomorphic

23
Communication in multiprocessor System

Routing on multistage interconnection networks
Looping algorithm
Recursive method used to set the switches of a
Bene network.
Group theory is necessary !

24
Result of Looping Algorithm(a)after one
iteration (b)final results
25
(a)successful and (b) unsuccessful routing on the
Omega network
26
Memory Organization in Multiprocessor System

Shared Memory
Interleaving
High-order
Low-order
UMA system with shared memory

27
Address ranges using high-order and low-order
interleaving

High-order
Low-order

28
Cache Coherence

Cache coherence problem
Each processor has its local cache
Solution
Program compilation mark shared data
non-cacheable
Cache directory (cache controller)
Snooping each cache monitors memory activity on
bus
MESI protocol
Modified Exclusive Shared Invalid
Four memory access scenarios
Read hit or miss
Write hit or miss

29
Multiprocessor

Load balancing
Distributing tasks among processors to maximize
processor utilization
Graceful degradation of performance
Ensure that the system continues to function
Data anti-dependency
1 and 2 (A)
Data output dependency
1 and 3 (A in 1 stored before 3)

1A ?BC 2D ?AE 3A ?FG
30
Parallel Algorithms

Parallel bubble sort
The following code segment implements the
bubble sort on the n elements of array A
The parallel algorithm

31
Trace of the (a) sequential (b) parallel bubble
sort algorithms
32
Parallel Algorithms

Parallel Matrix Multiplication
the following code segment performs this matrix
multiplication
the parallel code

33
Trace of the (a) sequential and (b) parallel
matrix multiplication algorithms
34
Alternative Parallel Architecture

Non-von Neumann machine
Dataflow Computing availability of operands
Dataflow graphs
Single assignment rule

35
Single assignment rule

To avoid data dependency
No two statements write data to the same variable
An operand of one equation cannot be written to
by any later equation.
1A ?BC The revised code segment and its
dataflow graph
2B ?AD
3C ?AB
4D ?CB
5A ?AC

36
Dataflow graph

Token
Data that traverses an edge
Fire
A vertex is ready to execute its instruction
statement 1 has been executed
statement 2 is read to fire
statement 3 and 5 each have one operand
available

37
I-structure

Dataflow vertices are usually stored as
I-structures
An I-structure includes operation, its operands
(or space to store the operand when it is ready),
and a list of destinations for its result
Dataflow graph of previous graph with I-structure
statement 2, operand 1 B2, C3, D4

38
Architectures of Dataflow System

Static architecture
Only one token to reside on an edge at any given
time
Dynamic architecture
Allow multiple, tagged tokens

39
Systolic Arrays

A systolic arrays incorporates several processing
elements into a regular structure, such as a
linear array or mesh
A 2 x 2 systolic array to multiply two matrices

40
Trace of the four clock cycle of the
multiplication algorithm on the systolic array
41
Trace of the four clock cycle of the
multiplication algorithm on the systolic array
42
A 2 x 2 systolic array to multiply two matrices
with data values shown
43
Neural Networks

Neurons
Named after the conductors of signals in the
human nervous system
Unlike traditional computers, which are
programmed, neural network are trained
Weighting factor
Output of each neuron is multiplied by its
weighting factor

44
Neural Networks

Threshold value
If the weighted value is greater than or equal to
the threshold value, the neuron outputs a logical
value of 1
Neural system are being used in control systems
and artificial intelligence application

Write a Comment

User Comments (0)

About PowerShow.com

Parallelism in Uniprocessor System - PowerPoint PPT Presentation

Parallelism in Uniprocessor System

Perform different arithmetic operations in parallel. 4. Vector arithmetic unit ... These two networks are isomorphic. 23. Communication in multiprocessor System ... – PowerPoint PPT presentation