Parallelism in Uniprocessor System - PowerPoint PPT Presentation

1 / 44
About This Presentation
Title:

Parallelism in Uniprocessor System

Description:

Perform different arithmetic operations in parallel. 4. Vector arithmetic unit ... These two networks are isomorphic. 23. Communication in multiprocessor System ... – PowerPoint PPT presentation

Number of Views:5563
Avg rating:3.0/5.0
Slides: 45
Provided by: SCF8
Category:

less

Transcript and Presenter's Notes

Title: Parallelism in Uniprocessor System


1
Chapter 9
2
Parallelism in Uniprocessor System
  • Reconfigurable arithmetic pipeline
  • Better suited for general purpose computing
  • Bi?CiDi

3
Parallelism in Uniprocessor System
  • Vector arithmetic unit
  • Perform different arithmetic operations in
    parallel

4
Vector arithmetic unit
  • How to get the data to the vectored arithmetic
    unit?
  • Multiport Memory
  • Designed for the purpose that allowing multiple,
    simultaneous memory accesses

5
Organization of Multiprocessor System
  • Flynns classification
  • SISD single instruction single data
  • SIMD single instruction multiple data
  • MISD multiple instruction single data
  • MIMD multiple instruction multiple data

6
A generic SIMD organization
7
System Topologies
  • Topology
  • Diameter
  • Maximum distance between two processors in the
    computer system
  • Bandwidth
  • The systems total bandwidth is the capacity of a
    communications link multiplied by the number of
    such links in the system

8
Topology
  • (a)shared bus
  • (b)ring

9
Topology
  • (c) tree
  • (d) mesh

10
Topology
  • (e) hypercube
  • (f) completely

11
MIMD System Architecture
  • SMP
  • UMA
  • NUMA
  • CC-NUMA
  • COMA
  • NOW COW
  • MPP

12
Uniform memory access architecture (UMA)
13
Nonuniform memory access architecture(NUMA)
access local mem. only
14
Communication in multiprocessor System
  • Fixed Connections
  • Clustering
  • Cluster bus
  • Intercluster gateway

15
(No Transcript)
16
Communication in multiprocessor System
  • Reconfigurable Connection
  • Crossbar switch
  • Drawback -size
  • An n x m crossbar switch

17
Reconfigurable Connection
  • Multistage Interconnection Networks(MINs)
  • Routing algorithm
  • By setting the switches to the correct states,
    the MIN realizes the desired connections between
    inputs and outputs

18
Nonblocking network
  • It can realize any of the n! connections.
  • Strictly nonblocking
  • If a network can modify one connection without
    changing any others
  • Rearrangeably nonblocking
  • If a network can realize a new connection, but
    may have to reroute the path used to realize an
    existing connection in order to do so.
  • Clos network
  • Bene network

19
Clos network originally designed for telephone
switching system
20
Clos network
  • Nn?k inputs
  • Three stages
  • If m?n , the network is rearrangeably
    nonblocking.
  • If m?2n-1 , the network is strictly nonblocking.

21
Bene network derrived from Clos network by
setting nm2 and kN/2, and recursively
decomposing the two (N/2)?(N/2) switches
22
Blocking network
  • (a)Omega network
  • Hardware complexity O(nlogn)
  • 12 switches
  • (b)Baseline network
  • These two networks are isomorphic

23
Communication in multiprocessor System
  • Routing on multistage interconnection networks
  • Looping algorithm
  • Recursive method used to set the switches of a
    Bene network.
  • Group theory is necessary !

24
Result of Looping Algorithm(a)after one
iteration (b)final results
25
(a)successful and (b) unsuccessful routing on the
Omega network
26
Memory Organization in Multiprocessor System
  • Shared Memory
  • Interleaving
  • High-order
  • Low-order
  • UMA system with shared memory

27
Address ranges using high-order and low-order
interleaving
  • High-order
  • Low-order

28
Cache Coherence
  • Cache coherence problem
  • Each processor has its local cache
  • Solution
  • Program compilation mark shared data
    non-cacheable
  • Cache directory (cache controller)
  • Snooping each cache monitors memory activity on
    bus
  • MESI protocol
  • Modified Exclusive Shared Invalid
  • Four memory access scenarios
  • Read hit or miss
  • Write hit or miss

29
Multiprocessor
  • Load balancing
  • Distributing tasks among processors to maximize
    processor utilization
  • Graceful degradation of performance
  • Ensure that the system continues to function
  • Data anti-dependency
  • 1 and 2 (A)
  • Data output dependency
  • 1 and 3 (A in 1 stored before 3)

1A ?BC 2D ?AE 3A ?FG
30
Parallel Algorithms
  • Parallel bubble sort
  • The following code segment implements the
    bubble sort on the n elements of array A
  • The parallel algorithm

31
Trace of the (a) sequential (b) parallel bubble
sort algorithms
32
Parallel Algorithms
  • Parallel Matrix Multiplication
  • the following code segment performs this matrix
    multiplication
  • the parallel code

33
Trace of the (a) sequential and (b) parallel
matrix multiplication algorithms
34
Alternative Parallel Architecture
  • Non-von Neumann machine
  • Dataflow Computing availability of operands
  • Dataflow graphs
  • Single assignment rule

35
Single assignment rule
  • To avoid data dependency
  • No two statements write data to the same variable
  • An operand of one equation cannot be written to
    by any later equation.
  • 1A ?BC The revised code segment and its
    dataflow graph
  • 2B ?AD
  • 3C ?AB
  • 4D ?CB
  • 5A ?AC

36
Dataflow graph
  • Token
  • Data that traverses an edge
  • Fire
  • A vertex is ready to execute its instruction
  • statement 1 has been executed
  • statement 2 is read to fire
  • statement 3 and 5 each have one operand
  • available

37
I-structure
  • Dataflow vertices are usually stored as
    I-structures
  • An I-structure includes operation, its operands
    (or space to store the operand when it is ready),
    and a list of destinations for its result
  • Dataflow graph of previous graph with I-structure
  • statement 2, operand 1 B2, C3, D4

38
Architectures of Dataflow System
  • Static architecture
  • Only one token to reside on an edge at any given
    time
  • Dynamic architecture
  • Allow multiple, tagged tokens

39
Systolic Arrays
  • A systolic arrays incorporates several processing
    elements into a regular structure, such as a
    linear array or mesh
  • A 2 x 2 systolic array to multiply two matrices

40
Trace of the four clock cycle of the
multiplication algorithm on the systolic array
41
Trace of the four clock cycle of the
multiplication algorithm on the systolic array
42
A 2 x 2 systolic array to multiply two matrices
with data values shown
43
Neural Networks
  • Neurons
  • Named after the conductors of signals in the
    human nervous system
  • Unlike traditional computers, which are
    programmed, neural network are trained
  • Weighting factor
  • Output of each neuron is multiplied by its
    weighting factor

44
Neural Networks
  • Threshold value
  • If the weighted value is greater than or equal to
    the threshold value, the neuron outputs a logical
    value of 1
  • Neural system are being used in control systems
    and artificial intelligence application
Write a Comment
User Comments (0)
About PowerShow.com