Chapter 6 Multiprocessor System Introduction Each processor in a multiprocessor system can be executing a different instruction at any time. The major advantages of ... – PowerPoint PPT presentation
Each processor in a multiprocessor system can be executing a different instruction at any time.
The major advantages of MIMD system
Reliability
High performance
The overhead involved with MIMD
Communication between processors
Synchronization of the work
Waste of processor time if any processor runs out of work to do
Processor scheduling
3 Introduction (continued)
task
An entity to which a processor is assigned
a program, a function or a procedure in execution
process
another word for a task
processor (or processing element)
hardware resource on which tasks are executed
4 Introduction (continued)
Thread
The sequence of tasks performed in succession by a given processor
The path of execution of a processor through a number of tasks.
Multiprocessors provide for the simultaneous presence of a number of threads of execution in an application.
Refer to Example 6.1 (degree of parallelism 3)
5 R-to-C ratio
A measure of how much overhead is produced per unit of computation.
R the length of the run time of the task (computation time)
C the communication overhead
This ratio signifies task granularity
A high R-to-C ratio implies that communication overhead is insignificant compared to computation time.
6 Task granularity
Task granularity
Coarse grain parallelism
High R-to-C ratio
Fine grain parallelism
Low R-to-C ratio
The general tendency to maximum performance is to resort to the finest possible granularity. ? providing for the highest degree of parallelism.
Maximum parallelism does not lead to maximum overhead. ? a trade-off is required to reach an optimum level.
7 6.1 MIMD Organization(Figure 6.2)
Two popular MIMD organizations
Shared memory (or tightly coupled ) architecture
Message passing (or loosely coupled) architecture
Share memory architecture
UMA (uniform memory architecture)
Rapid memory access
Memory contention
8 6.1 MIMD Organization (continued)
Message-passing architecture
Distributed memory MIMD system
NUMA (nonuniform memory access)
Heavy communication overhead for remote memory access
No memory contention problem
Other models
Mixed of two
9 6.2 Memory Organization
Two parameters of interest in MIMD memory system design
bandwidth
latency.
Memory latency is reduced by increasing the memory bandwidth.
By building the memory system with multiple independent memory modules (Banked and interleaved memory architecture)
By reducing the memory access and cycle times
10 Multi-port memories
Figure 6.3 (b)
Each memory module is a three-port memory device.
All three ports can be active simultaneously.
The only restriction is that only one location can be write data into a memory location.
11 Cache incoherence
The problem wherein the value of a data item is not consistent throughout the memory system.
Write-through
A processor updates the cache and also the corresponding entry in the main memory.
Updating protocol
Invalidating protocol
Write-back
An updated cache-block is written back to the main memory just before that block is replaced in the cache.
12 6.2 Memory Organization (continued)
Cache coherence schemes
Not to use private caches (Figure 6.4)
With private cache architecture, but to cache only non-sharable data items.
Cache flushing
Shared data are allowed to be cached only when it is known that only one processor will be accessing the data
13 6.2 Memory Organization (continued)
Cache coherence schemes (continued)
Bus watching (or bus snooping) (Figure 6.5)
Bus watching schemes incorporate hardware that monitors the shared bus for data LOAD and STORE into each processors cache controller.
Write-once
The first STORE causes a write-through to the main memory.
Ownership protocol
14 6.3 Interconnection Network
Bus (Figure 6.6)
Bus window (Figure 6.7(a))
Fat tree (Figure 6.7 (b))
Loop or ring
token ring standard
Mesh
15 6.3 Interconnection Network(continued)
Hypercube
Routing is straightforward.
The number of nodes must be increased by powers of two.
Crossbar
It offers multiple simultaneous communications but at a high hardware complexity.
Multistage switching networks
16 6.4 Operating System Considerations
The major functions of the multiprocessor system
Keeping track of the status of all the resources at all time
Assigning tasks to processors in a justifiable manner
Spawning and creating new processors such that they can be executed in parallel or independently of each other.
Collecting their individual results when all the spawned processed are completed and passing them to other processors as required.
17 6.4 Operating System Considerations (continued)
Synchronization mechanisms
Processes in an MIMD operate in a cooperative manner and a sequence control mechanism is needed to ensure the ordering of operations.
Processes compete with each other to gain access to shared data items.
An access control mechanism is needed to maintain orderly access
18 6.4 Operating System Considerations (continued)
Synchronization mechanisms
The most primitive synchronization techniques
Test set
Semaphores
Barrier synchronization
Fetch add
Heavy-weight process and Light-weight process
Scheduling
Static
Dynamic load balancing
19 6.5 Programming (continued)
Four main structures of parallel programming
Parbegin / parend
Fork / join
Doall
Processes, tasks, procedures, and so on can be declared for parallel execution.
20 6.6 Performance Evaluation and Scalability
Performance evaluation
Speed-up S Ts / Tp
To TpP-Ts ? Tp(ToTs)/P
S Ts P/(ToTs)
Efficiency E S/p
Ts/(TsTo)
1/(1To/Ts)
21 Scalability
Scalability the ability to increase speedup as the number of processors increase.
A parallel system is scalable if its efficiency can be maintained at a fixed value by increasing the number of processors as the problem size increases.
Time-constrained scaling
Memory-constrained scaling
22 Isoefficiency function
E 1/(1To/Ts)
? To/Ts(1-E)/E.
Hence, TsETo/(1-E)
For a given value of E, E/(1-E) is a constant, K.
Then TsKTo (Isoefficency function)
A small isoeffiency function indicates that small increments in problem size are sufficient to maintain efficiency when p is increased.
23 6.6 Performance Evaluation and Scalability (continued)
Performance models
The basic model
Each task is equal and takes R time units to be executed on a processor.
If two tasks on different processors wish to communicate with each other, they do so at a cost C time units.
PowerShow.com is a leading presentation sharing website. It has millions of presentations already uploaded and available with 1,000s more being uploaded by its users every day. Whatever your area of interest, here you’ll be able to find and view presentations you’ll love and possibly download. And, best of all, it is completely free and easy to use.
You might even have a presentation you’d like to share with others. If so, just upload it to PowerShow.com. We’ll convert it to an HTML5 slideshow that includes all the media types you’ve already added: audio, video, music, pictures, animations and transition effects. Then you can share it with your target audience as well as PowerShow.com’s millions of monthly visitors. And, again, it’s all free.
About the Developers
PowerShow.com is brought to you by CrystalGraphics, the award-winning developer and market-leading publisher of rich-media enhancement products for presentations. Our product offerings include millions of PowerPoint templates, diagrams, animated 3D characters and more.