CIS 3718 - PowerPoint PPT Presentation

1 / 44

About This Presentation

Title:

CIS 3718

Description:

... Data Stream. processes one instruction at a time, processes data from one data stream at a ... Instruction, Multiple Data Stream. multiprocessors. 4/4/01 ... – PowerPoint PPT presentation

Number of Views:42

Avg rating:3.0/5.0

Slides: 45

Provided by: raoulhe

Category:

Tags: cis | stream

more less

Transcript and Presenter's Notes

Title: CIS 3718

1
Operating Systems
Chapter 11
CIS 3718

Chapter 11 Parallel Computational View
Machine Architecture Types
Fault Tolerance
Complexity Issues
Detecting Parallelism
Parallel O/S Organization

2
Introduction
Chapter 11
CIS 3718
This chapter is primarily concerned with parallel
processing concepts on machines with multiple
similar or identical processors with a shared
common memory. There are numerous computer
projects which involve highly parallel
supercomputers. These projects usually concern
themselves with things like expert systems,
artificial intelligence, or strategic planning,
and other computationally intensive research. The
goal of such projects is to push computing power
to its practical limits. An example of a
supercomputer which involves parallelism is the
CRAY supercomputer. Such machines are usually
designed to favor CPU intensive processes.
3
Architecture Classifications
Chapter 11
CIS 3718
Single Instruction, Single Data Stream processes
one instruction at a time, processes data from
one data stream at a time
SISD
SIMD
Single Instruction, Multiple Data Stream an array
processor, performs same operation simultaneously
on every array element
MISD
Multiple Instruction, Single Data
Stream currently unused architecture
MIMD
Multiple Instruction, Multiple Data
Stream multiprocessors
4
Pipelining
Chapter 11
CIS 3718
Pipelining a technique used in multiprocessing
which allows several different machine language
instructions to be active and in various states
of execution at the same time. A simple
implementation of pipelining is a work-ahead
system which gets and starts the decoding of the
next (or next several) machine language
instruction, while the current one is still
executing.
5
Vector Processing
Chapter 11
CIS 3718
A vector process requires a special vector
instruction which specifies the operation to be
performed and a list of operands or vectors on
which the vector instruction is to
operate. Vector processing requires pipelined
hardware. Example Vector Instruction (wx)
(y-z) (ab) Vector (3, 4, 8, 3, 7,
1) The CRAY 1, for example, had 13 pipelines
operating in parallel which could perform simple
arithmetic functions (in parallel).
6
Array Processing
Chapter 11
CIS 3718
Array Processing is a SIMD (single instruction,
multiple data stream) architecture. The same
instruction is performed simultaneously on all
the elements of an array. Array Processors have
been developed which can easily process arrays
with hundreds of thousands of array elements
simultaneously. Disadvantages - no advantage
for programs which use small or no arrays - no
advantage, if only specific elements of the array
need manipulated (cant act on items
selectively)
7
Data Flow Computers
Chapter 11
CIS 3718
Data flow computers perform many operations in
parallel, by breaking the instruction down into a
series of statements which can be executed
together (in parallel). Example w x y z
a b (w x) (y z) and (a b) would all
be done in parallel then ( c d
) then ( e f ) Uses parallel processing
to reduce the number of calculations needed.
8
Advantages of Multiprocessors
Chapter 11
CIS 3718
If one processor fails, it can be detected, and
the operating system notified. Once the
operating system has determined that the
processor can no longer be assigned to a process,
the remaining processors assume and distribute
the load. Systems which incorporate such
techniques are called fault tolerant
systems. Utilizing massively parallel
processors enables increased computing power at
lower cost by combining many processors, rather
than using one high speed, high cost processor
9
Fault Tolerance
Chapter 11
CIS 3718

Systems should be able to withstand hardware
failures , i.e., detect, recover from, and
continue with normal processing.
Mission critical systems must be able to continue
without significant interruption and therefore
must be designed as fault tolerant. Fault
tolerance is required in situations where
human intervention is not possible (space probe)
human intervention would be too slow to stop a
disaster

10
Fault Tolerance
Chapter 11
CIS 3718

Techniques used to facilitate fault tolerance
maintain duplicate of critical data or resources
(optimally) in multiple physically separate
locations
be able to run subsets of the hardware as
effectively and efficiently as the entire set of
hardware
any detected hardware problem should be able to
be corrected without significantly affecting or
halting the system
use idle processors to check for potential
(possible future) problems and prevent them

11
Complexity Issues
Chapter 11
CIS 3718
Parallel processing is not a panacea. Parallel
processing only allows solvable problems to be
completed in less time, it does not permit the
solving of extremely complex problems, if the
problem is such that it size demands exponential
leaps in processing power. For example,
NP-Complete (non-deterministic polynomials) are
problems whose computation time is an exponential
function of the size of the problem. Parallel
processors and the power they bring are of little
help in solving such problems because they are
intractable.
12
Complexity Issues
Chapter 11
CIS 3718
Intractable problems are problems which resist
attempts to direct, control, shape, improve or
modify them for computational speed. Other
problems, although large, are not as foreboding,
e.g., where computation time is expresses as a
polynomial of size, and can therefore benefit
from parallel processing, are called tractable
problems.
13
Chapter 11
Detecting Parallelism
CIS 3718
Detecting parallelism has to do with who/what
determines that a problem is suitable for solving
with parallel processing. Explicit parallelism
a programmer/analyst determines/indicates which
items, problems, etc. can be processed in
parallel and states this explicitly in his/her
code using the programming languages parallel
construct (parbegin ... parend)
14
Detecting Parallelism
Chapter 11
CIS 3718

Disadvantages of explicit parallelism
defeats the purpose of having an operating system
because it is supposed to be our resource manager
time consuming for programmer/analyst
error prone
difficult to test and debug
difficult to maintain/modify
usually results in only the most obvious
situations be coded for parallel processing

15
Detecting Parallelism
Chapter 11
CIS 3718
Implicit parallelism operating system
algorithms, with the aid of special compilers and
computer hardware (special registers, pipelines,
etc.) determine what processes can run in
parallel. When compilers are used to detect
parallelism, two common techniques are used to
detect and identify parallel structures Loop
Distribution Tree Height Reduction
16
Loop Distribution
Chapter 11
CIS 3718
Loop distribution detects when statements in a
loop body are such that they may be able to be
performed in parallel. Example for x 1 to
5 a(x) b(x) c(x) next
x Sequential processing causes 5 cycles through
the processor, one for each iteration of the
loop. Parallel processing would perform all
statements in the loop simultaneously (as long as
5 processors were available). When the compiler
detects such loops, it automatically converts the
code into a section of parallel code.
17
Tree Height Reduction
Chapter 11
CIS 3718
This method is used to detect parallelism in
algebraic expressions and produce object code
which indicates which functions may be performed
in parallel. This works similar to the data flow
computer model example shown previously, here the
compiler is detecting and indicating the
parallelism. Tree Height Reduction takes
advantage of mathematical properties such as the
associative and the communicative
laws. Associative Law in addition/multiplication
((ab)c)d) is the same as (ab)
(cd) Communicative Law in addition/multiplicatio
n a b is the same as b a Recall the
quadratic equation example from Chapter 4. It
uses a form of tree height reduction.
18
CIS 3718
Tree Height Reduction
Chapter 11
Tree Height reduction would determine that
processes S1, S2, S3, and S4 could be performed
in parallel.
S9
S8
S7
S6
S5
x (-b ((b2 4 a c) .5)) /
2 a
S4
S1
S2
S3
19
Never Wait Rule
Chapter 11
CIS 3718
The Never Wait Rule states that it is always
better to give a processor a task than to let it
sit idle. For example we might use an idle
processor to calculate various outcomes of
short/non-complex conditionals with the notion
that when a particular outcome eventually occurs,
it has already been calculated (calculate both
the T and F branches of an IF statement before
the IF is executed) use the idle processor to
perform system level diagnostics for use in fault
tolerance
20
Interconnection Methods
Chapter 11
CIS 3718

Various methods are used to connect multiple
processors to storage.
These methods consider two or more equal
processors having a shared access to a common
main storage, shared access to I/O channels,
etc., ALL under the control of ONE operating
system. We will consider each of these methods
briefly
Shared Bus
Crossbar Switch Matrix
Hypercube
Multistage Networks

21
Shared Bus
Chapter 11
CIS 3718
A shared bus interconnection scheme has a single
path between all devices. Access control is via
a bus interface often built into the individual
units or a separate interface which is connected
directly to the bus, to which the unit is also
connected via cables. (The latter case, the bus
interface is often called an interface or adapter
card.)
CPU 1
CPU 2
CPU 3
SHARED MEMORY
22
Shared Bus
Chapter 11
CIS 3718

Shared bus operates using an addressing scheme
for the devices and message passing techniques.
Disadvantages
only one transmission can be handled at a time
if the bus fails, the entire system fails
system speed is determined by the speed of the
bus
bus contention can cause system performance
problems

23
Crossbar-Switch Matrix
Chapter 11
CIS 3718
This method of interconnection involves providing
a directly wired path from any given device to
all other devices. This is a good method for
performance, but the complexity and cost involved
in constructing such a hardware switch for more
than just a few devices become quickly
prohibitive.
CPU 2
CPU 3
CPU 1
SHARED MEMORY
24
Hypercube
Chapter 11
CIS 3718
Hypercube is a multi-dimensional interconnection
scheme which is reasonably economical. Hypercube
is implemented as as a series of squares/cubes
connected at their vertices.
Memory
CPU 3
CPU 2
CPU 1
25
Multistage Network
Chapter 11
CIS 3718
Multistage network connections uses hubs. A
unit can communicate with any other unit. The
complexity of the interconnection scheme is
reduced. A message may have to travel through
several hubs (be switched several times) to get
to its destination. System performance can be
affected by contention at the hubs when there is
an increased amount of switching that is
necessary.
26
Multistage Network
Chapter 11
CIS 3718
CPU
CPU
CPU
CPU
CPU
HUB
CPU
CPU
HUB
SHARED MEMORY
HUB
CPU
CPU
CPU
HUB
CPU
CPU
CPU
CPU
CPU
CPU
27
Loosely/Tightly Coupled Systems
Chapter 11
CIS 3718
Loosely-Coupled multiprocessing involves
connecting two or more independent computers AND
operating systems via a communication line. For
the most part, the systems function independently
of each other with mutual file access and minimal
offloading of processes to the other processor.
Communication is by message passing or
RPC. Tightly-Coupled systems use a single shared
storage and a single operating system which is in
control of ALL the systems resources.
Communications are handled via shared memory.
The contention point in this scheme is the shared
memory. The contention is usually alleviated by
using a combining switch which requires only a
single access for multiple references to the same
memory location.
28
Multiprocessor O/S Organizations
Chapter 11
CIS 3718

There are various techniques used to structure
operating systems which deal with
multiprocessors. We will look at these three
Master/Drone
Separate Executives
Symmetrical Organization

29
Master/Drone Organization
Chapter 11
CIS 3718
Master/Drone multiprocessor organization
designates ONE of the processors as a controlling
processor (master) all other processors are
designated as drones. Master performs I/O AND
computation (only the master has access to the
operating system) Drone perform ONLY computations
and must call the master for I/O (user processes
only) If a drone fails, there is fault tolerance,
if the master fails, the system fails.
30
Master/Drone Organization
Chapter 11
CIS 3718
I/O Channel
Primary Storage
Master Processor
Drone Processor
Drone Processor
Drone Processor
Drone Processor
Drone Processor
31
Separate Executives
Chapter 11
CIS 3718

In this organization, each processor has its own
operating system.
Once a process is assigned to a particular
processor, it uses that processor until
completion.
Process tables and other variables that are
shared among all operating systems must be
controlled using mutual exclusion algorithms.
Failure of a single processor is not catastrophic
(fault tolerant).
Drawback
Processors do not cooperate on the execution of a
single process. A longer process could take
advantage of idle processors, but this system
does not permit a given process to run on
anything other than its original processor.

32
Separate Executives
Chapter 11
CIS 3718
I/O Channel
O/S 1
Processor
O/S 1
O/S 3
O/S 2
Primary Storage
Processor
O/S 2
O/S 3
Processor
33
Symmetrical Organization
Chapter 11
CIS 3718

In this organization method all processors are
identical and have identical access rights to
storage and I/O units, however there is only ONE
operating system.
Most memory conflicts are resolved by the
hardware.
Conflicts involving system-wide data are resolved
by mutual exclusion software algorithms.
Very reliable. Failure in one processor has no
impact. O/S marks processor as unavailable and
proceeds to work with remaining processors
(graceful degradation)
Different processors take turns at owning the
operating system. The processor currently in
possession of the operating system is called the
executive processor. Only one processor may own
the operating system at a time. Contention may
occur while processors wait for the operating
system resource.

34
Separate Executives
Chapter 11
CIS 3718
I/O Channel
O/S
Executive Processor
O/S
Primary Storage
Processor
Processor
35
Case Study CM
Chapter 11
CIS 3718
CM uses processor-storage pairs called computer
modules. Computer modules are grouped into
clusters. Modules message pass via intra-cluster
buses (within cluster). Clusters message pass via
inter cluster buses (between clusters). The
cost (time/overhead) involved in a reference to
storage depends upon whether the data is in the
modules local storage, the clusters local
storage, or in the storage of another module in
another cluster. The success of this system
depends upon insuring that the majority of
references are local, either to the module or
cluster.
36
Computer Modules
Chapter 11
CIS 3718
Storage (S)
Storage (S)
Storage (S)
Storage (S)
Processor (P)
Processor (P)
Processor (P)
Processor (P)
Module
Module
Module
Module
Intra-Cluster Bus
Cluster Controller
Cluster
Inter-Cluster Bus
37
Computer Modules
Chapter 11
CIS 3718
Process on P6 calls for Storage S6 local
module Process on P6 calls for Storage S4 local
cluster Process on P6 calls for Storage S1
remote cluster
S1
S2
S3
P1
P2
P3
CC
C1
S4
S5
S6
S7
S8
S9
P4
P5
P6
P7
P8
P9
CC
C2
CC
C3
38
Butterfly Switch
Chapter 11
CIS 3718
The Butterfly Switch is a MIMD (multiple
instruction/multiple datastream interconnection
system). Each processor may execute independently
on private or shared data. Processors may access
each others data via shared global memory. Each
processor card contains a complete computer
system including real memory and a large virtual
address space, and a butterfly switch. Contention
and access to local vs. remote memory is almost
equally as fast due to a memory scattering
technique (storage interleaving). Each card has a
process node controller (PNC) which uses a
non-blocking packet switching technique to
send/receive messages from/to other switches.
39
The Connection Machine
Chapter 11
CIS 3718
The Connection Machine is a highly parallel
super-computer used for data/calculation
intensive processing. It is a SIMD (Single
Instruction/Multiple Datastream) matrix processor
which can have up to 64,000 processors and
execute billions and billions of instructions per
second. It uses four front end computers with
packet switching for interprocess
communication. A separate processor can be
assigned to every element in an array and
therefore process all elements simultaneously. The
front end machine stores and executes the
instructions, while the Connection Machine stores
and operates on the data. for x 1 to 50
000 a(x) a(x) 5 next
40
Operating Systems
Chapter 11
CIS 3718
1/18/01 CNN.com (Computerworld) IBM and a key
supercomputing research center today announced
plans to build a pair of high-performance Linux
clusters that will be built around Intels
Itanium and Pentium processors and provide two
teraflops of computing power for use in
scientific applications. The Pentium-based
system is scheduled to be installed next month at
the National Center for Supercomputing
Applications (NCSA) at the University of Illinois
at Urbana-Champaign. A companion setup using the
upcoming 64-bit Itanium chip is scheduled to
follow next summer. Together the two clusters
will consist of almost 700 IBM servers running
Linux. Dan Reed, NCSA director, said the
machines will provide the processing power to
allow researchers to further analyze Albert
Einsteins theory of relativity and to conduct
other research, for example, simulating the
violent collision of black holes and the
gravitational waves they produce.
41
Operating Systems
Chapter 11
CIS 3718
1/18/01 CNN.com (Computerworld),
continued Anytime you have a new, faster
machine, it opens up things you can explore that
simply werent feasible before, said Reed. Yes
you could solve these problems on your desktop,
but you may have to wait 10,000 years to get the
answer. The initial cluster will include 512 of
IBMs eServer x330 thin servers each equipped
with two 1GHz Pentium III processors and Red Hat
Inc.s version of Linux. Plans call for the
Itanium-based ssytem to be outfitted with 160
servers that will run TurboLinux Inc.s version
of the open source operating system. The two
systems will be linked together using cluster
interconnect technology developed by Myricom Inc.
in Arcadia, CA. The computer maker hopes that
the work being done at NCSA will eventually lead
to Linux-based applications for corporate users.
It is our intention to take this work forward
into commercial settings, said IBMs Dave
Gelardi, director of IBMs Deep Computing Linux
cluster group.
42
Operating Systems
Chapter 11
CIS 3718
3/30/01 CNN.com, Richard Stenger A new,
compact supercomputer could revolutionize the
technology industry, processing information 1,000
times faster than conventional computers.
Conducting billions of calculations at the same
time, the machines are also much faster and more
versatile than any supercomputers on the market,
according to NASA engineers using the so-called
hypercomputers. The new high-performance
computers, developed by Star Bridge Systems in
Midvale, Utah, replace traditional central
processing units with speciality circuit board
chips that can reconfigure themselves hundreds or
even thousands of times a second. This unique
feature maximizes the use of millions of
transistors (or gates) on the processors, unlike
traditional processors that use only a fraction
of their silicon for most applications. Langley
Research Center announced this week an agreement
to use one of the computers, known as HAL (Hyper
Algorithmic Logic)-15. The NASA facility in
Hampton, VA conducts research in aeronautics,
space technology and atmospheric sciences.
43
Operating Systems
Chapter 11
CIS 3718
3/30/01 CNN.com, continued Other customers
that will use the HAL-15 are the San Diego
Supercomputer Center, the DOD and Hollywood film
companies. Supercomputing rivals possess amazing
capabilities, but often take up entire rooms,
require constant temperature control and use lots
of cables and wires. The HAL-15 needs no more
space than a standard desktop computer and no
more electricity than a hair dryer. Star Bridge
Systems declined to discuss the price of the
HAL-15, but Langely worked out a nice deal It
wont cost us a penny, said Langley spokesperson
Bill Uher.
44
Operating Systems
Chapter 11
CIS 3718
End of Chapter 11

Write a Comment

User Comments (0)