Title: Mid3 Revision 2
1Mid3 Revision 2
CS147
Lecture 21
2Parallel Architectures
MIMD Machines
3From the beginning of time, computer scientists
have been challenging computers with larger and
larger problems. Eventually, computer processors
were combined together in parallel to work on the
same task together. This is parallel processing.
Types Of Parallel Processing
SISD Single Instruction stream, Single Data
stream MISD Multiple Instruction stream, Single
Data stream SIMD Single Instruction stream,
Multiple Data stream MIMD Multiple Instruction
stream, Multiple Data stream
4SISD
One piece of data is sent to one processor.
Ex To multiply one hundred numbers by the number
three, each number would be sent and calculated
until all one hundred results were calculated.
5MISD
One piece of data is broken up and sent to many
processor.
CPU
Data
CPU
Search
CPU
CPU
Ex A database is broken up into sections of
records and sent to several different processor,
each of which searches the section for a specific
key.
6SIMD
Multiple processors execute the same instruction
of separate data.
Ex A SIMD machine with 100 processors could
multiply 100 numbers, each by the number three,
at the same time.
7MIMD
Multiple processors execute different instruction
of separate data.
CPU
Data
Multiply
CPU
Data
Search
CPU
Data
Add
CPU
Data
Subtract
This is the most complex form of parallel
processing. It is used on complex simulations
like modeling the growth of cities.
8The Granddaddy of Parallel Processing
MIMD
9MIMD computers usually have a different program
running on every processor. This makes for a
very complex programming environment.
Whats doing what when?
What processor? Doing which task? At what time?
10Memory latency
The time between issuing a memory fetch and
receiving the response.
Simply put, if execution proceeds before the
memory request responds, unexpected results will
occur. What values are being used? Not the
ones requested!
11A similar problem can occur with instruction
executions themselves.
Synchronization The need to enforce the ordering
of instruction executions according to their data
dependencies.
Instruction b must occur before instruction a.
12Despite potential problems, MIMD can prove larger
than life.
MIMD Successes
IBM Deep Blue Computer beats professional chess
player.
Some may not consider this to be a fair example,
because Deep Blue was built to beat Kasparov
alone. It knew his play style so it could
counter is projected moves. Still, Deep Blues
win marked a major victory for computing.
13IBMs latest, a supercomputer that models nuclear
explosions.
IBM Poughkeepsie built the worlds fastest
supercomputer for the U. S. Department of Energy.
Its job was to model nuclear explosions.
14MIMD its the most complex, fastest, flexible
parallel paradigm. Its beat a world class chess
player at his own game. It models things that
few people understand. It is parallel processing
at its finest.
15(No Transcript)
16(No Transcript)
17(No Transcript)
18(No Transcript)
19(No Transcript)
20Midterm Gate Problem
Y
D Q¹ Q
I
T Q
Clock
21Start
0
1
0
0
0
0
1
1
0
0
Start
I Q¹ Q Y
22Clock Cycle 1
0
1
1
1
0
0
Note Q outputs are dependant on the state
of inputs present on the previous cycle.
1
0
1
1
I Q¹ Q Y
23Clock Cycle 2
0
1
1
1
0
0
Note Q outputs are dependant on the state
of inputs present on the previous cycle.
1
0
1
1
I Q¹ Q Y
24Clock Cycle 3
1
0
1
1
1
1
Note Q outputs are dependant on the state
of inputs present on the previous cycle.
0
0
1
1
I Q¹ Q Y
25Clock Cycle 4
1
0
1
1
1
1
Note Q outputs are dependant on the state
of inputs present on the previous cycle.
0
0
1
1
I Q¹ Q Y
26Clock Cycle 5
0
1
0
1
0
0
Note Q outputs are dependant on the state
of inputs present on the previous cycle.
1
1
0
0
I Q¹ Q Y
27Clock Cycle 6
0
1
1
1
1
0
Note Q outputs are dependant on the state
of inputs present on the previous cycle.
1
0
0
0
I Q¹ Q Y
28Clock Cycle 7
0
1
0
1
0
0
Note Q outputs are dependant on the state
of inputs present on the previous cycle.
1
1
0
0
I Q¹ Q Y
29Clock Cycle 8
0
1
1
1
0
0
Note Q outputs are dependant on the state
of inputs present on the previous cycle.
1
0
1
1
I Q¹ Q Y
30Some commonly used components
- Decoders n inputs, 2n outputs.
- the inputs are used to select which output is
turned on. At any time exactly one output is on. - Multiplexors 2n inputs, n selection bits, 1
output. - the selection bits determine which input will
become the output. - Adder 2n inputs, 2n outputs.
- Computer Arithmetic.
31Multiplexer
- Selects binary information from one of many
input lines and directs it to a single output
line. - Also known as the selector circuit,
- Selection is controlled by a particular set of
inputs lines whose depends on the of the data
input lines. - For a 2n-to-1 multiplexer, there are 2n data
input lines and n selection lines whose bit
combination determines which input is selected.
32MUX
Enable
2n Data Inputs
Data Output
n
Input Select
33Remember the 2 4 Decoder?
Sel(3)
S1
Sel(2)
Sel(1)
S0
Sel(0)
Mutually Exclusive (Only one O/P asserted at any
time
344 to 1 MUX
DataFlow
D3D0
Dout
4
Control
4
2 - 4 Decoder
Sel(30)
2
S1S0
354-to-1 MUX (Gate level)
Control Section
Three of these signal inputs will always be 0.
The other will depend on the data value selected
36Multiplexer (cont.)
- Until now, we have examined single-bit data
selected by a MUX. What if we want to select
m-bit data/words?? Combine MUX blocks in
parallel with common select and enable signals - Example Construct a logic circuit that selects
between 2 sets of 4-bit inputs (see next slide
for solution).
37Example Quad 2-to-1 MUX
- Uses four 4-to-1 MUXs with common select (S) and
enable (E). - Select line chooses between Ais and Bis. The
selected four-wire digital signal is sent to the
Yis - Enable line turns MUX on and off (E1 is on).
38Implementing Boolean functions with Multiplexers
- Any Boolean function of n variables can be
implemented using a 2n-1-to-1 multiplexer. A MUX
is basically a decoder with outputs ORed
together, hence this isnt surprising. - The SELECT signals generate the minterms of the
function. - The data inputs identify which minterms are to be
combined with an OR.
39Example
- F(X,Y,Z) XYZ XYZ XYZ XYZ
Sm(1,2,6,7) - There are n3 inputs, thus we need a 22-to-1 MUX
- The first n-1 (2) inputs serve as the selection
lines
40Efficient Method for implementing Boolean
functions
- For an n-variable function (e.g., f(A,B,C,D))
- Need a 2n-1 line MUX with n-1 select lines.
- Enumerate function as a truth table with
consistent ordering of variables (e.g., A,B,C,D) - Attach the most significant n-1 variables to the
n-1 select lines (e.g., A,B,C) - Examine pairs of adjacent rows (only the least
significant variable differs, e.g., D0 and D1). - Determine whether the function output for the
(A,B,C,0) and (A,B,C,1) combination is (0,0),
(0,1), (1,0), or (1,1). - Attach 0, D, D, or 1 to the data input
corresponding to (A,B,C) respectively.
41Another Example
- Consider F(A,B,C) ?m(1,3,5,6). We can implement
this function using a 4-to-1 MUX as follows. - The index is ABC. Apply A and B to the S1 and S0
selection inputs of the MUX (A is most sig, S1 is
most sig.) - Enumerate function in a truth table.
42MUX Example (cont.)
A B C F
0 0 0 0
0 0 1 1
0 1 0 0
0 1 1 1
1 0 0 0
1 0 1 1
1 1 0 1
1 1 1 0
When AB0, FC
When A0, B1, FC
When A1, B0, FC
When AB1, FC
43MUX implementation of F(A,B,C) ?m(1,3,5,6)
A
B
C
C
F
C
C
441 input Decoder
Decoder
O0
I
O1
Treat I as a 1 bit integer i. The ith output will
be turned on (Oi1), the other one off.
451 input Decoder
O0
I
O1
462 input Decoder
Decoder
O0
I0
O1
O2
I1
O3
Treat I0I1 as a 2 bit integer i. The ith output
will be turned on (Oi1), all the others off.
472 input Decoder
I1
I0
O0 !I0 !I1
O1 !I0 I1
O2 I0 !I1
O3 I0 I1
483 Input Decoder
Decoder
O0
I0
O1
O2
I1
O3
O4
O5
I2
O6
O7
493-Decoder Partial Implementation
I2
I1
I0
O0
O1
. . .
502 Input Multiplexor
Inputs I0 and I1 Selector S Output O If S is
a 0 OI0 If S is a 1 OI1
Mux
I0
O
I1
S
512-Mux Logic Design
I1
I0
S
I0 !S
O
I1 S
524 Input Multiplexor
Inputs I0 I1 I2 I3 Selectors S0 S1 Output O
Mux
I0
I1
O
I2
S0 S1 O
0 0 I0
0 1 I1
1 0 I2
1 1 I3
I3
S0
S1
53One Possible 4-Mux
2-Decoder
S0
I0
I1
S1
O
I2
I3
54Adder
- We want to build a box that can add two 32 bit
numbers. - Assume 2s complement representation
- We can start by building a 1 bit adder.
55Addition
- We need to build a 1 bit adder
- compute binary addition of 2 bits.
- We already know that the result is 2 bits.
A B O0 O1
0 0 0 0
0 1 0 1
1 0 0 1
1 1 1 0
This is addition!
A B O0 O1
56One Implementation
A B
A
O0
B
!A
(!A B) (A !B)
B
O1
A
!B
57Binary addition and our adder
1
1
Carry
01001 01101
10110
- What we really want is something that can be used
to implement the binary addition algorithm. - O0 is the carry
- O1 is the sum
58What about the second column?
1
1
Carry
01001 01101
10110
- We are adding 3 bits
- new bit is the carry from the first column.
- The output is still 2 bits, a sum and a carry
59Truth Table for Addition
A B Carry In Carry Out Sum
0 0 0 0 0
0 0 1 0 1
0 1 0 0 1
0 1 1 1 0
1 0 0 0 1
1 0 1 1 0
1 1 0 1 0
1 1 1 1 1
60Swapping
Disk
Monitor
User 1
User Partition
61Swapping
Disk
Monitor
User 1
User Partition
User 1
62Swapping
Disk
Monitor
User 1
User Partition
User 1
User 2
63Swapping
Disk
Monitor
User 1
User Partition
User 2
User 2
64Swapping
Disk
Monitor
User 1
User Partition
User 2
User 2
65Swapping
Disk
Monitor
User 1
User Partition
User 1
User 2
66Paging Request
67Paging
68Paging
69Paging
70Page Mapping Hardware
Virtual Memory
Virtual Address (P,D)
P
Page Table
D
P
P?F
Physical Memory
F
Physical Address (F,D)
D
71Page Mapping Hardware
Virtual Memory
Virtual Address (004006)
Page Table
004
006
4
4?5
Physical Memory
005
Physical Address (F,D)
Page size 1000 Number of Possible Virtual Pages
1000 Number of Page Frames 8
006
72Page Fault
- Access a virtual page that is not mapped into any
physical page - A fault is triggered by hardware
- Page fault handler (in OSs VM subsystem)
- Find if there is any free physical page available
- If no, evict some resident page to disk (swapping
space) - Allocate a free physical page
- Load the faulted virtual page to the prepared
physical page - Modify the page table
73Placement Policy
- Determines where in real memory a process piece
is to reside - Important in a segmentation system
- Paging or combined paging with segmentation
hardware performs address translation
74Replacement Policy
- Placement Policy
- Which page is replaced?
- Page removed should be the page least likely to
be referenced in the near future - Most policies predict the future behavior on the
basis of past behavior
75Replacement Policy
- Frame Locking
- If frame is locked, it may not be replaced
- Kernel of the operating system
- Control structures
- I/O buffers
- Associate a lock bit with each frame
76Basic Replacement Algorithms
- Optimal policy
- Selects for replacement that page for which the
time to the next reference is the longest - Impossible to have perfect knowledge of future
events
77Basic Replacement Algorithms
- Least Recently Used (LRU)
- Replaces the page that has not been referenced
for the longest time - By the principle of locality, this should be the
page least likely to be referenced in the near
future - Each page could be tagged with the time of last
reference. This would require a great deal of
overhead.
78Basic Replacement Algorithms
- First-in, first-out (FIFO)
- Treats page frames allocated to a process as a
circular buffer - Pages are removed in round-robin style
- Simplest replacement policy to implement
- Page that has been in memory the longest is
replaced - These pages may be needed again very soon
79Basic Replacement Algorithms
- Clock Policy
- Additional bit called a use bit
- When a page is first loaded in memory, the use
bit is set to 1 - When the page is referenced, the use bit is set
to 1 - When it is time to replace a page, the first
frame encountered with the use bit set to 0 is
replaced. - During the search for replacement, each use bit
set to 1 is changed to 0
80(No Transcript)