CS1104

About This Presentation

Title:

CS1104

Description:

Hence, clock cycle is 8ns. Clock cycle is determined by the longest path in the ... However, several other instructions could fit into a shorter clock cycle ... – PowerPoint PPT presentation

Number of Views:20

Avg rating:3.0/5.0

Slides: 41

Provided by: samarjitch1

Category:

more less

Transcript and Presenter's Notes

Title: CS1104

1
CS1104 Computer Organization

PART 2 Computer Architecture
Lecture 7
Multicycle Control and Datapath

2
Single Cycle Implementation

Calculate cycle time assuming negligible delays
except
memory (2ns), ALU and adders (2ns), register file
access (1ns)

3
Why single cycle implementation is not used?

Assume the following access times Memory (2ns),
ALU adders (2ns), reg. file access (1ns)
Fixed length clock longest instruction is the
lw which requires 8 ns
Load uses five functional units instruction
memory, register file, ALU, data memory, register
file once again
Hence, clock cycle is 8ns
Clock cycle is determined by the longest path in
the machine (lw in this case)
However, several other instructions could fit
into a shorter clock cycle

4
Why single cycle implementation is not used?

R-type Instruction fetch, Reg access, ALU, Reg
access
Load Instruction fetch, Reg access, ALU, Mem
access, Reg access
Store Instruction fetch, Reg access, ALU, Mem
access
Branch Instruction fetch, Reg access, ALU
Jump Instruction fetch

Note the difference between Load and Jump. This
difference becomes even more significant of there
are floating-point instructions.
5
Multicycle implementation Basics

In the previous slide, the execution of each
instruction was broken into several steps
In a multicycle implementation, each such step
executes in 1 clock cycle
Hence, different instructions require different
number of clock cycles
Advantages
More efficient
A functional unit can be used more than once per
instruction, as long as it is used in different
clock cycles (so less hardware is required)
But the design is more complex

6
Single-Cycle versus Multicycle

In a multicycle architecture
Single memory unit for both instruction and data
Single ALU, rather than one ALU and two adders
One or more registers added after each functional
unit to hold the output of that unit, until the
value is used in the next clock cycle

Multicycle architecture
Single cycle architecture
7
Multicycle implementation Additional Registers

Instruction Register, Memory Data Register,
Registers A and B in front of the Reg file and
ALUOut (reg in front of the ALU)
At the end of each clock cycle, the data to be
used in subsequent clock cycles is stored in a
state element
data to be used in subsequent instructions in a
later clock cycle is stored in a
programmer-visible state element like reg file,
PC or memory
data used by the same instruction in a later
cycle is stored in one of the additional
registers

8
Multicycle implementation Basics

Each clock cycle can accommodate at most one of
the following operations
a memory access
a register file access (two reads or one write)
an ALU operation
Hence, any data produced by one of the above
three functional units must be saved into a
temporary register for use in a later cycle

9
Multicycle implementation Additional Registers
I
n
s
t
r
u
c
t
i
o
n
r
e
g
i
s
t
e
r
D
a
t
a
P
C
A
d
d
r
e
s
s
A
R
e
g
i
s
t
e
r

I
n
s
t
r
u
c
t
i
o
n
M
e
m
o
r
y
R
e
g
i
s
t
e
r
s
A
L
U
A
L
U
O
u
t
o
r

d
a
t
a
R
e
g
i
s
t
e
r

M
e
m
o
r
y
d
a
t
a

B
D
a
t
a
r
e
g
i
s
t
e
r
R
e
g
i
s
t
e
r

All registers except the Instruction register
(IR) hold data only between a pair of adjacent
clock cycles (and hence do not need a write
control signal)
10
Multicycle implementation Examples
ALU used to compute PC PC 4
The same ALU is also used for R-type
instructions, branch address computation,
computing memory address in the case of lw/sw
instructions
11
Multicycle Approach Summary

Break up the instructions into steps, each step
takes a cycle
balance the amount of work to be done
restrict each cycle to use only one major
functional unit
At the end of a cycle
store values for use in later cycles (easiest
thing to do)
introduce additional internal registers
Notice we distinguish
processor state programmer visible registers
internal state programmer invisible registers
(like IR, MDR, A, B, and ALUout)

12
Multicycle implementation Steps

Instruction fetch
Instruction decode and register fetch
Execution, memory address computation or branch
completion
Memory access or R-type instruction completion
Memory read completion

common for all instructions
INSTRUCTIONS TAKE FROM 3 - 5 CYCLES!
13
Step 1 Instruction Fetch

Use PC to get instruction and put it in the
Instruction Register
Increment the PC by 4 and put the result back in
the PC
Can be described succinctly using RTL
"Register-Transfer Language" IR
MemoryPC PC PC 4Can we figure out the
values of the control signals?What is the
advantage of updating the PC now?

This step is common for all instructions
(obviously!)
14
Step 2 Instruction Decode and Register Fetch

Read registers rs and rt in case we need them
Compute the branch address in case the
instruction is a branch
Previous two actions are done optimistically (no
harm is done)
RTL A RegIR25-21 B RegIR20-16
ALUOut PC(sign-extend(IR15-0)ltlt 2)
We aren't setting any control lines based on the
instruction type (we are busy "decoding" it in
our control logic)

This step is also common for all instructions
15
Step 3 (instruction dependent)

ALU is performing one of four functions, based on
instruction type
Memory Reference ALUOut A
sign-extend(IR15-0)
R-type ALUOut A op B
Branch if (AB) PC ALUOut
Jump
PC PC31-28 (IR25-0ltlt2)

16
Step 4 (R-type or memory-access)

Loads and stores access memory MDR
MemoryALUOut or MemoryALUOut B
R-type instructions finish RegIR15-11
ALUOutThe write actually takes place at the
end of the cycle on the edge

17
Step 5 Write-back step

Memory read completion step RegIR20-16
MDR

18
Summary execution steps
Steps taken to execute any instruction class
19
Determining the values of the control signals
for each of Steps 1 5(we will show only Step
1)
20
Step 1 Instruction Fetch Step

IR MemoryPC
PC PC 4

MemRead1
IorD0
IRWrite1
21
Step 1 Instruction Fetch Step

IR MemoryPC
PC PC 4

Increment PC by 4 ALUSrcA0 ALUSrcB01
ALUOp00 (for ALU to ADD)
22
Step 1 Instruction Fetch Step

IR MemoryPC
PC PC 4

Store incremented instruction address back to PC
PCSource00 PCWrite1
23
Determining the values of the control
signalsSteps 2 5 are similar(Please work
them out on your own)
24
Designing the Control Unit
25
Finite state machines (FSMs)

Finite state machines
a set of states
next state function (determined by current state
and the input)
output function (determined by current state and
possibly input)

26
Finite state machines FSMs)

State is an abstraction
You may consider the state of a FSM to be a
variable or a function, or a collection of
variables or functions
If the output depends only on the current state,
then it is a Moore machine. If the output depends
on the state and the input then it is a Mealy
machine

output 0
output 1
This machine has two states. How does the output
behave when the input 1?
27
Moore machine

The output function depends only on the current
state
The next state function depends on the current
state and the input

28
Implementing the Control

Value of control signals is dependent upon
what instruction is being executed
which step is being performed
Use the information we have accumulated (ex
control signals for Step 1) to specify a finite
state machine (FSM)
specify the finite state machine graphically, or
use microprogramming
Implementation can be derived from specification

29
FSM high level view
Start/reset
Instruction fetch, decode and register fetch
Memory access instructions
R-type instructions
Branch instruction
Jump instruction
30
FSM implementation of the control unit
31
FSM for memory reference instructions
32
FSMs for other instructions
Branch instruction
Jump instruction
R-type instructions
33
The Full FSM for the Control Unit
Obtained by simply joining the FSMs in the
previous slides
34
Finite State Machine for Control

Implementation

35
PLA (programmed logic array) Implementation
opcode
AND plane (computes minterms)
current state
datapath control
OR plane (computes sum terms)
next state
36
ROM Implementation

ROM "Read Only Memory"
values of memory locations are fixed ahead of
time
A ROM can be used to implement a truth table
if the address is m-bits, we can address 2m
entries in the ROM
our outputs are the bits of data that the address
points to

address
data
ROM
0 0 0 0 0 1 1 0 0 1 1 1 0 0 0 1 0 1 1 0 0 0 1 1 1
0 0 0 1 0 0 0 0 0 0 1 0 1 0 0 0 1 1 1 0 0 1 1
0 1 1 1 0 1 1 1
n bits
m bits
m is the "heigth", and n is the "width"
37
ROM Implementation

How many inputs are there? 6 bits for opcode, 4
bits for state 10 address lines (i.e., 210
1024 different addresses)
How many outputs are there? 16 datapath-control
outputs, 4 state bits 20 outputs
ROM is 210 x 20 20K bits (very large and a
rather unusual size)
Rather wasteful, since for lots of the entries,
the outputs are the same i.e., opcode is often
ignored

38
ROM Implementation

Cheaper implementation
Exploit the fact that the FSM is a Moore machine
gt
Control outputs only depend on current state and
not on other incoming control signals !
Next state depends on all inputs
Break up the table into two parts 4 state bits
tell you the 16 outputs, 24 x 16 bits of
ROM 10 bits tell you the 4 next state bits,
210 x 4 bits of ROM Total number of bits
4.3K bits of ROM

39
Other implementation options