Title: CS1104
1CS1104 Computer Organization
- PART 2 Computer Architecture
- Lecture 7
- Multicycle Control and Datapath
2Single Cycle Implementation
- Calculate cycle time assuming negligible delays
except - memory (2ns), ALU and adders (2ns), register file
access (1ns)
3Why single cycle implementation is not used?
- Assume the following access times Memory (2ns),
ALU adders (2ns), reg. file access (1ns) - Fixed length clock longest instruction is the
lw which requires 8 ns - Load uses five functional units instruction
memory, register file, ALU, data memory, register
file once again - Hence, clock cycle is 8ns
- Clock cycle is determined by the longest path in
the machine (lw in this case) - However, several other instructions could fit
into a shorter clock cycle
4Why single cycle implementation is not used?
- R-type Instruction fetch, Reg access, ALU, Reg
access - Load Instruction fetch, Reg access, ALU, Mem
access, Reg access - Store Instruction fetch, Reg access, ALU, Mem
access - Branch Instruction fetch, Reg access, ALU
- Jump Instruction fetch
Note the difference between Load and Jump. This
difference becomes even more significant of there
are floating-point instructions.
5Multicycle implementation Basics
- In the previous slide, the execution of each
instruction was broken into several steps - In a multicycle implementation, each such step
executes in 1 clock cycle - Hence, different instructions require different
number of clock cycles - Advantages
- More efficient
- A functional unit can be used more than once per
instruction, as long as it is used in different
clock cycles (so less hardware is required) - But the design is more complex
6Single-Cycle versus Multicycle
- In a multicycle architecture
- Single memory unit for both instruction and data
- Single ALU, rather than one ALU and two adders
- One or more registers added after each functional
unit to hold the output of that unit, until the
value is used in the next clock cycle
Multicycle architecture
Single cycle architecture
7Multicycle implementation Additional Registers
- Instruction Register, Memory Data Register,
Registers A and B in front of the Reg file and
ALUOut (reg in front of the ALU) - At the end of each clock cycle, the data to be
used in subsequent clock cycles is stored in a
state element - data to be used in subsequent instructions in a
later clock cycle is stored in a
programmer-visible state element like reg file,
PC or memory - data used by the same instruction in a later
cycle is stored in one of the additional
registers
8Multicycle implementation Basics
- Each clock cycle can accommodate at most one of
the following operations - a memory access
- a register file access (two reads or one write)
- an ALU operation
- Hence, any data produced by one of the above
three functional units must be saved into a
temporary register for use in a later cycle
9Multicycle implementation Additional Registers
I
n
s
t
r
u
c
t
i
o
n
r
e
g
i
s
t
e
r
D
a
t
a
P
C
A
d
d
r
e
s
s
A
R
e
g
i
s
t
e
r
I
n
s
t
r
u
c
t
i
o
n
M
e
m
o
r
y
R
e
g
i
s
t
e
r
s
A
L
U
A
L
U
O
u
t
o
r
d
a
t
a
R
e
g
i
s
t
e
r
M
e
m
o
r
y
d
a
t
a
B
D
a
t
a
r
e
g
i
s
t
e
r
R
e
g
i
s
t
e
r
All registers except the Instruction register
(IR) hold data only between a pair of adjacent
clock cycles (and hence do not need a write
control signal)
10Multicycle implementation Examples
ALU used to compute PC PC 4
The same ALU is also used for R-type
instructions, branch address computation,
computing memory address in the case of lw/sw
instructions
11Multicycle Approach Summary
- Break up the instructions into steps, each step
takes a cycle - balance the amount of work to be done
- restrict each cycle to use only one major
functional unit - At the end of a cycle
- store values for use in later cycles (easiest
thing to do) - introduce additional internal registers
- Notice we distinguish
- processor state programmer visible registers
- internal state programmer invisible registers
(like IR, MDR, A, B, and ALUout)
12Multicycle implementation Steps
- Instruction fetch
- Instruction decode and register fetch
- Execution, memory address computation or branch
completion - Memory access or R-type instruction completion
- Memory read completion
common for all instructions
INSTRUCTIONS TAKE FROM 3 - 5 CYCLES!
13Step 1 Instruction Fetch
- Use PC to get instruction and put it in the
Instruction Register - Increment the PC by 4 and put the result back in
the PC - Can be described succinctly using RTL
"Register-Transfer Language" IR
MemoryPC PC PC 4Can we figure out the
values of the control signals?What is the
advantage of updating the PC now?
This step is common for all instructions
(obviously!)
14Step 2 Instruction Decode and Register Fetch
- Read registers rs and rt in case we need them
- Compute the branch address in case the
instruction is a branch - Previous two actions are done optimistically (no
harm is done) - RTL A RegIR25-21 B RegIR20-16
ALUOut PC(sign-extend(IR15-0)ltlt 2) - We aren't setting any control lines based on the
instruction type (we are busy "decoding" it in
our control logic)
This step is also common for all instructions
15Step 3 (instruction dependent)
- ALU is performing one of four functions, based on
instruction type - Memory Reference ALUOut A
sign-extend(IR15-0) - R-type ALUOut A op B
- Branch if (AB) PC ALUOut
- Jump
- PC PC31-28 (IR25-0ltlt2)
16Step 4 (R-type or memory-access)
- Loads and stores access memory MDR
MemoryALUOut or MemoryALUOut B - R-type instructions finish RegIR15-11
ALUOutThe write actually takes place at the
end of the cycle on the edge
17Step 5 Write-back step
- Memory read completion step RegIR20-16
MDR
18Summary execution steps
Steps taken to execute any instruction class
19Determining the values of the control signals
for each of Steps 1 5(we will show only Step
1)
20Step 1 Instruction Fetch Step
MemRead1
IorD0
IRWrite1
21Step 1 Instruction Fetch Step
Increment PC by 4 ALUSrcA0 ALUSrcB01
ALUOp00 (for ALU to ADD)
22Step 1 Instruction Fetch Step
Store incremented instruction address back to PC
PCSource00 PCWrite1
23Determining the values of the control
signalsSteps 2 5 are similar(Please work
them out on your own)
24Designing the Control Unit
25Finite state machines (FSMs)
- Finite state machines
- a set of states
- next state function (determined by current state
and the input) - output function (determined by current state and
possibly input)
26Finite state machines FSMs)
- State is an abstraction
- You may consider the state of a FSM to be a
variable or a function, or a collection of
variables or functions - If the output depends only on the current state,
then it is a Moore machine. If the output depends
on the state and the input then it is a Mealy
machine
output 0
output 1
This machine has two states. How does the output
behave when the input 1?
27Moore machine
- The output function depends only on the current
state - The next state function depends on the current
state and the input
28Implementing the Control
- Value of control signals is dependent upon
- what instruction is being executed
- which step is being performed
- Use the information we have accumulated (ex
control signals for Step 1) to specify a finite
state machine (FSM) - specify the finite state machine graphically, or
- use microprogramming
- Implementation can be derived from specification
29FSM high level view
Start/reset
Instruction fetch, decode and register fetch
Memory access instructions
R-type instructions
Branch instruction
Jump instruction
30FSM implementation of the control unit
31FSM for memory reference instructions
32FSMs for other instructions
Branch instruction
Jump instruction
R-type instructions
33The Full FSM for the Control Unit
Obtained by simply joining the FSMs in the
previous slides
34Finite State Machine for Control
35PLA (programmed logic array) Implementation
opcode
AND plane (computes minterms)
current state
datapath control
OR plane (computes sum terms)
next state
36ROM Implementation
- ROM "Read Only Memory"
- values of memory locations are fixed ahead of
time - A ROM can be used to implement a truth table
- if the address is m-bits, we can address 2m
entries in the ROM - our outputs are the bits of data that the address
points to
address
data
ROM
0 0 0 0 0 1 1 0 0 1 1 1 0 0 0 1 0 1 1 0 0 0 1 1 1
0 0 0 1 0 0 0 0 0 0 1 0 1 0 0 0 1 1 1 0 0 1 1
0 1 1 1 0 1 1 1
n bits
m bits
m is the "heigth", and n is the "width"
37ROM Implementation
- How many inputs are there? 6 bits for opcode, 4
bits for state 10 address lines (i.e., 210
1024 different addresses) - How many outputs are there? 16 datapath-control
outputs, 4 state bits 20 outputs - ROM is 210 x 20 20K bits (very large and a
rather unusual size) - Rather wasteful, since for lots of the entries,
the outputs are the same i.e., opcode is often
ignored
38ROM Implementation
- Cheaper implementation
- Exploit the fact that the FSM is a Moore machine
gt - Control outputs only depend on current state and
not on other incoming control signals ! - Next state depends on all inputs
- Break up the table into two parts 4 state bits
tell you the 16 outputs, 24 x 16 bits of
ROM 10 bits tell you the 4 next state bits,
210 x 4 bits of ROM Total number of bits
4.3K bits of ROM
39Other implementation options
- Microprogramming
- Read Section 5.7 of the textbook (in the CD)
- This is not included in the syllabus
40Required Reading
- Textbook (3rd edition)
- Section 5.5 and Section 5.7 (optional)
- 2nd edition of the textbook
- Section 5.4 and Section 5.5 (optional)