Title: LECTURE 7: Multicycle CPU
1LECTURE 7 Multicycle CPU
EECS 318 CADComputer Aided Design
Instructor Francis G. Wolff wolff_at_eecs.cwru.edu
Case Western Reserve University This
presentation uses powerpoint animation please
viewshow
2MIPS instructions
ALU alu rd,rs,rt rd rs ltalugt rt
ALUi alui rd,rs,value rd rs ltalugt value
Data lw rt,offset(rs) rt Memrs
offsetTransfer sw rt,offset(rs) Memrs
offset rt
Branch beq rs,rt,offset pc (rd rs)?
(pc4offset)(pc4)
Jump j address pc address
3MIPS fixed sized instruction formats
ALUi alui rt,rs,value
I - Format
Data lw rt,offset(rs)Transfer sw
rt,offset(rs)
Branch beq rs,rt,offset
4Assembling Instructions
Suppose there are 32 registers, addu
opcode001001, addi op001000
5MIPS instruction formats
Arithmetic addi rt, rs, value add
rd,rs,rt
Data Transfer lw rt,offset(rs) sw
rt,offset(rs)
Conditional branch beq rs,rt,offset
Unconditional jump j address
6MIPS registers and conventions
Name Number Conventional usage0
0 Constant 0v0-v1 2-3 Expression
evaluation function returna0-a3 4-7
Arguments 1 to 4t0-t9 8-15,24,35 Temporary
(not preserved across call)s0-s7 16-23
Saved Temporary (preserved across call)k0-k1
26-27 Reserved for OS kernelgp 28
Pointer to global areasp 29 Stack
pointerfp 30 Frame pointerra
31 Return address (used by function call)
7C function to MIPS Assembly Language
int power_2(int y) / compute x2y
/ register int x, i x1 i0 while(ilty)
xx2 ii1 return x
Assember .s Comments addi t0, 0, 1
x1 addu t1, 0, 0 i0w1
bge t1,a0,w2 while(ilty) / bge greater or
equal / addu t0, t0, t0 x x 2 / same
as xxx / addi t1,t1,1 i i
1 beq 0,0,w1 w2 addu v0,0,t0 return
x jr ra jump on register ( pc ra )
8Power_2.s MIPS storage assignment
.text 0x00400020 addi 8, 0, 1 addi t0,
0, 1 0x00400024 addu 9, 0, 0 addu t1,
0, 0 0x00400028 bge 9, 4, 2 bge t1, a0,
w2 0x0040002c addu 8, 8, 8 addi t0, t0,
t0 0x00400030 addi 9, 9, 1 addi t1, t1,
1 0x00400034 beq 0, 0, -3 beq 0, 0,
w1 0x00400038 addu 2, 0, 8 addu v0, 0,
t0 0x0040003c jr 31 jr ra
9Machine Language Single Stepping
Assume power2(0) is called then a00 and
ra700018
00400024 ? 0 1 ? 700018 addu t1, 0, 0
00400028 ? 0 1 0 700018 bge t1,a0,w2
00400038 ? 0 1 0 700018 add v0,0,t0
10Von Neuman Harvard CPU Architectures
ALU
I/O
ALU
I/O
Data bus
Address bus
instructions and data
instructions
data
Harvard architecture was coined to describe
machines with separate memories.Speed efficient
Increased parallelism.
Von Neuman architectureArea efficient but
requires higher bus bandwidth because
instructions and data must compete for memory.
11Multi-cycle Processor Datapath
12Multi-cycle Datapath with controller
13Multi-cycle using Finite State Machine
Finite State Machine( hardwired control )
C
o
m
b
i
n
a
t
i
o
n
a
l
c
o
n
t
r
o
l
l
o
g
i
c
D
a
t
a
p
a
t
h
c
o
n
t
r
o
l
o
u
t
p
u
t
s
O
u
t
p
u
t
s
I
n
p
u
t
s
N
e
x
t
s
t
a
t
e
S
t
a
t
e
r
e
g
i
s
t
e
r
I
n
p
u
t
s
f
r
o
m
i
n
s
t
r
u
c
t
i
o
n
r
e
g
i
s
t
e
r
o
p
c
o
d
e
f
i
e
l
d
14Finite State Machine program overview
T1 T2 T3 T4 T5
Fetch
Decode
Mem1
Rformat1
BEQ1
JUMP1
Rformat11
LW2
SW2
LW21
15The Four Stages of R-Format
- Fetch
- Fetch the instruction from the Instruction Memory
- Decode
- Registers Fetch and Instruction Decode
- Exec ALU
- ALU operates on the two register operands
- Update PC
- Write Reg
- Write the ALU output back to the register file
16R-Format State Machine
Clock1
17The Five Stages of Load Instruction
Cycle 1
Cycle 2
Cycle 3
Cycle 4
Cycle 5
Load
- Fetch
- Fetch the instruction from the Instruction Memory
- Decode
- Registers Fetch and Instruction Decode
- Exec Offset
- Calculate the memory offset
- Mem
- Read the data from the Data Memory
- Wr
- Write the data back to the register file
18R-Format I-Format State Machine
Clock1 AND R-Format1
Clock1
Clock1
19Multi-Instruction sequence
20State machine stepping T1 Fetch
(Done in parallel) IR?MEMORYPC
PC ? PC 4
PC
IR
21T1 Fetch State machine
Start
MemRead1, MemWrite0IorD1 (MemAddr?PC)IRWrit
e1 (IR?MemPC)ALUSrcA0 (PC)ALUSrcB1 (4)AL
UOPADD (PC?4PC)PCWrite1, PCSource1
(ALU)RegWrite0, MemtoRegX, RegDstX
Instruction Fetch
22T2 Decode (read rs and rt and offsetpc)
A?RegIR25-21 B?RegIR20-16
23T2 Decode State machine
MemRead0, MemWrite0IorDXIRWrite0ALUSrcA0
(PC)ALUSrcB3 (signext(IRltlt2))ALUOP0 (add)
PCWrite0, PCSourceXRegWrite0, MemtoRegX,
RegDstX
Start
Instr. Decode Register Fetch
24T3 ExecALU (ALU instruction)
ALUOut ? A op(IR31-26) B
op(IR31-26)
25T3 ExecALU State machine
Start
R-Format Execution
MemRead0, MemWrite0IorDXIRWrite0ALUSrcA1 (
A Regrs)ALUSrcB0 (B Regrt) ALUOP2
(IR28-26)PCWrite0, PCSourceXRegWrite0,
MemtoRegX, RegDstX
26T4 WrReg (ALU instruction)
Reg IR15-11 ? ALUOut
27T4 WrReg State machine
Start
Exec
R-Format Write Register
MemRead0, MemWrite0IorDXIRWrite0ALUSrcAXA
LUSrcBXALUOPX PCWrite0, PCSourceXRegWrite1
, (Regrd ?ALUout) MemtoReg0, (ALUout)RegDst
1 (rd)
28Review Moore Machine
Next State
29Moore Output State Tables O(State)
T1 1 0 0 PC 1 0 0 PC 1 4 1 0 ALU 0 X X
T2 0 0 X 0 0 0 PC 3 offset 0 X 0 X X
T3-R 0 0 X 0 2 op 1 A rs 0 B rt 0 X 0 X X
T4-R 0 0 X 0 X X X 0 X 1 0 ALUOut 1 rd
State MemRead MemWrite MUX IorD
IRWrite ALUOP MUX ALUSrcA MUX
ALUSrcB PCWrite MUX PCSource
RegWrite MUX MemtoReg MUX RegDst
30Review The Five Stages of Load Instruction
Cycle 1
Cycle 2
Cycle 3
Cycle 4
Cycle 5
Load
- Fetch
- Fetch the instruction from the Instruction Memory
- Decode
- Registers Fetch and Instruction Decode
- Exec Offset
- Calculate the memory offset
- Mem
- Read the data from the Data Memory
- Wr
- Write the data back to the register file
31Review R-Format I-Format State Machine
Clock1 AND R-Format1
Clock1
Clock1
32T3I Mem1 (common to both load store)
ALUOut ? A sign_extend(IR15-0)
33T3 Mem1 I-Format State Machine rs offset
Clock1 AND R-Format1
MemRead0, MemWrite0IorDXIRWrite0ALUOP0
ALUSrcA1 A RegrsALUSrcB2
signext(IR15-0)PCWrite0, PCSourceXRegWrite
0, MemtoRegX, RegDstX
Clock1 AND opcodeLW
I-Format Execution ALUoutrsoffset
34T4 LW1 load instruction, read memory
MDR ? MemoryALUOut
35T4 LW2 I-Format State Machine MemALU
Clock1 AND I-Format1
Clock1 AND R-Format1
Clock1 AND opcodeLW
I-Format Memory Read
MemRead1, MemWrite0IorD1IRWrite0ALUOPXALU
SrcAXALUSrcBXPCWrite0, PCSourceXRegWrite0,
MemtoRegX, RegDstX
Clock1 AND opcodeLW
36T5 LW2 Load instruction, write to register
Reg IR20-16 ? MDR
37T5 LW2 I-Format State Machine rtMDR
Clock1 AND I-Format1
Clock1 AND R-Format1
Clock1 AND opcodeLW
I-Format Register Write
MemRead1, MemWrite0IorD1IRWrite0ALUOPXALU
SrcAXALUSrcBXPCWrite0, PCSourceXRegWrite1,
MemtoReg1, RegDst1
Clock1 AND opcodeLW
38T4SW2 Store instruction, write to memory
Memory ALUOut ? B
39T4 SW2 I-Format State Machine MemALU
Clock1 AND I-Format1
Clock1 AND R-Format1
Clock1 AND opcodeSW
I-Format Memory Write
MemRead0, MemWrite1IorD1IRWrite0ALUOPXALU
SrcAXALUSrcBXPCWrite0, PCSourceXRegWrite0,
MemtoRegX, RegDstX
Store not Load!
40T3 BEQ1 (Conditional branch instruction)
If (A - B 0) PC ? ALUOut
Zero
ALUOut Address computed in T2 !
41T3 BEQ1 I-Format State Machine rs offset
Clock1 AND opcodebranch
Clock1 AND R-Format1
MemRead0, MemWrite0IorDXIRWrite0ALUOP0
subtractALUSrcA1 A RegrsALUSrcB0
B RegrtPCWrite0, PCWriteCond1,
PCSource1 ALUoutRegWrite0, MemtoRegX,
RegDstX
B-Format Execution
42T3 Jump1 (Jump Address)
PC ? PC31-28 IR25-0ltlt2
43Moore Output State Tables O(State)
T4-SW 0 1 1ALU 0 X X X 0 X 0 X X
T1 1 0 0PC 1 0 0PC 14 1 0AL 0 X X
T2 0 0 X 0 0 0 3 0 X 0 X X
T3-R 0 0 X 0 2op 1Ars 0Brt 0 X 0 X X
T4-R 0 0 X 0 X X X 0 X 1 0ALU 1rd
T4-LW 1 0 1ALU 0 X X X 0 X 0 X X
T3-I 0 0 X 0 0add 1Ars 2sign 0 X 0 X X
T5-LW 0 0 X 0 X X X 0 X 1 1MDR 1rt
State MemRead MemWrite MUX IorD
IRWrite ALUOP MUX ALUSrcA MUX
ALUSrcB PCWrite MUX PCSource
RegWrite MUX MemtoReg MUX RegDst
44Multi-cycle 5 execution steps
- T1 (a,lw,sw,beq,j) Instruction Fetch
- T2 (a,lw,sw,beq,j) Instruction Decode and
Register Fetch - T3 (a,lw,sw,beq,j) Execution, Memory Address
Calculation, or Branch Completion - T4 (a,lw,sw) Memory Access or R-type
instruction completion - T5 (a,lw) Write-back step INSTRUCTIONS TAKE
FROM 3 - 5 CYCLES!
45Multi-cycle Approach
All operations in each clock cycle Ti are done in
parallel not sequential! For example, T1, IR
MemoryPC and PCPC4 are done simultaneously!
T1 T2 T3 T4 T5
Between Clock T2 and T3 the microcode sequencer
will do a dispatch 1