Title: The single cycle CPU
1The single cycle CPU
2Performance of Single-Cycle Machines
- Memory Unit 2 ns
- ALU and Adders 2 ns
- Register file (Read or Write) 1 ns
- Class Fetch Decode ALU
Memory Write Back Total - R-format 2 1 2 0 1 6
- LW 2 1 2 2 1 8
- SW 2 1 2 2 7ns
- Branch 2 1 2 5ns
- Jump 2 2ns
-
3What if we had a variable CK cycle?
- Lets check the following scenario
- Rtype 44, LW 24, SW 12
- BRANCH 18, JUMP 2
- I- number of instructions in program
- T- time of the CK cycle
- CPI - number of CK cycle per instruction (1)
- ExecutionITCPI 824712644518226
.3 ns
4The result
- EXE Single cycle T single
clock I T single clock
8 - EXE Variable T variable clock I
T variable clock 6.3 - We get a ratio of 1.27. The ratio is higher when
more complicated instructions, e.g., floating
point instructions are also implemented. -
- Since building a variable CK circuit is too
complicated, we instead want instructions to take
as many shorter CKs as required -
5Multicycle Approach
- The idea of Multi-cycle approach
- Well save time since each instruction takes only
the necessary number of CK cycles (which are
about 5 times shorter than the original CK cycle) - We also save in components since we can use the
same component in different phases of the same
instruction -
6Building a Multi-Cycle CPU
Split the instruction to steps (phases) Make sure
that the steps are balanced (same time
required) Reduce the job done at each step. In
each step only one chore is done. At the end of
each CK cycle Store the result of the current
step to be used by the next step. So, add more
internal registers for storing the intermediate
results.
7A single cycle CPU capable of R-type lw/sw
instructions (data control)
4
MemWrite
Adder
6
3126
add
RegWrite
6
50funct
ALU control
Reg File
Data Memory
Instruction Memory
PC
ALU
Address
D. Out
5
Rd
D.In
16
150
Sext 16-gt32
8A single cycle CPU capable of R-type lw/sw
instructions - Data Path only
4
Adder
Reg File
Data Memory
Instruction Memory
PC
ALU
Address
D. Out
5
Rd
D.In
16
150
Sext 16-gt32
lw
sw
9Timing of a single cycle CPU
10Timing of a lw instruction in a single cycle CPU
PC
0x400000
I.Mem data
Memory output
Rs, Rt
ALU inputs
D.Mem adrs
ALU output (address)
D. Mem data
Mem data
We want to replace a long single CK cycle with 5
short ones
fetch
execute
memory
Write back
decode
2ns
1ns
2ns
2ns
1ns
0
1
3
4
5(0)
2
PC
0x400000
fetch
Instruction in IR
IR
decode
ALU calculates something
A,B
execute
Timing of a lw instruction in a multi-cycle CPU
ALUout
Mem data
memory
MDR
Write back
11Therefore we should add registers to the single
cycle CPU shown below
4
Adder
Reg File
5
2521Rs
Data Memory
Instruction Memory
PC
ALU
5
2016Rt
Address
D. Out
5
Rd
D.In
16
150
Sext 16-gt32
12Adding registers to split the instruction to 5
stages
4
Adder
A
Reg File
ALUout
MDR
5
2521Rs
Data Memory
Instruction Memory
PC
ALU
IR
5
2016Rt
Address
D. Out
2
5
Rd
D.In
B
PCWrite
4
3
0
1
5
16
150
Sext 16-gt32
13Here is the books version of the multi-cycle CPU
Only PC and IR have write enable signals All
other registers hold data for a single cycle
14Here is our version of A mult--cycle CPU capable
of R-type lw/sw branch instructions
Reg File
A
Instruction data Memory
PC
ALUout
IR
ALU
5
Rd
B
4
IR150
16
Sext 16-gt32
ltlt 2
MDR
ltlt2
15Let us explain the multi-cycle CPU
- First well look at a CPU capable of performing
only R-type instructions - Then, well add the lw instruction
- And the sw instruction
- Then, the beq instruction
- And finally, the j instruction
16Let us remind ourselves how works a single cycle
CPU capable of performingR-type
instructions.Here you see the data-path and the
timing of an R-typeinstruction.
4
Adder
6
3126
Reg File
Instruction Memory
PC
ALU
6
50funct
17A single cycle CPU demo R-type instruction
4
Instruction Memory
Reg File
ALU
PC
18A multi cycle CPU capable of performing R-type
instructions
Reg File
A
5
IR2521Rs
Instruction data Memory
PC
ALUout
IR
ALU
5
IR2016Rt
5
Rd
B
19A multi cycle CPU capable of R-type
instructionsfetch
Reg File
A
5
IR2521Rs
Instruction data Memory
PC
ALUout
IR
ALU
5
IR2016Rt
5
Rd
B
0
1
20A multi cycle CPU capable of R-type
instructionsdecode
Reg File
A
5
IR2521Rs
Instruction data Memory
PC
ALUout
IR
ALU
5
IR2016Rt
5
Rd
B
1
2
21A multi cycle CPU capable of R-type
instructionsexecute
Reg File
A
5
IR2521Rs
Instruction data Memory
PC
ALUout
IR
ALU
5
IR2016Rt
5
Rd
B
3
2
22A multi cycle CPU capable of R-type
instructionswrite back
Reg File
A
5
IR2521Rs
Instruction data Memory
PC
ALUout
IR
ALU
5
IR2016Rt
Rd
5
Rd
B
3
4
23PC
0x400000
Inst. Mem data
Memory output the instruction
Timing of an R-type instruction in a single cycle
CPU
Rs, Rt
ALU inputs
ALU output (Data result of cala.)
GPR input
fetch
execute
Write Back
decode
3
4 (0)
0
1
2
PC
Mem data
Timing of an R-type instruction in a multi-cycle
CPU
fetch
Previous inst.
Current instruction
IR
decode
A,B
execute
ALUout
Write back
24fetch
PC
Mem data
Current instruction
IRM ( PC )
Previous inst.
Current instruction
next inst.
IR
decode
GPR outputs
A Rs, B Rt
A,B
execute
ALU output
ALUuot A op B
Write back
Rd ALUout
ALUout
At the rising edge of CK RdALUout
R-Type instruction takes 4 CKs
IRWrite
The state diagram
A Rs, B Rt
ALUout A op B
IRM(PC)
RdALUout
25A multi-cycle CPU capable of R-type instructions
(PC calc. )
Reg File
A
Instruction data Memory
PC
ALUout
IR
ALU
5
Rd
B
4
26fetch
next PC current PC4
current PC
PC
Mem data
IR
next inst.
Previous inst.
current instruction
decode
GPR outputs
A,B
execute
ALU output
Write back
ALUout
At the rising edge of CK RdALUout
ALUuot A op B
PC PC4
PCWrite
27A multi cycle CPU capable of R-type
instructionsfetch
Reg File
A
5
IR2521Rs
Instruction Memory
PC
ALUout
IR
ALU
ALU
5
IR2016Rt
5
Rd
B
4
28The state diagram of a CPU capable of R-type
instructions only
IRM(PC) PC PC4
ARs BRt
ALUoutA op B
Rd ALUout
29The state diagram of a CPU capable of R-type and
lw instructions
ALUout Asext(imm)
MDR M(ALUout)
Rt MDR
30We added registers to split the instruction to
5 stages.Lets discuss the lw instruction
All parts related to lw only are blue
4
Adder
A
Reg File
ALUout
MDR
5
2521Rs
Data Memory
Instruction Memory
PC
ALU
IR
5
2016Rt
Address
D. Out
2
5
Rd
D.In
B
PCWrite
4
3
0
1
5
16
150
Sext 16-gt32
In ths single-cycle we kept the data flow from
left to right. Here we change that a little,
since as well see, we are some parts of the CPU
more than once during the same instruction. So we
prefer to move data the memory.
31First we draw a multi-cycle CPU capable of R-type
lw instructions
Reg File
A
Instruction Memory
PC
ALUout
IR
ALU
ALU
ALU
5
Rd
B
4
IR150
16
Sext 16-gt32
Data Memory
MDR
We just moved the data memory
All parts related to lw only are blue
32A multi-cycle CPU capable of R-type lw
instructionsfetch
Reg File
A
Instruction Memory
PC
ALUout
IR
ALU
ALU
ALU
5
Rd
B
4
IR150
Sext 16-gt32
16
Data Memory
MDR
33A multi-cycle CPU capable of R-type lw
instructionsdecode
Reg File
A
5
IR2521Rs
Instruction Memory
PC
ALUout
IR
ALU
5
IR2016Rt
5
Rd
B
4
IR150
16
Sext 16-gt32
ltlt 2
Data Memory
MDR
34A multi-cycle CPU capable of R-type lw
instructionsAdrCmp
Reg File
A
Instruction Memory
PC
ALUout
IR
ALU
ALU
5
Rd
B
4
IR150
16
Sext 16-gt32
Data Memory
MDR
35A multi-cycle CPU capable of R-type lw
instructionsmemory
Branch Address
Reg File
A
InstructionMemory
PC
ALUout
IR
ALU
5
Rd
B
4
IR150
16
Sext 16-gt32
ltlt 2
Data Memory
MDR
36A multi-cycle CPU capable of R-type lw
instructionsWB
Reg File
A
InstructionMemory
PC
ALUout
IR
ALU
Rt
5
Rd
B
4
IR150
16
Sext 16-gt32
Data Memory
MDR
37Can we unite the Instruction Data memories?
(They are not used simultaneously as in the
single cycle CPU)
Reg File
A
InstructionMemory
PC
ALUout
IR
ALU
5
Rd
B
4
IR150
16
Sext 16-gt32
Data Memory
MDR
38So here is a multi-cycle CPU capable of R-type
lw instructionsusing a single memory for
instructions data
Reg File
A
Instruction data Memory
PC
ALUout
IR
ALU
5
Rd
B
4
IR150
16
Sext 16-gt32
MDR
39PC
0x400000
I.Mem data
Memory output
Timing of a lw instruction in a single cycle CPU
Rs, Rt
ALU inputs
D.Mem adrs
ALU output (address)
D. Mem data
Mem data
fetch
execute
memory
Write back
decode
PC
PC4
fetch
current instruction
IR
Previous inst.
Timing of a lw instruction in a multi-cycle CPU
decode
A,B
execute
Data address
ALUout
Mem data
memory
Data to Rt
MDR
Write back
40fetch
PC
IRM ( PC ) PC PC4
Mem data
IR
Previous inst.
current instruction
decode
GPR outputs
A Rs, B Rt
A,B
execute
ALU output
Data address
ALUuot Asext(imm)
ALUout
Data address
memory
Mem data
MDRM(ALUout)
Write back
Data to Rt
MDR
At the rising edge of CK RtMDR
PCWrite, IRWrite
41The state diagram of a CPU capable of R-type and
lw instructions
IRM(PC) PC PC4
Fetch
0
ARs BRt
Decode
1
lw
R-type
ALUout Asext(imm)
AdrCmp
ALU
ALUoutA op B
2
6
Load
MDR M(ALUout)
3
WBR
Rt MDR
Rd ALUout
7
4
42A multi-cycle CPU capable of R-type lw sw
instructions
Branch Address
Reg File
A
Instruction data Memory
PC
ALUout
IR
ALU
5
Rd
B
4
IR150
16
Sext 16-gt32
ltlt 2
MDR
lw
sw
43The state diagram of a CPU capable of R-type and
lw and sw instructions
IRM(PC) PC PC4
ARs BRt
ALUout Asext(imm)
ALUoutA op B
M(ALUout)B
MDR M(ALUout)
Rd ALUout
Rt MDR
44A multi-cycle CPU capable of R-type lw/sw
branch instructions
Reg File
A
Instruction data Memory
PC
ALUout
IR
ALU
5
Rd
B
4
IR150
16
Sext 16-gt32
ltlt2
IR
45Adding the instruction beq to the state diagram
Calc Rs -Rt (just to produce the zero signal)
Calc PCPCsext(imm)ltlt2
46Adding the instruction beq to the state diagram,
a more efficient way Lets use the decode state
in which the ALU is doing nothing to compute the
branch address.Well have to store it for 1 more
CK cycle, until we know whether to branch or not!
(We store it in the ALUout reg.)
Calc Rs - Rt. If zero, load the PC with ALUout
data, else do not load the PC
47A multi-cycle CPU capable of R-type lw/sw
branch instructions
PC4
Reg File
A
Instruction data Memory
PC
ALUout
IR
ALU
5
Rd
B
4
IR150
16
Sext 16-gt32
ltlt2
IR
Branch Address
48Adding the instruction j to the state diagram
PC PC3128 IR250ltlt2
49A multi-cycle CPU capable of R-type lw/sw
branch jump instructions
PC4 next address
IR250
Jump address
ltlt2
PC3128
Reg File
A
Instruction data Memory
PC
ALUout
IR
ALU
5
Rd
B
4
IR150
16
Sext 16-gt32
ltlt2
IR
Branch Address
50The phases (steps) of all instructions
0
1
2
8
9
6
3
7
4
5
51MultiCycle implementation with Control
52Final State Machine
53The final state diagram
54(No Transcript)
55Finite State Machine for Control (The books
version)
56The Control Finite State Machine
current state
State reg
Outputs decoder
next state
control signals
next state calculation
Opcode IR3126 zero, neg, etc.
ck
For 10 states coded 0-9, we need 4 bits, i.e.,
S3,S2,S1,S0
57The control signals decoder
We just implement the table of slide 54
Lets look at ALUSrcA it is 0 in states 0 and
1 and it is 1 in states 2, 6 and 8. In all
other states we dont care. lets look at
PCWrite it is 1 in states 0 and 9. In all
other states it must be 0. And so, well fill
the table below and build the decoder.
58The state machine next state calc. logic
R-type
lwsw
lw
sw
R-type000000, lw100011, sw101011, beq000100,
bne000101, lui001111, j0000010, jal000011,
addi001000
59The Control Finite State Machine
current state
Moore machine
State reg
Outputs decoder
next state
control signals
next state calculation
Opcode IR3126
PCWrite
to PC
ck
PCWriteCond
zero
Meally machine
60Microprogramming
61Microinstruction
62Microinstruction format
63Interrupt and exception
Type of event From Where ?
MIPS terminology Interrupt
External I/O device request
--------------------------------------------------
---------------------------------- Invoke
Operation system Internal
Exception From user
program ------------------------------------------
------------------------------------------- Arithm
etic Overflow Internal
Exception Using an undefined
Instruction
Internal
Exception ----------------------------------------
----------------------------------------------
Either Exception or
interrupt Hardware malfunctions
64Exceptions handling
Exception type Exception vector address (in
hex) Undefined instruction c0 00
00 00 Arithmetic Overflow c0 00 00 20 We
have 2 ways to handle exceptions Cause register
or Vectored interrupts MIPS Cause register
65Handling exceptions
66Handling exceptions
67Handling interrupts
int
int
iret
68Handling an interrupt remembering it in a FF
until it is serviced
clr_irq
1
irq
D
Q
int (to the state machine)
eint
The interrupt source
69Interrupt
Jumping to the interrupt routine
C0000000
Returning from interrupt
Iret
70Interrupt
irq
eint
0 1
0 1
Jumping to the interrupt routine
C0000000
Returning from interrupt
Iret
71The state machine in action during interrupt
Fetch gt decode gtex gtwb
Fetch gt decode gtex gtwb
Fetch gt Save_PC gtJumpInt
C0000000
Fetch gt decode gtex gtwb
Fetch gt decode gtex gtwb
Iret
Fetch gt decode gt Iret
72End of multi-cycle implementation