Title: The single cycle CPU
1The single cycle CPU
2Performance of Single-Cycle Machines
- Memory Unit 2 ns
- ALU and Adders 2 ns
- Register file (Read or Write) 1 ns
- Class Fetch Decode ALU
Memory Write Back Total - R-format 2 1 2 0 1 6
- LW 2 1 2 2 1 8
- SW 2 1 2 2 7ns
- Branch 2 1 2 5ns
- Jump 2 2ns
-
3?? ??? ???? ?? cycle ?? ????? ??? ????? ?????
- ????? ???? ?????? ?? ??????? ???? ?? ??????
- Rtype 44, LW 24, SW 12
- BRANCH 18, JUMP 2
- I - ???? ?????? ???????
- T - ???? ????? ????
- CPI - ???? ??????? ?????? 1
- ExecutionITCPI 824712644518226
.3 ns
4??????
- EXE Single cycle T single clock I
T single clock 8 - EXE Variable T variable clock I T
variable clock 6.3 - ??? ?? 1.27. ???? ???? ???? ???? ???? ????
?????? ??????? ??? ?????? ?? floating point - ?????? ???? ???? ????? ????? - ????? ??????
?????. - ?????? ????? ????? ???? ????? ?? cycles.
5Multicycle Approach
?????? ?????? ???? ?- Multicycle ?????? ????
?? ????? ??? ?? ???? ??????? ????? ???????
??. ?????? ??????? ????? ????? ???? ??????
????? ?? ??????.
6???? ????? ?? ?????????? ?- Multicycle
??? ?? ?????? ??????. ?? ??? cycle - ??? ??
???? ?????? ?????? ??? ???. - ???? ?? ????
?????? ?????? ??? ??? - ?? ??? ???? ?? ????? ???
???????????. ????? ?? ????? ???? - ???? ??
?????? ???? ?????? ?????. - ???? ?????? ????? ??
???????? ??????? ??????.
7Timing of a lw instruction in a single cycle CPU
PC
0x400000
I.Mem data
Memory output
Rs, Rt
ALU inputs
D.Mem adrs
ALU output (address)
D. Mem data
Mem data
We want to replace a long single CK cycle with 5
short ones
fetch
execute
memory
Write back
decode
2ns
1ns
2ns
2ns
1ns
0
1
3
4
5(0)
2
PC
0x400000
fetch
Instruction in IR
IR
decode
ALU calculates something
A,B
execute
Timing of a lw instruction in a multi-cycle CPU
ALUout
Mem data
memory
MDR
Write back
8Therefore we should add registers to the single
cycle CPU shown below
4
Adder
Reg File
5
2521Rs
Data Memory
Instruction Memory
PC
ALU
5
2016Rt
Address
D. Out
5
Rd
D.In
16
150
Sext 16-gt32
9Adding registers to split the instruction to 5
stages
4
Adder
A
Reg File
ALUout
MDR
5
2521Rs
Data Memory
Instruction Memory
PC
ALU
IR
5
2016Rt
Address
D. Out
2
5
Rd
D.In
B
PCWrite
4
3
0
1
5
16
150
Sext 16-gt32
10Here is the books version of the multi-cycle CPU
Only PC and IR have write enable signals All
other registers hold data for a single cycle
11Here is our version of A mult--cycle CPU capable
of R-type lw/sw branch instructions
Reg File
A
Instruction data Memory
PC
ALUout
IR
ALU
5
Rd
B
4
IR150
16
Sext 16-gt32
ltlt 2
ltlt2
IR
12Let us explain the multi-cycle CPU
- First well look at a CPU capable of performing
only R-type instructions - Then, well add the lw instruction
- And the sw instruction
- Then, the beq instruction
- And finally, the j instruction
13Let us remind ourselves how works a single cycle
CPU capable of performingR-type
instructions.Here you see the data-path and the
timing of an R-typeinstruction.
4
Adder
6
3126
Reg File
Instruction Memory
PC
ALU
6
50funct
14A single cycle CPU demo R-type instruction
4
Instruction Memory
Reg File
ALU
PC
15A multi cycle CPU capable of performing R-type
instructions
Reg File
A
5
IR2521Rs
Instruction data Memory
PC
ALUout
IR
ALU
5
IR2016Rt
5
Rd
B
16A multi cycle CPU capable of R-type
instructionsfetch
Reg File
A
5
IR2521Rs
Instruction data Memory
PC
ALUout
IR
ALU
5
IR2016Rt
5
Rd
B
0
1
17A multi cycle CPU capable of R-type
instructionsdecode
Reg File
A
5
IR2521Rs
Instruction data Memory
PC
ALUout
IR
ALU
5
IR2016Rt
5
Rd
B
1
2
18A multi cycle CPU capable of R-type
instructionsexecute
Reg File
A
5
IR2521Rs
Instruction data Memory
PC
ALUout
IR
ALU
5
IR2016Rt
5
Rd
B
3
2
19A multi cycle CPU capable of R-type
instructionswrite back
Reg File
A
5
IR2521Rs
Instruction data Memory
PC
ALUout
IR
ALU
5
IR2016Rt
Rd
5
Rd
B
3
4
20PC
0x400000
Inst. Mem data
Memory output the instruction
Timing of an R-type instruction in a single cycle
CPU
Rs, Rt
ALU inputs
ALU output (Data result of cala.)
GPR input
fetch
execute
Write Back
decode
3
4 (0)
0
1
2
PC
Mem data
Timing of an R-type instruction in a multi-cycle
CPU
fetch
Previous inst.
Current instruction
IR
decode
A,B
execute
ALUout
Write back
21fetch
PC
Mem data
Current instruction
IRM ( PC )
Previous inst.
Current instruction
next inst.
IR
decode
GPR outputs
A Rs, B Rt
A,B
execute
ALU output
ALUuot A op B
Write back
Rd ALUout
ALUout
At the rising edge of CK RdALUout
R-Type instruction takes 4 CKs
IRWrite
The state diagram
A Rs, B Rt
ALUout A op B
IRM(PC)
RdALUout
22A multi-cycle CPU capable of R-type instructions
(PC calc. )
Reg File
A
Instruction data Memory
PC
ALUout
IR
ALU
5
Rd
B
4
23fetch
next PC current PC4
current PC
PC
Mem data
IR
next inst.
Previous inst.
current instruction
decode
GPR outputs
A,B
execute
ALU output
Write back
ALUout
At the rising edge of CK RdALUout
ALUuot A op B
PC PC4
PCWrite
24A multi cycle CPU capable of R-type
instructionsfetch
Reg File
A
5
IR2521Rs
Instruction Memory
PC
ALUout
IR
ALU
ALU
5
IR2016Rt
5
Rd
B
4
25The state diagram of a CPU capable of R-type
instructions only
IRM(PC) PC PC4
ARs BRt
ALUoutA op B
Rd ALUout
26The state diagram of a CPU capable of R-type and
lw instructions
ALUout Asext(imm)
MDR M(ALUout)
Rt MDR
27We added registers to split the instruction to
5 stages.Lets discuss the lw instruction
4
Adder
A
Reg File
ALUout
MDR
5
2521Rs
Data Memory
Instruction Memory
PC
ALU
IR
5
2016Rt
Address
D. Out
2
5
Rd
D.In
B
PCWrite
4
3
0
1
5
16
150
Sext 16-gt32
28First we draw a multi-cycle CPU capable of R-type
lw instructions
Reg File
A
Instruction Memory
PC
ALUout
IR
ALU
ALU
ALU
5
Rd
B
4
IR150
16
Sext 16-gt32
Data Memory
MDR
We just moved the data memory
All parts related to lw only are blue
29A multi-cycle CPU capable of R-type lw
instructionsfetch
Reg File
A
Instruction Memory
PC
ALUout
IR
ALU
ALU
ALU
5
Rd
B
4
IR150
16
Sext 16-gt32
Data Memory
MDR
30A multi-cycle CPU capable of R-type lw
instructionsdecode
Reg File
A
5
IR2521Rs
Instruction Memory
PC
ALUout
IR
ALU
5
IR2016Rt
5
Rd
B
4
IR150
16
Sext 16-gt32
ltlt 2
Data Memory
MDE
31A multi-cycle CPU capable of R-type lw
instructionsAdrCmp
Reg File
A
Instruction Memory
PC
ALUout
IR
ALU
ALU
5
Rd
B
4
IR150
16
Sext 16-gt32
Data Memory
MDR
32A multi-cycle CPU capable of R-type lw
instructionsmemory
Branch Address
Reg File
A
InstructionMemory
PC
ALUout
IR
ALU
5
Rd
B
4
IR150
16
Sext 16-gt32
ltlt 2
Data Memory
MDR
33A multi-cycle CPU capable of R-type lw
instructionsWB
Reg File
A
InstructionMemory
PC
ALUout
IR
ALU
Rt
5
Rd
B
4
IR150
16
Sext 16-gt32
Data Memory
MDR
34Can we unite the Instruction Data memories?
(They are not used simultaneously as in the
single cycle CPU)
Reg File
A
InstructionMemory
PC
ALUout
IR
ALU
5
Rd
B
4
IR150
16
Sext 16-gt32
Data Memory
MDR
35So here is a multi-cycle CPU capable of R-type
lw instructionsusing a single memory for
instructions data
Reg File
A
Instruction data Memory
PC
ALUout
IR
ALU
5
Rd
B
4
IR150
16
Sext 16-gt32
MDR
36PC
0x400000
I.Mem data
Memory output
Timing of a lw instruction in a single cycle CPU
Rs, Rt
ALU inputs
D.Mem adrs
ALU output (address)
D. Mem data
Mem data
fetch
execute
memory
Write back
decode
PC
PC4
fetch
current instruction
IR
Previous inst.
Timing of a lw instruction in a multi-cycle CPU
decode
A,B
execute
Data address
ALUout
Mem data
memory
Data to Rt
MDR
Write back
37fetch
PC
IRM ( PC ) PC PC4
Mem data
IR
Previous inst.
current instruction
decode
GPR outputs
A Rs, B Rt
A,B
execute
ALU output
Data address
ALUuot Asext(imm)
ALUout
Data address
memory
Mem data
MDRM(ALUout)
Write back
Data to Rt
MDR
At the rising edge of CK RtMDR
PCWrite, IRWrite
38The state diagram of a CPU capable of R-type and
lw instructions
IRM(PC) PC PC4
Fetch
0
ARs BRt
Decode
1
lw
R-type
ALUout Asext(imm)
AdrCmp
ALU
ALUoutA op B
2
6
Load
MDR M(ALUout)
3
WBR
Rt MDR
Rd ALUout
7
4
39A multi-cycle CPU capable of R-type lw sw
instructions
Branch Address
Reg File
A
Instruction data Memory
PC
ALUout
IR
ALU
5
Rd
B
4
IR150
16
Sext 16-gt32
ltlt 2
MDR
lw
sw
40The state diagram of a CPU capable of R-type and
lw and sw instructions
IRM(PC) PC PC4
ARs BRt
ALUout Asext(imm)
ALUoutA op B
M(ALUout)B
MDR M(ALUout)
Rd ALUout
Rt MDR
41A multi-cycle CPU capable of R-type lw/sw
branch instructions
Reg File
A
Instruction data Memory
PC
ALUout
IR
ALU
5
Rd
B
4
IR150
16
Sext 16-gt32
ltlt 2
ltlt2
IR
42Adding the instruction beq to the state diagram
Calc Rs -Rt (just to produce the zero signal)
Calc PCPCsext(imm)ltlt2
43Adding the instruction beq to the state diagram,
a more efficient way Lets use the decode state
in which the ALU is doing nothing to compute the
branch address.Well have to store it for 1 more
CK cycle, until we know whether to branch or not!
(We store it in the ALUout reg.)
Calc Rs - Rt. If zero, load the PC with ALUout
data, else do not load the PC
44A multi-cycle CPU capable of R-type lw/sw
branch instructions
PC4
Reg File
A
Instruction data Memory
PC
ALUout
IR
ALU
5
Rd
B
4
IR150
16
Sext 16-gt32
ltlt2
IR
Branch Address
45Adding the instruction j to the state diagram
PC PC3128 IR250ltlt2
46A multi-cycle CPU capable of R-type lw/sw
branch jump instructions
PC4 next address
IR250
Jump address
ltlt2
PC3128
Reg File
A
Instruction data Memory
PC
ALUout
IR
ALU
5
Rd
B
4
IR150
16
Sext 16-gt32
ltlt2
IR
Branch Address
47????? ???? ??????? ??????
0
1
2
8
9
6
3
7
4
5
48MultiCycle implementation with Control
49Final State Machine
50The final state diagram
51(No Transcript)
52MultiCycle implementation with Control
53Finite State Machine for Control (The books
version)
54The Control Finite State Machine
current state
State reg
Outputs decoder
next state
control signals
next state calculation
Opcode IR3126 zero, neg, etc.
ck
For 10 states coded 0-9, we need 4 bits, i.e.,
S3,S2,S1,S0
55The control signals decoder
We just implement the table of slide 54
Lets look at ALUSrcA it is 0 in states 0 and
1 and it is 1 in states 2, 6 and 8. In all
other states we dont care. lets look at
PCWrite it is 1 in states 0 and 9. In all
other states it must be 0. And so, well fill
the table below and build the decoder.
56The state machine next state calc. logic
R-type
lwsw
lw
sw
R-type000000, lw100011, sw101011, beq000100,
bne000101, lui001111, j0000010, jal000011,
addi001000
57The Control Finite State Machine
current state
Moore machine
State reg
Outputs decoder
next state
control signals
next state calculation
Opcode IR3126
PCWrite
to PC
ck
PCWriteCond
zero
Meally machine
58Finite State Machine for Control
0 0 0 0 0 1 1 0 0 1 1 1 0 0 0 1 0 1 1 0 0 0 1 1 1
0 0 0 1 0 0 0 0 0 0 1 0 1 0 0 0 1 1 1 0 0 1 1
0 1 1 1 0 1 1 1
59ROM Implementation
- ROM "Read Only Memory"
- values of memory locations are fixed ahead of
time - A ROM can be used to implement a truth table
- if the address is m-bits, we can address 2m
entries in the ROM. - our outputs are the bits of data that the address
points to.m is the "heigth", and n is the
"width"
0 0 0 0 0 1 1 0 0 1 1 1 0 0 0 1 0 1 1 0 0 0 1 1 1
0 0 0 1 0 0 0 0 0 0 1 0 1 0 0 0 1 1 1 0 0 1 1
0 1 1 1 0 1 1 1
60ROM Implementation
- How many inputs are there? 6 bits for opcode, 4
bits for state 10 address lines (i.e., 210
1024 different addresses) - How many outputs are there? 16 datapath-control
outputs, 4 state bits 20 outputs - ROM is 210 x 20 20K bits (and a rather
unusual size) - Rather wasteful, since for lots of the entries,
the outputs are the same i.e., opcode is often
ignored
61ROM vs PLA
- Break up the table into two parts 4 state bits
tell you the 16 outputs, 24 x 16 bits of
ROM 10 bits tell you the 4 next state bits,
210 x 4 bits of ROM Total 4.3K bits of ROM - PLA is much smaller can share product terms
only need entries that produce an active
output can take into account don't cares - Size is (inputs product-terms) (outputs
product-terms) For this example
(10x17)(20x17) 460 PLA cells - PLA cells usually about the size of a ROM cell
(slightly bigger)
62End of multi-cycle implementation