Title: Lecture 9. MIPS Processor Design
12010 RE Computer System Education Research
Lecture 9. MIPS Processor Design Single-Cycle
Processor Design
Prof. Taeweon Suh Computer Science
Education Korea University
2Single-Cycle MIPS Processor
- Again, microarchitecture (CPU implementation) is
divided into 2 interacting parts - Datapath
- Control
3Single-Cycle Processor Design
- Lets start with a memory access instruction - lw
- Example lw 2, 80(0)
4Single-Cycle Processor Design
- STEP 2 Decoding
- Read source operands from register file
Example lw 2, 80(0)
5Single-Cycle Processor Design
- STEP 2 Decoding
- Sign-extend the immediate
Example lw 2, 80(0)
module signext(input 150 a,
output 310 y)
assign y 16a15, a endmodule
6Single-Cycle Processor Design
- STEP 3 Execution
- Compute the memory address
Example lw 2, 80(0)
7Single-Cycle Processor Design
- STEP 4 Execution
- Read data from memory and write it back to
register file
Example lw 2, 80(0)
8Single-Cycle Processor Design
- We are done with lw
- CPU starts fetching the next instruction from PC4
module adder(input 310 a, b,
output 310 y) assign y a b endmodule
adder pcadd1(pc, 32'b100, pcplus4)
9Single-Cycle Processor Design
- Lets consider another memory access instruction
- sw - sw instruction needs to write data to data memory
Example sw 2, 84(0)
10Single-Cycle Processor Design
- Lets consider arithmetic and logical
instructions - add, sub, and, or - Write ALUResult to register file
- Note that R-type instructions write to rd field
of instruction (instead of rt)
11Single-Cycle Processor Design
- Lets consider a branch instruction - beq
- Determine whether register values are equal
- Calculate branch target address (BTA) from
sign-extended immediate and PC4
Example beq 4,0, around
12Single-Cycle Datapath Example
- We are done with the implementation of basic
instructions - Lets see how or instruction works out in the
implementation
13Single-Cycle Processor - Control
- As mentioned, CPU is designed with datapath and
control - Now, lets delve into the control part design
14Control Unit
15ALU Implementation and Control
F20 Function
000 A B
001 A B
010 A B
011 not used
100 A B
101 A B
110 A - B
111 SLT
slt set less than Example slt t0, t1,
t2 // t0 1 if t1 lt t2
16Control Unit ALU Control
- Implementation is completely dependent on
hardware designers - But, the designers should make sure the
implementation is reasonable enough
ALUOp10 Meaning
00 Add
01 Subtract
10 Look at Funct
11 Not Used
ALUOp10 Funct ALUControl20
00 X 010 (Add)
X1 X 110 (Subtract)
1X 100000 (add) 010 (Add)
1X 100010 (sub) 110 (Subtract)
1X 100100 (and) 000 (And)
1X 100101 (or) 001 (Or)
1X 101010 (slt) 111 (SLT)
17Control Unit Main Decoder
Instruction Op50 RegWrite RegDst AluSrc Branch MemWrite MemtoReg ALUOp10
R-type 000000
lw 100011
sw 101011
beq 000100
1
1
0
0
0
10
0
1
0
0
1
00
1
0
0
X
00
0
1
X
1
01
0
X
0
X
0
1
ALUOp10 Meaning
00 Add
01 Subtract
10 Look at Funct field
11 Not Used
18How about Other Instructions?
- Hmmm.. Now, we are done with the control part
design - Lets examine if the design is able to execute
other instructions - addi
Example addi t0, t1, -14
19Control Unit Main Decoder
Instruction Op50 RegWrite RegDst AluSrc Branch MemWrite MemtoReg ALUOp10
R-type 000000 1 1 0 0 0 0 10
lw 100011 1 0 1 0 0 1 00
sw 101011 0 X 1 0 1 X 00
beq 000100 0 X 0 1 0 X 01
addi 001000
0
0
1
0
00
1
0
20How about Other Instructions?
- Ok. So far, so good
- How about jump instructions?
- j
21How about Other Instructions?
- We need to add some hardware to support the j
instruction - A logic to compute the target address
- Mux and control signal
22Control Unit Main Decoder
- There is one more output in the main decoder to
support the jump instructions - Jump
Instruction Op50 RegWrite RegDst AluSrc Branch MemWrite MemtoReg ALUOp10 Jump
R-type 000000 1 1 0 0 0 0 10 0
lw 100011 1 0 1 0 0 1 00 0
sw 101011 0 X 1 0 1 X 00 0
beq 000100 0 X 0 1 0 X 01 0
addi 001000 1 0 1 0 0 0 00 0
j 000100 0 X X X 0 X XX 1
23Verilog Code - Main Decoder and ALU Control
module maindec(input 50 op,
output memtoreg, memwrite,
output branch, alusrc,
output regdst,
regwrite, output
jump, output 10
aluop) reg 80 controls assign
regwrite, regdst, alusrc, branch,
memwrite, memtoreg, jump, aluop
controls always _at_() case(op)
6'b000000 controls lt 9'b110000010 // R-type
6'b100011 controls lt 9'b101001000 // lw
6'b101011 controls lt 9'b001010000 // sw
6'b000100 controls lt 9'b000100001 // beq
6'b001000 controls lt 9'b101000000 // addi
6'b000010 controls lt 9'b000000100 // j
default controls lt 9'bxxxxxxxxx // ???
endcase endmodule
module aludec(input 50 funct,
input 10 aluop,
output reg 20 alucontrol) always
_at_() case(aluop) 2'b00 alucontrol lt
3'b010 // add 2'b01 alucontrol lt
3'b110 // sub default case(funct)
// RTYPE 6'b100000 alucontrol lt
3'b010 // ADD 6'b100010 alucontrol lt
3'b110 // SUB 6'b100100 alucontrol lt
3'b000 // AND 6'b100101 alucontrol lt
3'b001 // OR 6'b101010 alucontrol lt
3'b111 // SLT default alucontrol lt
3'bxxx // ??? endcase
endcase endmodule
24Verilog Code ALU
module alu(input 310 a, b,
input 20 alucont,
output reg 310 result,
output zero) wire 310 b2,
sum, slt assign b2 alucont2 ? bb
assign sum a b2 alucont2 assign slt
sum31 always_at_() case(alucont10)
2'b00 result lt a b2 2'b01 result lt
a b2 2'b10 result lt sum 2'b11
result lt slt endcase assign zero
(result 32'b0) endmodule
F20 Function
000 A B
001 A B
010 A B
011 not used
100 A B
101 A B
110 A - B
111 SLT
25Single-Cycle Processor Performance
- How fast is the single-cycle processor?
- Clock cycle time (frequency) is limited by the
critical path - The critical path is the path that takes the
longest time - What do you think the critical path is?
- The path that lw instruction goes through
26Single-Cycle Processor Performance
- Single-cycle critical path
- Tc tpcq_PC tmem max(tRFread, tsext) tmux
tALU tmem tmux tRFsetup - In most implementations, limiting paths are
memory (instruction and data), ALU, register
file. Thus, - Tc tpcq_PC 2tmem tRFread 2tmux tALU
tRFsetup
Elements Parameter
Register clock-to-Q tpcq_PC
Multiplexer tmux
ALU tALU
Memory read tmem
Register file read tRFread
Register file setup tRFsetup
27Single-Cycle Processor Performance Example
Elements Parameter Delay (ps)
Register clock-to-Q tpcq_PC 30
Multiplexer tmux 25
ALU tALU 200
Memory read tmem 250
Register file read tRFread 150
Register file setup tRFsetup 20
Tc tpcq_PC 2tmem tRFread 2tmux tALU
tRFsetup 30 2(250) 150 2(25) 200
20 ps 950 ps
fc 1/Tc
fc 1/950ps 1.052GHz
- Assuming that the CPU executes 100 billion
instructions to run your program, what is the
execution time of the program on a single-cycle
MIPS processor? - Execution Time (instructions)(cycles/instruct
ion)(seconds/cycle) - (100 109)(1)(950 10-12 s)
- 95 seconds