Title: CSE 341
1CSE 341
- The Processor Datapath Control
- as always, my grateful acknowledgment for
diagrams and slides, to - Kevin Bolding
- Associate Professor of Electrical Engineering
- Seattle Pacific University
- and
- Kris Schindler, PhD
- SUNYAB
2Datapath Design
- Simplified MIPS design for core MIPS instruction
set - Memory reference instructions
- lw, sw (use immediate data)
- Arithmetic logic instructions
- add, sub, and, or, slt (but not NOR, yet)
- Branch (book calls them control operations)
- branch equal, jump (use immediate data)
- As we explore computer operation, Control Signals
will just appear as needed
3Branches
- Branches conditionally change the next
instruction - BEQ s2, s1, 42
- The offset is specified as the number of words to
be added to the next instruction (PC4) - Take offset, multiply by 4
- Shift left two
- Add this to PC4 (from PC logic) Control logic
has to decide if the branch is taken - Uses zero output of ALU
?
Instruction
offset
5.2 / new 5.3
4Integrating the R-types and Memory
does the result come from memory, or from the ALU?
is the source in the inst, or in a reg?
MemoryDatapath
control will be needed
- R-types and Load/Stores are similar in many
respects - Differences
- 2nd ALU source R-types use register, I-types use
Immediate - Write Data R-types use ALU result, I-types use
memory - Mux the conflicting datapaths together
- Defer the control logic for now
5.3
5Adding the instruction memory
Simply add the instruction memoryand PC to the
beginning of the datapath.
Instruction fetch
Separate Instruction and Data memories are needed
in order to allowthe entire datapath to complete
its job in a single clock cycle.
5.3
6Adding the Branch Datapath
Now we have the datapath for R-type, I-type, and
branch instructions.
On to the control logic!
5.3
7Basic Architecture
0
4
Result
1
Add
Result
Sh.Left2
Add
Read reg. num A
Read reg num A
Read address
Read reg data A
Data Memory
Read reg num B
Read address
PC
Zero
Read data
1
Registers
Instruction 31-0
Write address
Result
Write reg num
InstructionMemory
0
0
Read reg data B
Write data
Write reg data
1
16
32
signextend
5.3
8Datapath Overview
control needed here
- R-Type add s1, s2, s3
- The data in s2 is added to the data in s3, and
the result is place in s1
5.1
9more Combinational Logic
Combinational Logic
0 1
ALU
10Datapath Overview
control needed here
test(t0)
data in t0
t0
t1
test
- I-Type 1 lw t1, test(t0)
- Move data from the location given by the sum of
- (label test plus data in t0) to register t1
- Read register A t0
- Immediate data test
- Write register t1
5.1
11Datapath Overview
control needed here
test(s1)
s1
data in s1
s0
test
- I-Type 2 sw s0, test(s1)
- Move data from register s0 to the location given
by - the sum of (label test plus data in s1)
- Read register B s0
- Immediate data test
- Write location test data in s1)
5.1
12MIPS Addressing Modes/Instruction Formats
130
4
Result
1
PCSrc
Add
Result
Sh.Left2
Add
MemRd or MemWr
Read reg. num A
Read reg num A
Read address
Read reg data A
Data Memory
Read reg num B
Read address
PC
Zero
Read data
1
Registers
Instruction 31-0
Write address
Result
Write reg num
RegWrite
InstructionMemory
0
0
Read reg data B
Write data
Write reg data
MemToReg
1
ALUSrc
16
32
signextend
5.3
14As architects, our job....
- Create a Single Datapath
- Start with memory reference, R-type, Branch
instructions basic architecture - Old Figure 5-11, page 352
- New Figure 5-10, page 299
- Add the instruction fetch
- Old Figure 5-12, page 353
- New Figure 5-11, page 300
- Add a control unit
- Control unit uses inputs to generate write
signals for each state element, selector control
for each multiplexor, - ALU control
15Single Cycle Datapath Design - ALU Control
- ALU function depends upon instruction class
- Memory Reference LW s0, test( t1 )
- Memory address calculation (addition)
- R-type ADD s1, s2, s3
- Depends on 6-bit function field (lower 6 bits of
instruction) - Branch BEQ s1, s2, test
- Subtraction for comparison
- Small control unit will be used to determine
operation - 2-bit input (ALUop) generates 3-bit output to
directly control ALU - Memory Reference ? add(00)
- Branch ? sub (01)
- R-type ? operation determined by function code
(10) - Figure 5-14, page 355 / New 5-12 pg 302
16- Remember - Ainvert, Binvert, and 2 bits OP select
? - And
- 0000
- Or
- 0001
- Add
- 0010
- Subtract
- 0110
- Set on Less Than
- 0111
- NOR
- 1100
17ALU Control
Sub, op1, op2
instr function code
Wheres nor?
18Setting the ALU controls
- The instruction Opcode and Function can be used
together. - For R-type Opcode is zero, function code
determines ALU controls. - For I-type Opcode determines ALU controls,
function code is ignored.
Instruction Opcode ALUOp Funct. Code ALU
action ALU control ALU
personality add 00-Rtype 10 100000 add 0 10
sub 00-Rtype 10 100010 subtract
1 10 and 00-Rtype 10 100100 and
0 00 or 00-Rtype 10 100101 or
0 01 SLT 00-Rtype 10 101010 SLT 1 11
load word LW 00 xxxxxx add 0 10 store word SW
00 xxxxxx add 0 10 branch equal BEQ
01 xxxxxx subtract 1 10
New control signal ALUOp (that is, ALU
personality as specified by Opcode) is 00 for
memory, 01 for Branch, and 10 for R-type
5.3
19Controlling the ALU
ALUOp From Opcode F5 F4 F3 F2 F1 F0 Function ALU
Ctrl 00 x x x x x x Add 0 10 x1 x x x x x x
Sub 1 10 1x x x 0 0 0 0 Add 0 10 1x x x 0 0 1 0
Sub 1 10 1x x x 0 1 0 0 And
0 00 1x x x 0 1 0 1 Or 0 01 1x x x 1 0 1 0
SLT 1 11
nor
AluOp is determined by Opcode For ALUOp 00 or
01, function code is unused Since ALUOp can only
be 00, 01, or 10, we dont care what ALUOp2 is
when ALUOP1 is 1
5.3
20Decoding the Instruction
R-type
To ctrllogic
Readreg. A
Readreg. B
Writereg.
To ALUControl
Not Used
Memory,Branch
Write/ Readreg. B
To ctrllogic
Readreg. A
Memory address or Branch PC Offset
One problem - Write register number must come
from two different places (RT or RD).
25-0
31-26
Jump
Opcode
Immediate Data
To ctrllogic
Pseudodirect Concatenation Offset
5.3
21Control Signals
- RegDst
- 0 RT is the write register number for
Mem/Btanch Instructions - 1 RD is the write register number for R-type
Instructions - ALUSrc
- 0 Read Register Bselected as 2nd operand
- 1 Sign-extended immediate data selected as 2nd
operand - PCSrc
- 0 PC incremented by 4
- 1 PC incremented by 4 (sign-extended branch
target X 4) - MemWrite / MemRead
- 1 / 0 Data written to memory
- 0 / 1 Data read from memory
- RegWrite
- 0 Data read from registers to Read Data A B
- 1 Data written to Registers
- MemtoReg
- 0 Register Write data input supplied by ALU
- 1 Register Write data input supplied by memory
- Three ALU control signals
22Instruction Decoding
0
4
Result
1
Add
Result
Sh.Left2
Add
Op31-26
Ctrl
Rs25-21
Read address
Rt20-16
Data Memory
Read address
PC
Zero
Read data
1
Instruction 31-0
Write address
Result
InstructionMemory
0
0
Write data
Rd15-11
1
Imm15-0
16
32
signextend
Opcode 31-26 Read Reg A Rs Read Reg B
Rt Write Reg Either Rd or Rt Immediate Data
15-0
(We can decode the data simply by dividing up the
instruction bus)
5.3
23Control Signals
0
4
Result
1
Load,R-type
Add
BEQ and zero
Result
Sh.Left2
PCSrc
Add
Op31-26
Ctrl
MemWrite
Load
RegWrite
Store
MemToReg
ALUSrc
Rs25-21
Read address
Rt20-16
Data Memory
Reg or Imm?
Read address
PC
Zero
Read data
1
Instruction 31-0
Write address
Result
InstructionMemory
0
0
Write data
Rd15-11
1
RegDest
Imm15-0
00 Memory01 Branch10 R-type
R-type
ALUCtrl
MemRead
16
32
signextend
Load
FC5-0
ALUOp
ALU Control - A function of ALUOp and the
function code
5.3
24Inside the control oval
00Mem01Branch10R-type
1Mem0ALU
0Reg1Imm
0Rt1Rd
1Branch
Reg ALU Mem Reg Mem Mem Instruction Opcode Wri
te Src To Reg Dest Read Write PCSrc ALUOp
R-format 000000 1 0 0 1 0 0 0 10
LW 100011 1 1 1 0 1 0 0 00
SW 101011 0 1 x x 0 1 0 00
BEQ 000100 0 0 x x 0 0 1 01
- This control logic can be decoded in several
ways - Random logic, PLA, PAL
- Just build hardware that looks for the 4 opcodes
- For each opcode, assert the appropriate signals
5.3
Note BEQ must also check the zero output of the
ALU...
25Control Signals
0
4
Result
1
Add
Result
Sh.Left2
Add
PCSrc
BEQ
MemToReg
Ctrl
MemRead
MemWrite
Op31-26
ALUOp
ALUSrc
RegWrite
RegDest
Rs25-21
Read
Write
Read address
Rt20-16
Data Memory
Read address
PC
Zero
Read data
1
Instruction 31-0
Write address
Result
InstructionMemory
0
0
Write data
Rd15-11
1
Imm15-0
ALUCtrl
16
32
signextend
FC5-0
5.3
26Jumping
Sh.Left2
Concat.
0
4
Result
1
31-28
Add
Result
Sh.Left2
PCSrc
Add
Jump
J25-0
BEQ
MemToReg
Ctrl
MemRead
MemWrite
Op31-26
ALUOp
ALUSrc
RegWrite
RegDest
Rs25-21
Read
Write
Read address
Rt20-16
Data Memory
Read address
PC
Zero
Read data
1
Instruction 31-0
Write address
Result
InstructionMemory
0
0
Write data
Rd15-11
1
Imm15-0
ALUCtrl
16
32
signextend
FC5-0
5.3
27Performance
- Determination of Cycle Time
- Based on delays through functional units
- Example
- Memory 3 ns
- ALU, adders 2 ns
- Register file 1 ns
- Assume all other delays are zero
28Performance
- Determination of Cycle Time - Functions Performed
by Various Instruction Classes - Instruction Fetch ( IF ) 3 ns
- Register Access ( R ) 1 ns
- ALU ( ALU ) 2 ns
- Memory Access ( M ) 3 ns
- Arith/Logic IF R ALU R 7 ns
- Load IF R ALU M R 10 ns
- Store IF R ALU M 9 ns
- Branch IF R ALU 6 ns
- Jump IF
3 ns - 10 ns must be allowed every time
29Performance
- Determination of Cycle Time. Total Delay Based on
Instruction Type - Arith/Logic 7
- Load 10
- Store 9
- Branch 6
- Jump 3
- Compare this to the use of a variable length
cycle time. - Consider the following instruction mix (for every
100 instructions...) - Arith/Logic 50 50 7 35
- Load 25 25 10 25
- Store 10 10 9 90
- Branch 14 14 6 84
- Jump 3 3 3 9
- ------
- 783
- Average cycle time 7.83 ns
30Performance
- Determination of Cycle Time
- Compare two approaches
- Using a variable length cycle time 1.28 speedup
- This becomes significantly greater if
instructions requiring larger instruction times
are incorporated - More powerful operations (FP, multiplication,
division) - Additional addressing modes
- Cycle time could increase from 3-4 functional
unit delays to 10s or 100s - Violates key design principle Make the common
case fast! - In addition, each functional unit is used only
once per cycle, leading to redundancy
duplication of hardware - A More Efficient Design
- Multicycle Instructions -use only the clock
cycles necessary - Pipelining - overlapping instructions