The Processor: Datapath and Control - PowerPoint PPT Presentation

About This Presentation

Title:

The Processor: Datapath and Control

Description:

The Processor: Datapath and Control We will design a microprocessor that includes a subset of the MIPS instruction set: Memory access: load/store word (lw, sw) – PowerPoint PPT presentation

Number of Views:91

Avg rating:3.0/5.0

Slides: 30

Provided by: Daniel985

Category:

more less

Transcript and Presenter's Notes

Title: The Processor: Datapath and Control

1
The Processor Datapath and Control

We will design a microprocessor that includes a
subset of the MIPS instruction set
Memory access load/store word (lw, sw)
AL instructions add, sub, and, or, and slt.
Branch instructions beq and jump (j).
The subset doesn't include all the integer nor
any fp instructions but the principle is the
same.
For every instruction the first two steps are
identical
Fetch an instruction from where the PC points to
in memory.
Decode the instruction and read the registers or
memory contents specified.

2
Abstract View of the DataPath

The data path contains 2 types of logic elements
Combinational Elements that operate on data
values. Their outputs depend on their inputs. The
ALU is an combinnational element.
State Elements with internal storage. Their
state is defined by the values they contain
(memory and registers).

3
Clocking Methodology

A state element has at least two inputs and one
output. The inputs are the data value to be
written into the element and the clock signal
which determines when the value will be written.
The output is the data value stored in the
element. Thus a state element can be read from at
any time but written depending on the clock.
A clocking methodology defines when signals can
be read and written. This is crucial (?????) to
the correct design of a computer.
We will assume an edge-triggered clocking
methodology. Any values stored in the machine are
updated only on a clock edge.

4
Edge-Triggered Clocking

Because only stateelements can storevalues, any
collectionof combinational logicmust have its
inputscoming from a set of state elements and
its outputs written to set of state elements. The
time necessary for the signals to reach element 2
defines the length of the clock cycle.
An edge-triggered methodologyallows us to read
the contents of an register, send the value
through some combinational logic and write that
register in thesame clock cycle. We assume that
state elements have implicit clock signals.

5
Fetching an Instruction

A memory unit will hold the instructions that are
to be executed. The address of the next
instruction is in the PC. We need an ALU that
performs only addition in order to calculate the
next instruction to fetch.
Thick arrows symbolize 32-bit buses unless
specified differently. Thin arrows specify 1-bit
lines, colored lines specify control lines.

6
The Register File

The R-type instructions (also called the
arithmetic-logical instructions) read the
contents of 2 registers, perform an ALU op. , and
write the result back into a third register.
The 32 registers are stored in the register file.
The register file has 3 5-bit inputs to specify
the registers, 2 32-bit outputs for the data
read, 1 32-bit input for the data written and 1
control signal to decide if data should be
written in. In addition we will need an ALU to
perform the operations.

7
Data Memory

The 2 elements needed to implement load and store
instructions are data memory and a unit that
sign-extends the 16-bit constant in an I-type
instruction. In addition we use the existing ALU
to compute the address to access.
The data memory has 2 32-bit inputs, the address
and the write data, and 1 32-input the read data.
In addition it has 2 control lines MemWrite and
MemRead.

8
Branch Equal

The beq instruction has 3 operands two registers
that are compared for equality and a 16-bit
offset used to compute the branch address
relative to the PC. To implement this instruction
we must add the sign-extend offset to the PC.
There are 2 important details1. The base for
the address calculation is the address afterthe
current instruction's address. But since we
compute PC4 when fetching we already have this
address2. The offset is in words not bytes so
we have to shift left the offset by 2.

9
Combining ALU and Memory Instructions

The ALU datapath (slide 6) and the Memory
datapath (slide 7) are similar. The differences
are
The second input to the ALU is a register
(R-type) or the sign-extended offset (I-type).
The value stored into the destination register
comes from the ALU (R-type) or from memory
(I-type) .
Using 2 multiplexors (Mux) we can combine both
datapaths.

10
The Complete Datapath

This simple processor can compute ALU
instructions, access memory or compute the next
instruction's address in a single cycle.

11
ALU Control

The ALU has 3 control inputs, we use 5 of the 8
possible input combinations000 AND001 OR010
add110 subtract111 slt
The ALU control uses as its inputs the funct
field of the instruction and a 2-bit control
field called the ALUOp.
For lw/sw the ALU computes the address using
addition (ALUOp00), for the R-type instructions
the ALU performs one of 5 actions depending on
the function field of the instruction (ALUOp10),
for beq the ALU performs a subtraction
(ALUOp01).
The ALU control is a large truth table that given
the funct field and ALUOp outputs 3-bit controls
for the ALU.

12
Main Control

Look at the formats of the R-type and I-type
instructionsField opcode rs rt rd shamt
funct Bits 31-26 25-21 20-16 15-11
10-6 5-0Field opcode rs rt
address Bits 31-26 25-21 20-16
15-0
The following observations can be made
The opcode is always in bits 31-26
The 2 registers to be read are always the rs
(25-21) and rt (20-16) fields (R-type, beq, and
store).
The base register for load/ store instructions is
always rs (25-21)
The 16-bit offset for beq, lw,sw is always in
bits (15-0)
The destination register is in one of two places
For a lw it is rt (20-16), for a R-type it is rd
(15-11). Thus we need a MUX to select which field
of the instruction is written.

13
The Main Control Signals

There are 7 control signals in our
microprocessor, let's see what happens when they
are asserted (set to 1) and deasserted (set to
0)Signal Deasserted AssertedRegDst
The Write reg is rt The Write reg is
rdRegWrite None The Write register is
written
with the Write data
ALUSrc The 2nd ALU operand The 2nd ALU
operand is the comes from
the register file is the 16-bit addressPCSrc
PCPC 4 PCBranch targetMemRead
None Memory contents at the

address input are put on the

Read data outputMemWrite None
Memory contents at the

address input are replaced by

the Write data inputMemtoReg The value
of the reg. Write The value of the reg. Write
data input is from the ALU
data input is from memory

14
Main Control Diagram
15
Opcode to Control

The control lines are determined by the opcodes
of the instructions. The exception is the PCSrc
line which is dependent on the output of the beq
instruction as well (x means don't care).
Line R-type lw sw beqRegDst
1 0 x xALUSrc 0 1 1 0MemtoReg 0 1 x xRegW
rite 1 1 0 0MemRead 0 1 0 0MemWrite 0 0 1
0Branch 0 0 0 1ALUOp 10 00 00 01
At this stage the Control is a block box, which
receives inputs and gives outputs.

16
Operation of the Datapath

Let's see the stages of execution of a R-type
instruction add t1,t2,t3
1. An instruction is fetched from memory, the PC
is incremented
2. Two registers t2 and t3 are read from the
register file.
3. The ALU operates on the data read from the
register file.
4. The results of the ALU is written into the
register t3.
This doesn't really happen in 4 steps because the
implementation is combinational, but at the end
of the clock cycle the result is written into the
destination register.
Let's look at lw t1,offset(t2)
1. An instruction is fetched from memory, the PC
is incremented
2. The register t2 is read from the register
file.
3. The ALU computes the sum of t2 and the
sign-extended offset.
4. The sum from the ALU is used as the address
for the data memory.
5. The data from memory is written into register
t1.

17
Adding the Jump Instruction

The j instruction uses pseudodirect addressing,
the upper 4 bits of PC4 are concatenated
(???????) to the 26 bits (shifted left by 2) of
the address in the J-type instruction.

18
Performance of Single-Cycle Machines

Let's assume that the operation time for the
following units is Memory - 2 nanoseconds (ns),
ALU and adders - 2 ns, Register file - 1 ns. We
will assume that MUXs, control, sign-extension,
PC accesses, and wires have no delays.
Which implementation is faster? 1. Every
instruction operates in 1 clock cycle of fixed
length.2. Every instruction operates in a
varying length clock cycle.
Lets look at the time needed by each
instructionInst. Fetch Reg. Rd ALU op
Memory Reg. Wr TotalR-Type 2
1 2 0 1
6nsLoad 2 1
2 2 1
8nsStore 2 1 2
2 7nsBranch
2 1 2
5nsJump 2

2ns

19
Fixed vs. Variable Cycle Length

Lets Assume a program has the following
instruction mix 24 loads, 12 stores, 44
R-type, 18 branchs, 2 jumps.
CPU execution time Instruction count Cycle
time
For the fixed cycle length the cycle time is 8
ns, long enough for the longest instruction
(load). Thus each instruction takes 8 ns to
execute.
For the variable cycle time the average CPU clock
cycle is824 712 644 518 22
6.3 ns
It is obvious that the variable clock
implementation is faster but it is extremely hard
to implement.
So why not use the single cycle implementation
which is only 6.3/8 78 slower?
When adding instructions such as multiply and
divide which can take tens of cycles this scheme
is too slow.

20
A Multicycle Implementation

We broke each instruction into several steps, we
can use these steps to build a multicycle
implementation. Each step takes 1 cycle, the
multicycle implementation allows a functional
unit to be used more than once in each
instruction as long as it is used on different
clock cycles.

We now have only a single memory unit and a
single ALU. In addition we need registers to hold
the output of each stage.
21
New Registers and MUXs

We have now added several new registers(which
hare transparent to the programmer) and some new
MUXs
Instruction Register (IR) - the instruction
fetched
Memory Data Register (MDR) - data read from
memory
A, B - registers read from the register file
ALUOut - result of ALU operation
The new MUXs added are
An additional MUX to the 1st ALU input, chooses
between the A register and the PC.
The MUX on the 2nd ALU input is changed from a
2-way to a 4-way MUX. The additional inputs are
the constant 4 (used to increment the PC) and the
sign-extended and shifted offset field (used in
beq).

22
Multicycle Diagram

There are 3 possible sources for the PC value 1.
The output of the ALU which is PC4 2. The
register ALUOut which is the address of the
computed branch target 3. The lower 26 bits of
the IR shifted left by 2, concatenated with the 4
upper bits of the PC.

23
The Instruction Execution Stages (1,2)

1. Instruction Fetch (IF)- Fetch the instruction
from memory and compute the address of the next
sequential addressIR MemoryPC PC PC 4
2. Instruction Decode (ID) and register fetch -
get the registers from the register file and
compute the potential branch address (even if it
isn't needed in the future)A
RegIR25-21B RegIR20-16ALUOut PC
(sign-extended(IR15-0)ltlt2)

24
The Instruction Execution Stages (3)

3. Execution (EX), Memory address computation or
branch completion - In this stage the operation
is determined by the the instruction class A.
Memory reference ALUOut A
sign-extended(IR15-0)B. R-type ALUOut
A op BC. Branch if (A B) PC
ALUOutD. Jump PC PC31-28 cat
(IR25-0ltlt2)

25
The Instruction Execution Stages (4,5)

4. Memory access (Mem) or R-type completion -
During this step the load/store instruction
accesses memory or the AL instruction write its
results.A. Memory reference MDR
MemoryALUOut (load) MemoryALUOut B
(store)B. R-type RegIR15-11 ALUOut
5. Memory read completion step - The load
completes by writing the value from memory into a
register.RegIR20-16MDR

26
Cycles Per Instruction (CPI)

The CPI of a program defines how many cycles an
average instruction takes. Assuming an
instruction mix (for the gcc compiler) of 22
loads, 11 stores, 49 R-type, 16 branches, and
2 jumps what is the CPI, assuming each state
requires one clock cycle?
The number of clock cycles for each instruction
format isLoads 5 Stores 4 R-type 4
Branches 3 Jumps 3
Thus the CPI 0.225 (0.11 0.49)4 (0.16
0.02)3 4.04
This is better than the worst case CPI in which
each instruction would have taken the same number
of clock cycles.

27
Exceptions

One of the most hardest parts of control is
implementing exceptions and interrupts, events
other than branches and jumps which change the
normal flow of instruction execution.
An exception is an unexpected event that happens
during program execution such as an arithmetic
overflow or an illegal instruction (which are the
only 2 in our design).
An interrupt is an event that is external to the
processor, such as requests by I/O devices.
When an exception occurs the machine must save
the address of the offending instruction in the
exception program counter (EPC), and then
transfer execution to the OS. The OS might
service the exception and return control to the
program or terminate execution.

28
Causes of Exceptions

In order for the OS to handle the exception it
must know the cause of the exception. MIPS has a
register called the Cause register which holds
the reason of the exception.
A second method is called vectored interrupts. In
a vectored interrupt the address to which control
is transferred is determined by the exception
cause. The OS knows the cause of the exception by
the address that is jumped to.
We need two additional registers the EPC which
holds the address of the instruction and the
Cause Register which holds 0 for an undefined
instruction and 1 for arithmetic overflow.
We will need 2 control signals to write to the
EPC and cause registers (EPCWrite and CauseWrite)
and a signal to set the LSB of the Cause register
(IntCause).

29
Datapath with Exceptions

IntCause is defined by the control if it can't
decode the instruction or if the ALU signals an
overflow. The next PC MUX now has 4 inputs, the
exception handler addr is added

Write a Comment

User Comments (0)