The Processor: Datapath and Control - PowerPoint PPT Presentation

About This Presentation
Title:

The Processor: Datapath and Control

Description:

The Processor: Datapath and Control We will design a microprocessor that includes a subset of the MIPS instruction set: Memory access: load/store word (lw, sw) – PowerPoint PPT presentation

Number of Views:91
Avg rating:3.0/5.0
Slides: 30
Provided by: Daniel985
Category:

less

Transcript and Presenter's Notes

Title: The Processor: Datapath and Control


1
The Processor Datapath and Control
  • We will design a microprocessor that includes a
    subset of the MIPS instruction set
  • Memory access load/store word (lw, sw)
  • AL instructions add, sub, and, or, and slt.
  • Branch instructions beq and jump (j).
  • The subset doesn't include all the integer nor
    any fp instructions but the principle is the
    same.
  • For every instruction the first two steps are
    identical
  • Fetch an instruction from where the PC points to
    in memory.
  • Decode the instruction and read the registers or
    memory contents specified.

2
Abstract View of the DataPath
  • The data path contains 2 types of logic elements
  • Combinational Elements that operate on data
    values. Their outputs depend on their inputs. The
    ALU is an combinnational element.
  • State Elements with internal storage. Their
    state is defined by the values they contain
    (memory and registers).

3
Clocking Methodology
  • A state element has at least two inputs and one
    output. The inputs are the data value to be
    written into the element and the clock signal
    which determines when the value will be written.
    The output is the data value stored in the
    element. Thus a state element can be read from at
    any time but written depending on the clock.
  • A clocking methodology defines when signals can
    be read and written. This is crucial (?????) to
    the correct design of a computer.
  • We will assume an edge-triggered clocking
    methodology. Any values stored in the machine are
    updated only on a clock edge.

4
Edge-Triggered Clocking
  • Because only stateelements can storevalues, any
    collectionof combinational logicmust have its
    inputscoming from a set of state elements and
    its outputs written to set of state elements. The
    time necessary for the signals to reach element 2
    defines the length of the clock cycle.
  • An edge-triggered methodologyallows us to read
    the contents of an register, send the value
    through some combinational logic and write that
    register in thesame clock cycle. We assume that
    state elements have implicit clock signals.

5
Fetching an Instruction
  • A memory unit will hold the instructions that are
    to be executed. The address of the next
    instruction is in the PC. We need an ALU that
    performs only addition in order to calculate the
    next instruction to fetch.
  • Thick arrows symbolize 32-bit buses unless
    specified differently. Thin arrows specify 1-bit
    lines, colored lines specify control lines.

6
The Register File
  • The R-type instructions (also called the
    arithmetic-logical instructions) read the
    contents of 2 registers, perform an ALU op. , and
    write the result back into a third register.
  • The 32 registers are stored in the register file.
    The register file has 3 5-bit inputs to specify
    the registers, 2 32-bit outputs for the data
    read, 1 32-bit input for the data written and 1
    control signal to decide if data should be
    written in. In addition we will need an ALU to
    perform the operations.

7
Data Memory
  • The 2 elements needed to implement load and store
    instructions are data memory and a unit that
    sign-extends the 16-bit constant in an I-type
    instruction. In addition we use the existing ALU
    to compute the address to access.
  • The data memory has 2 32-bit inputs, the address
    and the write data, and 1 32-input the read data.
    In addition it has 2 control lines MemWrite and
    MemRead.

8
Branch Equal
  • The beq instruction has 3 operands two registers
    that are compared for equality and a 16-bit
    offset used to compute the branch address
    relative to the PC. To implement this instruction
    we must add the sign-extend offset to the PC.
  • There are 2 important details1. The base for
    the address calculation is the address afterthe
    current instruction's address. But since we
    compute PC4 when fetching we already have this
    address2. The offset is in words not bytes so
    we have to shift left the offset by 2.

9
Combining ALU and Memory Instructions
  • The ALU datapath (slide 6) and the Memory
    datapath (slide 7) are similar. The differences
    are
  • The second input to the ALU is a register
    (R-type) or the sign-extended offset (I-type).
  • The value stored into the destination register
    comes from the ALU (R-type) or from memory
    (I-type) .
  • Using 2 multiplexors (Mux) we can combine both
    datapaths.

10
The Complete Datapath
  • This simple processor can compute ALU
    instructions, access memory or compute the next
    instruction's address in a single cycle.

11
ALU Control
  • The ALU has 3 control inputs, we use 5 of the 8
    possible input combinations000 AND001 OR010
    add110 subtract111 slt
  • The ALU control uses as its inputs the funct
    field of the instruction and a 2-bit control
    field called the ALUOp.
  • For lw/sw the ALU computes the address using
    addition (ALUOp00), for the R-type instructions
    the ALU performs one of 5 actions depending on
    the function field of the instruction (ALUOp10),
    for beq the ALU performs a subtraction
    (ALUOp01).
  • The ALU control is a large truth table that given
    the funct field and ALUOp outputs 3-bit controls
    for the ALU.

12
Main Control
  • Look at the formats of the R-type and I-type
    instructionsField opcode rs rt rd shamt
    funct Bits 31-26 25-21 20-16 15-11
    10-6 5-0Field opcode rs rt
    address Bits 31-26 25-21 20-16
    15-0
  • The following observations can be made
  • The opcode is always in bits 31-26
  • The 2 registers to be read are always the rs
    (25-21) and rt (20-16) fields (R-type, beq, and
    store).
  • The base register for load/ store instructions is
    always rs (25-21)
  • The 16-bit offset for beq, lw,sw is always in
    bits (15-0)
  • The destination register is in one of two places
    For a lw it is rt (20-16), for a R-type it is rd
    (15-11). Thus we need a MUX to select which field
    of the instruction is written.

13
The Main Control Signals
  • There are 7 control signals in our
    microprocessor, let's see what happens when they
    are asserted (set to 1) and deasserted (set to
    0)Signal Deasserted AssertedRegDst
    The Write reg is rt The Write reg is
    rdRegWrite None The Write register is
    written
    with the Write data
    ALUSrc The 2nd ALU operand The 2nd ALU
    operand is the comes from
    the register file is the 16-bit addressPCSrc
    PCPC 4 PCBranch targetMemRead
    None Memory contents at the

    address input are put on the

    Read data outputMemWrite None
    Memory contents at the

    address input are replaced by

    the Write data inputMemtoReg The value
    of the reg. Write The value of the reg. Write
    data input is from the ALU
    data input is from memory

14
Main Control Diagram
15
Opcode to Control
  • The control lines are determined by the opcodes
    of the instructions. The exception is the PCSrc
    line which is dependent on the output of the beq
    instruction as well (x means don't care).
  • Line R-type lw sw beqRegDst
    1 0 x xALUSrc 0 1 1 0MemtoReg 0 1 x xRegW
    rite 1 1 0 0MemRead 0 1 0 0MemWrite 0 0 1
    0Branch 0 0 0 1ALUOp 10 00 00 01
  • At this stage the Control is a block box, which
    receives inputs and gives outputs.

16
Operation of the Datapath
  • Let's see the stages of execution of a R-type
    instruction add t1,t2,t3
  • 1. An instruction is fetched from memory, the PC
    is incremented
  • 2. Two registers t2 and t3 are read from the
    register file.
  • 3. The ALU operates on the data read from the
    register file.
  • 4. The results of the ALU is written into the
    register t3.
  • This doesn't really happen in 4 steps because the
    implementation is combinational, but at the end
    of the clock cycle the result is written into the
    destination register.
  • Let's look at lw t1,offset(t2)
  • 1. An instruction is fetched from memory, the PC
    is incremented
  • 2. The register t2 is read from the register
    file.
  • 3. The ALU computes the sum of t2 and the
    sign-extended offset.
  • 4. The sum from the ALU is used as the address
    for the data memory.
  • 5. The data from memory is written into register
    t1.

17
Adding the Jump Instruction
  • The j instruction uses pseudodirect addressing,
    the upper 4 bits of PC4 are concatenated
    (???????) to the 26 bits (shifted left by 2) of
    the address in the J-type instruction.

18
Performance of Single-Cycle Machines
  • Let's assume that the operation time for the
    following units is Memory - 2 nanoseconds (ns),
    ALU and adders - 2 ns, Register file - 1 ns. We
    will assume that MUXs, control, sign-extension,
    PC accesses, and wires have no delays.
  • Which implementation is faster? 1. Every
    instruction operates in 1 clock cycle of fixed
    length.2. Every instruction operates in a
    varying length clock cycle.
  • Lets look at the time needed by each
    instructionInst. Fetch Reg. Rd ALU op
    Memory Reg. Wr TotalR-Type 2
    1 2 0 1
    6nsLoad 2 1
    2 2 1
    8nsStore 2 1 2
    2 7nsBranch
    2 1 2
    5nsJump 2

    2ns

19
Fixed vs. Variable Cycle Length
  • Lets Assume a program has the following
    instruction mix 24 loads, 12 stores, 44
    R-type, 18 branchs, 2 jumps.
  • CPU execution time Instruction count Cycle
    time
  • For the fixed cycle length the cycle time is 8
    ns, long enough for the longest instruction
    (load). Thus each instruction takes 8 ns to
    execute.
  • For the variable cycle time the average CPU clock
    cycle is824 712 644 518 22
    6.3 ns
  • It is obvious that the variable clock
    implementation is faster but it is extremely hard
    to implement.
  • So why not use the single cycle implementation
    which is only 6.3/8 78 slower?
  • When adding instructions such as multiply and
    divide which can take tens of cycles this scheme
    is too slow.

20
A Multicycle Implementation
  • We broke each instruction into several steps, we
    can use these steps to build a multicycle
    implementation. Each step takes 1 cycle, the
    multicycle implementation allows a functional
    unit to be used more than once in each
    instruction as long as it is used on different
    clock cycles.

We now have only a single memory unit and a
single ALU. In addition we need registers to hold
the output of each stage.
21
New Registers and MUXs
  • We have now added several new registers(which
    hare transparent to the programmer) and some new
    MUXs
  • Instruction Register (IR) - the instruction
    fetched
  • Memory Data Register (MDR) - data read from
    memory
  • A, B - registers read from the register file
  • ALUOut - result of ALU operation
  • The new MUXs added are
  • An additional MUX to the 1st ALU input, chooses
    between the A register and the PC.
  • The MUX on the 2nd ALU input is changed from a
    2-way to a 4-way MUX. The additional inputs are
    the constant 4 (used to increment the PC) and the
    sign-extended and shifted offset field (used in
    beq).

22
Multicycle Diagram
  • There are 3 possible sources for the PC value 1.
    The output of the ALU which is PC4 2. The
    register ALUOut which is the address of the
    computed branch target 3. The lower 26 bits of
    the IR shifted left by 2, concatenated with the 4
    upper bits of the PC.

23
The Instruction Execution Stages (1,2)
  • 1. Instruction Fetch (IF)- Fetch the instruction
    from memory and compute the address of the next
    sequential addressIR MemoryPC PC PC 4
  • 2. Instruction Decode (ID) and register fetch -
    get the registers from the register file and
    compute the potential branch address (even if it
    isn't needed in the future)A
    RegIR25-21B RegIR20-16ALUOut PC
    (sign-extended(IR15-0)ltlt2)

24
The Instruction Execution Stages (3)
  • 3. Execution (EX), Memory address computation or
    branch completion - In this stage the operation
    is determined by the the instruction class A.
    Memory reference ALUOut A
    sign-extended(IR15-0)B. R-type ALUOut
    A op BC. Branch if (A B) PC
    ALUOutD. Jump PC PC31-28 cat
    (IR25-0ltlt2)

25
The Instruction Execution Stages (4,5)
  • 4. Memory access (Mem) or R-type completion -
    During this step the load/store instruction
    accesses memory or the AL instruction write its
    results.A. Memory reference MDR
    MemoryALUOut (load) MemoryALUOut B
    (store)B. R-type RegIR15-11 ALUOut
  • 5. Memory read completion step - The load
    completes by writing the value from memory into a
    register.RegIR20-16MDR

26
Cycles Per Instruction (CPI)
  • The CPI of a program defines how many cycles an
    average instruction takes. Assuming an
    instruction mix (for the gcc compiler) of 22
    loads, 11 stores, 49 R-type, 16 branches, and
    2 jumps what is the CPI, assuming each state
    requires one clock cycle?
  • The number of clock cycles for each instruction
    format isLoads 5 Stores 4 R-type 4
    Branches 3 Jumps 3
  • Thus the CPI 0.225 (0.11 0.49)4 (0.16
    0.02)3 4.04
  • This is better than the worst case CPI in which
    each instruction would have taken the same number
    of clock cycles.

27
Exceptions
  • One of the most hardest parts of control is
    implementing exceptions and interrupts, events
    other than branches and jumps which change the
    normal flow of instruction execution.
  • An exception is an unexpected event that happens
    during program execution such as an arithmetic
    overflow or an illegal instruction (which are the
    only 2 in our design).
  • An interrupt is an event that is external to the
    processor, such as requests by I/O devices.
  • When an exception occurs the machine must save
    the address of the offending instruction in the
    exception program counter (EPC), and then
    transfer execution to the OS. The OS might
    service the exception and return control to the
    program or terminate execution.

28
Causes of Exceptions
  • In order for the OS to handle the exception it
    must know the cause of the exception. MIPS has a
    register called the Cause register which holds
    the reason of the exception.
  • A second method is called vectored interrupts. In
    a vectored interrupt the address to which control
    is transferred is determined by the exception
    cause. The OS knows the cause of the exception by
    the address that is jumped to.
  • We need two additional registers the EPC which
    holds the address of the instruction and the
    Cause Register which holds 0 for an undefined
    instruction and 1 for arithmetic overflow.
  • We will need 2 control signals to write to the
    EPC and cause registers (EPCWrite and CauseWrite)
    and a signal to set the LSB of the Cause register
    (IntCause).

29
Datapath with Exceptions
  • IntCause is defined by the control if it can't
    decode the instruction or if the ALU signals an
    overflow. The next PC MUX now has 4 inputs, the
    exception handler addr is added
Write a Comment
User Comments (0)
About PowerShow.com