CSC 4250 Computer Architectures - PowerPoint PPT Presentation

About This Presentation
Title:

CSC 4250 Computer Architectures

Description:

Simple pipeline fetches an instruction, decodes it, and checks for hazards (structural and data) ... Issue Decode instructions; check for structural and WAW hazards ... – PowerPoint PPT presentation

Number of Views:28
Avg rating:3.0/5.0
Slides: 28
Provided by: stude6
Learn more at: http://www.cs.rpi.edu
Category:

less

Transcript and Presenter's Notes

Title: CSC 4250 Computer Architectures


1
CSC 4250Computer Architectures
  • September 29, 2006Appendix A. Pipelining

2
Static Pipeline Scheduling
  • Simple pipeline fetches an instruction, decodes
    it, and checks for hazards (structural and data)
  • If no hazard, then issue instruction
  • If there is hazard, then stall pipeline - no new
    instructions will be fetched or issued
  • Compiler may schedule instructions to avoid the
    hazard - static scheduling

3
Dynamic Pipeline Scheduling
  • Hardware rearranges instruction execution to
    reduce stalls
  • Scoreboarding technique of CDC6600
  • Tomasulos algorithm (Chapter 3)
  • We do in-order instruction issue - if an
    instruction is stalled in the pipeline, then no
    later instructions can proceed
  • What if later instructions are independent?
  • Example DIV.D F0,F2,F4
  • ADD.D F10,F0,F8
  • MUL.D F6,F6,F14
  • We want to issue and execute MUL instruction
    while ADD instruction waits for the result of DIV

4
Scoreboarding
  • In a dynamically scheduled pipeline, all
    instructions pass through the issue stage in
    order (in-order issue) however, they can be
    stalled or they can bypass each other in the
    second stage (read operands) and enter execution
    out of order
  • Scoreboarding is a technique for allowing
    instructions to execute out of order when there
    are sufficient resources and no data dependences
    it is named after the CDC 6600 scoreboard, which
    developed this capability

5
First Supercomputer
  • CDC Control Data Corporation
  • In 1964 CDC delivered the first CDC6600
  • The machine was unique in many ways
  • It introduced scoreboarding
  • It was the first processor to make extensive use
    of multiple functional units. It had 16 separate
    FUs, including 4 FP units, 5 units for memory
    references and 7 units for integer operations
  • It had peripheral processors that used
    multithreading
  • The interaction between pipelining and IS design
    was understood, and a simple, load-store
    instruction set was used to promote pipelining

6
Structural and Data Hazards
  • Before, no instruction issue if there is either
    structural or data hazard
  • Data hazards include WAW, RAW and WAR
  • Now, issue instruction if no structural hazard
    and no WAW data hazard
  • Example DIV.D F0,F2,F4
  • ADD.D F10,F0,F8
  • MUL.D F6,F6,F14
  • So, all three instructions will be issued
  • Read operands when no RAW hazards

7
Record Keeping
  • Every instruction goes through the scoreboard,
    where a record of the data dependences is
    constructed this step corresponds to instruction
    issue and replaces part of the ID step in the
    MIPS pipeline
  • The scoreboard determines when the instruction
    can read its operands and begin operation (RAW
    hazards)
  • If the scoreboard decides that the instruction
    cannot execute immediately, it monitors every
    change in the hardware and decides when the
    instruction can execute
  • The scoreboard controls when an instruction can
    write its result into the destination register
    (WAR hazards)

8
Split ID Stage into Two Stages
  • Issue - Decode instructions check for structural
    and WAW hazards
  • Read operands - Wait until no RAW hazards then
    read operands
  • No Issue DIV.D F0,F2,F4
  • ADD.D F10,F0,F8
  • SUB.D F6,F6,F14 (why no issue?)
  • No Issue DIV.D F0,F2,F4
  • ADD.D F10,F0,F8
  • MUL.D F0,F6,F14 (why no issue?)

9
MIPS Processor with Scoreboard
10
Four Steps in Execution
  • Issue - if no structural nor WAW hazards
  • Read operands - if no RAW hazards
  • Execute - if both operands are received
  • Write result - if no WAR hazards
  • We concentrate on FP operations and do not
    consider a step for memory access

11
Step One. Issue
  • If a functional unit (FU) for the instruction is
    free and no other active instruction has the same
    destination register, the scoreboard issues the
    instruction to the FU and updates its internal
    data structure
  • By ensuring that no other active FU wants to
    write its result into the destination register,
    we guarantee that WAW hazards cannot be present
  • If a structural or WAW hazard exists, then the
    instruction issue stalls, and no further
    instructions will issue until these hazards are
    cleared

12
Step Two. Read Operands
  • The scoreboard monitors the availability of the
    source operands. A source operand is available if
    no earlier issued active instruction is going to
    write it.
  • When the source operands are available, the
    scoreboard tells the FU to proceed to read the
    operands from the registers and begin execution.
  • The scoreboard resolves RAW hazards dynamically
    in this step, and instructions may be sent into
    execution out of order.
  • The operands for an instruction are read only
    when both operands are available in the register
    file. The scoreboard does not take advantage of
    forwarding.
  • Issue and Read Operands together replace the ID
    stage of the simple MIPS pipeline.

13
Step Three. Execution
  • The FU begins execution upon receiving operands
  • When the result is ready, the FU notifies the
    scoreboard that it has completed execution
  • This step replaces the EX stage in the MIPS
    pipeline and takes multiple cycles in the MIPS FP
    pipeline

14
Step Four. Write Result
  • Once it is aware that the FU has completed
    execution, the scoreboard checks for WAR hazards
    and stalls the completing instruction, if
    necessary
  • In general, a completing instruction cannot be
    allowed to write its results when
  • There is an instruction that has not read its
    operands that precedes (i.e., in order of issue)
    the completing instruction, and
  • One of the operands is the same register as the
    result of the completing instruction
  • If WAR hazard does not exist, or when it clears,
    the scoreboard tells the FU to store its result
    to the destination register
  • This step replaces the WB step in the simple MIPS
    pipeline

15
Example (p. A-72)
  • L.D F6,34(R2)
  • L.D F2,45(R3)
  • MUL.D F0,F2,F4
  • SUB.D F8,F6,F2
  • DIV.D F10,F0,F6
  • ADD.D F6,F8,F2

16
Scoreboard
  • Three parts
  • Instruction status -
  • indicates which of four steps of instruction
  • Functional unit status -
  • busy, op, Fi, Fj, Fk, Qj, Qk, Rj, Rk
  • Register result status -
  • indicates which functional unit will write
    each register, if instruction is active

17
Example
  • Code
  • L.D F6,34(R2)
  • L.D F2, 45(R3)
  • MUL.D F0,F2,F4
  • SUB.D F8,F6,F2
  • DIV.D F10,F0,F6
  • ADD.D F6,F8,F2

18
Scoreboard Tables 1 (Fill in blanks)
Instruction Issue Read operands Exec. complete Write result
L.D F6,34(R2) v v
L.D F2,45(R3)
MUL.D F0,F2,F4
SUB.D F8,F6,F2
DIV.D F10,F0,F6
ADD.D F6,F8,F2
Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer
Mult1
Mult2
Add
Divide
F0 F2 F4 F6 F8 F10 F12 F30
FU
19
Scoreboard Tables 2 (Fill in blanks)
Instruction Issue Read operands Exec. complete Write result
L.D F6,34(R2) v v v v
L.D F2,45(R3) v v v
MUL.D F0,F2,F4 v
SUB.D F8,F6,F2 v
DIV.D F10,F0,F6 v
ADD.D F6,F8,F2
Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer
Mult1
Mult2
Add
Divide
F0 F2 F4 F6 F8 F10 F12 F30
FU
20
Scoreboard Tables 3 (Fill in blanks)
Instruction Issue Read operands Exec. complete Write result
L.D F6,34(R2) v v v v
L.D F2,45(R3) v v v v
MUL.D F0,F2,F4 v v v
SUB.D F8,F6,F2 v v v v
DIV.D F10,F0,F6 v
ADD.D F6,F8,F2 v v v
Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer
Mult1
Mult2
Add
Divide
F0 F2 F4 F6 F8 F10 F12 F30
FU
21
Scoreboard Tables 4 (Fill in blanks)
Instruction Issue Read operands Exec. complete Write result
L.D F6,34(R2) v v v v
L.D F2,45(R3) v v v v
MUL.D F0,F2,F4 v v v v
SUB.D F8,F6,F2 v v v v
DIV.D F10,F0,F6 v v v
ADD.D F6,F8,F2 v v v v
Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer
Mult1
Mult2
Add
Divide
F0 F2 F4 F6 F8 F10 F12 F30
FU
22
Required Checks
Instruction status Wait until
Issue Not busyFU and not ResultD
Read operands Rj and Rk
Execution complete Functional unit done
Write results For every f ( ( Fjf?FiFU or RjfNo ) ( Fkf?FiFU or RkfNo ) )
23
WAR Hazard
  • WAR hazard exists
  • if another instr. has this instr.s destination
    (FiFU) as a source (Fjf or Fkf), and
  • if some other instruction has flagged the
    register (Rj Yes or Rk Yes)
  • Test on write-result prevents write if WAR hazard
    exists

24
Costs and Benefits of Scoreboarding
  • Reported performance improvement of 1.7 for
    FORTRAN programs and 2.5 for hand-coded assembly
    language.
  • Scoreboard had about as much logic as a FU -
    surprisingly low.
  • Main cost was large number of buses - about four
    times as many as would be required if CPU only
    executed instructions in order.

25
Factors Limiting Scoreboarding
  1. Amount of Parallelism available among the
    instructions - This determines whether
    independent instructions can be found to execute.
    If each instruction depends on its predecessor,
    no dynamic scheduling scheme can reduce stalls.
  2. Amount of Scoreboard Entries - This determines
    how far ahead the pipeline can look for
    independent instructions. The set of instructions
    examined as candidates for potential execution is
    called the window. The size of the scoreboard
    determines the size of the window.
  3. Number and Types of FUs - This determines the
    importance of structural hazards.
  4. Presence of Antidependences and Output
    Dependences - These lead to WAR and WAW stalls.

26
A.9. Fallacies and Pitfalls
  • Unexpected execution may cause unexpected
    hazards. It looks like that WAW hazards should
    never occur in a code sequence because no
    compiler would ever generate two writes to the
    same register without an intervening read. But
    they can occur when the sequence is unexpected.
    For example, the first write might be in the
    delay slot of a taken branch. Here is an example
  • BNEZ R1,foo
  • DIV.D F0,F2,F4 moved into delay slot
  • from fall through
  • ..
  • ..
  • foo L.D F0,qrs
  • If the branch is taken, then before DIV.D can
    complete, the L.D will reach WB, causing a WAW
    hazard.

27
How Extensive Pipelining Affects Performance
  • Extensive pipelining can impact other aspects of
    a design, leading to overall worse
    cost-performance
  • The best example of this phenomenon comes from
    two implementations of the VAX, the 8600 and the
    8700
  • When the 8600 was initially delivered, it had a
    cycle time of 80ns. Subsequently, a redesigned
    version called the 8650 with a 55 ns clock was
    introduced.
  • The 8700 had a much simpler pipeline that
    operated at the microinstruction level, yielding
    a smaller CPU with a faster clock cycle of 45ns
  • The overall outcome is that the 8650 had a CPI
    advantage of about 20, but the 8700 had a clock
    rate that was about 20 faster. Thus, the 8700
    achieved the same performance with much less
    hardware
Write a Comment
User Comments (0)
About PowerShow.com