Lecture 6: Advanced Pipelines - PowerPoint PPT Presentation

About This Presentation
Title:

Lecture 6: Advanced Pipelines

Description:

Lecture 6: Advanced Pipelines Multi-cycle in-order pipelines and out-of-order pipelines (Appendix A, Sections 3.5-3.6) – PowerPoint PPT presentation

Number of Views:108
Avg rating:3.0/5.0
Slides: 21
Provided by: RajeevB65
Category:

less

Transcript and Presenter's Notes

Title: Lecture 6: Advanced Pipelines


1
Lecture 6 Advanced Pipelines
  • Multi-cycle in-order pipelines and out-of-order
  • pipelines (Appendix A, Sections 3.5-3.6)

2
Control Hazards
  • Simple techniques to handle control hazard
    stalls
  • for every branch, introduce a stall cycle (note
    every
  • 6th instruction is a branch!)
  • assume the branch is not taken and start
    fetching the
  • next instruction if the branch is taken,
    need hardware
  • to cancel the effect of the wrong-path
    instruction
  • fetch the next instruction (branch delay slot)
    and
  • execute it anyway if the instruction turns
    out to be
  • on the correct path, useful work was done
    if the
  • instruction turns out to be on the wrong
    path,
  • hopefully program state is not lost

3
Branch Delay Slots
4
Slowdowns from Stalls
  • Perfect pipelining with no hazards ? an
    instruction
  • completes every cycle (total cycles num
    instructions)
  • ? speedup increase in clock speed num
    pipeline stages
  • With hazards and stalls, some cycles ( stall
    time) go by
  • during which no instruction completes, and then
    the stalled
  • instruction completes
  • Total cycles number of instructions stall
    cycles
  • Slowdown because of stalls 1/ (1 stall
    cycles per instr)

5
Pipeline Implementation
  • Signals for the muxes have to be generated
    some of this can happen during ID
  • Need look-up tables to identify situations that
    merit bypassing/stalling the
  • number of inputs to the muxes goes up

6
Detecting Control Signals
Situation Example code Action
No dependence LD R1, 45(R2) DADD R5, R6, R7 DSUB R8, R6, R7 OR R9, R6, R7 No hazards
Dependence requiring stall LD R1, 45(R2) DADD R5, R1, R7 DSUB R8, R6, R7 OR R9, R6, R7 Detect use of R1 during ID of DADD and stall
Dependence overcome by forwarding LD R1, 45(R2) DADD R5, R6, R7 DSUB R8, R1, R7 OR R9, R6, R7 Detect use of R1 during ID of DSUB and set mux control signal that accepts result from bypass path
Dependence with accesses in order LD R1, 45(R2) DADD R5, R6, R7 DSUB R8, R6, R7 OR R9, R1, R7 No action required
7
Multicycle Instructions
Functional unit Latency Initiation interval
Integer ALU 1 1
Data memory 2 1
FP add 4 1
FP multiply 7 1
FP divide 25 25
8
Effects of Multicycle Instructions
  • Structural hazards if the unit is not fully
    pipelined (divider)
  • Frequent RAW hazard stalls
  • Potentially multiple writes to the register file
    in a cycle
  • WAW hazards because of out-of-order instr
    completion
  • Imprecise exceptions because of o-o-o instr
    completion

9
Precise Exceptions
  • On an exception
  • must save PC of instruction where program must
    resume
  • all instructions after that PC that might be in
    the pipeline
  • must be converted to NOPs (other instructions
    continue
  • to execute and may raise exceptions of their
    own)
  • temporary program state not in memory (in other
    words,
  • registers) has to be stored in memory
  • potential problems if a later instruction has
    already
  • modified memory or registers
  • A processor that fulfils all the above
    conditions is said to
  • provide precise exceptions (useful for
    debugging and of
  • course, correctness)

10
Dealing with these Effects
  • Multiple writes to the register file increase
    the number of
  • ports, stall one of the writers during ID,
    stall one of the
  • writers during WB (the stall will propagate)
  • WAW hazards detect the hazard during ID and
    stall the
  • later instruction
  • Imprecise exceptions buffer the results if they
    complete
  • early or save more pipeline state so that you
    can return to
  • exactly the same state that you left at

11
ILP
  • Instruction-level parallelism overlap among
    instructions
  • pipelining or multiple instruction execution
  • What determines the degree of ILP?
  • dependences property of the program
  • hazards property of the pipeline

12
Types of Dependences
  • Data dependences an instr produces a result for
    another
  • (true dependence, results in RAW hazards in a
    pipeline)
  • Name dependences two instrs that use the same
    names
  • (anti and output dependences, result in WAR and
    WAW
  • hazards in a pipeline)
  • Control dependences an instructions execution
    depends
  • on the result of a branch re-ordering should
    preserve
  • exception behavior and dataflow

13
An Out-of-Order Processor Implementation
Reorder Buffer (ROB)
Branch prediction and instr fetch
Instr 1 Instr 2 Instr 3 Instr 4 Instr 5 Instr 6
T1 T2 T3 T4 T5 T6
Register File R1-R32
R1 ? R1R2 R2 ? R1R3 BEQZ R2 R3 ? R1R2 R1 ?
R3R2
Decode Rename
T1 ? R1R2 T2 ? T1R3 BEQZ T2 T4 ? T1T2 T5 ?
T4T2
ALU
ALU
ALU
Instr Fetch Queue
Results written to ROB and tags broadcast to IQ
Issue Queue (IQ)
14
Design Details - I
  • Instructions enter the pipeline in order
  • No need for branch delay slots if prediction
    happens in time
  • Instructions leave the pipeline in order all
    instructions
  • that enter also get placed in the ROB the
    process of an
  • instruction leaving the ROB (in order) is
    called commit
  • an instruction commits only if it and all
    instructions before
  • it have completed successfully (without an
    exception)
  • To preserve precise exceptions, a result is
    written into the
  • register file only when the instruction commits
    until then,
  • the result is saved in a temporary register in
    the ROB

15
Design Details - II
  • Instructions get renamed and placed in the issue
    queue
  • some operands are available (T1-T6 R1-R32),
    while
  • others are being produced by instructions in
    flight (T1-T6)
  • As instructions finish, they write results into
    the ROB (T1-T6)
  • and broadcast the operand tag (T1-T6) to the
    issue queue
  • instructions now know if their operands are
    ready
  • When a ready instruction issues, it reads its
    operands from
  • T1-T6 and R1-R32 and executes (out-of-order
    execution)
  • Can you have WAW or WAR hazards? By using more
  • names (T1-T6), name dependences can be avoided

16
Design Details - III
  • If instr-3 raises an exception, wait until it
    reaches the top
  • of the ROB at this point, R1-R32 contain
    results for all
  • instructions up to instr-3 save registers,
    save PC of instr-3,
  • and service the exception
  • If branch is a mispredict, flush all
    instructions after the
  • branch and start on the correct path
    mispredicted instrs
  • will not have updated registers (the branch
    cannot commit
  • until it has completed and the flush happens as
    soon as the
  • branch completes)
  • Potential problems ?

17
Managing Register Names
Temporary values are stored in the register file
and not the ROB
Logical Registers R1-R32
Physical Registers P1-P64
At the start, R1-R32 can be found in
P1-P32 Instructions stop entering the pipeline
when P64 is assigned
R1 ? R1R2 R2 ? R1R3 BEQZ R2 R3 ? R1R2
P33 ? P1P2 P34 ? P33P3 BEQZ P34 P35 ? P33P34
What happens on commit?
18
The Commit Process
  • On commit, no copy is required
  • The register map table is updated the
    committed value
  • of R1 is now in P33 and not P1 on an
    exception, P33 is
  • copied to memory and not P1
  • An instruction in the issue queue need not
    modify its
  • input operand when the producer commits
  • When instruction-1 commits, we no longer have
    any use
  • for P1 it is put in a free pool and a new
    instruction can
  • now enter the pipeline ? for every instr that
    commits, a
  • new instr can enter the pipeline ? number of
    in-flight
  • instrs is a constant number of extra (rename)
    registers

19
The Alpha 21264 Out-of-Order Implementation
Reorder Buffer (ROB)
Branch prediction and instr fetch
Instr 1 Instr 2 Instr 3 Instr 4 Instr 5 Instr 6
Register File P1-P64
Register Map Table R1?P1 R2?P2
R1 ? R1R2 R2 ? R1R3 BEQZ R2 R3 ? R1R2 R1 ?
R3R2
Decode Rename
P33 ? P1P2 P34 ? P33P3 BEQZ P34 P35 ?
P33P34 P36 ? P35P34
ALU
ALU
ALU
Instr Fetch Queue
Results written to regfile and tags broadcast to
IQ
Issue Queue (IQ)
20
Title
  • Bullet
Write a Comment
User Comments (0)
About PowerShow.com