Pipelining - PowerPoint PPT Presentation

About This Presentation
Title:

Pipelining

Description:

... Kaufmann Publishers. Pipelining. Multiple instructions are overlapped in execution. Instruction fetch and execution is divided into steps. ... – PowerPoint PPT presentation

Number of Views:13
Avg rating:3.0/5.0
Slides: 47
Provided by: toda76
Category:

less

Transcript and Presenter's Notes

Title: Pipelining


1
Pipelining
  • Multiple instructions are overlapped in
    execution.
  • Instruction fetch and execution is divided into
    steps.
  • A stage in the pipeline takes care of a step.
  • All stages in the pipeline operate concurrently.
  • We must have separate resources for each stage.
  • In MIPS
  • five steps
  • each step takes from 1 to 2 ns
  • the nonpipelined execution of an instruction
    takes from 5 to 8 ns
  • the pipelined execution of an instruction takes
    10 ns (all instructions executed in a similar
    way)
  • if pipeline is full, new output every 2 ns

2
Pipelining
  • Improve performance by increasing instruction
    throughput
  • Ideal speedup is number of stages in the
    pipeline. We dont always achieve it.

3
Pipelining
  • Notice that
  • the register file operations take 1 ns
  • writing is done during the first half of the
    clock cycle
  • reading is done during the second half of the
    clock cycle
  • This will help us later

4
Pipelining
  • What makes it easy
  • all instructions are the same length
  • just a few instruction formats
  • memory operands appear only in loads and stores
  • aligned data one memory access for one data
    item
  • Hazards make it hard
  • next instruction cannot execute in the following
    clock cycle
  • structural, control and data hazards
  • Well build a simple pipeline and look at these
    issues
  • Well talk about modern processors and what
    really makes it hard
  • exception handling
  • trying to improve performance with out-of-order
    execution, etc.

5
Hazards
  • structural hazards
  • competition in accessing hardware resources
  • e.g accessing the memory at the same time
  • control hazards
  • problems in controlling the program flow
  • e.g. branch instructions
  • data hazards
  • accessing data that is not yet complete
  • e.g. an instruction depends on a previous one

6
Resolving Structural Hazard
  • Suppose, that we had a single memory instead of
    two memories.
  • Data accesses from the memory would be
    simultaneous to instruction fetches.
  • Some structural hazards can be resolved with
    extra hardware. If not, stall the pipeline

7
Resolving Control Hazard by Pipeline Stall
  • Assumption enough extra hardware so that all
    branch computations are ready in stage 2.
  • The next instruction is stalled one extra clock
    cycle before starting.
  • For longer pipelines we often cannot resolve the
    branch in the second stage, thus we need another
    better solution.

8
Resolving Control Hazard by Prediction
  • Simple approach always predict that branches
    will fail
  • right the pipeline proceeds at full speed
  • wrong the pipeline stalls
  • Another approach predict that branches to an
    earlier address are taken
  • usually right in the case of a loop
  • Dynamic prediction keep a history for each
    branch as taken or untaken.
  • When the guess is wrong, instructions following
    the wrongly guessed branch must have no effect.
    The pipeline must be restarted from the proper
    address.

9
Resolving Control Hazard by Prediction
  • Prediction no branch

correct!
wrong!
10
Resolving Control Hazard by Delayed Branch
  • The next sequential instruction is always
    executed.
  • Assemblers and compilers usually fill the branch
    delay slots.
  • an earlier instruction is moved into the delay
    slot
  • if not found, insert NOP

11
Data Hazard
  • An example
  • add s0, t0, t1
  • sub t2, s0, t3
  • add writes in stage 5
  • sub reads data in stage 2
  • three stalls required
  • We cannot rely on compilers to avoid data hazards
    by rearranging the instruction sequence
  • these dependencies happen just too often
  • the delay is just too long
  • Solution forwarding or bypassing
  • we dont need to wait for the instruction to
    complete
  • get the missing item early from the internal
    resources
  • Stalls are still needed in some instruction
    sequences

12
Forwarding
no stalls
load-use data hazard
one stall
13
Instruction steps mapped onto the datapath

14
Pipelined Datapath
  • Reuse of functional units in every clock cycle
  • Additional hardware
  • Separation of pipeline stages by pipeline
    registers
  • Functional units if used by several instructions
    at the same time (for removing structural
    hazards)
  • Extended control
  • Strict sequentialisation of instruction (every
    instruction goes through all stages)
  • Check for hazards
  • Introduce stalls to remove hazards

15
Problems
  • Usually data moves from left to right data
    moving from right to left affects later
    instructions
  • Write back into the register file can lead to
    data hazards
  • Selection of the next value of the PC leads to
    control hazards

16
Pipelined Datapath
17
Pipelined Datapath
  • We must add wide enough pipeline registers to
    store all the data.
  • The write register number must be passed from the
    instruction.
  • Adders
  • single cycle present
  • multicycle absent (ALU took care of
    calculations)
  • pipeline present

18
Graphically Representing Pipelines
  • Can help with answering questions like
  • how many cycles does it take to execute this
    code?
  • what is the ALU doing during cycle 4?
  • use this representation to help understand
    datapaths

19
Traditional Pipeline Diagram
  • Not as informative as the previous one

20
Pipelined Control
21
Pipelined control
  • Data travels through the pipeline stages
  • All data belonging to an instruction must be kept
    together
  • Information transfer only through pipeline
    registers
  • Control information must travel with the
    instruction

22
Pipelined control
  • Instruction fetch / PC Increment
  • identical for all instructions
  • read instruction memory
  • write PC
  • Instruction decode / Register file read
  • identical for all instructions
  • Execution / address calculation
  • signals RegDst, ALUOp, ALUSrc
  • Memory access
  • signals Branch, MemRead, MemWrite
  • Write Back
  • signals MemtoReg, RegWrite

23
Pipelined Control
  • Pass control signals along just like the data

bits 11-15/ reg/
new mem/
16-20 instr
PC ALU
24
Datapath with Control
25
Dependencies
  • Problem with starting next instruction before
    present is finished
  • dependencies that go backward in time are data
    hazards

26
Software Solution
  • Have compiler guarantee no hazards
  • Insert no operations sub 2, 1, 3
    nop and 12, 2, 5 nop or 13,
    6, 2 add 14, 2, 2 sw 15, 100(2)
  • Problem this really slows us down!

27
Dependency Detection
  • Hazard conditions
  • EX/MEM.RegisterRd ID/EX.RegisterRs next
  • EX/MEM.RegisterRd ID/EX.RegisterRt
    instruction
  • MEM/WB.RegisterRd ID/EX.RegisterRs after
    two
  • MEM/WB.RegisterRd ID/EX.RegisterRt
    instructions

28
Forwarding
  • register file forwarding to handle read/write to
    same register
  • ALU forwarding

29
ALU without Forwarding
e
g
i
s
t
e
r
R
d
30
ALU with Forwarding
31
Forwarding MUX Control Values
  • MUX control Source Explanation
  • ForwardA00 ID/EX 1st ALU operand
    comes from register file
  • ForwardA10 EX/MEM 1st ALU operand
    forwarded from the prior ALU result
  • ForwardA01 MEM/WB 1st ALU operand
    forwarded from data memory or an earlier
    ALU result
  • ForwardB00 ID/EX 2nd ALU operand
    comes from register file
  • ForwardB10 EX/MEM 2nd ALU operand
    forwarded from the prior ALU result
  • ForwardB01 MEM/WB 2nd ALU operand
    forwarded from data memory or an earlier
    ALU result

32
Data Hazards and Stalls
  • Forwarding does not solve all problems.
  • Load word can still cause a hazard
  • lw 2, 20(1)
  • and 4, 2, 5
  • An instruction tries to read a register following
    a load instruction that writes to the same
    register.
  • We need a hazard detection unit to stall the
    pipeline.

33
Data Hazards and Stalls
34
Hazard Detection
  • if (ID/EX.MemRead and
  • ((ID/EX.RegisterRt IF/ID.RegisterRs) or
  • (ID/EX.RegisterRt IF/ID.RegisterRt)))
  • stall the pipeline
  • check for load instructions
  • check if the register to be loaded is part of
    the current instruction
  • We can stall the pipeline by keeping an
    instruction in the same stage.

35
Stalling
36
Hazard Detection Unit
  • The hazard detection unit stalls if the load-use
    hazard test is true.

37
Branch Hazards
  • When we decide to branch, other instructions are
    in the pipeline!
  • We are predicting branch not taken
  • need to add hardware for flushing instructions if
    we are wrong

38
Reducing the Delay of Branches
  • Move branch decision earlier in the pipeline, so
    that fewer instructions need be flushed.
  • Select branch address either at
  • end of EX stage (two cycle penalty) or at
  • end of ID stage (one cycle penalty)
  • Move the branch address adder to ID stage
  • Branch detection in ID stage
  • EXCLUSIVE-OR of the bits of the registers
  • OR of the results
  • Clear instruction field in IF/ID pipeline ?
    creates a NOP

39
Flushing Instructions

40
Dynamic Branch Prediction
  • Analyse the branch history
  • keep a list of recent branch instructions
  • save low order bits of the address only - limits
    the precision, but its only a prediction
  • Action
  • if branch is taken, set mark
  • if branch is not taken, reset mark
  • Prediction accuracy is limited (twice wrong
    in a loop)
  • Improvement 2 bit prediction scheme
  • prediction must be wrong twice before it is
    changed
  • better prediction for loops (once wrong in a
    loop)

41
Exceptions
  • Some exception types
  • Overflow
  • Illegal opcode
  • Invoking an operating system service
  • I/O device request
  • Actions
  • Load PC with exception handling address
  • Flush instructions from the pipeline
  • Leave registers untouched
  • Save offending instruction address in EPC

42
Exceptions
43
Final Data Path
44
Superscalar and Dynamic Pipelining
  • Superpipelining
  • Increased number of pipeline stages
  • Superscalar
  • Increased number of parallel units
  • Multiple instructions issued in one cycle
  • Parallel instructions must be independent
  • Usually all units arent replicated ? limitations
  • Dynamic pipeline scheduling
  • Rescheduling of instructions by hardware to avoid
    pipeline stalls
  • Out of order execution is possible
  • Speculative execution and dynamic branch
    prediction

45
Superscalar MIPS
  • Two instructions in parallel (ALU oper OR
    branch) AND (load OR store)

46
Dynamically Scheduled Pipeline
Write a Comment
User Comments (0)
About PowerShow.com