CSCE 212 Chapter 6 Enhancing Performance with Pipelining - PowerPoint PPT Presentation

About This Presentation
Title:

CSCE 212 Chapter 6 Enhancing Performance with Pipelining

Description:

CSCE 212 Chapter 6 Enhancing Performance with Pipelining Instructor: Jason D. Bakos – PowerPoint PPT presentation

Number of Views:79
Avg rating:3.0/5.0
Slides: 22
Provided by: Jaso1231
Learn more at: https://www.cse.sc.edu
Category:

less

Transcript and Presenter's Notes

Title: CSCE 212 Chapter 6 Enhancing Performance with Pipelining


1
CSCE 212Chapter 6Enhancing Performance with
Pipelining
  • Instructor Jason D. Bakos

2
Pipelining
3
MIPS Pipeline
  • Basic idea
  • Execute multiple instructions in parallel
  • Split instruction execution into 5 stages
  • Instructions execute in assembly-line

fetch
decode
execute
memory
write back
op/func
ctrl/NOOP
control
MemoryDataIn
address
A
MemRead MemWrite Address MemoryOut MemoryIn
PC
RegFile
rs/rt
ALU
R
B
SE/imm SE/imm4
4
SHAMT
A, B registers control for execute/memory/wb rs/
rt/rd
instruction register
R register control for memory/wb rs/rt/rd
MDR register control for wb rs/rt/rd
4
Pipelined MIPS
5
Pipelined MIPS
6
Pipelined Control
7
Pipelined Control
8
Pipelined Control
9
MIPS ISA
  • MIPS pipeline stages
  • Fetch (F)
  • read next instruction from memory, increment
    address counter
  • assume 1 cycle to access memory
  • Decode (D)
  • read register operands, resolve instruction in
    control signals, compute branch target
  • Execute (E)
  • execute arithmetic/resolve branches
  • Memory (M)
  • perform load/store accesses to memory, take
    branches
  • assume 1 cycle to access memory
  • Write back (W)
  • write arithmetic results to register file

10
Hazards
  • Hazards are data flow problems that arise as a
    result of pipelining
  • Limits the amount of parallelism, sometimes
    induces penalties that prevent one instruction
    per clock cycle
  • Structural hazards
  • Two operations require a single piece of hardware
  • Structural hazards can be overcome by adding
    additional hardware
  • Control hazards
  • Conditional control instructions are not resolved
    until late in the pipeline, requiring subsequent
    instruction fetches to be predicted
  • Flushed if prediction does not hold (make sure no
    state change)
  • Branch hazards can use dynamic prediction/speculat
    ion, branch delay slot
  • Data hazards
  • Instruction from one pipeline stage is
    dependant of data computed in another pipeline
    stage

11
Hazards
  • Data hazards
  • Register values read in decode, written during
    write-back
  • RAW hazard occurs when dependent inst. separated
    by less than 2 slots
  • Examples
  • ADD 2,X,X (E) ADD 2,X,X (M) ADD 2,3,4
    (W)
  • ADD X,2,X (D)
  • ADD X,2,X (D)
  • ADD X,2,3 (D)
  • In most cases, data generated in same stage as
    data is required (EX)
  • Data forwarding
  • ADD 2,X,X (M) ADD 2,X,X (W) ADD 2,3,4
    (out-of-pipe)
  • ADD X,2,X (E)
  • ADD X,2,X (E)
  • ADD X,2,3 (E)

12
Load Hazards
  • Stalls required when data is not produced in same
    stage as it is needed for a subsequent
    instruction
  • Example
  • LW 2, 0(X) (M)
  • ADD X, 2 (E)
  • When this occurs, insert a bubble into EX
    state, stall F and D
  • LW 2, 0(X) (W)
  • NOOP (M)
  • ADD X, 2 (E)
  • Forward from W to E

13
Data Hazards Forwarding
14
Data Hazards Stalling for Load Hazard
15
Control Hazards
  • Need to make a branch decision based on data that
    has yet to be produced
  • add 2,3,4
  • beqz 2,loop
  • Which stage is branch resolved?
  • Approaches
  • stall
  • insert bubbles after all branches
  • always predict untaken
  • if taken, instructions entering DEC and EX (and
    MEM?) transfer as NOOPs
  • branch delay slot
  • instruction following branch is always executed
  • dynamic branch predictors

16
Control Hazards
  • Instructions are fetched every clock cycle
  • Branch decisions happen in the EX stage
  • Solutions
  • Assume branch not taken (performs a flush of IF,
    ID, EX by inserting a nop into the pipeline
    registers on the clock edge)
  • Reduce the delay by moving the branch decision up
  • Requires additional hardware (comparators, etc.)
  • Might increase cycle time, since register read
    and resolution are now in series and must be
    performed in half a cycle to allow for parallel
    register writes!
  • Requires forwarding and stall hardware for new
    data hazards

17
Example
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
  • add 6,5,2
  • lw 7,0(6)
  • addi 7,7,10
  • add 6,4,2
  • sw 7,0(6)
  • addi 2,2,4
  • blt 2,3,loop
  • add 6,5,2

8 instructions, 15 - 4 cycles, CPI 11/8
18
Moving up Branch Resolution
19
Moving up Branch Resolution
20
Scheduling the Branch Delay Slot
21
Dynamic Branch Prediction
  • Assume taken/not-taken (static)
  • Loops have branches that are usually taken
  • When wrong, we flush pipeline stages
  • Deeper pipelines have higher branch penalties
    (misprediction penalty)
  • Solution
  • Look up address of branch to check if branch was
    previously taken
  • One-bit schemes
  • Two-bit schemes (must be wrong twice to change
    prediction)
Write a Comment
User Comments (0)
About PowerShow.com