CSCE 212 Chapter 6 Enhancing Performance with Pipelining - PowerPoint PPT Presentation

About This Presentation

Title:

CSCE 212 Chapter 6 Enhancing Performance with Pipelining

Description:

CSCE 212 Chapter 6 Enhancing Performance with Pipelining Instructor: Jason D. Bakos – PowerPoint PPT presentation

Number of Views:82

Avg rating:3.0/5.0

Slides: 22

Provided by: Jaso1231

Learn more at: https://www.cse.sc.edu

Category:

more less

Transcript and Presenter's Notes

Title: CSCE 212 Chapter 6 Enhancing Performance with Pipelining

1
CSCE 212Chapter 6Enhancing Performance with
Pipelining

Instructor Jason D. Bakos

2
Pipelining
3
MIPS Pipeline

Basic idea
Execute multiple instructions in parallel
Split instruction execution into 5 stages
Instructions execute in assembly-line

fetch
decode
execute
memory
write back
op/func
ctrl/NOOP
control
MemoryDataIn
address
A
MemRead MemWrite Address MemoryOut MemoryIn
PC
RegFile
rs/rt
ALU
R
B
SE/imm SE/imm4
4
SHAMT
A, B registers control for execute/memory/wb rs/
rt/rd
instruction register
R register control for memory/wb rs/rt/rd
MDR register control for wb rs/rt/rd
4
Pipelined MIPS
5
Pipelined MIPS
6
Pipelined Control
7
Pipelined Control
8
Pipelined Control
9
MIPS ISA

MIPS pipeline stages
Fetch (F)
read next instruction from memory, increment
address counter
assume 1 cycle to access memory
Decode (D)
read register operands, resolve instruction in
control signals, compute branch target
Execute (E)
execute arithmetic/resolve branches
Memory (M)
perform load/store accesses to memory, take
branches
assume 1 cycle to access memory
Write back (W)
write arithmetic results to register file

10
Hazards

Hazards are data flow problems that arise as a
result of pipelining
Limits the amount of parallelism, sometimes
induces penalties that prevent one instruction
per clock cycle
Structural hazards
Two operations require a single piece of hardware
Structural hazards can be overcome by adding
additional hardware
Control hazards
Conditional control instructions are not resolved
until late in the pipeline, requiring subsequent
instruction fetches to be predicted
Flushed if prediction does not hold (make sure no
state change)
Branch hazards can use dynamic prediction/speculat
ion, branch delay slot
Data hazards
Instruction from one pipeline stage is
dependant of data computed in another pipeline
stage

11
Hazards

Data hazards
Register values read in decode, written during
write-back
RAW hazard occurs when dependent inst. separated
by less than 2 slots
Examples
ADD 2,X,X (E) ADD 2,X,X (M) ADD 2,3,4
(W)
ADD X,2,X (D)
ADD X,2,X (D)
ADD X,2,3 (D)
In most cases, data generated in same stage as
data is required (EX)
Data forwarding
ADD 2,X,X (M) ADD 2,X,X (W) ADD 2,3,4
(out-of-pipe)
ADD X,2,X (E)
ADD X,2,X (E)
ADD X,2,3 (E)

12
Load Hazards

Stalls required when data is not produced in same
stage as it is needed for a subsequent
instruction
Example
LW 2, 0(X) (M)
ADD X, 2 (E)
When this occurs, insert a bubble into EX
state, stall F and D
LW 2, 0(X) (W)
NOOP (M)
ADD X, 2 (E)
Forward from W to E

13
Data Hazards Forwarding
14
Data Hazards Stalling for Load Hazard
15
Control Hazards

Need to make a branch decision based on data that
has yet to be produced
add 2,3,4
beqz 2,loop
Which stage is branch resolved?
Approaches
stall
insert bubbles after all branches
always predict untaken
if taken, instructions entering DEC and EX (and
MEM?) transfer as NOOPs
branch delay slot
instruction following branch is always executed
dynamic branch predictors

16
Control Hazards

Instructions are fetched every clock cycle
Branch decisions happen in the EX stage
Solutions
Assume branch not taken (performs a flush of IF,
ID, EX by inserting a nop into the pipeline
registers on the clock edge)
Reduce the delay by moving the branch decision up
Requires additional hardware (comparators, etc.)
Might increase cycle time, since register read
and resolution are now in series and must be
performed in half a cycle to allow for parallel
register writes!
Requires forwarding and stall hardware for new
data hazards

17
Example
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

add 6,5,2
lw 7,0(6)
addi 7,7,10
add 6,4,2
sw 7,0(6)
addi 2,2,4
blt 2,3,loop
add 6,5,2

8 instructions, 15 - 4 cycles, CPI 11/8
18
Moving up Branch Resolution
19
Moving up Branch Resolution
20
Scheduling the Branch Delay Slot
21
Dynamic Branch Prediction