EECS 470 - PowerPoint PPT Presentation

1 / 46
About This Presentation
Title:

EECS 470

Description:

EECS 470 Pipeline Hazards Lecture 4 Coverage: Appendix A – PowerPoint PPT presentation

Number of Views:138
Avg rating:3.0/5.0
Slides: 47
Provided by: GaryT158
Category:

less

Transcript and Presenter's Notes

Title: EECS 470


1
EECS 470
  • Pipeline Hazards
  • Lecture 4
  • Coverage Appendix A

2
Basic Pipelining
  • Data hazards
  • What are they?
  • How do you detect them?
  • How do you deal with them?
  • Micro-architectural changes
  • Pipeline depth
  • Pipeline width
  • Forwarding ISA

3
Fetch Decode Execute Memory WB
M U X
1
target
PC1
PC1
0
R0
eq?

R1
regA
ALU result

R2
Inst mem
Register file
regB
valA
M U X
PC
Data memory
instruction

R3
ALU result
mdata

R4
valB

R5

R6
M U X
data

R7
offset
dest
valB
Bits 0-2
dest
dest
dest
Bits 16-18
M U X
Bits 22-24
op
op
op
IF/ ID
ID/ EX
EX/ Mem
Mem/ WB
4
Fetch Decode Execute Memory WB
M U X
1
target
PC1
PC1
0
R0
eq?

R1
regA
ALU result

R2
Inst mem
Register file
regB
valA
M U X
PC
Data memory
instruction

R3
ALU result
mdata

R4
M U X
valB

R5

R6
M U X
data

R7
offset
dest
valB
dest
dest
dest
op
op
op
IF/ ID
ID/ EX
EX/ Mem
Mem/ WB
5
Fetch Decode Execute Memory WB
M U X
1
target
PC1
PC1
0
R0
eq?

R1
regA
ALU result

R2
Inst mem
Register file
regB
valA
M U X
PC
Data memory
instruction

R3
ALU result
mdata

R4
M U X
valB

R5
data

R6
M U X

R7
offset
valB
op
op
op
IF/ ID
ID/ EX
EX/ Mem
Mem/ WB
6
Pipeline function for ADD
  • Fetch read instruction from memory
  • Decode read source operands from reg
  • Execute calculate sum
  • Memory Pass results to next stage
  • Writeback write sum into register file

7
Data Hazards
add 1 2 3 nand 3 4 5
time
add
fetch decode execute memory writeback
nand
fetch decode execute memory
writeback
If not careful, you will read the wrong value of
R3
8
Three approaches to handling data hazards
  • Avoidance
  • Make sure there are no hazards in the code
  • Detect and Stall
  • If hazards exist, stall the processor until they
    go away.
  • Detect and Forward
  • If hazards exist, fix up the pipeline to get the
    correct value (if possible)

9
Handling data hazards avoid all hazards
  • Assume the programmer (or the compiler) knows
    about the processor implementation.
  • Make sure no hazards exist.
  • Put noops between any dependent instructions.

write R3 in cycle 5
add 1 2 3 noop noop nand 3 4 5
read R3 in cycle 6
10
Problems with this solution
  • Old programs (legacy code) may not run correctly
    on new implementations
  • Longer pipelines need more noops
  • Programs get larger as noops are included
  • Especially a problem for machines that try to
    execute more than one instruction every cycle
  • Intel EPIC Often 25 - 40 of instructions are
    noops
  • Program execution is slower
  • CPI is one, but some Is are noops

11
Handling data hazards detect and stall
  • Detection
  • Compare regA with previous DestRegs
  • 3 bit operand fields
  • Compare regB with previous DestRegs
  • 3 bit operand fields
  • Stall
  • Keep current instructions in fetch and decode
  • Pass a noop to execute

12
End of Cycle 1
M U X
1
target
PC1
PC1
0
R0
eq?
14
R1
regA
ALU result
7
R2
Inst mem
Register file
regB
valA
M U X
PC
Data memory
10
R3
add 1 2 3
ALU result
mdata


R4
M U X
valB

R5
data

R6
M U X

R7
offset
valB
op
op
op
IF/ ID
ID/ EX
EX/ Mem
Mem/ WB
13
End of Cycle 2
M U X
1
target
PC1
PC1
0
R0
eq?
14
R1
regA
ALU result
7
R2
Inst mem
Register file
regB
14
M U X
PC
Data memory
10
R3
nand 3 4 5
ALU result
mdata
3

R4
M U X
7

R5
data

R6
M U X

R7
3
valB
add
op
op
IF/ ID
ID/ EX
EX/ Mem
Mem/ WB
14
First half of cycle 3
M U X
1
target
PC1
PC1
0
R0
eq?
3
14
R1
regA
ALU result
7
R2
Inst mem
Register file
regB
14
M U X
PC
Data memory
nand 3 4 5
10
R3
ALU result
mdata
3

R4
M U X
7

R5
data

R6
M U X

R7
3
valB
add
op
op
IF/ ID
ID/ EX
EX/ Mem
Mem/ WB
15
Hazard detected
compare
REG file
regA
3
regB
3
IF/ ID
ID/ EX
16
Hazard detected
1
compare
0 0 0
0 1 1
regA
regB
0 1 1
3
17
Handling data hazards detect and stall the
pipeline until ready
  • Detection
  • Compare regA with previous DestReg
  • 3 bit operand fields
  • Compare regB with previous DestReg
  • 3 bit operand fields
  • Stall
  • Keep current instructions in fetch and decode
  • Pass a noop to execute

18
First half of cycle 3
M U X
1
target
1
2
0
R0
eq?
3
14
R1
regA
ALU result
7
R2
Inst mem
Register file
regB
14
M U X
PC
Data memory
nand 3 4 5
10
R3
ALU result
mdata
3
11
R4
M U X
7

R5
data

R6
M U X

R7

valB
add


IF/ ID
ID/ EX
EX/ Mem
Mem/ WB
19
Handling data hazards detect and stall the
pipeline until ready
  • Detection
  • Compare regA with previous DestReg
  • 3 bit operand fields
  • Compare regB with previous DestReg
  • 3 bit operand fields
  • Stall
  • Keep current instructions in fetch and decode
  • Pass a noop to execute

20
End of cycle 3
M U X
1

2
0
R0

14
R1
regA
ALU result
7
R2
Inst mem
Register file
regB

M U X
PC
Data memory
nand 3 4 5
10
R3
21
mdata
3
11
R4

M U X


R5
data

R6
M U X

R7



add

IF/ ID
ID/ EX
EX/ Mem
Mem/ WB
21
First half of cycle 4
M U X
1

2
0
R0

3
14
R1
regA
ALU result
7
R2
Inst mem
Register file
regB

M U X
PC
Data memory
nand 3 4 5
10
R3
21
mdata
3
11
R4

M U X


R5
data

R6
M U X

R7


noop
add

IF/ ID
ID/ EX
EX/ Mem
Mem/ WB
22
End of cycle 4
M U X
1
2
0
R0

14
R1
regA
21
7
R2
Inst mem
Register file
regB

M U X
PC
Data memory
nand 3 4 5
10
R3


3
11
R4
M U X


R5
data

R6
M U X

R7


noop
noop
add
IF/ ID
ID/ EX
EX/ Mem
Mem/ WB
23
First half of cycle 5
M U X
1
2
0
R0

3
14
R1
regA
21
7
R2
Inst mem
Register file
regB

M U X
PC
Data memory
nand 3 4 5
10
R3


3
11
R4
M U X


R5
data

R6
M U X

R7


noop
noop
add
IF/ ID
ID/ EX
EX/ Mem
Mem/ WB
24
End of cycle 5
M U X
1
2
3
0
R0

14
R1
regA

7
R2
Inst mem
Register file
regB
21
M U X
PC
Data memory
add 3 7 7
21
R3


11
R4
5
M U X
11
77
R5
data
1
R6
M U X
8
R7


nand
noop
noop
IF/ ID
ID/ EX
EX/ Mem
Mem/ WB
25
No more hazard stalling
add 1 2 3 nand 3 4 5
time
add
fetch decode execute memory writeback
nand
fetch decode decode decode
execute
hazard
hazard
We are careful to get the right value of R3
26
Problems with detect and stall
  • CPI increases every time a hazard is detected!
  • Is that necessary? Not always!
  • Re-route the result of the add to the nand
  • nand no longer needs to read R3 from reg file
  • It can get the data later (when it is ready)
  • This lets us complete the decode this cycle
  • But we need more control to remember that the
    data that we arent getting from the reg file at
    this time will be found elsewhere in the pipeline
    at a later cycle.

27
Handling data hazards detect and forward
  • Detection same as detect and stall
  • Except that all 4 hazards are treated differently
  • i.e., you cant logical-OR the 4 hazard signals
  • Forward
  • New datapaths to route computed data to where it
    is needed
  • New Mux and control to pick the right data

28
First half of cycle 3
M U X
1
1
2
0
R0

3
14
R1
regA

7
R2
Inst mem
Register file
regB
14
M U X
PC
Data memory
nand 3 4 5
10
R3


3
11
R4
M U X
7
77
R5
data
1
R6
M U X
8
R7


add


IF/ ID
ID/ EX
EX/ Mem
Mem/ WB
29
End of cycle 3
M U X
1
2
3
0
R0


14
R1
regA

7
R2
Inst mem
Register file
regB
10
M U X
PC
Data memory
add 6 3 7
10
R3
3
21

11
R4
5
M U X
11
77
R5
data
1
R6
M U X
8
R7


nand
add

IF/ ID
ID/ EX
EX/ Mem
Mem/ WB
30
First half of cycle 4
M U X
1
2
3
0
R0

21
14
R1
regA

M U X
3
7
R2
Inst mem
Register file
regB
10
M U X
PC
Data memory
add 6 3 7
10
R3
3
21

11
R4
11
5
M U X
11
77
R5
data
1
R6
M U X
8
R7


nand
add

IF/ ID
ID/ EX
EX/ Mem
Mem/ WB
31
End of cycle 4
M U X
1
3
4
0
R0

14
R1
regA
21
M U X
7
R2
Inst mem
Register file
regB
1
M U X
PC
Data memory
lw 3 6 10
10
R3
-2

11
R4
7
5
3
M U X
10
77
R5
data
1
R6
M U X
8
R7


add
nand
add
IF/ ID
ID/ EX
EX/ Mem
Mem/ WB
32
First half of cycle 5
M U X
1
3
4
No Hazard
0
R0

3
14
R1
regA
21
M U X
7
R2
Inst mem
Register file
regB
1
M U X
PC
Data memory
lw 3 6 10
10
R3
-2

11
R4
7
5
3
M U X
10
77
R5
data
1
R6
M U X
8
R7


add
nand
add
IF/ ID
ID/ EX
EX/ Mem
Mem/ WB
33
End of cycle 5
M U X
1
4
5
0
R0


14
R1
regA
-2
M U X
7
R2
Inst mem
Register file
regB
21
M U X
PC
Data memory
sw 6 2 12
21
R3
22

6
11
R4
7
5
M U X

77
R5
data
1
R6
M U X
8
R7
10

lw
add
nand
IF/ ID
ID/ EX
EX/ Mem
Mem/ WB
34
First half of cycle 6
M U X
1
4
5
Hazard
0
R0

6

14
R1
regA
-2
M U X
7
R2
Inst mem
Register file
regB
21
M U X
PC
Data memory
sw 6 2 12
21
R3
22

11
R4
6
7
5
M U X

77
R5
L
1
R6
M U X
data
8
R7
10

lw
add
nand
IF/ ID
ID/ EX
EX/ Mem
Mem/ WB
35
End of cycle 6
M U X
1

5
0
R0


14
R1
regA
22
M U X
7
R2
Inst mem
Register file
regB

M U X
PC
Data memory
sw 6 2 12
21
R3
31

11
R4
6
7
M U X

-2
R5
data
1
R6
M U X
8
R7



lw
add
IF/ ID
ID/ EX
EX/ Mem
Mem/ WB
36
First half of cycle 7
M U X
1

5
Hazard
0
R0

6

14
R1
regA
22
M U X
7
R2
Inst mem
Register file
regB

M U X
PC
Data memory
sw 6 2 12
21
R3
31

11
R4
6
7
M U X

-2
R5
data
1
R6
M U X
8
R7


noop
lw
add
IF/ ID
ID/ EX
EX/ Mem
Mem/ WB
37
End of cycle 7
M U X
1
5

0
R0



14
R1
regA

M U X
7
R2
Inst mem
Register file
regB
1
M U X
PC
Data memory

21
R3

99
11
R4

6
M U X
7
-2
R5
data
1
R6
M U X
22
R7
12

sw
noop
lw
IF/ ID
ID/ EX
EX/ Mem
Mem/ WB
38
First half of cycle 8
M U X
1
5

0
R0



14
R1
regA

M U X
7
R2
Inst mem
Register file
regB
1
M U X
PC
Data memory

21
R3

99
11
R4

6
M U X
7
-2
R5
data
1
R6
M U X
8
R7
12

sw
noop
lw
IF/ ID
ID/ EX
EX/ Mem
Mem/ WB
39
End of cycle 8
M U X
1


0
R0



14
R1
regA

M U X
7
R2
Inst mem
Register file
regB

M U X
PC
Data memory

21
R3
111

11
R4


M U X

-2
R5
data
99
R6
M U X
8
R7

7

sw
noop
IF/ ID
ID/ EX
EX/ Mem
Mem/ WB
40
FP pipeline support
I
add
M1
M2
M3
M4
M5
M6
M7
Mem
WB
fetch
decode
FP multiply
A1
A2
A3
A4
FP adder
Non-pipelined divide
41
Adding pipeline stages
  • Pipeline frontend
  • Fetch, Decode
  • Pipeline middle
  • Execute
  • Pipeline backend
  • Memory, Writeback

42
Adding stages to fetch, decode
  • Delays hazard detection
  • No change in forwarding paths
  • No performance penalty with respect to data
    hazards

43
Adding stages to execute
  • Check for structural hazards
  • ALU not pipelined
  • Multiple ALU ops completing at same time
  • Data hazards may cause delays
  • If multicycle op hasn't computed data before the
    dependent instruction is ready to execute
  • Performance penalty for each stall

44
Adding stages to memory, writeback
  • Instructions ready to execute may need to wait
    longer for multi-cycle memory stage
  • Adds more pipeline registers
  • Thus more source registers to forward
  • More complex hazard detection
  • Wider muxes
  • More control bits to manage muxes

45
Wider pipelines
fetch
decode
execute
mem
WB
fetch
decode
execute
mem
WB
More complex hazard detection 2X pipeline
registers to forward from 2X more instructions
to check 2X more destinations (muxes)
46
Making forwarding explicit
  • add r1 ? r2, EX/Mem ALU result
  • Include direct mux controls into the ISA
  • Hazard detection is now a compiler task
  • New micro-architecture leads to new ISA
  • Can reduce some resources
  • No longer need to build a heavily ported reg file

Ref TTAs Missing the ILP complexity wall
Write a Comment
User Comments (0)
About PowerShow.com