Chapter Six - PowerPoint PPT Presentation

About This Presentation

Title:

Chapter Six

Description:

Multiple tasks operating simultaneously using different resources ... Exec: Calculate the memory address. Mem: Read the data from the Data Memory ... – PowerPoint PPT presentation

Number of Views:19

Avg rating:3.0/5.0

Slides: 33

Provided by: toda52

Learn more at: http://www.engr.newpaltz.edu

Category:

more less

Transcript and Presenter's Notes

Title: Chapter Six

1
Chapter Six
Enhancing Performance with Pipelining
2
Sequential Laundry
2 AM
12
6 PM
7
8
11
1
10
9
30
30
30
30
30
30
30
30
30
30
30
30
30
30
30
30
T a s k O r d e r
Time

Sequential laundry takes 8 hours for 4 loads
If they learned pipelining, how long would
laundry take?

3
Pipelined Laundry

Pipelining doesnt help latency of single task,
it helps throughput of entire workload
Multiple tasks operating simultaneously using
different resources
Potential speedup Number pipe stages
Pipeline rate limited by slowest pipeline stage
Unbalanced lengths of pipe stages reduces speedup
Time to fill pipeline and time to drain it
reduces speedup
Stall for dependencies

6 PM
7
8
9
Time
T a s k O r d e r
4
Single Stage VS. Pipeline Performance

Ideal speedup is number of stages in
the pipeline.

5
Pipelining

What makes it easy in MIPS
all instructions are the same length
just a few instruction formats
memory operands appear only in loads and stores
What makes it hard?
structural hazards suppose we had only one
memory
control hazards need to worry about branch
instructions
data hazards an instruction depends on a
previous instruction
Well build a simple pipeline and look at these
issues

6
The Five Stages of Load

Ifetch Instruction Fetch
Fetch the instruction from the Instruction Memory
Reg/Dec Registers Fetch and Instruction Decode
Exec Calculate the memory address
Mem Read the data from the Data Memory
Wr Write the data back to the register file

7
Single Cycle, Multiple Cycle, vs. Pipeline
Cycle 1
Cycle 2
Clk
Single Cycle Implementation
Load
Store
Waste
Cycle 1
Cycle 2
Cycle 3
Cycle 4
Cycle 5
Cycle 6
Cycle 7
Cycle 8
Cycle 9
Cycle 10
Clk
Multiple Cycle Implementation
Load
Store
R-type
Pipeline Implementation
Load
Store
R-type
8
Basic Idea
x
e
c
u
t
e
/
M
E
M

M
e
m
o
r
y

a
c
c
e
s
s
W
B

W
r
i
t
e

b
a
c
k
a
d
d
r
e
s
s

c
a
l
c
u
l
a
t
i
o
n
9
Pipelined Datapath

Walk through lw instruction
Walk through sw instruction
The design

10
Corrected Datapath
11
Graphically Representing Pipelines

Can help with answering questions like
how many cycles does it take to execute this
code?
what is the ALU doing during cycle 4?
use this representation to help understand
datapaths

12
Why Pipeline? Because the resources are there!
Time (clock cycles)
I n s t r. O r d e r
Inst 0
Inst 1
Inst 2
Inst 3
Inst 4
13
Pipeline Control
14
Pipeline Control

Pass control signals along just like the data

15
Datapath with Control
16
Designing a Pipelined Processor

Go back and examine your datapath and control
diagram
associated resources with states
ensure that flows do not conflict, or figure out
how to resolve
assert control in appropriate stage

17
Can pipelining get us into trouble?

Yes Pipeline Hazards
structural hazards attempt to use the same
resource two different ways at the same time
data hazards attempt to use item before it is
ready
instruction depends on result of prior
instruction still in the pipeline
control hazards attempt to make a decision
before condition is evaluated
branch instructions
Can always resolve hazards by waiting
pipeline control must detect the hazard
take action (or delay action) to resolve hazards

18
Data Hazards

Problem with starting next instruction before
first is finished
dependencies that go backward in time are data
hazards

r

2

R
e
g
D
M
D
M
R
e
g
R
e
g
R
e
g
19
Software Solution

Have compiler guarantee no hazards
Where should compiler insert nop
instructions? sub 2, 1, 3 and 12, 2,
5 or 13, 6, 2 add 14, 2, 2 sw 15,
100(2)
Problem
It happens too often to rely on compiler
It really slows us down!

20
Data Hazard Solution Forwarding

Use temporary results, dont wait for them to be
written
Also, write register file during 1st half of
clock and read during 2nd half

r
e
g
i
s
t
e
r

2

X
X
X

2
0
X
X
X
X
X
V
a
l
u
e

o
f

E
X
/
M
E
M

X
X
X
X

2
0
X
X
X
X
V
a
l
u
e

o
f

M
E
M
/
W
B

D
M
R
e
g
R
e
g
D
M
R
e
g
R
e
g
21
Hazard Conditions

Steer the result from precious instruction to the
ALU
EX hazard
if (EX/MEM.RegWrite
and (EX/MEM.RegisterRd 0)
and (EX /MEM.RegisterRd ID/EX.RegisterRs))
ForwardA 10
if (EX/MEM.RegWrite
and (EX/MEM.RegisterRd 0)
and (EX /MEM.RegisterRd ID/EX.RegisterRt))
ForwardB 10
MEM hazard
if (MEM/WB.RegWrite
and (MEM/WB.RegisterRd 0)
and (MEM/WB.RegisterRd ID/EX.RegisterRs))
ForwardA 01
if (MEM/WB.RegWrite
and (MEM/WB.RegisterRd 0)
and (MEM/WB.RegisterRd ID/EX.RegisterRt))
ForwardB 01

22
Forwarding
I
F
/
I
D
n
o
i
t
c
u
r
t
s
n
I
R
s
I
F
/
I
D
.
R
e
g
i
s
t
e
r
R
s
R
t
I
F
/
I
D
.
R
e
g
i
s
t
e
r
R
t
R
t
I
F
/
I
D
.
R
e
g
i
s
t
e
r
R
t
M
E
X
/
M
E
M
.
R
e
g
i
s
t
e
r
R
d
u
R
d
I
F
/
I
D
.
R
e
g
i
s
t
e
r
R
d
x
M
E
M
/
W
B
.
R
e
g
i
s
t
e
r
R
d
i
t
00 Register file 01 Mem. or earlier
ALU 10 Prior ALU
23
Can't always forward

lw can still cause a hazard
an instruction tries to read a register following
a load instruction that writes to the same
register.
Thus, we need a hazard detection unit to stall
the load instruction

24
Stalling

We can stall the pipeline by keeping an
instruction in the same stage
Repeat in clock cycle 4 what they did in clock
cycle 3

25
Hazard Detection Unit

Stall by letting an instruction that wont write
anything go forward
controls writing of the PC and IF/ID plus MUX

26
Branch Hazards

When we decide to branch, other instructions are
in the pipeline!
We are predicting branch not taken
need to add hardware for flushing instructions if
we are wrong

R
e
g
27
Flushing Instructions

Reduce branch delay

28
Improving Performance

Superpipelining ideal maximum speedup is
related to number of stages
Superscalar start more than one instruction in
the same cycle
Dynamic pipeline scheduling
Try and avoid stalls! E.g., reorder these
instructions
lw t0, 0(t1)
lw t2, 4(t1)
sw t2, 0(t1)
sw t0, 4(t1)

29
Dynamic Scheduling

The hardware performs the scheduling
hardware tries to find instructions to execute
out of order execution is possible
speculative execution and dynamic branch
prediction
All modern processors are very complicated
DEC Alpha 21264 9 stage pipeline, 6 instruction
issue
PowerPC and Pentium branch history table
Compiler technology important

30
Dynamic Scheduling in PowerPC 604 and Pentium Pro

Both In-order Issue, Out-of-order execution,
In-order Commit
Pentium Pro central reservation station for any
functional units with one bus shared by a branch
and an integer unit

31
Dynamic Scheduling in Pentium Pro

PPro doesnt pipeline 80x86 instructions
PPro decode unit translates the Intel
instructions into 72-bit micro-operations (
MIPS)
Sends micro-operations to reorder buffer
reservation stations
Takes 1 clock cycle to determine length of 80x86
instructions 2 more to create the
micro-operations
Most instructions translate to 1 to 4
micro-operations
Complex 80x86 instructions are executed by a
conventional microprogram (8K x 72 bits) that
issues long sequences of micro-operations