Chapter Six - PowerPoint PPT Presentation

About This Presentation
Title:

Chapter Six

Description:

Multiple tasks operating simultaneously using different resources ... Exec: Calculate the memory address. Mem: Read the data from the Data Memory ... – PowerPoint PPT presentation

Number of Views:19
Avg rating:3.0/5.0
Slides: 33
Provided by: toda52
Category:
Tags: chapter | register | six

less

Transcript and Presenter's Notes

Title: Chapter Six


1
Chapter Six
Enhancing Performance with Pipelining
2
Sequential Laundry
2 AM
12
6 PM
7
8
11
1
10
9
30
30
30
30
30
30
30
30
30
30
30
30
30
30
30
30
T a s k O r d e r
Time
  • Sequential laundry takes 8 hours for 4 loads
  • If they learned pipelining, how long would
    laundry take?

3
Pipelined Laundry
  • Pipelining doesnt help latency of single task,
    it helps throughput of entire workload
  • Multiple tasks operating simultaneously using
    different resources
  • Potential speedup Number pipe stages
  • Pipeline rate limited by slowest pipeline stage
  • Unbalanced lengths of pipe stages reduces speedup
  • Time to fill pipeline and time to drain it
    reduces speedup
  • Stall for dependencies

6 PM
7
8
9
Time
T a s k O r d e r
4
Single Stage VS. Pipeline Performance
  • Ideal speedup is number of stages in
    the pipeline.

5
Pipelining
  • What makes it easy in MIPS
  • all instructions are the same length
  • just a few instruction formats
  • memory operands appear only in loads and stores
  • What makes it hard?
  • structural hazards suppose we had only one
    memory
  • control hazards need to worry about branch
    instructions
  • data hazards an instruction depends on a
    previous instruction
  • Well build a simple pipeline and look at these
    issues

6
The Five Stages of Load
  • Ifetch Instruction Fetch
  • Fetch the instruction from the Instruction Memory
  • Reg/Dec Registers Fetch and Instruction Decode
  • Exec Calculate the memory address
  • Mem Read the data from the Data Memory
  • Wr Write the data back to the register file

7
Single Cycle, Multiple Cycle, vs. Pipeline
Cycle 1
Cycle 2
Clk
Single Cycle Implementation
Load
Store
Waste
Cycle 1
Cycle 2
Cycle 3
Cycle 4
Cycle 5
Cycle 6
Cycle 7
Cycle 8
Cycle 9
Cycle 10
Clk
Multiple Cycle Implementation
Load
Store
R-type
Pipeline Implementation
Load
Store
R-type
8
Basic Idea
x
e
c
u
t
e
/
M
E
M


M
e
m
o
r
y

a
c
c
e
s
s
W
B


W
r
i
t
e

b
a
c
k
a
d
d
r
e
s
s

c
a
l
c
u
l
a
t
i
o
n
9
Pipelined Datapath
  • Walk through lw instruction
  • Walk through sw instruction
  • The design

10
Corrected Datapath
11
Graphically Representing Pipelines
  • Can help with answering questions like
  • how many cycles does it take to execute this
    code?
  • what is the ALU doing during cycle 4?
  • use this representation to help understand
    datapaths

12
Why Pipeline? Because the resources are there!
Time (clock cycles)
I n s t r. O r d e r
Inst 0
Inst 1
Inst 2
Inst 3
Inst 4
13
Pipeline Control
14
Pipeline Control
  • Pass control signals along just like the data

15
Datapath with Control
16
Designing a Pipelined Processor
  • Go back and examine your datapath and control
    diagram
  • associated resources with states
  • ensure that flows do not conflict, or figure out
    how to resolve
  • assert control in appropriate stage

17
Can pipelining get us into trouble?
  • Yes Pipeline Hazards
  • structural hazards attempt to use the same
    resource two different ways at the same time
  • data hazards attempt to use item before it is
    ready
  • instruction depends on result of prior
    instruction still in the pipeline
  • control hazards attempt to make a decision
    before condition is evaluated
  • branch instructions
  • Can always resolve hazards by waiting
  • pipeline control must detect the hazard
  • take action (or delay action) to resolve hazards

18
Data Hazards
  • Problem with starting next instruction before
    first is finished
  • dependencies that go backward in time are data
    hazards

r


2

R
e
g
D
M
D
M
R
e
g
R
e
g
R
e
g
19
Software Solution
  • Have compiler guarantee no hazards
  • Where should compiler insert nop
    instructions? sub 2, 1, 3 and 12, 2,
    5 or 13, 6, 2 add 14, 2, 2 sw 15,
    100(2)
  • Problem
  • It happens too often to rely on compiler
  • It really slows us down!

20
Data Hazard Solution Forwarding
  • Use temporary results, dont wait for them to be
    written
  • Also, write register file during 1st half of
    clock and read during 2nd half

r
e
g
i
s
t
e
r


2


X
X
X

2
0
X
X
X
X
X
V
a
l
u
e

o
f

E
X
/
M
E
M


X
X
X
X

2
0
X
X
X
X
V
a
l
u
e

o
f

M
E
M
/
W
B


D
M
R
e
g
R
e
g
D
M
R
e
g
R
e
g
21
Hazard Conditions
  • Steer the result from precious instruction to the
    ALU
  • EX hazard
  • if (EX/MEM.RegWrite
  • and (EX/MEM.RegisterRd 0)
  • and (EX /MEM.RegisterRd ID/EX.RegisterRs))
    ForwardA 10
  • if (EX/MEM.RegWrite
  • and (EX/MEM.RegisterRd 0)
  • and (EX /MEM.RegisterRd ID/EX.RegisterRt))
    ForwardB 10
  • MEM hazard
  • if (MEM/WB.RegWrite
  • and (MEM/WB.RegisterRd 0)
  • and (MEM/WB.RegisterRd ID/EX.RegisterRs))
    ForwardA 01
  • if (MEM/WB.RegWrite
  • and (MEM/WB.RegisterRd 0)
  • and (MEM/WB.RegisterRd ID/EX.RegisterRt))
    ForwardB 01

22
Forwarding
I
F
/
I
D
n
o
i
t
c
u
r
t
s
n
I
R
s
I
F
/
I
D
.
R
e
g
i
s
t
e
r
R
s
R
t
I
F
/
I
D
.
R
e
g
i
s
t
e
r
R
t
R
t
I
F
/
I
D
.
R
e
g
i
s
t
e
r
R
t
M
E
X
/
M
E
M
.
R
e
g
i
s
t
e
r
R
d
u
R
d
I
F
/
I
D
.
R
e
g
i
s
t
e
r
R
d
x
M
E
M
/
W
B
.
R
e
g
i
s
t
e
r
R
d
i
t
00 Register file 01 Mem. or earlier
ALU 10 Prior ALU
23
Can't always forward
  • lw can still cause a hazard
  • an instruction tries to read a register following
    a load instruction that writes to the same
    register.
  • Thus, we need a hazard detection unit to stall
    the load instruction

24
Stalling
  • We can stall the pipeline by keeping an
    instruction in the same stage
  • Repeat in clock cycle 4 what they did in clock
    cycle 3

25
Hazard Detection Unit
  • Stall by letting an instruction that wont write
    anything go forward
  • controls writing of the PC and IF/ID plus MUX

26
Branch Hazards
  • When we decide to branch, other instructions are
    in the pipeline!
  • We are predicting branch not taken
  • need to add hardware for flushing instructions if
    we are wrong

R
e
g
27
Flushing Instructions
  • Reduce branch delay

28
Improving Performance
  • Superpipelining ideal maximum speedup is
    related to number of stages
  • Superscalar start more than one instruction in
    the same cycle
  • Dynamic pipeline scheduling
  • Try and avoid stalls! E.g., reorder these
    instructions
  • lw t0, 0(t1)
  • lw t2, 4(t1)
  • sw t2, 0(t1)
  • sw t0, 4(t1)

29
Dynamic Scheduling
  • The hardware performs the scheduling
  • hardware tries to find instructions to execute
  • out of order execution is possible
  • speculative execution and dynamic branch
    prediction
  • All modern processors are very complicated
  • DEC Alpha 21264 9 stage pipeline, 6 instruction
    issue
  • PowerPC and Pentium branch history table
  • Compiler technology important

30
Dynamic Scheduling in PowerPC 604 and Pentium Pro
  • Both In-order Issue, Out-of-order execution,
    In-order Commit
  • Pentium Pro central reservation station for any
    functional units with one bus shared by a branch
    and an integer unit

31
Dynamic Scheduling in Pentium Pro
  • PPro doesnt pipeline 80x86 instructions
  • PPro decode unit translates the Intel
    instructions into 72-bit micro-operations (
    MIPS)
  • Sends micro-operations to reorder buffer
    reservation stations
  • Takes 1 clock cycle to determine length of 80x86
    instructions 2 more to create the
    micro-operations
  • Most instructions translate to 1 to 4
    micro-operations
  • Complex 80x86 instructions are executed by a
    conventional microprogram (8K x 72 bits) that
    issues long sequences of micro-operations

32
FYI MIPS R3000 clocking discipline
phi1
phi2
  • 2-phase non-overlapping clocks

phi1
phi1
phi2
Edge-triggered
Write a Comment
User Comments (0)
About PowerShow.com