Title: Pipelining
1 Pipelining
- By
- S. Priya
- M.Swarna lakshmi
2 What is Pipelining?
-
- Implementation technique whereby multiple
instructions are overlapped in execution.
3Architecture vs implementation
- Architecture design of operation of machine
- Implementation how operations are carried out
- If the machine changes, but the same instruction
set gives the same results, the change has only
been one of implementation. - Pipelining is implementation only.
4 Steps in Pipelining
- Fetch the instruction.
- Read the register.
- Execute the operation.
- Access an operand in data memory.
- Write the result into the register.
5Each instruction has stages
MIPS instruction stages and times
Instr Fetch RegRead ALU MemDat
RegWrite Tot LW 2 1
2 2 1 8 SW
2 1 2
2 7 ADD 2
1 2 1
6 BEQ 2 1
2
5 Times in nsec for each stage
Fet
RgR
ALU
RgW
Accs
6Three LWs, sequential and pipelined execution
7Notes about previous pipeline slide
- Individual instructions are actually slower!
- Non-pipe case 8 nsec per instruction, 24 total
- With pipelining, every stage takes the time of
the longest stage, 10 nsec per instruction, - so any one instruction is actually slowe
- But great improvement in throughput!
- Time to start the 4th instruction reduces from
24 nsec to 6 nsec.
8Ideal speedup by pipelining
I-I Time(pipelined) I-I
time(nonpipelined)
No. Stages
I-I means instruction-to-instruction Assumes
pipeline is full Assumes no hazards
True speedup will be less than ideal, because
Stages are not perfectly balanced (different exec
times)
9Increasing throughput is worthwhile
- A program consists of many instructions
- Speeding up program is therefore more
important than speeding up one instruction.
10What makes pipelining easy
- All instructions the same length
- Just a few instruction formats
- Memory operations appear only in LW and SW
11The major hurdle of pipelining-Pipeline Hazards
- Problems in pipelining that make it impossible to
achieve the ideal speedup. - Situations when the next instruction cannot
execute in the designated clock cycle. - Hazards stall pipelines and can be eliminated by
allowing some instructions to proceed while the
others are delayed.
12Pipeline hazards
- Three types
- structural hazards
- control hazards
- data hazards
13- Structural hazards
- Resources are not available to do all stages
simultaneously - Data hazards Instruction depends on result of
prior instruction still in the pipeline - Control hazards Pipelining of branches other
instructions stall the pipeline until the hazard
in the pipeline
14(No Transcript)
15(No Transcript)
16Structural Hazard - Computers
Fet
RgR
ALU
RgW
Accs
Fet
RgR
ALU
RgW
Accs
Fet
RgR
ALU
RgW
Accs
Fet
RgR
ALU
RgW
Accs
Instruction Fetch and Memory Access both use
Memory at the same time
17Overcome this structural hazard
- Build two memories one for instructions one for
data - MIPS does this
18Data Memory
-
- What do we need to add to actually split the
datapath into stages?
Instruction Memory
19Control Hazards
- Need to deal with branch instructions
- Since you cannot predict with certainty whether a
branch will be done, you cannot know which
instruction to load into the pipeline next. - What to do?
- Conservative solution or progressive solution
20Control Hazard - Conservative Solution
Fet
RgR
ALU
RgW
Accs
ADD
Fet
RgR
ALU
RgW
Accs
BEQ
Fet
RgR
ALU
RgW
Accs
LW
Branch decision made
Insert correct instruction
Stall (bubble)
Result Accept a one-stage stall at every branch
instruction
21Conservative result
- Accept a 1 stage stall at every branch
22Progressive concept -- prediction
- Predict that a branch will not be taken.
- Start the instruction following the branch.
- If prediction good, pipeline runs at full speed
- If bad (branch is taken), undo the results of the
instruction following the branch. - Its entire instruction time is lost.
23Control Hazard - Progressive Success
Fet
RgR
ALU
RgW
Accs
ADD
Fet
RgR
ALU
RgW
Accs
BEQ
LW
Fet
RgR
ALU
RgW
Accs
Pipeline runs at full speed, no bubbles. We
guessed correctly that the branch would not be
taken.
24Control Hazard - Progressive Failure
Fet
RgR
ALU
RgW
Accs
ADD
Fet
RgR
ALU
RgW
Accs
BEQ
bub
bub
bub
bub
bub
OR
Fet
RgR
ALU
RgW
Accs
Instruction following branch was done here, but
must be undone. (total of 5 bubbles in the
pipeline)
25Sophisticated branch prediction
- Assume branches backwards will succeed.(likely
scenario, since these usually relate to loops
which branch back except at the last iteration) - Dynamic prediction, based on past performance
of this code.
26Flushing the pipeline -- control hazard
ADD
F
R
A
W
C
F
R
A
W
C
BEQ
F
R
A
W
C
SUB
F
R
A
W
C
ADDI
SW
F
R
A
W
C
With a full pipeline, if the branch is not taken,
everything runs fast. If the branch is
taken, several instructions must be
undone. Easiest technique is simply to empty
(flush) all instructions and start to reload the
pipeline with the branched-to instruction.
27Pipeline Data Hazard
- Instruction depends on the results of a previous,
not-yet-finished instruction. - ADD R0, R1 // Result to R0
- SUB R3, R0 // Result to R3, but uses R0
- If you consider instruction as a unit, you cant
start the SUB until the ADD is done
28Data Hazard
Fet
RgR
ALU
RgW
Accs
ADD
Fet
RgR
ALU
SUB
Accs
bub
bub
bub
Result 3 bubbles
Result written to R0
R0 used here
29Data hazard Let the compiler separate independent
instructions
LW R0,Mem
F
R
A
W
C
F
R
A
W
C
b
b
b
SUB R1,R0
F
R
A
W
C
CLR R2
F
R
A
W
C
b
b
b
SW R2,Mem
Example shown Instr 1 and 2 are dependent Instr
3 and 4 are dependent Result 14 stages
required, 6 bubbles
30Read After Write (RAW) InstrJ tries to read
operand before InstrI writes it
Write After Read (WAR) InstrJ tries to write
operand before InstrI reads I Gets wrong operand
31Rearrange code to separate dependent instructions
LW R0,Mem
F
R
A
W
C
F
R
A
W
C
CLR R2
F
R
A
W
C
b
b
SUB R1,R0
F
R
A
W
C
SW R2,Mem
Example shown Instr 1 and 3 are dependent Instr
2 and 4 are dependent Result 10 stages
required, 2 bubbles
32Code rearrangement saves little
- There are just too many dependencies
- The work is too sophisticated
33Compiler can insert No Operations (NOPs)
Data Hazard, 3 bubbles
ADD R1,R0
F
R
A
W
C
F
R
A
W
C
b
b
b
SUB R2,R1
With NOPs, 0 bubbles
ADD R1,R0
F
R
A
W
C
But it slows the pipeline
F
R
A
W
C
NOP
F
R
A
W
C
NOP
F
R
A
W
C
NOP
SUB R2,R1
F
R
A
W
C
34Best solution forward the data
- Given I2 depends on data from I1
- As soon as it is calculated in I1, pass it to I2
- (Dont wait to store it in a reg or memory)
35(No Transcript)
36Two data-dependent instructions
No forwarding, 3 bubbles
ADD R1,R0
F
R
A
W
C
F
R
A
W
C
b
b
b
SUB R2,R1
With forwarding, 0 bubbles
ADD R1,R0
F
R
A
W
C
F
R
A
W
C
SUB R2,R1
Forwarding works, but requires more hardware!
37Pipelining summary
- Increases throughput (speed of many)
instructions, not execution time of one. - Does stages of different instructions in
parallel, execution is more complex. - Creates structure, control, data hazards
- To work.MUST get the right answer
- To be valuable, must increase throughput.
38Auto-increment to increase speed
(Many assembly languages, not MIPS)
MIPS code to store data in a array
Loop ADDI t1,4 make t1 point to
next element SW t0, 0(t1) store data
to array element
Non-MIPS code to store data in a array
Loop MOV R0, (R1) store data, incr
pointer
39Datapath additions for auto-increment
Select
Address
ALU
PC
IR
M U X
MEM
Tb
Opcode
R3
R2
R1
Ta
MAR
General Registers
M U X
4
Data
40 Thank
You!!!