Lecture 13 Instruction Execution Pipeline

About This Presentation

Title:

Lecture 13 Instruction Execution Pipeline

Description:

Title: Lecture 12 Instruction Execution Pipeline Author: Last modified by: jwcho Created Date: 1/23/2001 8:23:30 AM Document presentation format – PowerPoint PPT presentation

Number of Views:144

Avg rating:3.0/5.0

Slides: 32

Provided by: 6649912

Category:

more less

Transcript and Presenter's Notes

Title: Lecture 13 Instruction Execution Pipeline

1
Lecture 13Instruction Execution Pipeline
2
Lecture 13 Instruction Execution Pipeline

In this lecture, we will study
Principle of pipeline
Characteristics of pipeline
Number of pipeline stages and the performance
Delays of pipeline stages and the performance
Instruction execution steps in RISC-S
5-stage instruction execution pipeline for RISC-S
Ideal pipeline
Hazards
Improving RISC-S pipeline for hazards

3
Car Wash Station

Car wash stations
1 S(Spray water)
2 W(Wash with detergent and brush)
3 R(Rinse)
4 B(Blow dry)
Each stage takes 1 minute(identical delay)

1st car S W R B

2nd car
S W R B
3rd car

S W R B . . .

. . .

To improve the profit
Improve the speed of the wash stations -
expensive solution
Improve the throughput - Parallel wash stations
- expensive solution
Improve the effective wash time - Pipeline - a
less expensive solution

4
Pipeline Principle
Ordinary car wash station
1 car/4 min
Parallel car wash station
1st car S W R B
2nd car S W R B
3rd car S W R B
4th car S W R B
5th car
S W R B . . .

. . .
4 cars/4 mins
Pipeline car wash station
1st car S W R B
2nd car S W R
B
3rd car S W
R B
1 car/1 min
4th car
S W R B
5th car
S W R B
. . .
. . .
5
Pipeline Terminology

Pipeline Stage
Pipeline consists of a finite number of Pipeline
Stages
Pipeline Cycle
Delay of a pipeline stage is called Pipeline
Cycle
Delays of the pipeline stages are not necessarily
identical in practice
Control is complicated
Pipeline cycle can be made equal to the longest
pipeline stage delay by sacrificing
performance(pipeline cycle time)
Pipeline Latency
Time from beginning of a task to the completion
of the task
Ideal Pipeline
Delays of the Pipeline Stages are identical -
Pipeline Cycle
All the pipeline stages are occupied with tasks
to be executed
Simple to control and provides the best
performance
1 instruction/cycle

6
Pipeline Characteristics
I0 I1 I2 I3 I4 I5 I6 I7
. . . In-1 In I0
I1 I2 I3 I4 I5 I6 . . .
In-2 In-1 In I0 I1
I2 I3 I4 I5 . . .
In-3 In-2 In-1 In I0
I1 I2 I3 I4 . . .
In-4 In-3 In-2 In-1 In

Assuming that there are plenty of
tasks(instructions) to be executed
All of the pipeline stages are busy most of time
Pipeline Filling
At the initial phase of the execution, pipeline
stages are not fully occupied with tasks
For an n-stage pipeline, first (n-1) pipeline
cycles are filling time
Pipeline Draining
At the final phase of the execution, pipeline
stages are not fully occupied with tasks
For an n-stage pipeline, last (n-1) pipeline
stages are draining time

7
Number of Pipeline Stage

Comparison of car wash stations with
4-stage(S,W,R,B) and 2-stage(SW,RB) pipeline,
identical pipeline latency(4 minutes)
4-stage pipeline with 1 minute pipeline cycle

2-stage pipeline with 2 minute pipeline cycle

The more pipeline stages, the better performance
8
Delay of Pipeline Stages

Comparison of 4-stage car wash stations with
different pipeline stage delays
Identically 1 minute delay

Identical pipeline stage delay shows better
performance

S(0.5 min) - W(1.5 min) - R(0.5 min) - B(1.5
min) pipeline

9
Instruction Execution Steps
10
Instruction Execution PipelineRISC-S

A 5-stage pipeline
IF-DR-A-M-SR pipeline
For the instruction execution pipeline,
information have to be passed to the succeeding
pipeline stage
Need Inter-stage buffers made of latches
I/D buffer, D/A buffer, A/M buffer, M/S buffer

11
IF Stage
Instruction Fetch and update PC stage
12
DR Stage

Instruction decoding and register read stage
OP lt- OP-code
A(Rs1) lt- RIR14..18
t lt- IR13
B(Rs2) lt- RIR0..4
D(S2) lt- (IR12)19IR0..12
C(Cond) lt- IR19..22
cc(SCC) lt- IR24
(NPC lt- NPC)

13
A Stage

ALU operations using operands, and effective
address computation,
and condition test for conditional branches
Memory Ref Instr(t1) AO lt- NPCD(imm32)
LD Instruction C lt- C
Functional instr(t0) AO lt- A op B
C lt- C
Control instr AO lt- NPCD(imm32)
T lt- (flag(C) op 0)
(OP lt- OP)
(NPC lt- NPC)

14
M Stage

Memory access for read and write, and decide
final PC value for branch instructions
LD DATA lt- MAO
ST MAO lt- B
Functional instruction AO lt- AO
Branch instruction if T0 PC lt- AO
if T1 PC lt-
NPC
(OP lt- OP)
(C lt- C)

15
SR Stage

Store the result of operation in a register for a
functional instruction,
and store the data read from memory to a register
for load instruction
Functional instruction RC lt- AO
LD RC lt- DATA

16
Time Out

??? ????? ?? ??? ?? ????? ?? ???.
??? ??? ?? ??? ??? ???. ?? ??? ???? ?? ??? ?? ??
??? ?? ?? ?? ???? ????. ?? ?? ??? ?? ? ??? ??? ??
????? ???.
??? ??? ??? ????? ??? ??? ?? ?????? ??? ?? ?? ??
? ?? ???.
2 ?? ? ?? ??? ??. ??, ?? ? ???! ???, ? ??? ?? ??
? ?? ??? ? ????

17
Ideal Pipeline

Ideal Pipeline
Delays of the pipeline stages are identical -
Pipeline Cycle
All the pipeline stages are occupied with tasks,
except the filling time and draining time
Complete one task for every pipeline cycle after
the filling time
Reasons for preventing pipelines from operating
as an ideal pipeline even though delays of the
pipeline stages are identical
Hazards
Structural Hazard
Data Hazard
Control Hazard

18
Structural Hazard

Cases when Structural Hazards take place
More than one instruction require the same
pipeline stage at the same clock cycle
This never happens when the delay of the pipeline
stages are identical

More than one pipeline stages try to use the same
hardware resource at the same clock cycle
IF and A stages Operation with Adder
DR and SR stages Access register file
IF and M stages Access memory

19
Example Structural Hazard

Structural Hazard due to Adder - IF and A stage
in the same cycle

Structural Hazard due to Register

Structural Hazard due to Memory

20
Hardware Solution - For Structural Hazards -

Adder Hazard in IF and A stages
Include a simple 4 adder in the IF stage to
avoid using ALU in A stage in calculating PC4
Register Hazard
Register can be made to write access in the first
half of the clock cycle, and read access in the
second half of the clock cycle

Memory Hazard
Dedicated memory, i.e., separate Instruction
Memory and Data Memory
2-port memory

21
Data Hazard

Data Hazard is possible when more than one
instruction in a
sequence share the same data
SLL R5, R1 IF DR A
M SR
ADD R1, R2, R3 IF DR
A M SR
AND R1, R4, R4 IF
DR A M SR
SUB R5, R1, R6
IF DR A M SR
XOR R1, R7, R8
IF DR A M SR

Read After Write(RAW) Hazard
Supposed to read the written data, but reading
it takes place first
Write After Read(WAR) Hazard
Supposed to read first then write it, but
writing it takes place first
Write After Write(WAW) Hazard
Written data at the same location in a wrong
order

22
Data Hazards

RAW Hazard
Ii precedes Ij, and Ij tries to read a register
or data memory location before Ii stores data
into there.
ADD R2, R3, R1
AND R1, R4, R4

WAR Hazard
Ii precedes Ij, and Ii reads data and Ij writes
data at the same location and writing take place
earlier than reading
This never happens if all the instructions go
through the same pipeline stages with same delay
because instructions go through SR stage(for
writing) later than DR stage(for reading)
WAW Hazard
Ii precedes Ij, and both Ii and Ij writes data at
the same location, but in a wrong order
This never happens also if the assumption in WAR
is true

23
Forwarding Circuit For RAW Data Hazard

Circuit that forwards the data to be stored in SR
stage to ALU input
MUX in A stage
Data to be stored in a register in SR stage
DATA, AO in M/S Buffer
AO in A/M Buffer
These values in inter-stage buffers are
forwarded to the ALU input MUX

24
Instruction Scheduling with Forwarding Circuit

Resolving Data Hazard with registers by
forwarding No delay
SLL R5, R1 IF DR A
M SR
ADD R1, R2, R3 IF DR
A M SR
AND R1, R4, R4 IF
DR A M SR
SUB R5, R1, R6
IF DR A M SR
XOR R1, R7, R8
IF DR A M SR

25
Load Delay Due To RAWImprovement by Forwarding
Circuit

Load Delay 2 cycles
LD R1, X IF DR A
M SR
stall
stall
ADD R1, R2, R3
IF DR A M SR
AND R1, R4, R4
IF DR A M SR
SUB R5, R1, R6
IF DR A M
SR
XOR R1, R7, R8
IF DR
A M SR

Load delay with forwarding 1 cycle
LD R1, X IF DR A M SR
stall
ADD R1, R2, R3 IF
DR A M SR
AND R1, R4, R4
IF DR A M SR
SUB R5, R1, R6
IF DR A M SR
XOR R1, R7, R8
IF DR A M SR

26
Load Delay Due To RAWImprovement by Software
Scheduling

LD R1, X IF DR A
M SR
stall
ADD R1, R2, R3 IF
DR A M SR
SUB R1, R5, R4
IF DR A M SR
LD R6, Y
IF DR A M
SR

Software Scheduling LD R1, X
IF DR A M SR LD R6, Y
IF DR A
M SR ADD R1, R2, R3
IF DR A M SR SUB R1,
R5, R4 IF
DR A M SR
27
Control Hazard

Address of the instruction after a branch
instruction is determined in M stage. Therefore,
the next instruction fetch must be delayed until
the branch instruction completes in M stage.
ADD R1, R2, R3 IF DR A M SR
JMP COND, X IF DR A M SR
stall
stall
stall
next instruction IF DR A
M SR

Branch Delay of 3 cycles
Value of PC is decided by the value of T, which
select the from input addresses to the MUX in M
stage - AO(branch address) or NPC(PC)
Value of T is decided by testing the conditions
in A stage
Branch address can be decided earlier if branch
condition can be tested earlier

28
Reduction of Branch Effect

If calculation of Branch Address and Testing
Condition are made earlier, Branch delay can be
reduced.
Move these operations to DR stage
Include an Adder for branch address calculation
in DR stage
Move Circuit to test the branch condition in M
stage to DR stage

29
Branch DelayImprovement by Software Rescheduling

ADD R1, R2, R3 IF DR A M SR
JMP COND, X IF DR A M SR
stall
next instruction IF DR A
M SR

Branch Delay 1 cycle Rescheduling JMP
COND, X IF DR A M SR ADD R1, R2,
R3 IF DR A M SR next
instruction IF DR A M SR
This is possible only if COND is set by the
instruction before the JMP instruction.
Conditional branch on the COND set by the
ADD(following JMP) is not possible. No branch
delay
30
Branch DelayImprovement by Hardware Branch
Predictor
Predict TAKEN, and actually TAKEN ADD R1,
R2, R3 IF DR A M SR JMP
COND, X IF DR A M
SR LD R1, Y SUB R3, R4, R5
X ADD R1, R6, R5
IF DR A M SR
Predict TAKEN, and actually NOT TAKEN IF DR
A M SR IF DR A M
SR IF DR A M SR IF
DR A M SR IF
1 Cycle Delay
1 Cycle Delay
Predict NOT TAKEN, and actually NOT TAKEN IF
DR A M SR IF DR A M
SR IF DR A M
SR IF DR A M SR
Predict NOT TAKEN, and actually TAKEN ADD
R1, R2, R3 IF DR A M SR
JMP COND, X IF DR A
M SR LD R1, Y
IF SUB R3, R4, R5 X ADD
R1, R6, R5 IF
DR A M SR
1 Cycle Delay
No Delay
31
Branch Prediction Penalty

Write a Comment

User Comments (0)

About PowerShow.com

Lecture 13 Instruction Execution Pipeline - PowerPoint PPT Presentation

Lecture 13 Instruction Execution Pipeline

Title: Lecture 12 Instruction Execution Pipeline Author: Last modified by: jwcho Created Date: 1/23/2001 8:23:30 AM Document presentation format – PowerPoint PPT presentation