PIPELINING - PowerPoint PPT Presentation

1 / 47

About This Presentation

Title:

PIPELINING

Description:

Each stage is connected with each other to form a pipe. ... For any system to be free from hazards, pipelining of functional units and ... – PowerPoint PPT presentation

Number of Views:106

Avg rating:3.0/5.0

Slides: 48

Provided by: Crea199

Category:

more less

Transcript and Presenter's Notes

Title: PIPELINING

1
PIPELINING

-Deepak Haran
(2000B5A3710)

2
WHAT IS PIPELINING??

Pipelining is an implementation technique where
multiple instructions are overlapped in execution
to make fast CPUs.
It is an implementation which exploits
parallelism among the instructions in a
sequential instruction stream.

3
THE METHODOLOGY

In a pipeline each step is called a pipe
stage/pipe segment which completes a part of an
instruction.
Each stage is connected with each other to form a
pipe.
Instructions enter at one end ,progress through
each stage and exit at the other end.

4
THE NEED FOR PIPELINING

TO MAKE FAST CPUS.
This is accomplished by increasing the CPU
throughput (the number of instructions completed
per unit time)
It yields a reduction in the average execution
time per execution. For a machine with multiple
clock cycles per instruction, pipelining is
viewed as the reduction in the number of CPI.

5
Contd

time per instruction on a pipelined machine
time per inst. on unpipelined machine
_______________________________
Number of pipe stages

6
IMPLEMENTATION OF THE DLX INSTRUCTION SET

The DLX architecture has been chosen because its
simplicity makes it easy to demonstrate the
principles of pipelining.
Each DLX instruction can be implemented in at
most 5 clock cycles. implementation requires the
use of several temporary registers which simplify
pipelining.

7
IMPLEMENTATION OF THE DLX INSTRUCTION SET

The five clock cycles are as follows
Instruction Fetch cycle (IF) the instruction
stored in the memory corresponding to the PC is
stored in the IR and (PC4) is stored in NPC.
Instruction Decode/Register Fetch Cycle
Decoding is done parallel with reading
registers because the fields are at a fixed
location in the format (Fixed Field Decoding).

8
IMPLEMENTATION OF THE DLX INSTRUCTION SET

Execution/Effective Address cycle (EX)
The ALU operates on the operands prepared in
the prior cycle performing functions depending
upon the DLX instruction type.
Memory access/branch completion cycle (MEM)
the only instructions that are active are the
loads, stores and branches.
memory reference if the instruction is a
load, then data from the memory is placed in the
LMD register. If the instruction is a store

9
IMPLEMENTATION OF THE DLX INSTRUCTION SET

then data from the B register is written into
the memory corresponding to the value stored in
register ALUOutput.
Branch if the instruction branches, the PC
is replaced with the branch destination address
in ALUOutput, otherwise, it is replaced with
incremented PC in register NPC.

10
IMPLEMENTATION OF THE DLX INSTRUCTION SET

Write Back cycle (WB)
the result is written into the register file,
whether it comes from the memory system or from
the ALU.

11
IMPLEMENTATION OF THE DLX INSTRUCTION SET

Single Cycle vs Multiple Cycle Implementation
Multiple cycle implementation each
instruction takes multiple clock cycles to
execute. In the DLX set, each instruction takes
five clock cycles to implement.
Single Cycle implementation each instruction
takes one long clock cycle

12
IMPLEMENTATION OF THE DLX INSTRUCTION SET

However the single cycle implementation is not
followed for the two reasons
1. inefficient for those machines which have a
reasonable variation among the amount of work and
in the clock cycle time needed for different
instructions.
2.it requires the duplication of functional
units that could be shared in a multicycle
implementation.

13
THE BASIC PIPELINE FOR DLX

Since each instruction takes 5 clock cycles to
complete, during each clock cycle the hardware
initiates a new instruction and will be executing
some part of the five different instructions.
Two different operations with the same data path
resource and during the same clock cycle are not
simultaneously performed.

14
THE BASIC PIPELINE FOR DLX

Further more, pipelining the datapath requires
that values are passed from one pipe stage to the
next are placed in registers called pipeline
registers.
These registers convey values and control
information from one stage to another.

15
THE BASIC PIPELINE FOR DLX

In the DLX pipeline, the major functional units
such as ALU etc. are used in different cycles and
hence overlapping the execution of multiple
instructions introduces relatively few conflicts.
This is possible due to the following reasons.

16
THE BASIC PIPELINE FOR DLX

The usage of different instruction and data
memories eliminates a conflict for a single
memory that would arise between the instruction
fetch and data memory access of different
instructions.
The register file is used in two stages for
reading during the ID phase and for writing in
the WB stage during a particular clock cycle.

17
THE BASIC PIPELINE FOR DLX

To start a new instruction every clock the PC
needs to be incremented every clock and stored.
This is done in the IF stage where the
incremented PC or the value of the branch target
of an earlier branch is written in PC.

18
PIPELINE HAZARDS

WHAT ARE PIPELINE HAZARDS ???
Hazards are those situations ,that prevent the
next instruction in the instruction stream from
executing during its designated clock cycle. They
reduce the performance from the ideal speedup
gained by pipelining.

19
CLASSIFICATION OF HAZARDS

Structural Hazards arise from resource
conflicts when the hardware cant support all
possible combinations in simultaneous overlapped
execution.
Data hazards arise when an instruction depends
upon the results of a previous instruction in a
way that is exposed by the overlapping of
instructions in the pipeline.

20
CLASSIFICATION OF HAZARDS

Control Hazards arise from the pipelining of
branches and other instructions that change the PC

21
STRUCTURAL HAZARDS

For any system to be free from hazards,
pipelining of functional units and duplication of
resources is necessary to allow all possible
combinations of instructions in the pipeline.
Structural hazards arise due to the following
reasons

22
STRUCTURAL HAZARDS

When a functional unit is not fully pipelined ,
then the sequence of instructions using that unit
cannot proceed at the rate of one per clock
cycle.
When the resource is not duplicated enough to
allow all possible combinations of instructions.
ex a machine may have one register file
write port, but it may want to perform 2 writes
during the same clock cycle.

23
STRUCTURAL HAZARDS

A machine with a shared single memory for data
and instructions . An instruction containing data
memory reference will conflict with the
instruction reference for a later instruction.
This resolved by stalling the pipeline for one
clock cycle when the data memory access occurs.

24
DATA HAZARDS

Data hazards occur when the pipeline changes
the order of read/write accesses to operands so
that the order differs from the order they see by
sequentially executing instructions on an
unpipelined machine.

25
CLASSIFICATION OF DATA HAZARDS

RAW (read after write ) consider two
instructions i and j with i occurring before j.
j tries to read a source before i actually
writes into it , as a result j gets the old
value.
Ex
ADD R1,R2,R3
SUB R4,R1,R5
AND R6,R1,R7
OR R8,R1,R9
XOR R10,R1,R11

26
CLASSIFICATION OF DATA HAZARDS

This hazard is overcome by a simple hardware
technique called forwarding.
in forwarding ,the ALU result from the EX/MEM
register is always fed back into ALU input
latches.
if the forwarding hardware detects that the
previous ALU operations has written the register
corresponding to a source for the current ALU
operation, then the control logic selects the
forwarded result as the ALU input rather than the
value read from the register file.

27
CLASSIFICATION OF DATA HAZARDS

WAW (write after write)
j tries to write an operand before it is
written by i. Thus the writes are performed in
the wrong order leaving the value of i as the
final value.
This hazard is present in pipelines that write
in more than one pipe stage. However in DLX this
isnt a hazard as it writes only in the WB stage.

28
CLASSIFICATION OF DATA HAZARDS

EX
LW R1,0(R2)
ADD R1,R2,R3

29
CLASSIFICATION OF DATA HAZARDS

WRITE AFTER READ (WAR)
j tries to write a destination before it is
read by i.
This doesnt happen in DLX as all reads occur
early (ID phase) and all writes occur late (in WB
stage).
EX
SW 0(R1),R2
ADD R2,R3,R4

30
CLASSIFICATION OF DATA HAZARDS

HAZARDS REQUIRING STALLS
Consider the situation where a load and a sub
instruction are consecutive, where the
destination register of load is the source
register for sub.
This hazard cannot be removed by forwarding.
Hence a pipeline interlock is introduced to
detect the hazard and stalls the pipeline until
the hazard is cleared. The hazard is checked
during the ID phase and stalls the instruction
that wants to use the data until the source
instruction produces it.

31
CONTROL HAZARDS

Control hazards cause a greater performance
loss compared to the losses posed by data
hazards.
The simplest method of dealing with branches
is that the pipeline is stalled as soon the
branch is detected in the ID phase and until the
MEM stage where the new PC is finally determined.

32
CONTROL HAZARDS

Each branch causes a 3 cycle stall in the DLX
pipeline which is a significant loss as the 30
of the instructions used are branch instructions.
The number of clock cycles in the branch is
reduced by testing the condition for branching in
the ID stage and computing the destination
address in the ID stage using a separate adder.
Thus there is only clock cycle on branches

.
33
WHAT MAKES PIPELINING HARD TO IMPLEMENT???

EXCEPTIONAL SITUATIONS are those situations in
which the normal order of execution is changed.
This is due to instructions that raise exceptions
that may force the machine to abort the
instructions in the pipeline before they complete.

34
WHAT MAKES PIPELINING HARD TO IMPLEMENT???

Some of the exceptions include
Integer arithmetic overflow/underflow.
Power failure
Hardware malfunctions.
I/O device request.

35
WHAT MAKES PIPELINING HARD TO IMPLEMENT???

The five categories that are used to define what
action is needed for the different execution
types are
synchronous/asynchronous
User requested/coerced
User maskable /non maskable
Within versus between instructions
Resume versus terminate

36
WHAT MAKES PIPELINING HARD TO IMPLEMENT???

EXCEPTIONS IN DLX
IF- page-fault on instruction fetch, misaligned
memory access
ID- undefined/illegal opcode.
EX-arithmetic exceptions.
MEM- page-fault on data fetch, misaligned memory
access.
WB-none

37
DLX FP PIPELINE

THE FLOATING POINT PIPELINE HAVE THE SAME
PIPELINE AS THE INTEGER INSTRUCTIONS EXCEPT THE
FOLLOWING TWO IMPORTANT CHANGES.
The EX cycle can be repeated times to complete
operation.

38
DLX FP PIPELINE

There are multiple floating point functional
units
1. the main integer unit that handles loads
and stores, integer ALU operations and branches.
2.FP and integer multiplier.
3.FP adder
4.FP and integer divider.

39
DLX FP PIPELINE

All the execution stages of these functional
units are not pipelined.
FLOATING PIPELINE HAVE A LONGER LATENCY FOR
OPERATIONS.
Latency is defined as the number of cycles that
elapse between an instruction producing the
result and an instruction using the result

40
DLX FP PIPELINE

Latency is also the number of stages from the EX
stage to the stage that produces the result.
Using the above definition ,various functional
units have different latencies as shown below.
1.Integer ALU-0
2.Data Memory-1

41
DLX FP PIPELINE

3.FP add-3
4.FP multiply-6
5.FP divide-24
The pipeline structure has been implemented
with the above latencies with the introduction of
additional pipeline registers between the
additional pipe-stages.

42
DLX FP PIPELINE

FEATURES
FP multiplier is pipelined with 7 stages.
FP adder is pipelined with 4 stages.
FP divider is not pipelined and requires 24
clock cycles to complete an operation.
Both structural and RAW and WAW data hazards are
possible.

43
INTERDEPENDENCE OF INSTRUCTION SET DESIGN AND
PIPELINING

Variable instruction length and execution times
lead to imbalance among pipeline stages, thus
complicating hazard detection.
Sophisticated addressing modes such as
post-increment that update registers complicate
hazard detection.
Architectures such as 80x86 allow writes into
instruction space complicate pipelining.

44
MIPS R4000 PIPELINE

FEATURES
MIPS-3 INSTRUCTION SET-64 BIT
DEEPER PIPELINE THAN DLX-8 STAGE
HIGHER CLOCK RATE ACHIEVED.
BOTH LOAD AND BRANCH DELAYS ARE INCREASED
BASIC BRANCH DELAY 3 CYCLES

45
MIPS R4000 PIPELINE

MIPS R4000 pipeline consists of 3 functional
units a floating point divider, a floating
point multiplier and a floating point adder.
The primary reasons for stalls in MIPS R4000
PIPELINE have been attributed to the following

46
MIPS R4000 PIPELINE

Load stalls Delays arising from the use of a
load result one or two cycles after the load.
Branch stall Two cycle stall taken on every
branch taken.
FP result stall due to RAW hazards for an FP
operand.
FP structural stall arising from conflicts for
functional units.