Title: PIPELINED PROCESSORS
1PIPELINED PROCESSORS
- Chapter No. 5
- By
- Najma Ismat
2Pipeline Evolution in Processors
- First appeared in at the end of 1960s in the
first supercomputers of that time such as IBM
360/91 (1967) and the CDC 7600 (1970). - In 1970 the use of pipelining at instruction
level in mainframe B7700.
3Principle of Pipelining
- A number of functional units are employed in
sequence to perform a single computation. - Each functional unit represent a certain stage
of computation. - Pipeline allows overlapped execution of
instructions or temporal overlapping of
processing. - It increases the overall processors throughput.
- In pipelined operation each task is divided into
a number of subtasks.
4Principle of Pipelining
- Each stage of pipeline is associated with with
each subtask which performs required operation. - For a basic pipeline same amount of time is
available in each stage for performing a certain
task. - All the pipeline stages operate like assembly
line, that is , receiving input typically from
previous stage and delivering their output to the
next stage. - The basic pipeline operates clocked
(synchronously), that is each stage accepts a new
input at the start of the clock cycle.
5Principle of Pipelining
6Pipelined Operation
7Pipelined Operation
8Pipelined and Unpipelined Processing
9Processor Pipelines in Reality
- A real pipeline may include a few extensions to
basic pipeline. - Pipelined execution is also often performed using
half-cycles. and in certain cases, one or more
pipeline stages may have to be recycled to
accomplish a given task. - These additional cycles may be required to
perform certain arithmetic operations
10Logical Layout of Pentium Pipeline
11Logical Layout of PowerPC 604 Pipeline
12General Structure of Pipelines
- Pipeline consists of a number of stages, one for
each subtask. The stages are decoupled from each
other by registers, called latches. - As each clock cycle ends, the latches gates in
their inputs and forward them into the associated
stage where the required operation is performed. - In reality, each stage is often implemented by a
number of different FUs/Eus in performing the
required operations. - The latches are extended with multiplexers that
selects and transfer data from the outputs of
preceding Eus to input the subsequent execution
units.
13General Structure of Pipelines
14(No Transcript)
15Pipeline Performance Measures
- Non-pipelined processor
- characteristic is instruction cycle time and
execution time - Pipelined processor
- no importance of execution time
- three different measures in pipelined processors
cycle time, latency and repetition rate - Cycle time
- specifies the time available for each stage to
accomplish the required operations
16Pipeline Performance Measures
- determined by worst-case processing time of the
longest stage - latency
- specifies the amount of time that the result of a
particular instruction takes to become available
in the pipeline for a subsequent dependent
instruction - used in context of processing subsequent RAW
dependent instruction - Two kinds of latencies define-use dependency and
load-use dependency (corresponds to two types of
RAW dependencies)
17Pipeline Performance Measures
- define use latency
- mul r1, r2, r3
- add r5, r1, r4
- define-use delay
- the time a subsequent RAW-dependent instruction
has to be stalled in a pipeline - load-use latency
- r1, x
- add r5, r1, r2
- Load-use delay
- interpreted same as define-use delay
18Pipeline Performance Measures
- Repetition rate
- also known as throughput
- specifies the shortest possible time interval
between the subsequent instructions in pipeline
the repetition rate of a basic pipeline is one
cycle - repetition rate is the performance potential of a
pipeline - Performance potential of a pipeline with no
define-use delay or load-use delay exist between
instructions can be calculated as - P 1/Rtc
19Pipeline Performance Measures
- where
- Ris the repetition rate of the pipeline in
cycles - tcis the cycle time of the pipeline
20Application Scenarios of Pipelines
21Design space of pipelines
Key aspects of the design space of pipelines
22Basic Pipeline Layout
23Basic Pipeline Layout
- The number of pipeline stages
- when more pipeline stages are used, more parallel
execution and thus a higher performance can be
expected - disadvantage more number of stages results in
frequent data and control dependencies which
decreases performance - specification of the subtasks to be performed in
each stage - the specification of the subtasks at a number of
levels of increasing details
24Number of Pipeline Stages
25Number of Pipeline Stages
26Basic Pipeline Layout
- Layout of the stage sequence
- concerns how the pipeline stages are used
- use of bypassing
- intended to reduce or eliminate pipeline stalls
due to RAW dependencies - ProblemUnless special arrangements are made, the
results of the operation instruction is written
into the register file, or into the memory, and
then it is fetched from there as a source operand - Solutionthe result of the EU is immediately
forwarded to its input for use in the next
pipeline cycle
27Layout of the Stage Sequence
28Bypassing
29Basic Pipeline Layout
- Its implementation requires an additional data
bus for forwarding the results of the execution
stage to its input and an appropriate extension
of the associated multiplexers and latches - timing of the pipeline operations
- self-timed(asynchronous)
- clocked (synchronous)
30Timing of Pipeline Operations
31Dependency Resolution
Method of dependency resolution
Static resolution performed by the compiler
Dynamic resolution performed by extra hardware
Combined resolution performed partly by the
compiler partly by the hardware
Trend
32Overview of Pipelined Instructions
33Logical Layout
- It specifies the tasks to be accomplished, this
includes - the declaration of pipeline to be implemented
- usually separate pipelines for the processing of
FX and logical data, called FX pipeline, for FP
data, the FP pipeline, for loads and stores, L/S
pipeline, and for branches , the B pipeline - DEC a 21164 provides two types of FX integer
pipelines - detailed specification of subtasks to be
performed and their execution sequence for each
pipeline - detailed description of the subtasks to be
performed in each stage
34Power PC 601 Example
35Detailed Description of FX Pipeline
36Implementation of Instruction Pipeline
37Layout of the Physical Pipelines
38Layout of the Physical Pipelines
- Multifunction
- Only one published design of multifunction
pipeline is available and that is MIPS R4200
which implements all the FX, FP, L/S and B
instructions - Classical approach/ Master pipeline approach is
implemented in IBM 801, MIPS, MIPS-X, MIPS
R-series (up to the R6000), i486, Pentium - Dedicated pipelines
- dedicated pipelines are implemented in power PC
603, Power PC 604, DEC a etc
39Multiplicity of Pipelines
- multiplicity refers to the concept that whether
to use a single instance of physical pipeline or
multiple instances of physical pipelines. - Two aspects should be considered while
considering pipeline multiplicity - frequency of instructions
- out-of-order execution of instructions due to
multiple pipelines
40Multiplicity of Pipelines
41Preserving Sequential Consistency
42Implementation Pipelined Instruction Processing
43Implementation Pipelined Instruction Processing