PIPELINING - PowerPoint PPT Presentation

1 / 47
About This Presentation
Title:

PIPELINING

Description:

Each stage is connected with each other to form a pipe. ... For any system to be free from hazards, pipelining of functional units and ... – PowerPoint PPT presentation

Number of Views:105
Avg rating:3.0/5.0
Slides: 48
Provided by: Crea199
Category:
Tags: pipelining | form | free | resume

less

Transcript and Presenter's Notes

Title: PIPELINING


1
PIPELINING
  • -Deepak Haran
  • (2000B5A3710)

2
WHAT IS PIPELINING??
  • Pipelining is an implementation technique where
    multiple instructions are overlapped in execution
    to make fast CPUs.
  • It is an implementation which exploits
    parallelism among the instructions in a
    sequential instruction stream.

3
THE METHODOLOGY
  • In a pipeline each step is called a pipe
    stage/pipe segment which completes a part of an
    instruction.
  • Each stage is connected with each other to form a
    pipe.
  • Instructions enter at one end ,progress through
    each stage and exit at the other end.

4
THE NEED FOR PIPELINING
  • TO MAKE FAST CPUS.
  • This is accomplished by increasing the CPU
    throughput (the number of instructions completed
    per unit time)
  • It yields a reduction in the average execution
    time per execution. For a machine with multiple
    clock cycles per instruction, pipelining is
    viewed as the reduction in the number of CPI.

5
Contd
  • time per instruction on a pipelined machine
  • time per inst. on unpipelined machine
  • _______________________________
  • Number of pipe stages

6
IMPLEMENTATION OF THE DLX INSTRUCTION SET
  • The DLX architecture has been chosen because its
    simplicity makes it easy to demonstrate the
    principles of pipelining.
  • Each DLX instruction can be implemented in at
    most 5 clock cycles. implementation requires the
    use of several temporary registers which simplify
    pipelining.

7
IMPLEMENTATION OF THE DLX INSTRUCTION SET
  • The five clock cycles are as follows
  • Instruction Fetch cycle (IF) the instruction
    stored in the memory corresponding to the PC is
    stored in the IR and (PC4) is stored in NPC.
  • Instruction Decode/Register Fetch Cycle
  • Decoding is done parallel with reading
    registers because the fields are at a fixed
    location in the format (Fixed Field Decoding).


8
IMPLEMENTATION OF THE DLX INSTRUCTION SET
  • Execution/Effective Address cycle (EX)
  • The ALU operates on the operands prepared in
    the prior cycle performing functions depending
    upon the DLX instruction type.
  • Memory access/branch completion cycle (MEM)
  • the only instructions that are active are the
    loads, stores and branches.
  • memory reference if the instruction is a
    load, then data from the memory is placed in the
    LMD register. If the instruction is a store

9
IMPLEMENTATION OF THE DLX INSTRUCTION SET
  • then data from the B register is written into
    the memory corresponding to the value stored in
    register ALUOutput.
  • Branch if the instruction branches, the PC
    is replaced with the branch destination address
    in ALUOutput, otherwise, it is replaced with
    incremented PC in register NPC.

10
IMPLEMENTATION OF THE DLX INSTRUCTION SET
  • Write Back cycle (WB)
  • the result is written into the register file,
    whether it comes from the memory system or from
    the ALU.

11
IMPLEMENTATION OF THE DLX INSTRUCTION SET
  • Single Cycle vs Multiple Cycle Implementation
  • Multiple cycle implementation each
    instruction takes multiple clock cycles to
    execute. In the DLX set, each instruction takes
    five clock cycles to implement.
  • Single Cycle implementation each instruction
    takes one long clock cycle

12
IMPLEMENTATION OF THE DLX INSTRUCTION SET
  • However the single cycle implementation is not
    followed for the two reasons
  • 1. inefficient for those machines which have a
    reasonable variation among the amount of work and
    in the clock cycle time needed for different
    instructions.
  • 2.it requires the duplication of functional
    units that could be shared in a multicycle
    implementation.

13
THE BASIC PIPELINE FOR DLX
  • Since each instruction takes 5 clock cycles to
    complete, during each clock cycle the hardware
    initiates a new instruction and will be executing
    some part of the five different instructions.
  • Two different operations with the same data path
    resource and during the same clock cycle are not
    simultaneously performed.

14
THE BASIC PIPELINE FOR DLX
  • Further more, pipelining the datapath requires
    that values are passed from one pipe stage to the
    next are placed in registers called pipeline
    registers.
  • These registers convey values and control
    information from one stage to another.

15
THE BASIC PIPELINE FOR DLX
  • In the DLX pipeline, the major functional units
    such as ALU etc. are used in different cycles and
    hence overlapping the execution of multiple
    instructions introduces relatively few conflicts.
  • This is possible due to the following reasons.

16
THE BASIC PIPELINE FOR DLX
  • The usage of different instruction and data
    memories eliminates a conflict for a single
    memory that would arise between the instruction
    fetch and data memory access of different
    instructions.
  • The register file is used in two stages for
    reading during the ID phase and for writing in
    the WB stage during a particular clock cycle.

17
THE BASIC PIPELINE FOR DLX
  • To start a new instruction every clock the PC
    needs to be incremented every clock and stored.
    This is done in the IF stage where the
    incremented PC or the value of the branch target
    of an earlier branch is written in PC.

18
PIPELINE HAZARDS
  • WHAT ARE PIPELINE HAZARDS ???
  • Hazards are those situations ,that prevent the
    next instruction in the instruction stream from
    executing during its designated clock cycle. They
    reduce the performance from the ideal speedup
    gained by pipelining.

19
CLASSIFICATION OF HAZARDS
  • Structural Hazards arise from resource
    conflicts when the hardware cant support all
    possible combinations in simultaneous overlapped
    execution.
  • Data hazards arise when an instruction depends
    upon the results of a previous instruction in a
    way that is exposed by the overlapping of
    instructions in the pipeline.

20
CLASSIFICATION OF HAZARDS
  • Control Hazards arise from the pipelining of
    branches and other instructions that change the PC

21
STRUCTURAL HAZARDS
  • For any system to be free from hazards,
    pipelining of functional units and duplication of
    resources is necessary to allow all possible
    combinations of instructions in the pipeline.
  • Structural hazards arise due to the following
    reasons

22
STRUCTURAL HAZARDS
  • When a functional unit is not fully pipelined ,
    then the sequence of instructions using that unit
    cannot proceed at the rate of one per clock
    cycle.
  • When the resource is not duplicated enough to
    allow all possible combinations of instructions.
  • ex a machine may have one register file
    write port, but it may want to perform 2 writes
    during the same clock cycle.

23
STRUCTURAL HAZARDS
  • A machine with a shared single memory for data
    and instructions . An instruction containing data
    memory reference will conflict with the
    instruction reference for a later instruction.
  • This resolved by stalling the pipeline for one
    clock cycle when the data memory access occurs.

24
DATA HAZARDS
  • Data hazards occur when the pipeline changes
    the order of read/write accesses to operands so
    that the order differs from the order they see by
    sequentially executing instructions on an
    unpipelined machine.

25
CLASSIFICATION OF DATA HAZARDS
  • RAW (read after write ) consider two
    instructions i and j with i occurring before j.
  • j tries to read a source before i actually
    writes into it , as a result j gets the old
    value.
  • Ex
  • ADD R1,R2,R3
  • SUB R4,R1,R5
  • AND R6,R1,R7
  • OR R8,R1,R9
  • XOR R10,R1,R11

26
CLASSIFICATION OF DATA HAZARDS
  • This hazard is overcome by a simple hardware
    technique called forwarding.
  • in forwarding ,the ALU result from the EX/MEM
    register is always fed back into ALU input
    latches.
  • if the forwarding hardware detects that the
    previous ALU operations has written the register
    corresponding to a source for the current ALU
    operation, then the control logic selects the
    forwarded result as the ALU input rather than the
    value read from the register file.

27
CLASSIFICATION OF DATA HAZARDS
  • WAW (write after write)
  • j tries to write an operand before it is
    written by i. Thus the writes are performed in
    the wrong order leaving the value of i as the
    final value.
  • This hazard is present in pipelines that write
    in more than one pipe stage. However in DLX this
    isnt a hazard as it writes only in the WB stage.

28
CLASSIFICATION OF DATA HAZARDS
  • EX
  • LW R1,0(R2)
  • ADD R1,R2,R3

29
CLASSIFICATION OF DATA HAZARDS
  • WRITE AFTER READ (WAR)
  • j tries to write a destination before it is
    read by i.
  • This doesnt happen in DLX as all reads occur
    early (ID phase) and all writes occur late (in WB
    stage).
  • EX
  • SW 0(R1),R2
  • ADD R2,R3,R4

30
CLASSIFICATION OF DATA HAZARDS
  • HAZARDS REQUIRING STALLS
  • Consider the situation where a load and a sub
    instruction are consecutive, where the
    destination register of load is the source
    register for sub.
  • This hazard cannot be removed by forwarding.
    Hence a pipeline interlock is introduced to
    detect the hazard and stalls the pipeline until
    the hazard is cleared. The hazard is checked
    during the ID phase and stalls the instruction
    that wants to use the data until the source
    instruction produces it.

31
CONTROL HAZARDS
  • Control hazards cause a greater performance
    loss compared to the losses posed by data
    hazards.
  • The simplest method of dealing with branches
    is that the pipeline is stalled as soon the
    branch is detected in the ID phase and until the
    MEM stage where the new PC is finally determined.

32
CONTROL HAZARDS
  • Each branch causes a 3 cycle stall in the DLX
    pipeline which is a significant loss as the 30
    of the instructions used are branch instructions.
  • The number of clock cycles in the branch is
    reduced by testing the condition for branching in
    the ID stage and computing the destination
    address in the ID stage using a separate adder.
    Thus there is only clock cycle on branches

.
33
WHAT MAKES PIPELINING HARD TO IMPLEMENT???
  • EXCEPTIONAL SITUATIONS are those situations in
    which the normal order of execution is changed.
    This is due to instructions that raise exceptions
    that may force the machine to abort the
    instructions in the pipeline before they complete.

34
WHAT MAKES PIPELINING HARD TO IMPLEMENT???
  • Some of the exceptions include
  • Integer arithmetic overflow/underflow.
  • Power failure
  • Hardware malfunctions.
  • I/O device request.

35
WHAT MAKES PIPELINING HARD TO IMPLEMENT???
  • The five categories that are used to define what
    action is needed for the different execution
    types are
  • synchronous/asynchronous
  • User requested/coerced
  • User maskable /non maskable
  • Within versus between instructions
  • Resume versus terminate

36
WHAT MAKES PIPELINING HARD TO IMPLEMENT???
  • EXCEPTIONS IN DLX
  • IF- page-fault on instruction fetch, misaligned
    memory access
  • ID- undefined/illegal opcode.
  • EX-arithmetic exceptions.
  • MEM- page-fault on data fetch, misaligned memory
    access.
  • WB-none

37
DLX FP PIPELINE
  • THE FLOATING POINT PIPELINE HAVE THE SAME
    PIPELINE AS THE INTEGER INSTRUCTIONS EXCEPT THE
    FOLLOWING TWO IMPORTANT CHANGES.
  • The EX cycle can be repeated times to complete
    operation.

38
DLX FP PIPELINE
  • There are multiple floating point functional
    units
  • 1. the main integer unit that handles loads
    and stores, integer ALU operations and branches.
  • 2.FP and integer multiplier.
  • 3.FP adder
  • 4.FP and integer divider.

39
DLX FP PIPELINE
  • All the execution stages of these functional
    units are not pipelined.
  • FLOATING PIPELINE HAVE A LONGER LATENCY FOR
    OPERATIONS.
  • Latency is defined as the number of cycles that
    elapse between an instruction producing the
    result and an instruction using the result

40
DLX FP PIPELINE
  • Latency is also the number of stages from the EX
    stage to the stage that produces the result.
  • Using the above definition ,various functional
    units have different latencies as shown below.
  • 1.Integer ALU-0
  • 2.Data Memory-1

41
DLX FP PIPELINE
  • 3.FP add-3
  • 4.FP multiply-6
  • 5.FP divide-24
  • The pipeline structure has been implemented
    with the above latencies with the introduction of
    additional pipeline registers between the
    additional pipe-stages.

42
DLX FP PIPELINE
  • FEATURES
  • FP multiplier is pipelined with 7 stages.
  • FP adder is pipelined with 4 stages.
  • FP divider is not pipelined and requires 24
    clock cycles to complete an operation.
  • Both structural and RAW and WAW data hazards are
    possible.

43
INTERDEPENDENCE OF INSTRUCTION SET DESIGN AND
PIPELINING
  • Variable instruction length and execution times
    lead to imbalance among pipeline stages, thus
    complicating hazard detection.
  • Sophisticated addressing modes such as
    post-increment that update registers complicate
    hazard detection.
  • Architectures such as 80x86 allow writes into
    instruction space complicate pipelining.

44
MIPS R4000 PIPELINE
  • FEATURES
  • MIPS-3 INSTRUCTION SET-64 BIT
  • DEEPER PIPELINE THAN DLX-8 STAGE
  • HIGHER CLOCK RATE ACHIEVED.
  • BOTH LOAD AND BRANCH DELAYS ARE INCREASED
  • BASIC BRANCH DELAY 3 CYCLES

45
MIPS R4000 PIPELINE
  • MIPS R4000 pipeline consists of 3 functional
    units a floating point divider, a floating
    point multiplier and a floating point adder.
  • The primary reasons for stalls in MIPS R4000
    PIPELINE have been attributed to the following

46
MIPS R4000 PIPELINE
  • Load stalls Delays arising from the use of a
    load result one or two cycles after the load.
  • Branch stall Two cycle stall taken on every
    branch taken.
  • FP result stall due to RAW hazards for an FP
    operand.
  • FP structural stall arising from conflicts for
    functional units.

47
  • THANK YOU
Write a Comment
User Comments (0)
About PowerShow.com