Pipelining Recap - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

Pipelining Recap

Description:

The MIPS processor (DLX processor) needs 5 stages to ... WD. Data. Memory. ADDR. 5. Instruction. I. 32. M. U. X 2. RD. Instruction. Memory. ADDR. PC. 4. ADD ... – PowerPoint PPT presentation

Number of Views:24
Avg rating:3.0/5.0
Slides: 30
Provided by: mot112
Category:
Tags: pipelining | recap | wd

less

Transcript and Presenter's Notes

Title: Pipelining Recap


1
Pipelining(Recap)
2
MIPS 5-stage pipeline
  • The MIPS processor (DLX processor) needs 5 stages
    to execute instructions
  • Pipelining stages
  • IF - Instruction Fetch
  • ID - Instruction Decode
  • EX - Execute / Address Calculation
  • MEM - Memory Access (read / write)
  • WB - Write Back (results into register file)
  • Not all instructions need all the stages (e.g.,
    add instruction does not need the MEM stage)

3
Basic MIPS Pipelined Processor
IF/ID
ID/EX
EX/MEM
MEM/WB
4
Pipelined Example - Executing Multiple
Instructions
  • Consider the following instruction sequence
  • lw r0, 10(r1)
  • sw sr3, 20(r4)
  • add r5, r6, r7
  • sub r8, r9, r10

5
Executing Multiple InstructionsClock Cycle 1
LW
6
Executing Multiple InstructionsClock Cycle 2
LW
SW
7
Executing Multiple InstructionsClock Cycle 3
LW
SW
ADD
8
Executing Multiple InstructionsClock Cycle 4
LW
SW
ADD
SUB
9
Executing Multiple InstructionsClock Cycle 5
LW
SW
ADD
SUB
10
Executing Multiple InstructionsClock Cycle 6
SW
ADD
SUB
11
Executing Multiple InstructionsClock Cycle 7
ADD
SUB
12
Executing Multiple InstructionsClock Cycle 8
SUB
13
Alternative View - Multicycle Diagram
14
Processor Pipelining
  • There are two ways that pipelining can help
  • Reduce the clock cycle time, and keep the same
    CPI
  • Reduce the CPI, and keep the same clock cycle
    time
  • CPU time Instruction count CPI Clock cycle
    time

15
Reduce the clock cycle time, and keep the same CPI
CPI 1 Clock X Hz
16
Reduce the clock cycle time, and keep the same CPI
CPI 1 Clock X5 Hz
4
PC
ltlt2
Instruction
I
RD
ADDR
32
32
16
5
5
5
Instruction
Memory
RN1
RN2
WN
RD1
Register File
ALU
WD
RD2
ADDR
Data
RD
Memory
16
32
WD
17
Reduce the CPI, and keep the same cycle time
CPI 5 Clock X5 Hz
18
Reduce the CPI, and keep the same cycle time
CPI 1 Clock X5 Hz
19
Pipeline performance
  • Ideally we get a speedup (by reducing clock cycle
    or reducing the CPI) equal to the number of
    stages.
  • In practice, we do not achieve that but we get
    close
  • Pipelining has additional overhead (e.g.,
    pipeline registers)
  • Pipeline hazards

20
Pipeline Hazards
  • Hazards are situations in pipelining which
    prevent the next instruction in the instruction
    stream from executing during the designated clock
    cycle.
  • Hazards reduce the ideal speedup gained from
    pipelining (e.g., CPI 1) and are classified into
    three classes
  • Structural hazards
  • Data hazards
  • Control hazards

21
Structural Hazards
  • If a resource conflict arises due to a hardware
    resource being required by more than one
    instruction in a single cycle, and one or more
    such instructions cannot be accommodated, then a
    structural hazard has occurred, for example
  • when a machine has only one register file write
    port
  • or when a pipelined machine has a shared
    single-memory pipeline for data and instructions.
  • stall the pipeline for one cycle for register
    writes or memory data access

22
Register File/Structural Hazards
Operation on register set by 2 different
instructions in the same clock cycle
Time (clock cycles)
Cycle 1
Cycle 2
Cycle 3
Cycle 4
Cycle 6
Cycle 7
Cycle 5
I n s t r. O r d e r
Load
Reg
Reg
DMem
Instr 1
Reg
Reg
Reg
Reg
Instr 2
Instr 3
Reg
Reg
Ifetch
Reg
Reg
Instr 4
23
Register File/Structural Hazards
We need 3 stall cycles In order to solve this
hazard
Time (clock cycles)
Cycle 1
Cycle 2
Cycle 3
Cycle 4
Cycle 6
Cycle 7
Cycle 5
I n s t r. O r d e r
Load
Reg
Reg
DMem
Instr 1
Reg
Reg
Reg
Reg
Instr 2
Instr 3
3 stalls cycles
Instr 4
24
Register File/Structural Hazards
Allow writing registers in first ½ of cycle and
reading in 2nd ½ of cycle
Time (clock cycles)
Cycle 1
Cycle 2
Cycle 3
Cycle 4
Cycle 6
Cycle 7
Cycle 5
No stalls are required
I n s t r. O r d e r
Load
Reg
Reg
DMem
Instr 1
Reg
Reg
Reg
Reg
Instr 2
Instr 3
Reg
Reg
Ifetch
Reg
Reg
Instr 4
25
1 Memory Port/Structural Hazards
Time (in Cycles)
Operation on Memory by 2 different
instructions in the same clock cycle
Mem
Mem
Load
Instruction Order
Mem
Mem
Instruction1
Instruction2
Mem
Mem
Instruction3
Mem
Mem
Instruction4
Mem
Mem
26
Inserting Bubbles (Stalls)
Time (in Cycles)
Mem
Mem
Load
3 stall cycles with 1-port memory
Mem
Mem
Instruction1
Instruction2
Mem
Mem
Bubble
Bubble
Bubble
Bubble
Bubble
Stall
Bubble
Bubble
Bubble
Bubble
Bubble
Stall
Instruction3
Mem
Mem
27
2 Memory Port/Structural Hazards(Read Write at
the same time)
Time (in Cycles)
No stall with 2-memory ports
Mem
Mem
Load
Instruction Order
Mem
Mem
Instruction1
Instruction2
Mem
Mem
Instruction3
Mem
Mem
Instruction4
Mem
Mem
28
Performance of Pipelines with Stalls
  • Hazards in pipelines may make it necessary to
    stall the pipeline by one or more cycles and thus
    degrading performance from the ideal CPI of 1.
  • CPI pipelined Ideal CPI Pipeline stall
    clock cycles per instruction
  • Speedup CPI unpipelined/(1Pipeline stall
    cycles per instruction)
  • Speedup Pipeline depth/(1 Pipeline stall
    cycles per instruction)

29
Example Dual-port vs. Single-port Memory
  • Machine A Dual ported memory (0 stalls)
  • Machine B Single ported memory (3 stalls), but
    its pipelined implementation has a 1.05 times
    faster clock rate
  • Ideal CPI 1 for both
  • Loads are 40 of instructions executed
  • SpeedUpA Pipeline Depth/(1 0) x
    (clockunpipe/clockpipe)
  • Pipeline Depth
  • SpeedUpB Pipeline Depth/(1 0.4 x 3)
    x (clockunpipe/(clockunpipe / 1.05)
  • (Pipeline Depth/2.2) x 1.05
  • 0.48 x Pipeline Depth
  • SpeedUpA / SpeedUpB Pipeline
    Depth/(0.48 x Pipeline Depth) 2.1
  • Machine A is 2.1 times faster
Write a Comment
User Comments (0)
About PowerShow.com