Chapter 14 Instruction Level Parallelism and Superscalar Processors - PowerPoint PPT Presentation

About This Presentation
Title:

Chapter 14 Instruction Level Parallelism and Superscalar Processors

Description:

Instructions must stall if necessary. In-Order Issue In-Order Completion ... Could result in a pipeline stall. One solution: Registers allocated dynamically ... – PowerPoint PPT presentation

Number of Views:1592
Avg rating:3.0/5.0
Slides: 28
Provided by: adria216
Category:

less

Transcript and Presenter's Notes

Title: Chapter 14 Instruction Level Parallelism and Superscalar Processors


1
Chapter 14Instruction Level Parallelismand
Superscalar Processors

2
What is Superscalar?
A Superscalar machine executes multiple
independent instructions in parallel.
  • Common instructions (arithmetic, load/store,
    conditional branch) can be executed
    independently.
  • Equally applicable to RISC CISC, but more
    straightforward in RISC machines.
  • The order of execution is usually determined by
    the compiler.

3
Example Superscalar Organization
4
Superpipelined Machines
Superpiplined machines overlap pipe stages -
rely on stages being able to begin operations
before the last is complete.
5
Superscalar v Superpipeline
6
Limitations of Superscalar
  • Dependent upon
  • Instruction level parallelism
  • Compiler based optimization
  • Hardware support
  • Limited by
  • True data dependency
  • Procedural dependency
  • Resource conflicts
  • Output dependency
  • Antidependency (one instruction can overwrite a
    value that an earlier instruction has not yet
    read)

7
True Data Dependency
  • ADD r1, r2 (r1 r1r2)
  • MOVE r3, r1 (r3 r1)
  • Can fetch and decode second instruction in
    parallel with first
  • Can NOT execute second instruction until first is
    finished
  • Compare with the following?
  • LOAD r1, X (r1 X)
  • MOVE r3, r1 (r3 r1)
  • What additional problem do we have here?

8
Procedural Dependency
  • Can not execute instructions after a branch in
    parallel with instructions before a branch,
    because?
  • Also, if instruction length is not fixed,
    instructions have to be decoded to find out how
    many fetches are needed

9
Resource Conflict
  • Two or more instructions requiring access to the
    same resource at the same time
  • e.g. two arithmetic instructions
  • Solution - Can possibly duplicate resources
  • e.g. have two arithmetic units

10
Antidependancy
  • ADD R4, R3, 1
  • ADD R3, R5, 1
  • Cannot complete the second instruction before the
    first has read R3

11
Effect of Dependencies
12
Instruction-level Parallelism
  • Consider
  • LOAD R1, R2
  • ADD R3, 1
  • ADD R4, R2
  • These can be handles in parallel
  • Consider
  • ADD R3, 1
  • ADD R4, R3
  • STO (R4), R0
  • These cannot

13
Instruction Issue Policies
  • Order in which instructions are fetched
  • Order in which instructions are executed
  • Order in which instructions change registers and
    memory

14
In-Order Issue In-Order Completion
  • Issue instructions in the order they occur
  • Not very efficient
  • May fetch gt1 instruction
  • Instructions must stall if necessary

15
In-Order Issue In-Order Completion (Diagram)
  • Assume
  • I1 requires 2 cycles to execute
  • I3 I4 conflict for the same functional unit
  • I5 depends upon value produced by I4
  • I5 I6 conflict for a functional unit

16
In-Order Issue Out-of-Order Completion
How does this effect interrupts?
17
Out-of-Order IssueOut-of-Order Completion
  • Decouple decode pipeline from execution pipeline
  • Can continue to fetch and decode until this
    pipeline is full
  • When a functional unit becomes available an
    instruction can be executed
  • Since instructions have been decoded, processor
    can look ahead

18
Out-of-Order Issue Out-of-Order Completion
(Diagram)
Note I5 depends upon I4, but I6 does not
19
Register Renaming
  • Output and antidependencies occur because
  • register contents may not reflect the correct
  • ordering from the program
  • Could result in a pipeline stall
  • One solution Registers allocated dynamically

20
Register Renaming example
  • R3bR3a R5a (I1)
  • R4bR3b 1 (I2)
  • R3cR5a 1 (I3)
  • R7bR3c R4b (I4)
  • Without subscript refers to logical register in
    instruction
  • With subscript is hardware register allocated
  • Note R3a R3b R3c

21
Machine Parallelism Support
  • Duplication of Resources
  • Out of order issue
  • Renaming
  • Windowing

22
Speedups of Machine Organizations Without
Procedural Dependencies
23
Study Conclusions
  • Not worth duplication functional units without
    register renaming
  • Need instruction window large enough (more than
    8, probably not more than 32)

24
Branch Prediction in Superscalar Machines
  • Delayed branch not used much. Why?
  • Branch prediction used - Branch history may still
    be useful

25
Superscalar Execution
26
Committing or Retiring Instructions
  • Sometimes results must be held in temporary
    storage until it is certain they can be placed in
    permanent storage.
  • This temporary storage needs to be regularly
    cleaned up.

27
Superscalar Hardware Support
  • Simultaneously fetch multiple instructions
  • Logic to determine true dependencies involving
    register values and Mechanisms to communicate
    these values
  • Mechanisms to initiate multiple instructions in
    parallel
  • Resources for parallel execution of multiple
    instructions
  • Mechanisms for committing process state in
    correct order
Write a Comment
User Comments (0)
About PowerShow.com