14 Superscalar Processors - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

14 Superscalar Processors

Description:

out of order completion of 2nd instr can. write over value to be ... the issue of instruction completion policy ... Out-of-Order Completion (Example) ... – PowerPoint PPT presentation

Number of Views:1338
Avg rating:3.0/5.0
Slides: 32
Provided by: adria216
Category:

less

Transcript and Presenter's Notes

Title: 14 Superscalar Processors


1
Chapter 14
Superscalar Processors

2
What is Superscalar?
A Superscalar machine executes multiple
independent instructions in parallel.
  • Common instructions (arithmetic, load/store,
    conditional branch) can be executed
    independently.
  • Equally applicable to RISC CISC, but more
    straightforward in RISC machines.
  • The order of execution is usually determined by
    the compiler.

3
Example Superscalar Organization
  • 2 Integer ALU pipelines,
  • 2 FP ALU pipelines,
  • 1 memory pipeline (?)

4
Superpipelined Machines
Superpiplined machines overlap pipe stages -
rely on stages being able to begin operations
before the last is complete. Superscaler
machines have multiple instruction pipelines -
process multiple instructions in parallel
5
Superscalar v Superpipeline
6
Limitations of Superscalar
  • Dependent upon
  • Instruction level parallelism
  • Compiler based optimization
  • Hardware support
  • Limited by
  • True Data dependency
  • Procedural dependency
  • Resource conflicts
  • Output dependency or
  • Antidependency (another form of data
    dependency)

7
True Data Dependency
  • ADD r1, r2 (r1r2 ? r1)
  • MOVE r3, r1 (r1 ? r3)
  • Can fetch and decode second instruction in
    parallel with first
  • Can NOT execute second instruction until first is
    finished
  • Compare with the following?
  • LOAD r1, X (x ? r1)
  • MOVE r3, r1 (r1? r3)
  • What additional problem do we have here?

8
Procedural Dependency
  • Cant execute instructions after a branch in
    parallel with instructions before a branch,
    because?
  • Note Also, if instruction length is not
    fixed, instructions have to be decoded to find
    out how many fetches are needed

9
Resource Conflict
  • Two or more instructions requiring access to the
    same resource at the same time
  • e.g. two arithmetic instructions
  • Solution - Can possibly duplicate resources
  • e.g. have two arithmetic units

10
Antidependancy
  • ADD R4, R3, 1 R3 1 ? R4
  • ADD R3, R5, 1 R5 1 ? R3
  • Cannot complete the second instruction before the
    first has read R3
  • Why?

11
True data dependency Antidependency
  • True data dependency
  • result of 1st instr used in 2nd instr
  • (cant complete 1st too soon)
  • Antidenpendency
  • out of order completion of 2nd instr can
  • write over value to be used in 1st instr
  • (must complete 1st before 2nd changes
  • operand value)

12
Effect of Dependencies
13
Instruction-level Parallelism
  • Consider
  • LOAD R1, R2
  • ADD R3, 1
  • ADD R4, R2
  • These can be handled in parallel. Why?
  • Consider
  • ADD R3, 1
  • ADD R4, R3
  • STO (R4), R0
  • These cannot. Why?

14
Instruction Issue Policies
  • Order in which instructions are fetched
  • Order in which instructions are executed
  • Order in which instructions update registers and
    memory values
  • Note there is also the issue of instruction
    completion policy

15
In-Order Issue -- In-Order Completion
  • Issue instructions in the order they occur
  • Not very efficient
  • Instructions must stall if necessary

16
In-Order Issue -- In-Order Completion (Example)
  • Assume
  • I1 requires 2 cycles to execute
  • I3 I4 conflict for the same functional unit
  • I5 depends upon value produced by I4
  • I5 I6 conflict for a functional unit

17
In-Order Issue -- Out-of-Order Completion(Example
)
  • Again
  • I1 requires 2 cycles to execute
  • I3 I4 conflict for the same functional unit
  • I5 depends upon value produced by I4
  • I5 I6 conflict for a functional unit

How does this effect interrupts?
18
Out-of-Order Issue -- Out-of-Order Completion
  • Decouple decode pipeline from execution pipeline
  • Can continue to fetch and decode until this
    pipeline is full
  • When a functional unit becomes available an
    instruction can be executed
  • Since instructions have been decoded, processor
    can look ahead

19
Out-of-Order Issue -- Out-of-Order Completion
(Example)
  • Again
  • I1 requires 2 cycles to execute
  • I3 I4 conflict for the same functional unit
  • I5 depends upon value produced by I4
  • I5 I6 conflict for a functional unit

Note I5 depends upon I4, but I6 does not
20
Register Renaming
  • Output and antidependencies occur because
  • register contents may not reflect the correct
  • ordering from the program
  • Can result in a pipeline stall
  • One solution Allocate Registers dynamically
  • (renaming registers)

21
Register Renaming example
  • R3bR3a R5a (I1)
  • R4bR3b 1 (I2)
  • R3cR5a 1 (I3)
  • R7bR3c R4b (I4)
  • Without subscript refers to logical register in
    instruction
  • With subscript is hardware register allocated
  • R3a R3b R3c
  • Note R3c avoids antidependency on I2
  • output dependency I1

22
Machine Parallelism Support
  • Duplication of Resources
  • Out of order issue
  • Renaming
  • Windowing

23
Speedups of Machine Organizations (Without
Procedural Dependencies)
  • Not worth duplication of functional units
    without register renaming
  • Need instruction window large enough (more than
    8, probably not more than 32)

24
Branch Prediction in Superscalar Machines
  • Delayed branch not used much. Why?
  • Multiple instructions need to execute in
    the delay slot.
  • This leads to much complexity in
    recovery.
  • Branch prediction may be used - Branch history
    MAY still be useful
  • Are there any alternatives ?

25
Superscalar Execution
26
Committing or Retiring Instructions
  • Results need to be put into order (commit or
    retire)
  • Results sometimes must be held in temporary
    storage until it is certain they can be placed in
    permanent storage.
  • (commit or retire)
  • Temporary storage requires regular clean up -
    overhead.

27
Superscalar Hardware Support
  • Facilities to simultaneously fetch multiple
    instructions
  • Logic to determine true dependencies involving
    register values and Mechanisms to communicate
    these values
  • Mechanisms to initiate multiple instructions in
    parallel
  • Resources for parallel execution of multiple
    instructions
  • Mechanisms for committing process state in
    correct order

28
Conclusions
  • What are the relative benefits of
  • Superscalar
  • Superpipelining

29
Superscalar CISC machines
  • Can Superscalar design be applied to CISC
    machines ?

30
javax.comm
  • Basically, javax.comm is no longer supported
    on Windows (hasn't been since 2002), so we
    switched to RxTx, which is nearly identical. /
    According to
  • http//en.wikibooks.org/wiki/Serial_Programming
    Serial_JavaRxTx,
  • "Converting a JavaComm Application to RxTx", all
    that is required to convert a javacomm
    application to an RxTx application is simply
    changing the import statement import
    javax.comm. to import gnu.io.
    Everything else in the program can remain
    exactly the same because the package gnu.io
    apparently encompasses the same classes as
    javax.comm.
  • Indeed, rxtx version of SimpleWrite is
    identical to the javacomm version of SimpleWrite
    except that it imports gnu.io. rather than
    javax.comm.."

31
Basic Concepts of the IA-64 Architecture
  • Instruction level parallelism
  • Explicit in machine instruction rather than
    determined at run time by processor
  • Long or very long instruction words (LIW/VLIW)
  • Fetch bigger chunks already preprocessed
  • Branch predication (not the same as branch
    prediction)
  • Go ahead and fetch decode instructions, but
    keep track of them so the decision to issue
    them, or not, can be practically made later
  • Speculative loading
  • Go ahead and load data so it is ready when need,
    and have a practical way to recover if
    speculation proved wrong
  • Software Pipelining
  • Allows multiple iterations of a loop to execute
    in parallel
Write a Comment
User Comments (0)
About PowerShow.com