Superscalar Design for Large Instruction Windows: the ROB and LSQ - PowerPoint PPT Presentation

1 / 18

About This Presentation

Title:

Superscalar Design for Large Instruction Windows: the ROB and LSQ

Description:

Number of Views:157

Avg rating:3.0/5.0

Slides: 19

Provided by: sams4

Category:

more less

Transcript and Presenter's Notes

Title: Superscalar Design for Large Instruction Windows: the ROB and LSQ

1
Superscalar Design for Large Instruction Windows
the ROB and LSQ

2
Reorder Buffer
3
Classical ROB Uses

Recovery From Misspeculation Branch, Memory
State Retirement Register, Memory
Precise Interrupts and Exceptions
Resource Management
Storage for renamed registers, in the case where
the ROB doubles as the physical register file
Mechanism whereby physical registers are
reclaimed when the physical register file is
separate from the ROB

4
Debunking the ROB

Does ROB size limit large instruction window,
high ILP processors?
Large FIFOs are not necessarily hard to build,
not the limiting factor
Why are we not building larger ROBs today?
The real performance limiter is not the ROB
itself but the mechanisms that use the ROB
Misspeculation recovery, state retirement,
precise interrupts, and physical resource
reclamation all use the ROB in a serialized
manner
Performance bound by rate of retirement, not ROB
size
Key design goal for large instruction window
machines
Break the unneeded serial dependencies on the ROB
and the serialize on those which remain on
granularities greater than a single instruction

5
Extending the ROB

Classical ROB Extensions
Retire more than an instruction per cycle.
Addresses the ability to retire multiple
instructions per cycle by allowing sequential
instructions to retire concurrently.
Speculative Early Retirement of Instructions
Addresses sequential retirement of instructions,
early resource reclamation and, to a second
order, the ability to retire multiple
instructions per cycle.
Misspeculation recovery sequentialized by a
History Buffer or similar structure

6
Checkpoint Repair

Partial Checkpointing of State
Addresses non-sequential recovery from
misspeculation as well as early resource
reclamation.
MIPS R10000, Alpha 21264
Example Cherry
Point of No Return (PNR) Oldest instruction
that can suffer a branch or memory misprediction

PNR
HEAD
TAIL
REVERSIBLE
IRREVERSIBLE
7
Full Checkpoint Repair

Checkpointed Processors
Addresses sequential instruction retirement,
early resource reclamation, the ability to retire
multiple instructions per cycle, and
non-sequential recovery from misspeculation.
Problem cannot checkpoint at every instruction

ACTIVE
COMPLETE
COMPLETE NOT ASSOCIATED
CHKPT 2
CHKPT 3
CHKPT 1
OLDER INSTRUCTIONS
8
Load-Store Queues
9
Conventional LSQ

10
Scaling the LSQ

Memory disambiguation and store-to-load
forwarding do not scale as instruction window
increases
High latency
High power consumption
Insufficient bandwidth
Search filtering decreases power consumption and
increases bandwidth
Segmented or hierarchical LSQs decrease the
latency of LSQ searches
Address-indexed LSQs decrease power and latency

11
Search Filtering (UT-Austin)

12
Search Filtering (UT-Austin)

13
A Segmented LSQ

14
A Hierarchical Store Queue

15
Want More??
16
Address-Indexed Structures

The LSQ tracks load-store dependences by renaming
the memory space.
Bypass values and recovery checkpoints are
generated via searches.
Power consumption and latency increase
dramatically as the number of in-flight loads and
stores increase.
Might there be a low-power, low-latency
alternative to the FIFO organization of the LSQ?

17
Address-Indexed Structures

Address-Indexed Memory Disambiguation Tables
(MDTs)
Loads and stores use low-order bits of their
addresses to index into the MDTs.
The MDTs store the sequence numbers of the
latest in-flight load and store to each address.
Memory ordering violations are detected by
comparing the sequence number of the issued
load/store to the sequence number in the
corresponding MDT.
Recovery with the MDT is more conservative than
recovery with the LSQ.

18
Address-Indexed Structures

Address-Indexed Forwarding Cache
Forwarding cache holds the speculative values of
in-flight memory addresses.
Each load accesses the forwarding cache as a
level-0 cache in the cache-memory hierarchy.
The state of the forwarding cache may become
corrupt (branch mispredictions, memory
misspeculations). Policy for recovering from
corrupt state is essential to high performance.

Write a Comment

User Comments (0)