Title: Commit out of order
1Commit out of order
- Phd student Adrián Cristal.
- Advisors Josep Llosa, Antonio González and Mateo
Valero
2Commit out of order Why?
- Tolerate Long Latency Instructions with following
features (compared with Large ROB design) - A Reduced ROB
- A Reduced Physical Register File
3Commit out of order How?
- Checkpoints The processor creates a checkpoint
in a conflictive long latency instruction and
retires (virtually) it from ROB, but not from
issue queues. - The processor retires (virtually) all dependent
instruction as well - The processor retires the rest of instructions in
a normal way - In case of miss predict either a virtually
retired branch or an exception, the processor
recover its state from the checkpoint
4Hardware Scheme
- R10000 or 21264 like
- Minor changes in the IQ and FPQ
- Little more changes in de LSQ and Store Buffer
- More Physical Registers
Size less than 4kbytes 8 checkpoints entries for
128 integer physical regs 128 fp physical regs
5Example
6Some Definitions
- At the moment to retire an instruction, the
processor must - Retire or Commit if the instruction is completed
- Retire or Commit Virtually if the instruction is
not ready - Create a checkpoint and retire virtually if the
instruction has a long latency and is ready but
not completed - Wait to complete if the instruction has a short
latency and is ready but no completed - A Physical Register is free only if
- Its busy flag is clear
- Its reference counter in the commit state is zero
- Its blocking counter in the commit state is zero
7Commit State
- Its the committed (virtually or not) processors
state - When the processor creates a checkpoint, it
copies this state to the checkpoint entry - Its information is used to control which physical
register is free
8Commit State
- Map Commit Table The processor saves here the
committed (virtually or not) map table. - References counters For each physical register
counts the number pending operations (readings or
freeing) over the register - Blockings counters For each physical register
counts the number of blockings over the register.
When the processor creates a checkpoint, it
blocks all physical registers included in the map
commit table, plus the destination register of
the instruction. And some stores blocks registers
too.
9Checkpoint Table
- It is a set of checkpoints where each entry
contains - A map table
- A references counters
- A virtually retired instruction counter
- The first virtually retired instruction
information
10Example Create Checkpoint I
Copy the Map Commit table to the new entry in the
checkpoint table
11Example Create Checkpoint II
- Update (add 1 to the entries corresponding to the
destination physical register and source
registers) and - Copy the References counters from the commit
state to the new entry in the checkpoint table
12Example Create Checkpoint III
- Copy the instruction information and
- Set the retired virtually counter to 1
13Example Create Checkpoint IV
Update blockings counters. Add 1 to the
corresponding entry for each physical register in
the map commit table. Add 1 to the entry
corresponding to the destination register of the
instruction
14Example Create Checkpoint V
Send a signal to the store buffer to block the
futures stores, until the checkpoint is removed
15Example Create Checkpoint VI
- Mark the instruction in the LSQ as retired
virtually - Free the rob entry, but not the LSQ entry.
- Update the map commit table and the busy flag
16Example Commit
- Update busy flag.
- Update map commit table
- References Counterscurrent
- References Countersold--
17Example Commit Virtually
- Update busy flag.
- Update map commit table
- References Counterscurrent
- References Counterssources
- Virtually retired counter in the last
checkpoint entry
18Example Writeback I
- In all entries of the chekcpoint table created
after or with the instruction - References Countersold--
- References Counterssources--
- Virtually retired counter-- in the instruction
checkpoint entry
19Example Writeback II
- Virtually retired counter-- in the instruction
checkpoint entry. If 0 then - Unblock registers
- Clear the entry in the checkpoint table
20Example Miss Predict Branch I
Copy from the checkpoint entry the references
counters and the map commit table
21Example Miss Predict Branch II
- Unblock registers from all checkpoints entry that
will be freed - Unblock registers corresponding to aborted stores
22Example Miss Predict Branch III
- Purge the IQ, FPQ, LSQ, SB and erase all entry in
the ROB - Set the PC to the next PC of the instruction
saved in the checkpoint entry - Purge the entries in the chekcpoint table
23Operation On retirement
- Completed instruction Update, busy flag,
reference counter (1 destination, -1 old
destination), map commit. - Ready not completed short latency instruction
wait until complete. - Ready not completed long latency instruction
Create a checkpoint - Not ready instruction Virtual retirement.
24Operation On retirementCreate a checkpoint
- The map commit is copied to map table of the
entry - The reference counter is copied to reference
counter of the entry - The instruction information is copied to the
entry - The virtual committed instruction counter is set
to 0 - The blocking counter is updated
- Signal to store buffer to avoid further progress
- The instruction is retired virtually
25Operation On retirementVirtual commit
- We said that an instruction is retired virtually
or is committed virtually if at the moment to be
retired it is not completed - Add 1 to reference counter entries corresponding
to the source registers and the destination
register - Add 1 to the virtually retired instruction
counter of the last created entry - Clear busy flag of the old destination register
- Mark the instruction as retired virtually
26Operation On Writeback
- If the instruction is marked as retired virtually
- Subtract 1 to the reference of the commit state
and for all entries created before of the commit
of the instruction for the source operands and
old destination - Subtract 1 to the retired virtually instruction
counter of the corresponding entry, if this
counter is 0, unblock the registers blocked for
the entry, free the entry and signal to the store
buffer.
27Operation On miss predict (virtually committed)
- Copy from the corresponding entry to commit
state, the map commit and the reference counter - Unblock registers corresponding to purged
checkpoints entries - Unblock registers corresponding to aborted stores
- Purge the IQ, FPQ, LSQ, SB and erase all entry in
the ROB - Set the PC to the next PC of the instruction
saved in the checkpoint entry
28Exception
- If the instruction that generates the exception
is not virtually committed - The processor waits until the checkpoint table is
empty to ensure no prior exception occurs. - Acts as normal processor exception
29Exception I (virtually committed)
- If the instruction is the same that generate the
checkpoint entry - The processor waits until this entry is the only
entry in the checkpoint table - Acts as in miss predict but set the PC to the
exception handler PC
30Exception II (virtually committed)
- If the instruction is not the same that generate
the checkpoint entry - Acts as miss predict branch until the instruction
which generate the exception is not virtually
committed and acts as normal exception - This model is precise exception model, but a
relaxed more efficient model is allowed too
31Load/Store
- Loads can advance stores
- The LSQ entries are freed at commit for completed
loads or at writeback for virtually committed
loads - The stores can not be virtually retired. To
retire a store the processor needs to know the
address, if the value has not been yet calculated
the store is retired and the value register is
blocked - The store operation always is retired to the
store buffer, where the store remains until is
safe to send to memory
32Simulations
- Highly modified simplescalar 3.0 simulator
- 10 spec2000
- First 500 millions instructions of test set
- Branch predictor is update at writeback
- Speedup(IPC/IPC_base)-1
33Simulations
34Simulations
35Simulations Swim
36Simulations Swim
37Detected problems
- Branch predictor
- Some times the processor will fail several times
in the same branch. This can be solved more or
less easily
38Space design
- In the commit state the references counters and
the blockings counters can be stored in a unique
array. - The checkpoint entry can store only the
instruction information and the map table. The
references counters and the blockings counters
can be calculated from information stored in the
checkpoint table entries, store buffer and the
issue queues - To allow a fast unblocking, perhaps it is better
design to add a new 1 bit x physical register
size array to the commit state and to each
checkpoint entry. This array will be set if the
physical register is included in the map table
39Future works
- Add Virtual registers
- Add a waiting queue associated to each issue
queue. The virtual retired instructions are moved
to it when they are virtual committed. When the
first instruction of a checkpoint is completed
this instructions are moved back to the issue
queue - Study better politics of checkpoint creation, may
be something based in branch confidence and
number of virtual committed instructions - Study of branch predictors adequate to this
processor - Develop a model without a ROB, only checkpoint
based