Homework 1 is out Due Tuesday next week - PowerPoint PPT Presentation

1 / 43
About This Presentation
Title:

Homework 1 is out Due Tuesday next week

Description:

The ARF holds the 'official' register state, which we will only update in program order ... so when we issue R2=R9 R3, it will read source from the ARF. ROB4 ... – PowerPoint PPT presentation

Number of Views:32
Avg rating:3.0/5.0
Slides: 44
Provided by: ccGa
Category:
Tags: arf | due | homework | next | out | tuesday | week

less

Transcript and Presenter's Notes

Title: Homework 1 is out Due Tuesday next week


1
Homework 1 is out!Due Tuesday next week!
2
CS6290Reorder Buffer
3
Out-of-Order Execution
  • Were now executing instructions in data-flow
    order
  • Great! More performance
  • But outside world cant know about this
  • Must maintain illusion of sequentiality

4
Remember the Toll Booth?
OOO 30s
Well add the equivalent of the shoulder to the
CPU the Re-Order Buffer (ROB)
5
Re-Order Buffer (ROB)
  • Separates architected vs. physical registers
  • Tracks program order of all in-flight insts
  • Enables in-order completion or commit

6
Hardware Organization
Architected Register File
RAT
Instruction Buffers
ROB
head
Reservation Stations and ALUs
op
Qj
Qk
Vj
Vk
op
Qj
Qk
Vj
Vk
op
Qj
Qk
Vj
Vk
op
Qj
Qk
Vj
Vk
Add
op
Qj
Qk
Vj
Vk
op
Qj
Qk
Vj
Vk
type
dest
value
fin
Mult
7
Issue
Architected Register File
RAT
  • Read inst from inst buffer
  • Check if resources available
  • Appropriate RS entry
  • ROB entry
  • Read RAT, read (available) sources, update RAT
  • Write to RS and ROB

Instruction Buffers
ROB
head
Reservation Stations and ALUs
op
Qj
Qk
Vj
Vk
op
Qj
Qk
Vj
Vk
op
Qj
Qk
Vj
Vk
op
Qj
Qk
Vj
Vk
Add
op
Qj
Qk
Vj
Vk
op
Qj
Qk
Vj
Vk
type
dest
value
fin
Mult
Stall issue if any needed resource not available
8
Exec
  • Same as before
  • Wait for all operands to arrive
  • Compete to use functional unit
  • Execute!

9
Write Result
  • Broadcast result on CDB
  • (any dependents will grab the value)
  • Write result back to your ROB entry
  • The ARF holds the official register state,
    which we will only update in program order
  • Mark ready/finished bit in ROB (note that this
    inst has completed execution)

10
New Commit
  • When an inst is the oldest in the ROB
  • i.e. ROB-head points to it
  • Write result (if ready/finished bit is set)
  • If register producing instruction write to
    architected register file
  • If store write to memory
  • Advance ROB-head to next instruction
  • This is what the outside world sees
  • And its all in-order

11
Commit Illustrated
  • Make instruction execution visible to the
    outside world
  • Commit the changes to the architected state

ROB
Outside World sees
A
?
ARF
B
?
A executed
C
B executed
?
D
C executed
?
E
D executed
?
F
E executed
?
G
?
H
Instructions execute out of program order, but
outside world still believes its in-order
?
J
?
K
?
12
Revisiting Register Renaming
R1 R2 R3 R3 R5 R6 R1 R1 R7 R1 R4
R8 R2 R9 R3
RAT
R1
R1
ROB1
ROB3
ROB4
ROB
R2
R2
R3
ROB2
R3

ROB1
ROB2
ROB3
If we issue R2R9R3 to the ROB now, R3 comes
from ROB2
ROB4
ROB5
ROB6
However, if R3R5-R6 commits first
ROB7
ROB8
Then update RAT so when we issue R2R9R3, it
will read source from the ARF.
13
Example
Operands
Inst
Add 2 cycles Mult 10 cycles Divide 40 cycles
R2, R3, R4
DIV
R1, R5, R6
MUL
R3, R7, R8
ADD
Sequentially, this would take 401021022
66 cycles ( other pipeline stages)
R1, R1, R3
MUL
R4, R1, R5
SUB
R1, R4, R2
ADD
R1
-23
R2
16
R3
45
R4
5
R5
3
R6
4
R7
1
R8
2
14
In Detail
Assume you can bypass and execute in the same
cycle
1 2 3 4 5 6
RS fields
Dst-Tag
Tag1
Tag2
Val1
Val2
Op
ROB fields
Type
Dest
Value
Finished
RS (Adder)
RS (Mul/Div)
ROB1
45
5
DIV
I
E
W
C
ARF
RAT
ROB
1
1
ARF1
R1
R2
ROB1
2
R2
ROB1
ARF2
ROB2
3
R3
ARF3
ROB3
4
R4
ARF4
ROB4
5
R5
ARF5
ROB5
6
R6
ARF6
ROB6
ARF7
R7
Cycle
1
ARF8
R8
15
In Detail
Assume you can bypass and execute in the same
cycle
1 2 3 4 5 6
RS fields
Dst-Tag
Tag1
Tag2
Val1
Val2
Op
ROB fields
Type
Dest
Value
Finished
RS (Adder)
RS (Mul/Div)
ROB1
45
5
DIV
ROB2
3
4
MUL
I
E
W
C
ARF
RAT
ROB
1
1
2
ARF1
R1
R2
ROB1
ROB2
2
2
R2
ROB1
R1
ROB2
3
R3
ARF3
ROB3
4
R4
ARF4
ROB4
5
R5
ARF5
ROB5
6
R6
ARF6
ROB6
ARF7
R7
Cycle
2
ARF8
R8
16
In Detail
Assume you can bypass and execute in the same
cycle
1 2 3 4 5 6
RS fields
Dst-Tag
Tag1
Tag2
Val1
Val2
Op
ROB fields
Type
Dest
Value
Finished
RS (Adder)
RS (Mul/Div)
ROB3
1
2
ADD
ROB1
45
5
DIV
ROB2
3
4
MUL
I
E
W
C
ARF
RAT
ROB
1
1
2
ROB2
R1
R2
ROB1
2
2
3
R2
ROB1
R1
ROB2
3
3
R3
ROB3
ARF3
R3
ROB3
4
R4
ARF4
ROB4
5
R5
ARF5
ROB5
6
R6
ARF6
ROB6
ARF7
R7
Cycle
3
ARF8
R8
17
In Detail
Assume you can bypass and execute in the same
cycle
1 2 3 4 5 6
RS fields
Dst-Tag
Tag1
Tag2
Val1
Val2
Op
ROB fields
Type
Dest
Value
Finished
RS (Adder)
RS (Mul/Div)
ROB3
1
2
ADD
ROB1
45
5
DIV
ROB2
3
4
MUL
I
E
W
C
ARF
RAT
ROB
1
1
2
ROB2
R1
R2
ROB1
2
2
3
R2
ROB1
R1
ROB2
3
3
4
R3
ROB3
R3
ROB3
4
R4
ARF4
ROB4
5
R5
ARF5
ROB5
6
R6
ARF6
ROB6
ARF7
R7
Cycle
4
ARF8
R8
18
In Detail
Assume you can bypass and execute in the same
cycle
1 2 3 4 5 6
RS fields
Dst-Tag
Tag1
Tag2
Val1
Val2
Op
ROB fields
Type
Dest
Value
Finished
RS (Adder)
RS (Mul/Div)
ROB3
1
2
ADD
ROB1
45
5
DIV
ROB2
3
4
MUL
I
E
W
C
ARF
RAT
ROB
1
1
2
ROB2
R1
R2
ROB1
2
2
3
R2
ROB1
R1
ROB2
3
3
4
6
R3
ROB3
R3
3
Y
ROB3
4
R4
ARF4
ROB4
5
R5
ARF5
ROB5
6
R6
ARF6
ROB6
ARF7
R7
Cycle
6
ARF8
R8
19
In Detail
Assume you can bypass and execute in the same
cycle
1 2 3 4 5 6
RS fields
Dst-Tag
Tag1
Tag2
Val1
Val2
Op
ROB fields
Type
Dest
Value
Finished
RS (Adder)
RS (Mul/Div)
ROB1
45
5
DIV
ROB2
3
4
MUL
I
E
W
C
ARF
RAT
ROB
1
1
2
ROB2
R1
R2
ROB1
2
2
3
13
R2
ROB1
R1
12
Y
ROB2
3
3
4
6
R3
ROB3
R3
3
Y
ROB3
4
R4
ARF4
ROB4
5
R5
ARF5
ROB5
6
R6
ARF6
ROB6
ARF7
R7
Cycle
13
ARF8
R8
20
In Detail
Assume you can bypass and execute in the same
cycle
1 2 3 4 5 6
RS fields
Dst-Tag
Tag1
Tag2
Val1
Val2
Op
ROB fields
Type
Dest
Value
Finished
RS (Adder)
RS (Mul/Div)
ROB1
45
5
DIV
ROB4
12
3
MUL
I
E
W
C
ARF
RAT
ROB
1
1
2
ROB2
R1
R2
ROB1
ROB4
2
2
3
13
R2
ROB1
R1
12
Y
ROB2
3
3
4
6
R3
ROB3
R3
3
Y
ROB3
4
14
R4
ARF4
R1
ROB4
5
R5
ARF5
ROB5
6
R6
ARF6
ROB6
ARF7
R7
Cycle
14
ARF8
R8
21
In Detail
Assume you can bypass and execute in the same
cycle
1 2 3 4 5 6
RS fields
Dst-Tag
Tag1
Tag2
Val1
Val2
Op
ROB fields
Type
Dest
Value
Finished
RS (Adder)
RS (Mul/Div)
ROB5
ROB4
3
SUB
ROB1
45
5
DIV
ROB4
12
3
MUL
I
E
W
C
ARF
RAT
ROB
1
1
2
ROB4
R1
R2
ROB1
2
2
3
13
R2
ROB1
R1
12
Y
ROB2
3
3
4
6
R3
ROB3
R3
3
Y
ROB3
4
14
15
R4
ARF4
R1
ROB5
ROB4
5
15
R5
ARF5
R4
ROB5
6
R6
ARF6
ROB6
ARF7
R7
Cycle
15
ARF8
R8
22
In Detail
Assume you can bypass and execute in the same
cycle
1 2 3 4 5 6
RS fields
Dst-Tag
Tag1
Tag2
Val1
Val2
Op
ROB fields
Type
Dest
Value
Finished
RS (Adder)
RS (Mul/Div)
ROB5
ROB4
3
SUB
ROB1
45
5
DIV
ROB6
ROB5
ROB1
ADD
ROB4
12
3
MUL
I
E
W
C
ARF
RAT
ROB
1
1
2
ROB4
R1
R2
ROB1
ROB6
2
2
3
13
R2
ROB1
R1
12
Y
ROB2
3
3
4
6
R3
ROB3
R3
3
Y
ROB3
4
14
15
R4
ROB5
R1
ROB4
5
15
R5
ARF5
R4
ROB5
6
16
R6
ARF6
R1
ROB6
ARF7
R7
Cycle
16
ARF8
R8
23
In Detail
Assume you can bypass and execute in the same
cycle
1 2 3 4 5 6
RS fields
Dst-Tag
Tag1
Tag2
Val1
Val2
Op
ROB fields
Type
Dest
Value
Finished
RS (Adder)
RS (Mul/Div)
ROB5
ROB4
3
SUB
ROB1
45
5
DIV
ROB6
ROB5
ROB1
ADD
ROB4
12
3
MUL
I
E
W
C
ARF
RAT
ROB
1
1
2
ROB6
R1
R2
ROB1
2
2
3
13
R2
ROB1
R1
12
Y
ROB2
3
3
4
6
R3
ROB3
R3
3
Y
ROB3
4
14
15
R4
ROB5
R1
ROB4
5
15
R5
ARF5
R4
ROB5
6
16
R6
ARF6
R1
ROB6
ARF7
R7
Cycle
ARF8
R8
24
In Detail
Assume you can bypass and execute in the same
cycle
1 2 3 4 5 6
RS fields
Dst-Tag
Tag1
Tag2
Val1
Val2
Op
ROB fields
Type
Dest
Value
Finished
RS (Adder)
RS (Mul/Div)
ROB5
36
3
SUB
ROB1
45
5
DIV
ROB6
ROB5
ROB1
ADD
I
E
W
C
ARF
RAT
ROB
1
1
2
ROB6
R1
R2
ROB1
2
2
3
13
R2
ROB1
R1
12
Y
ROB2
3
3
4
6
R3
ROB3
R3
3
Y
ROB3
4
14
15
25
R4
ROB5
R1
36
Y
ROB4
5
15
R5
ARF5
R4
ROB5
6
16
R6
ARF6
R1
ROB6
ARF7
R7
Cycle
26
ARF8
R8
25
In Detail
Assume you can bypass and execute in the same
cycle
1 2 3 4 5 6
RS fields
Dst-Tag
Tag1
Tag2
Val1
Val2
Op
ROB fields
Type
Dest
Value
Finished
RS (Adder)
RS (Mul/Div)
ROB5
36
3
SUB
ROB1
45
5
DIV
ROB6
ROB5
ROB1
ADD
I
E
W
C
ARF
RAT
ROB
1
1
2
ROB6
R1
R2
ROB1
2
2
3
13
R2
ROB1
R1
12
Y
ROB2
3
3
4
6
R3
ROB3
R3
3
Y
ROB3
4
14
15
25
R4
ROB5
R1
36
Y
ROB4
5
15
26
R5
ARF5
R4
ROB5
6
16
R6
ARF6
R1
ROB6
ARF7
R7
Cycle
28
ARF8
R8
26
In Detail
Assume you can bypass and execute in the same
cycle
1 2 3 4 5 6
RS fields
Dst-Tag
Tag1
Tag2
Val1
Val2
Op
ROB fields
Type
Dest
Value
Finished
RS (Adder)
RS (Mul/Div)
ROB1
45
5
DIV
ROB6
ROB1
33
ADD
I
E
W
C
ARF
RAT
ROB
1
1
2
ROB6
R1
R2
ROB1
2
2
3
13
R2
ROB1
R1
12
Y
ROB2
3
3
4
6
R3
ROB3
R3
3
Y
ROB3
4
14
15
25
R4
ROB5
R1
36
Y
ROB4
5
15
26
28
R5
ARF5
R4
33
Y
ROB5
6
16
R6
ARF6
R1
ROB6
ARF7
R7
Cycle
ARF8
R8
27
In Detail
Assume you can bypass and execute in the same
cycle
1 2 3 4 5 6
RS fields
Dst-Tag
Tag1
Tag2
Val1
Val2
Op
ROB fields
Type
Dest
Value
Finished
RS (Adder)
RS (Mul/Div)
ROB6
33
9
ADD
I
E
W
C
ARF
RAT
ROB
1
1
2
42
ROB6
R1
R2
9
Y
ROB1
2
2
3
13
R2
ROB1
R1
12
Y
ROB2
3
3
4
6
R3
ROB3
R3
3
Y
ROB3
4
14
15
25
R4
ROB5
R1
36
Y
ROB4
5
15
26
28
R5
ARF5
R4
33
Y
ROB5
6
16
R6
ARF6
R1
ROB6
ARF7
R7
Cycle
43
ARF8
R8
28
In Detail
Assume you can bypass and execute in the same
cycle
1 2 3 4 5 6
RS fields
Dst-Tag
Tag1
Tag2
Val1
Val2
Op
ROB fields
Type
Dest
Value
Finished
RS (Adder)
RS (Mul/Div)
ROB6
33
9
ADD
I
E
W
C
ARF
RAT
ROB
1
1
2
42
43
ROB6
R1
ROB1
2
2
3
13
R2
ARF2
R1
12
Y
ROB2
3
3
4
6
R3
ROB3
R3
3
Y
ROB3
4
14
15
25
R4
ROB5
R1
36
Y
ROB4
5
15
26
28
R5
ARF5
R4
33
Y
ROB5
6
16
43
R6
ARF6
R1
ROB6
ARF7
R7
Cycle
44
ARF8
R8
29
In Detail
Assume you can bypass and execute in the same
cycle
1 2 3 4 5 6
RS fields
Dst-Tag
Tag1
Tag2
Val1
Val2
Op
ROB fields
Type
Dest
Value
Finished
RS (Adder)
RS (Mul/Div)
ROB6
33
9
ADD
I
E
W
C
ARF
RAT
ROB
1
1
2
42
43
ROB6
R1
ROB1
2
2
3
13
44
R2
ARF2
ROB2
3
3
4
6
R3
ROB3
R3
3
Y
ROB3
4
14
15
25
R4
ROB5
R1
36
Y
ROB4
5
15
26
28
R5
ARF5
R4
33
Y
ROB5
6
16
43
R6
ARF6
R1
ROB6
ARF7
R7
Cycle
45
ARF8
R8
30
In Detail
Assume you can bypass and execute in the same
cycle
1 2 3 4 5 6
RS fields
Dst-Tag
Tag1
Tag2
Val1
Val2
Op
ROB fields
Type
Dest
Value
Finished
RS (Adder)
RS (Mul/Div)
I
E
W
C
ARF
RAT
ROB
1
1
2
42
43
ROB6
R1
ROB1
2
2
3
13
44
R2
ARF2
ROB2
3
3
4
6
45
R3
ARF3
ROB3
4
14
15
25
R4
ROB5
R1
36
Y
ROB4
5
15
26
28
R5
ARF5
R4
33
Y
ROB5
6
16
43
45
R6
ARF6
R1
42
Y
ROB6
ARF7
R7
Cycle
46
ARF8
R8
31
In Detail
Assume you can bypass and execute in the same
cycle
1 2 3 4 5 6
RS fields
Dst-Tag
Tag1
Tag2
Val1
Val2
Op
ROB fields
Type
Dest
Value
Finished
RS (Adder)
RS (Mul/Div)
I
E
W
C
ARF
RAT
ROB
1
1
2
42
43
ROB6
R1
ROB1
2
2
3
13
44
R2
ARF2
ROB2
3
3
4
6
45
R3
ARF3
ROB3
4
14
15
25
46
R4
ROB5
ROB4
5
15
26
28
R5
ARF5
R4
33
Y
ROB5
6
16
43
45
R6
ARF6
R1
42
Y
ROB6
ARF7
R7
Cycle
47
ARF8
R8
32
In Detail
Assume you can bypass and execute in the same
cycle
1 2 3 4 5 6
RS fields
Dst-Tag
Tag1
Tag2
Val1
Val2
Op
ROB fields
Type
Dest
Value
Finished
RS (Adder)
RS (Mul/Div)
I
E
W
C
ARF
RAT
ROB
1
1
2
42
43
ROB6
R1
ROB1
2
2
3
13
44
R2
ARF2
ROB2
3
3
4
6
45
R3
ARF3
ROB3
4
14
15
25
46
R4
ARF4
ROB4
5
15
26
28
47
R5
ARF5
ROB5
6
16
43
45
R6
ARF6
R1
42
Y
ROB6
ARF7
R7
Cycle
48
ARF8
R8
33
Timing Example
Add 1 cycles Mult 10 cycles Divide 40 cycles
  • Assume you can bypass and execute in the same
    cycle

Commit
Operands
Is
Exec
Wr
Comments
Inst
R2, R3, R4
DIV
R1, R5, R6
MUL
R3, R7, R8
ADD
R1, R1, R3
MUL
R4, R1, R5
SUB
R1, R4, R2
ADD
34
Unified Reservation Stations (1)
  • If MULT RSs are full, and we need to issue
    another MULT, then we have to stall
  • But there may be other RSs (e.g., add) that are
    available
  • Proper number of RSs per ALU needs to be matched
    to the programs inst distribution
  • But different programs have different
    distributions

35
Unified Reservation Stations (2)
RS (Adder)
RS (Mul/Div)
Add
Mult
Can hold 5 adds
Or 5 multiplies
Or any combination
36
Unified Reservation Stations (3)
  • Arbitration and execution paths a little more
    complex (not too bad though)

For N functional units, RS needs To support N
read ports
RS Entries
ADD rdy
Adder Arbiter
Multiplier Arbiter
Add
MUL x
ADD x
Mult
ADD rdy
MUL rdy
MUL x
ADD x
Ready adds compete for execution unit
ADD rdy
Ready muls do the same
Arbitration logic picks insts to execute
Insts go and do their thing
37
Out-of-Order, but not Superscalar
  • As described, this Tomasulo (ROB) CPU can only
    maintain sustained throughput of 1 IPC
  • Limitations
  • Need superscalar fetch, decode, etc.
  • Theres only one CDB, so only one inst per cycle
    can write-back its result to the ROB
  • Also must commit gt 1 IPC

38
Getting gt 1 IPC
  • Must be able to issue gt 1 IPC to RS/ROB
  • Must be able to send gt 1 IPC to functional units
  • original Tomasulo can do this already(if inst
    ready and FU available, go and execute!)
  • Must be able to write-back gt 1 IPC to ROB (and
    reservation stations)

39
Dual-Issue
  • Need to check resource availability for two
    instructions (RS/ROB entries)
  • Depending on resources, may issue 0,1 or 2
  • Read RAT/ARF/ROB for operands
  • Renaming is a little trickier (next slide)
  • Update RS/ROB entries (not too hard)

40
Dual-Rename
All registers renamed!
RAT
ARF1
R1
ROB21
ROB17, ARF3
R1 R2 R3
ROB17
R2
R4 R2 R4
ARF3
R3
ROB22
ROB17, ROB6
ROB6
R4
New destinations are just in the next two ROB
entries To be allocated ROB-tail, ROB-tail 1
ROB21
ROB17, ARF3
X
ROB22
ARF1, ROB6
Need to check for RAW dependencies within
your issue group
41
Multiple CDBs
RS Entries
  • It works (we do this today in CPUs)
  • But theres a cost
  • each RS entry must compare each source with each
    CDB
  • more area, logic, power (all ? )

D1, ROB3
FU
E1, ROB4
A2, ROB10
FU
Arbiters
FU
A1, ROB7
B1, ROB19
FU
C1, ROB2
CDB1
CDB2
42
Committing gt 1 IPC
  • Must be able to write-back multiple results from
    ROB ? ARF (or memory for stores)
  • ROB needs extra read ports
  • ARF needs extra write ports

43
Terminology is Inconsistent, Overloaded
  • Issue, Dispatch, Commit, etc.
  • Text uses terms w.r.t. Tomasulos algorithm
  • Other usage is different (many academics)
  • Issue/Alloc/Dispatch
  • Exec/Issue/Dispatch
  • Commit/Complete/Retire/Graduate

Blue Intels convention
Orange some academics
Write a Comment
User Comments (0)
About PowerShow.com