Lecture 16: Core Design - PowerPoint PPT Presentation

About This Presentation
Title:

Lecture 16: Core Design

Description:

Lecture 16: Core Design Today: basics of implementing a correct ooo core: register renaming, commit, LSQ, issue queue * * The Alpha 21264 Out-of-Order Implementation ... – PowerPoint PPT presentation

Number of Views:75
Avg rating:3.0/5.0
Slides: 18
Provided by: RajeevBalas160
Learn more at: https://my.eng.utah.edu
Category:

less

Transcript and Presenter's Notes

Title: Lecture 16: Core Design


1
Lecture 16 Core Design
  • Today basics of implementing a correct ooo
    core
  • register renaming, commit, LSQ,
    issue queue

2
The Alpha 21264 Out-of-Order Implementation
Reorder Buffer (ROB)
Branch prediction and instr fetch
Instr 1 Instr 2 Instr 3 Instr 4 Instr 5 Instr 6
Committed Reg Map R1?P1 R2?P2
Register File P1-P64
R1 ? R1R2 R2 ? R1R3 BEQZ R2 R3 ? R1R2 R1 ?
R3R2
Decode Rename
P33 ? P1P2 P34 ? P33P3 BEQZ P34 P35 ?
P33P34 P36 ? P35P34
ALU
ALU
ALU
Speculative Reg Map R1?P36 R2?P34
Instr Fetch Queue
Results written to regfile and tags broadcast to
IQ
Issue Queue (IQ)
3
Rename
A lr1 ? lr2 lr3 B lr2 ? lr4 lr5 C
lr6 ? lr1 lr3 D lr6 ? lr1 lr2 RAR
lr3 RAW lr1 WAR lr2 WAW
lr6 A BC D
pr7 ? pr2 pr3 pr8 ? pr4 pr5 pr9 ? pr7
pr3 pr10 ? pr7 pr8 RAR pr3 RAW pr7 WAR
x WAW x AB CD
4
Commit Example
Assume a processor with 6 logical regs and 10
physical regs
Map Old / New lr1 pr1 pr7 lr2 pr2 pr8 lr6
pr6 pr9 lr6 pr9 pr10 lr3 pr3 pr1 lr4
pr4 pr2
A lr1 ? lr2 lr3 B lr2 ? lr4 lr5 C
lr6 ? lr1 lr3 D lr6 ? lr1 lr2 E lr3 ?
lr6 lr2 F lr4 ? lr3 lr4
pr7 ? pr2 pr3 pr8 ? pr4 pr5 pr9 ? pr7
pr3 pr10 ? pr7 pr8 pr1 ? pr10 pr8 pr2 ?
pr1 pr4
5
Out-of-Order Loads/Stores
Ld
R1 ? R2
Ld
R3 ? R4
St
R5 ? R6
Ld
R7 ? R8
Ld
R9?R10
6
Memory Dependence Checking
Ld
0x abcdef
  • The issue queue checks for
  • register dependences and
  • executes instructions as soon
  • as registers are ready
  • Loads/stores access memory
  • as well must check for RAW,
  • WAW, and WAR hazards for
  • memory as well
  • Hence, first check for register
  • dependences to compute
  • effective addresses then check
  • for memory dependences

Ld
St
Ld
Ld
0x abcdef
St
0x abcd00
Ld
0x abc000
Ld
0x abcd00
7
Memory Dependence Checking
  • Load and store addresses are
  • maintained in program order in
  • the Load/Store Queue (LSQ)
  • Loads can issue if they are
  • guaranteed to not have true
  • dependences with earlier stores
  • Stores can issue only if we are
  • ready to modify memory (can not
  • recover if an earlier instr raises
  • an exception)

Ld
0x abcdef
Ld
St
Ld
Ld
0x abcdef
St
0x abcd00
Ld
0x abc000
Ld
0x abcd00
8
The Alpha 21264 Out-of-Order Implementation
Reorder Buffer (ROB)
Branch prediction and instr fetch
Instr 1 Instr 2 Instr 3 Instr 4 Instr 5 Instr
6 Instr 7
Committed Reg Map R1?P1 R2?P2
Register File P1-P64
R1 ? R1R2 R2 ? R1R3 BEQZ R2 R3 ? R1R2 R1 ?
R3R2 LD R4 ? 8R3 ST R4 ? 8R1
Decode Rename
P33 ? P1P2 P34 ? P33P3 BEQZ P34 P35 ?
P33P34 P36 ? P35P34 P37 ? 8P35 P37 ? 8P36
ALU
ALU
ALU
Speculative Reg Map R1?P36 R2?P34
Results written to regfile and tags broadcast to
IQ
Instr Fetch Queue
Issue Queue (IQ)
ALU
P37 ? P35 8 P37 ? P36 8
D-Cache
LSQ
9
Speculative Issue
  • Instr I1 leaves the issue queue at start of
    cycle 6 the instr
  • then reads operands from the regfile, wires are
    traversed,
  • instruction executes, result is available at
    end of cycle 8
  • If operand availability is broadcast to issue
    queue in cycle 9,
  • dependent instruction leaves in cycle 10
  • This causes a 4-cycle gap between successive
    instrs
  • Hence, if we know that the instruction takes a
    cycle to
  • execute, the operand is broadcast to the issue
    queue in
  • cycle 6 and the dependent instr leaves issue
    queue in
  • cycle 7 the input operand is correctly
    bypassed at the FU

10
Load Hit Speculation
  • The previous optimization assumes that we know
    the exact
  • latency for every operation
  • This is true for all ops except loads (cache hit
    or miss?)
  • Assume hit and schedule accordingly on a cache
    miss,
  • must squash all speculatively issued
    instructions an
  • instruction therefore sits in the queue until
    load hits are
  • determined

11
Register Rename Logic
Map Table
Physical Source Regs
Physical Dest Regs
Logical Source Regs
Mux
Free Pool
Dependence Check Logic
Logical Dest Regs
Logical Source Reg
12
Map Table RAM
7-bits
7-bits
7-bits
7-bits
7-bits
Phys reg id
Num entries Num logical regs
Shadow copies (shift register)
13
Map Table CAM
5-bits
1-bit
1-bit
Logical reg id
v a l i d
Num entries Num phys regs
Shadow copies
14
Wakeup Logic
tag1
tagIW



or
or
rdyL
rdyR
tagR
tagL
. . .
. . .
rdyL
rdyR
tagR
tagL
15
Selection Logic
Issue window
req
grant
enable
anyreq
Arbiter cell
enable
  • For multiple FUs, will need sequential selectors

16
Structure Complexities
  • Critical structures
  • register map tables, issue queue, LSQ,
    register file,
  • register bypass
  • Cycle time is heavily influenced by
  • window size (physical register size),
    issue width (FUs)
  • Conflict between the desire to increase IPC and
    clock speed

17
Title
  • Bullet
Write a Comment
User Comments (0)
About PowerShow.com