Half Price Architecture - PowerPoint PPT Presentation

About This Presentation
Title:

Half Price Architecture

Description:

Need for Two Read Ports. 26. Sequential Register Access. Have only one read port ... Example. 28. Sequential Register Access and Scheduling Logic ... – PowerPoint PPT presentation

Number of Views:12
Avg rating:3.0/5.0
Slides: 37
Provided by: Pet149
Category:

less

Transcript and Presenter's Notes

Title: Half Price Architecture


1
Half Price Architecture
  • Authors Ilhyun Kim and Mikko H. Lipasti

2
Motivation
  • Processors are overdesigned
  • Handle 0, 1, 2 operand instructions equally
  • Simplifies control
  • Requires multi-port register files
  • Requires large broadcast busses for wakeup logic
  • Results in slower clock frequency

3
Scarcity of 2-source instructions
  • Characterize frequency of 2-source instructions
  • Simplescalar Alpha 3.0 (sim-outorder)
  • Spec 2000 integer benchmark suite

4
Dynamic 2-source instructions
  • 18-36 use 2-source format
  • But some are zero or same operand twice

5
Dynamic 2-source instructions
  • 6-23 use 2-source format without zero register
    or duplicate operands

6
Deeper Study
  • 2-source instructions not very dominant
  • Justifies further study into overdesign
  • Scheduling Logic Wakeup logic
  • Register file access

7
Processor Model
8
Wakeup Logic
  • Wakeup Logic The logic which notifies a queued
    instruction that its operands will be ready off
    the bypass bus
  • Once both operands ready it may be selected for
    issue
  • Destinations are broadcast to all
  • Broadcast is slow (high fanout)

9
Wakeup Logic System
Dispatch
Issue Queue
Add r6, r7, r2 Sub r4, r5, r1 Add r1, r2, r3
Instructions
Selector 4-way
10
Wakeup Logic System
Dispatch
Issue Queue
Add r1, r2, r3
Add r6, r7, r2 Sub r4, r5, r1
Instructions
Selector 4-way
11
Wakeup Logic System
Dispatch
Issue Queue

Add r1, r2, r3
Instructions
Selector 4-way
12
Wakeup Logic - overdesign
  • Destinations broadcast simultaneously to both
    operands
  • Useful only when
  • Both operands fetched from bypass
  • Both operands ready in same cycle

13
1. Both operands requiring bypass
  • Some operands are already ready (dont need
    wakeup)
  • 4-16 have 2 pending operands in scheduler

14
2. Both operands ready in same cycle
  • Operands become ready in different cycles
  • lt3 become ready in same cycle

15
Previous WorkTag Elimination
  • Ernst and Austin
  • Predict latest arriving operand
  • Use only one comparator for it
  • Incurs penalty for mispredictions
  • Implementing with selective recovery is
    impractical

16
Sequential Wakeup
  • Less bus loading, different timing

17
Example
r2
dest
rdy
rdy
r1
ADD
0
0
r2
r3
Cycle 1
r3
SUB
0
0
r4
r5
r5
XOR
0
0
_
r6
18
Example
r1, r4
r2
dest
rdy
rdy
r1
ADD
1
1
r2
r3
issue
Cycle 2
r3
SUB
0
0
r4
r5
r5
XOR
0
0
_
r6
19
Example
r3
r1, r4
dest
rdy
rdy
r1
ADD
1
1
r2
r3
issued
Cycle 3
r3
SUB
1
1
r4
r5
issue
r5
XOR
0
0
_
r6
20
Example
r5
r3
dest
rdy
rdy
r1
ADD
1
1
r2
r3
issued
Cycle 4
r3
SUB
1
1
r4
r5
issued
r5
XOR
1
0
_
r6
issue
21
Last Operand Predictability
22
Last Operand Predictor
  • PC-based, direct-mapped, 2-bit saturating

23
Advantages/Disadvantages of Sequential Wakeup
  • Advantages
  • No recovery needed on mispredict
  • Easily integrates with selective recovery
  • Reduces bus load capacitance
  • 26.4 delay speedup for 4-way 64-entry scheduler
  • Disadvantages
  • All mispredictions and simultaneous arrivals
    issued one cycle later

24
Register File Access
  • 2 read ports and 1 write per issue slot
  • More ports causes
  • Quadratic growth in area
  • Linear growth in latency
  • Having 2 read ports is an overdesign
  • Often have 0, 1 sources or use bypass
  • lt4 instructions need 2 read port accesses

25
Need for Two Read Ports
26
Sequential Register Access
  • Have only one read port
  • Structural hazard when 2 ports needed
  • Perform both reads sequentially
  • Cacti 3.0 model in 0.18µ
  • 160-entry register file going from 24 to 16 ports
    reduces latency by 20.5

27
Example
28
Sequential Register Access and Scheduling Logic
  • Speculative Scheduling does not allow variable
    latencies
  • Scheduling logic must detect sequential register
    access
  • Authors use a conservative approach
  • Only back-to-back dont require 2 cycles

29
Scheduling Logic with Sequential Register Access
Wakeup Logic
Select Logic
30
Performance of Sequential Wakeup
  • IPC Degradation 0.4 4-way, 0.6 8-way
  • Outperforms tag elimination, even w/o pred

31
Performance of Sequential Register Access
  • IPC Degradation 1.1 4-way, 0.7 8-way

32
Performance of Combined
  • IPC Degradation 2.2 on average
  • Worse than sum of both
  • Mispredict gt 2 sequential register reads

33
Negative interference
dest
rdy
rdy
r1
ADD
1
1
r2
r3
issue
Cycle 1
r4
SUB
1
0
r3
r5
34
Negative interference
r3
dest
rdy
rdy
r1
ADD
1
1
r2
r3
issued
Cycle 2
r4
SUB
1
0
r3
r5
  • Mispredicted - r3 put on slow side

35
Negative interference
r5
r3
dest
rdy
rdy
r1
ADD
1
1
r2
r3
issued
Cycle 3
r4
SUB
1
1
r3
r5
issue
  • Add Sub werent issued back-to-back
  • Conservatively assume 2 reg reads

36
Conclusion
  • Established overdesign of
  • Wakeup Logic
  • Register File multi-porting
  • Wakeup logic sped up (26.4)
  • lt1 IPC reduction
  • Register file ports reduced, and latency
    decreased (20.5)
  • 1 IPC reduction
  • Together 2.2 IPC reduction
Write a Comment
User Comments (0)
About PowerShow.com