Inherently WorkloadBalanced Clustered Microarchitecture - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Inherently WorkloadBalanced Clustered Microarchitecture

Description:

Jaume Abella1, Antonio Gonz lez1,2. 1 Computer Architecture ... I5 is sent to 1, 2 or 3. Cluster 3 writes results in cluster 0, which has more free registers ... – PowerPoint PPT presentation

Number of Views:8
Avg rating:3.0/5.0
Slides: 25
Provided by: jaumea
Category:

less

Transcript and Presenter's Notes

Title: Inherently WorkloadBalanced Clustered Microarchitecture


1
Inherently Workload-Balanced Clustered
Microarchitecture
  • Jaume Abella1, Antonio González1,2

1 Computer Architecture Dept. UPC-Barcelona
2 Intel Barcelona Research Center Intel Labs
UPC, Barcelona
2
Motivation (I)
Conventional microarchitectures are complex and
power-hungry
3
Motivation (II)
  • Conventional clustered microarchitectures
    features
  • Workload balance and communications
  • Ideal balancing dramatically increases
    communications
  • No communications implies huge imbalance
  • A tradeoff must be found not possible to
    minimize both
  • This works main contribution
  • It is possible to avoid this tradeoff?
  • Yes. Using Ring Clustered Microarchitectures.

4
Objectives
  • Low complexity and low power using clustered
    microarchitectures
  • Reduce imbalance and communications
    simultaneously
  • Achieve high performance

5
Contents
  • Motivation
  • Background
  • Conventional clustered microarchitectures
  • Tradeoff implications
  • Our approach Ring Clustered Microarchitecture
  • Results
  • Conclusions

6
Background (I)
  • Conventional clustered microarchitectures
  • In case of high imbalance, send instructions to
    the least loaded cluster.
  • Hence, communications are required
  • Otherwise, instructions are sent to the cluster
    where their operands are available.
  • Hence, instructions concentrate in few clusters

7
Background (II)
i 5 R5 R1 R2
Cluster 1
Cluster 2
instructions
registers
instructions
registers
i
R0
i 3
R3
i 1
R1
i 2
R2
i 4
R4
8
Background (III)
Steering logic
Cluster 1
Cluster 2
Cluster 3
instructions
instructions
instructions
High activity during some cycles. Hence, high
temperature!!!
High activity during some cycles. Hence, high
temperature!!!
9
Contents
  • Motivation
  • Background
  • Conventional clustered microarchitectures
  • Tradeoff implications
  • Our approach Ring Clustered Microarchitecture
  • Results
  • Conclusions

10
Ring Clustered Microarchitecture (I)
  • Clusters interconnected in a ring topology
  • Data produced in a cluster is available ONLY in
    the following cluster in the ring
  • There are no fast bypasses within a cluster
  • There are fast bypasses between one cluster and
    the following one in the ring

11
Ring Clustered Microarchitecture (II)
Conventional
Ring
C 1
C 2
C 1
C 2
C 4
C 3
C 4
C 3
  • Conventional bypass to the same cluster in 0
    cycles (a simple integer operation and bypass
    take 1 cycle)
  • Ring bypass to the following cluster in 0 cycles

12
Ring Clustered Microarchitecture (III)
Conventional
Ring
Cluster K
Cluster K
write
read
data wakeup bypass
Register file
write
data wakeup bypass
Register file
read
Cluster K1
13
Ring Clustered Microarchitecture (IV)
  • New designs are required for
  • Issue queues
  • Register files
  • Functional units
  • to enable the data to be sent fast to the
    following cluster instead of the same one

14
Ring Clustered Microarchitecture (V)
  • Operation

Registers for each cluster
I1. R1 1 I2. R2 R1 1 I3. R3 R1 R2 I4.
R4 R1 R3 I5. R5 R1 3
0
1
2
3
15
Contents
  • Motivation
  • Background
  • Conventional clustered microarchitectures
  • Tradeoff implications
  • Our approach Ring Clustered Microarchitecture
  • Results
  • Conclusions

16
Evaluation Framework
  • Processor
  • 8 clusters
  • Reorder buffer size 256 instructions
  • Each cluster
  • Issue queue size 16 integer 16 FP instructions
  • Registers 48 integer 48 FP registers
  • Number of buses and issue width per cluster

Issue Width
Buses
1bus_1IW
1
1
1
2
1bus_2IW
2
1
2bus_1IW
2
2
2bus_2IW
17
Steering Algorithms (I)
  • Smart heuristic takes into account
  • Workload imbalance (only conventional
    microarchitecture)
  • Number of required communications
  • Distance of communications

Conventional microarchitecture If workload
imbalance is high then Instruction to
least loaded cluster Else Instruction is
sent to the cluster requiring less and shorter
communications endif
Ring microarchitecture Instruction is sent to
the cluster requiring less and shorter
communications
18
Steering Algorithms (II)
  • Simple heuristic takes into account
  • First operand

Ring and Conventional microarchitectures Instruc
tion is sent to the cluster where the first
operand is (will be) available
19
Performance (I)
  • Smart steering algorithm

20
Performance (II)
  • Simple steering algorithm

21
Activity Distribution
  • Average maximum number of instructions to same
    cluster, every dispatch block

22
Contents
  • Motivation
  • Background
  • Conventional clustered microarchitectures
  • Tradeoff implications
  • Our approach Ring Clustered Microarchitecture
  • Results
  • Conclusions

23
Conclusions
  • The ring microarchitecture
  • achieves higher performance than the conventional
    one, especially for simple steering algorithms
  • balances the workload inherently
  • requires low communication resources
  • distributes the activity much better than the
    conventional one

24
Q A
Write a Comment
User Comments (0)
About PowerShow.com