Overview - PowerPoint PPT Presentation

About This Presentation
Title:

Overview

Description:

Mircea Stan, Kevin Skadron, David Brooks, Lev Finkelstein, Antonio Gonz lez 2004 ... What current chips do (Lev) HotSpot (Kevin) 2 ... – PowerPoint PPT presentation

Number of Views:28
Avg rating:3.0/5.0
Slides: 41
Provided by: skadronsta
Category:
Tags: lev | overview

less

Transcript and Presenter's Notes

Title: Overview


1
Overview
  1. Motivation (Kevin)
  2. Thermal issues (Kevin)
  3. Power modeling (David)
  4. Thermal management (David)
  5. Optimal DTM (Lev)
  6. Clustering (Antonio)
  7. Power distribution (David)
  8. What current chips do (Lev)
  9. HotSpot (Kevin)

2
The clustering approach
  • Reduce complexity by partitioning
  • Less latency, area, power and temperature
  • Fast, simple, distributed units
  • Communication latency is heterogeneous and
    exposed to the microarchitecture
  • Localize critical communication within clusters
    (fast wires)

3
The clustering approach (...)
  • Smaller structures consume less power
  • Higher power efficiency Zyuban, IEEE
    Transactions 01
  • Partitioning simplifies power management
  • Via clock/power gating techniques Bahar, ISCA
    01
  • Via dynamic cluster resizing González, ICCD 03
  • Via DVS/DFS
  • Partitioning reduces temperature
  • Activity is distributed Chaparro, TACS 04
  • Hopping schemes can be applied Chaparro, TACS
    04
  • Adds flexibility for temperature-effective
    layouts
  • IPC overheads due to communication/imbalance
  • Compensated by shorter latency/clock period
    Palacharla, ISCA 97, Canal, HPCA 00

4
Clustered microarchitecture
  • Dynamic steering
  • Distributed Issue, Registers, FUs
  • Inter-cluster register communication

5
On-demand communication
  • Map table tracks locations of register values
  • At rename
  • allocate register for result, in the assigned
    cluster
  • if a source operand is in a remote cluster
  • insert a copy instruction in remote cluster
  • allocate register for a copy
  • At commit
  • free allocated register(s) by previous mapping

log. reg.
Canal, PACT99
6
Rename
Renaming Table
Log CL0 CL1 CL2 CL3
0 X 27 X X
1 18 X X 9
2 X 3 15 X
3 5 10 X 13
4 X X 12 14
5 4 X X X
6 X 1 24 X
7 2 X X X
8 X 2 X 9
Log CL0 CL1 CL2 CL3
0 X 27 X X
1 X 14 X X
2 X 3 15 X
3 5 10 X 13
4 X X 12 14
5 4 X X X
6 X 1 24 X
7 2 X X X
8 X 2 X 9
Cluster 1
Steering Logic
src1
src2
src3
src4
src5
dst
Logical
2
3
X
X
X
1
Physical
7
Copy instructions
Copy instruction
Renaming Table
Log CL0 CL1 CL2 CL3
0 X 27 X X
1 18 X X 9
2 X 3 15 X
3 5 10 X 13
4 X X 12 14
5 4 X X X
6 X 1 24 X
7 2 X X X
8 X 2 X 9
Log CL0 CL1 CL2 CL3
0 X 27 X X
1 13 X X 5
2 X 3 15 X
3 5 10 27 13
4 X X 12 14
5 4 X X X
6 X 1 24 X
7 2 X X X
8 X 2 X 9
Log CL0 CL1 CL2 CL3
0 X 27 X X
1 X X 14 X
2 X 3 15 X
3 5 10 27 13
4 X X 12 14
5 4 X X X
6 X 1 24 X
7 2 X X X
8 X 2 X 9
Cluster 2
Steering Logic
src1
src2
src3
src4
src5
dst
Logical
2
3
X
X
X
1
Physical
8
Broadcast communication
  • Values sent to all register files
  • Local file is updated earlier than remote ones
  • Registers are replicated in all files
  • Register storage waste
  • Increase in power
  • Values are written multiple times
  • Increase in power
  • May reduce communication penalties
  • Values are present everywhere
  • But not at the same time
  • E.g. Alpha 21264

9
Cluster assignment schemes
  • Main goals
  • Minimize inter-cluster communication penalty
  • Maximize workload balance
  • Main approaches
  • Static approachesFarkas, Micro 97 Sastry,
    PLDI 98
  • Less flexible than dynamic ones poor load
    balancing
  • Dynamic, dependence-basedPalacharla ISCA 97
    Alpha 21264 Kemp, ICPP 96
  • Only consider dependences through unavailable
    operands
  • Lack specific balancing mechanisms
  • Dynamic, workload balance orientedBaniasadi 00
  • Only suitable with low communication penalty
    architectures
  • Dynamic, dependence-based and workload balance
    orientedCanal HPCA 2000, Parcerisa PACT 2002
  • Tries to find best trade-off between
    communications and workload balance

10
Cluster assignment schemes
  • Accurate-Rebalancing Priority RMB
  • 1- To minimize communication penalties
  • If unavailable source register choose producers
    cluster
  • Else Select clusters with highest number of
    source regs. mapped
  • 2- Choose the least loaded one of the above
  • Exception if imbalance gt threshold, then exclude
    clusters with positive workload, prior to
    applying rules 1 and 2

11
Evaluation
  • SpecInt95

12
Dynamic vs. static steering
S. Sastry, S.Palacharla and J.E.Smith, PLDI 1998
13
Data cache architectures
González, WMPI 04
  • Centralized

Backend
Backend
L1 Dcache
  • Dcache is a cluster
  • Single Load/Store queue
  • Simple disambiguation

Backend
Backend
UL2
14
Data cache architecture (II)
  • Attraction caches
  • Lines are copied on demand
  • A coherence scheme is needed
  • Steering must exploit data locality

15
Data cache architecture (III)
  • Replicated
  • Area cost
  • Traffic due to store broadcast

UL2
DL1
DL1
DL1
DL1
BE 2
BE 1
BE 4
BE 3
16
Data cache architecture (IV)
  • Interleaved
  • Word/line interleaved
  • Steering needs to predict the bank

UL2
17
Memory issues
  • Disambiguation
  • Load/Store queues are distributed
  • Stores are allocated in all clusters
  • Address is computed in one and broadcast
  • Loads go to memory once previous stores know
    their addresses
  • Memory coherence
  • Write-Invalidate / Write-Update protocols

18
Performance comparison
19
Thermal benefits of clustering
Example layout for a quad-cluster architecture
20
Temperature metrics
  • AbsMax
  • Maximum sensed temperature
  • Average
  • Average temperature across time and area
  • AverageMax
  • Average temperature across time of maximum sensed
    temperature

21
Clustering reduces temperature
  • If clustering is smart

22
Clustering effects
  • May end up with higher power densities!
  • Simpler and smaller units may create hotspots
  • Layout must be thermal-effective
  • Surround hotspots by cold areas
  • Activity steering must be smart
  • Other techniques (e.g. throttling) can be applied
    at smaller granularity
  • Aim at particular clusters without affecting
    others

23
Dynamic cluster resizing
González, ICCD 03
  • Motivation

24
Dynamic cluster resizing
  • Proposal
  • Dynamically compute the energy of blocks
  • Schedulers, FUs, DL0s, etc
  • Dynamically compute the energyxdelay2 of the
    processor
  • Use different configurations for different
    intervals
  • Measure the optimal configuration
  • Gate-off (disable) useless units
  • Scheduler level
  • Backend level

25
Dynamic cluster resizing
UL2 cache
I
Decode Rename Steer
BEn
BE3
BE2
BE1
BE4
BE5
ED2Px lt ED2Px1 lt ED2Px-1 ?
26
Dynamic cluster resizing
27
Cluster hopping
  • Motivation
  • Power and average temperature savings when
    statically Vdd gating clusters

Temperatures in the backend area when gating
all but the indicated cluster(s). Reductions over
in-box ambient temperature (45º) respect to a
baseline quad-cluster architecture.
28
Cluster hopping
  • Based on activity migration Heo, ISLPED 03
  • Vdd gate a subset of clusters
  • Rotate clusters to spread activity over time
  • Gated clusters cannot provide any register value
  • Before gating, some register values must be
    evicted
  • Cache/DTLB contents are lost
  • Unless some low power (e.g. drowsy) mode is used
  • Proactive and/or reactive behavior
  • Proactive Per interval basis
  • Reactive On thermal events

29
Cluster hopping schemes
Effective at reducing average temperature (thus
leakage) but not max temperature
30
Thermal-aware steering
  • Try to minimize max temperature
  • Take into account cluster temperature when
    deciding destination
  • Some examples
  • Cold
  • Dispatch to coldest cluster with available
    resources
  • Lowest average temperature
  • Lowest peak temperature
  • T-Cold
  • Like Cold but discard clusters that are too hot
  • If difference in temperature with previous
    cluster (ordered by temperature) is higher than a
    threshold

31
Thermal-aware steering
  • T-Thermal
  • Minimize communications unless candidate cluster
    is too hot
  • If temperature difference gt threshold ? Priority
    to the colder
  • Otherwise ? Priority to the one that minimize
    communications, and in case of tie maximize
    workload balance (instructions in the schedulers)

32
Thermal-aware steering
  • Thermal-aware steering standalone

33
Hopping thermal steering
  • Putting it all together

34
Clustering the front-end
Parcerisa, TR 02
Distributed Back-end
35
Distributed branch predictor
  • Broadcast every prediction (next PC) to all
    clusters
  • Hardware loop predictor uses PC as index
  • insert bubble when switching the predictor
    cluster (2)
  • if interleaving by low order bits frequent
    bubbles
  • Solution
  • Pipeline prediction ahead of I-cache interleave
    by hi-bits
  • Bubble only when high level interleave boundary
    crossed (2)

36
Impact of distributing branch predictor
  • Bank switching
  • SpecInt95 every 24 instructions
  • Mbench every 133 instructions
  • IPC loss
  • SpecInt95 0,5
  • Mbench no loss

37
Distributed cluster assignment
  • Make local assignments and broadcast them to all
    clusters
  • Loop steering logic uses assignments made by
    other clusters
  • Partial solution use outdated info (2 cycles)
  • Problem outdated dependences ? generates
    communications
  • Solution
  • anticipate dependence-checking and
  • override assignment, if dependence was violated

38
Impact of distributing assignment
  • W/o assignment overriding
  • 0.42 communications / instruction
  • More than 10 IPC loss
  • With assignment overriding
  • 0.17 communications / instruction
  • Less than 2 IPC loss

39
Thermal benefits
  • Clustering the rename table and the reorder
    buffer Chaparro, 04

40
Summary
  • Clustering is thermal-effective (in addition to
    complexity-effective)
  • Reduces power
  • Distributes activity
  • Clustering enables effective temperature control
    schemes
  • Adaptive configuration
  • DVS/DFS
  • Cluster hopping
  • Thermal steering
Write a Comment
User Comments (0)
About PowerShow.com