Title: Overview
1Overview
- Motivation (Kevin)
- Thermal issues (Kevin)
- Power modeling (David)
- Thermal management (David)
- Optimal DTM (Lev)
- Clustering (Antonio)
- Power distribution (David)
- What current chips do (Lev)
- HotSpot (Kevin)
2PowerPC G3 Microprocessor
- On-chip temperature sensor (junction temperature)
- Based on differential voltage change across 2
diodes of different sizes - Implemented in PowerPC G3/G4 processors
- OS required for control
- Instruction Cache Throttling used to dynamically
lower junction temperature
3Pentium III
- On-die thermal diode
- Coupled with board-level thermal diode sensor
- Uses
- Monitor long-term temperature and environmental
trends - Provide indication of catastrophic failure
4Pentium 4
- Thermal ramp rates 50ºC/second(over whole
package) - Much too high for coarse-grained solutions
- Thermal Monitor
- Highly-accurate on-die temperature sensing
circuit - Fast acting temperature control circuit (50ns)
Temperature Sensing Diode
PROCHOT
Current Comparator
Reference Current Source
5Pentium 4 -- Thermal Monitor
- Trip Point is calibrated at manufacturing time
- Simple response
- Turn processor clocks on/off at 50 duty cycle
- For 1.5GHz processor, 2?s on 2?s off?
6Pentium 4 -- Results
- For 200 traces (TPC-C, SPEC, Microsoft)
- Thermal design point can be reduced to 75 of
true max power with minimal performance loss
7Pentium 4
- Thermal monitors allow
- Tradeoff between cost and performance
- Cheaper package
- More triggers, Less Performance
- Expensive package
- No triggers, no performance loss
8Architecture-level Thermal Management
- Dynamically adjust execution to control
temperature - Avoid catastrophic failure (heat sink, fan)
- Permit use of less expensive package
- Design for less than the worst case
- Package costs 1/W above 40 W
- Heat sinks, heat pipes, thinned wafers, fans
- Fans reduce battery life
- Peak power as high as 150 W now and gt 200W in 1-2
generations - Temperatures over 100C
- More fundamentally -- there is a need for
architecture-level thermal modeling - Whats actually going on in there?
9HotSpot project
- Collaboration between HPLP and LAVA Labs (ECE and
CS depts. UVa) - Deal with hot spots
- Localized heating occurs muchfaster than
chip-wide - microsec. to millisec.
- Chip-wide treatment is too conservative
- seconds to minutes
- but there is significant lateral thermal
coupling through the package - How do we model this?
- Prove temperature will be
- safely bounded
10Hot spots in Power4
Temperature landscape space and time How to
estimate early in the design cycle?
11Thermal modeling
- Want a fine-grained, dynamic model of temperature
- At a granularity architects can reason about
- That accounts for adjacency and package
- That does not require detailed designs
- That is fast enough for practical use
- HotSpot - a compact model based on thermal R, C
- Parameterized to automatically derive a model
based on various - Architectures
- Power models
- Floorplans
- Thermal Packages
12Dynamic compact thermal model
- Electrical-thermal duality
- V ?temp (T)
- I ?power (P)
- R ?thermal resistance (Rth)
- C ?thermal capacitance (Cth)
- RC time constant (Rth Cth)
- Kirchoff Current Law
- differential eq. I C dV/dt V/R
- thermal domain P Cth dT/dt T/Rth
- where T T_hot T_amb
- At higher granularities of P, Rth, Cth
- P, T are vectors and Rth, Cth are circuit
matrices
13Package we model
Heat sink
IC Package
Heat spreader
PCB
Pin
Die
Interface material
14Modeling the package
- Thermal management allows for packaging
alternatives/shortcuts/interactions - HotSpot needs a model of packaging
- Basic thermal model
- Heat spreader
- Heatsink
- Interface materials (e.g. phase-change films)
- Fan/Active cooler (TEC)
- Thermal resistance due to convection
- Constriction and bulk resistance for fins
- Spreading constriction and bulk resistance for
heatsink base and heat spreader - Thermal resistance for bonding material
- Thermal capacitance heat spreader and heatsink
15Optimal package
- Default package is found using
- Power dissipation
- Target temperature on chip
- Chip area
- Clock speed high or low performance
- Power dissipation and target temperature used to
determine resistance value needed - Needs more work modern packages are incredibly
complex, yet there is still a need to model at
higher levels - Now what can we do with HotSpot?
16Equivalent vertical network
- Diagram is simplified peripheral nodes
Chip
Interface
Peripheral spreader nodes
Spreader
Interface Sink
Convection
17Vertical network parameters
- Resistances
- Determined by the corresponding areas and their
cross sectional thickness - R resistivity x thickness / Area
- Capacitances
- C specific heat x thickness x Area
- Peripheral node areas
Spreader
North
East
West
Chip
South
18Lateral resistances
- Determined by the floorplan and the length of
shared edges between adjacent blocks - "Heat Spreading and Conduction in Compressed
Heatsinks", Jaana Behm and Jari Huttunen, in
proceedings of the 10th International Flotherm
User Conference, May 2001.
19Lateral resistances contd...
- Lengths used for silicon
- Lengths used in the spreader
20Our model (lateral and vertical)
Interface material(not shown)
21Temperature equations
- Fundamental RC differential equation
- P C dT/dt T / R
- Steady state
- dT/dt 0
- P T / R
- When R and C are network matrices
- Steady state T R x P
- Modified transient equation
- dT/dt (RC)-1 x T C-1 x P
- HotSpot software mainly solves these two equations
22HotSpot
- Time evolution of temperature is driven by unit
activities and power dissipations averaged over
10K cycles - Power dissipations can come from any power
simulator, act as current sources in RC circuit
('P' vector in the equations) - Simulation overhead in Wattch/SimpleScalar lt 1
- Requires models of
- Floorplan important for adjacency
- Package important for spreading and time
constants - R and C matrices are derived from the above
23Implementation
- Primarily a circuit solver
- Steady state solution
- Mainly matrix inversion done in two steps
- Decomposition of the matrix into lower and upper
triangular matrices - Successive backward substitution of solved
variables - Implements the pseudocode from CLR
- Transient solution
- Inputs current temperature and power
- Output temperature for the next interval
- Computed using a fourth order Runge-Kutta (RK4)
method
24Transient solution
- Solves differential equations of the form dT AT
B where A and B are constants - In HotSpot, A is constant but B depends on the
power dissipation - Solution assume constant average power
dissipation within an interval (10 K cycles) and
call RK4 at the end of each interval - In RK4, current temperature (at t) is advanced in
very small steps (th, t2h ...) till the next
interval (10K cycles) - RK 4 because error term is 4th order i.e.,
O(h4)
25Transient solution contd...
- 4th order error has to be within the required
precision - The step size (h) has to be small enough even for
the maximum slope of the temperature evolution
curve - Transient solution for the differential equation
is of the form Ae-Bt with A and B are dependent
on the RC network - Thus, the maximum value of the slope (AxB) and
the step size are computed accordingly
26Validation
- Validated and calibrated using MICRED test chips
- 9x9 array of power dissipators and sensors
- Compared to HotSpot configured with same grid,
package - Within 7 for both steady-state and transient
step-response - Interface material (chip/spreader) matters
27Current features
- Specification of arbitrary floorplans
- Format of floorplan file
- One line per unit
- Line format ltunit-namegt \t ltwidthgt \t ltheightgt
\t ltleft-xgt \t ltbottom-ygt \n - Takes a power trace file as an input and outputs
corresponding temperature trace - Ability to modify package specifactions (type of
interface material, size and type of heat
spreader and heat sink etc.)
28Current floorplan
29Current floorplan CPU core
30Soon to be features
- Grid model RC network per grid cell instead of
a block - Temperature models for wires, pads and interface
material between heat sink and spreader - Better (more user friendly) floorplan
specification - Automatic floorplan generation using classical
floorplanning algorithms
31Better floorplan specification
- Floorplan of current microprocessors has a
structural similarity - Floorplans similar to MIPS R10K, Pentium and the
Alpha 21264 - Pipeline order corresponds to floorplan adjacency
32Better floorplan specification
- Sample specification (with areas) that takes
advantage of pipeline order
33Automatic floorplan for architects
- Why develop an architectural floorplanning tool?
- Thermal modeling requires adjacency information.
- Wire delays make performance depend on the
floorplan. - Goal
- Derive a realistic floorplan using only
microarchitectural information - Trade off thermal efficiency against latency
- Simulated annealing based floorplan optimization
for thermal, delay and combined metrics - Current work. Results will be available soon
34Sensors
- Caveat emptor
- We are not well-versed on sensor design the
following is a digest of information we have been
able to collect from industry sources and the
research literature.
35Desirable Sensor Characteristics
- Small area
- Low Power
- High Accuracy Linearity
- Easy access and low access time
- Fast response time (slew rate)
- Easy calibration
- Low sensitivity to process and supply noise
36PowerPC G3
- (Sanchez et al, Symp. on VLSI Circuits 97,
COMPCON 97) - 0.35 µ, 2.5V
- Area 0.2 mm2
- Power 10 mW
- Precision 4.5
- Offset 12 at process corners
- Linearity lt 4
- Based on thermal diodes and current mirrors
37Types of Sensors
- (In approx. order of increasing ease to build)
- Thermocouples voltage output
- Junction between wires of different materials
voltage at terminals is a Tref Tjunction - Often used for external measurements
- Thermal diodes voltage output
- Biased p-n junction voltage drop for a known
current is temperature-dependent - Biased resistors (thermistors) voltage output
- Voltage drop for a known current is temperature
dependent - You can also think of this as varying R
- Example 1 KO metal snake
- BiCMOS, CMOS voltage or current output
- Rely on reference voltage or current generated
from a reference band-gap circuit current-based
designs often depend on temp-dependence of
threshold
38Thermal Sensors in PowerPC
- On-chip temperature sensor (junction temperature)
- Based on differential voltage change across 2
diodes of different sizes - Implemented in PowerPC G3/G4 processors
- Instruction Cache Throttling used to dynamically
lower junction temperature
39Typical Sensor Configuration
PTAT Proportional to Absolute Temperature
40Absolute Sensor 1
Syal, Lee, Ivanov, Altet, Online Testing
Workshop, 2001
Schematics of Delta Vgs Current Reference (left)
Generator and Delay Cell (right)
41Sensors Problem Issues
- Poor control of CMOS transistor parameters
- Noisy environment
- Cross talk
- Ground noise
- Power supply noise
- These can be reduced by making the sensor larger
- This increases power dissipation
- But we may want many sensors
42Reasonable Values
- Based on conversations with engineers at Sun,
Intel, and HP (Alpha) - Linearity not a problem for range of
temperatures of interest - Slew rate lt 1 µs
- This is the time it takes for the physical
sensing process (e.g., current) to reach
equilibrium - Sensor bandwidth ltlt 1 MHz, probably 100-200 kHz
- This is the sampling rate 100 kHz 10 µs
- Limited by slew rate but also A/D
- Consider digitization using a counter
43Reasonable Values Precision
- Mid 1980s lt 0.1 was possible
- Precision
- 3 is very reasonable
- 2 is reasonable
- 1 is feasible but expensive
- lt 1 is really hard
- The limited precision of the G3 sensor seems to
have been a design choice involving the
digitization
P 10s of mW
44Calibration
- Accuracy vs. Precision
- Analogous to mean vs. stdev
- Calibration deals with accuracy
- The main issue is to reduce inter-die variations
in offset - Typically requires per-part testing and
configuration - Basic idea measure offset, store it, then
subtract this from dynamic measurements
45Dynamic Offset Cancelation
- Rich area of research
- Build circuit to continuously, dynamically detect
offset and cancel it - Typically uses an op-amp
- Has the advantage that it adapts to changing
offsets - Has the disadvantage of more complex circuitry
46Role of Precision
- Suppose
- Junction temperature is J
- Max variation in sensor is S
- Thermal emergency is T
- T J S
- Spatial gradients
- If sensors cannot be located exactly at hotspots,
measured temperature may be G lower than true
hotspot - T J S G
47Rate of change of temperature
- Our FEM simulations suggest maximum 0.1 in about
25-100 µs - This is for power density lt 1 W/mm2 die thickness
between 0.2 and 0.7mm, and contemporary packaging - This means slew rate is not an issue
- But sampling rate is!
48Sensors Summary
- Sensor precision cannot be ignored
- Reducing operating threshold by 1-2 degrees will
affect performance - Precision of 1 is conceivable but expensive
- Maybe reasonable for a single sensor or a few
- Precision of 2-3 is reasonable even for a
moderate number of sensors - Power and area are probably negligible from the
architecture standpoint - Sampling period lt 10-20 µs
49HotSpot Summary
- HotSpot is a simple, accurate and fast
architecture level thermal model for
microprocessors - Over 90 downloads till now
- Ongoing active development architecture level
floorplanning will be available soon - Download site
- http//lava.cs.virginia.edu/HotSpot
- Mailing list
- www.cs.virginia.edu/mailman/listinfo/hotspot
50- Temperature-aware computing
- Optimize performance subject to a thermal
constraint