Title: Tortola: Addressing Tomorrows Computing Challenges through HardwareSoftware Symbiosis
1Tortola Addressing Tomorrows Computing
Challenges through Hardware/Software Symbiosis
- Kim Hazelwood
- September 29, 2006
2Modern Computing Challenges
- Performance
- Power
- Energy consumption, max instantaneous power,
di/dt - Temperature
- Total heat output, hot spots
- Reliability
- Neutron strikes, alpha particles, MTBF, design
flaws - Approaches Circuit, microarchitecture, compiler
- Constraint Fixed HW-SW interface (e.g., x86)
3Typical Approaches
- Optimize using SW or HW techniques in isolation
- Performance
- SW Compile-time optimizations
- HW Architectural improvements, VLSI technology
- Reliability Code/data duplication (HW or SW)
- Power Temperature
- HW control mechanisms
- Profile recompile cycle
4Modern Design Constraints
- Compilers Compile once, run anywhere
- Cannot ship MS Office for 1Q05 batch of
Pentium-4 3GHz, gt 1GB RAM, BrandX power supply,
located in high altitudes - Microarchitecture Limited window of application
knowledge (past must predict the future) - VLSI Guaranteed correctness, reliability
- We currently must optimize for the common case
(but must design for the worst case)
5The Power of Virtualization
SW Applications
Binary Modifier
HW
6Dynamic Binary Modification
- Creates a modified code image at run time
- Examples
- Dynamo (HP)
- DAISY/BOA (IBM)
- CMS (Transmeta)
- Mojo (Microsoft)
- Strata (UVa)
- Pin (Intel)
EXE
Transform
Code Cache
Profile
Execute
7Dynamic Instrumentation Demo
- Pin
- Four architectures IA32, EM64T, IPF, XScale
- Four OSes Linux, FreeBSD, MacOS, Windows
- http//rogue.colorado.edu/pin/
8Dynamic Optimization Demo
- DynamoRIO
- Windows and Linux for IA32
- http//www.cag.lcs.mit.edu/dynamorio/
9Dynamic Binary Modification
- Creates a modified code image at run time
- Always triggered by software events until now
- Examples
- Dynamo (HP)
- DAISY/BOA (IBM)
- CMS (Transmeta)
- Mojo (Microsoft)
- Strata (UVa)
- Pin (Intel)
EXE
Transform
Code Cache
Profile
Execute
10Tortola Symbiotic Optimization
- Enable HW/SW Communication
SW Applications
Binary Modifier
HW
11Simulation Methodology
- SimpleScalar 4.0 for x86
- Wattch 1.02 power extensions
- Pin dynamic instrumentation system (x86/Linux
version)
SW Application
Benchmarks
Binary Modifier
Pin
HW
Wattch Simplescalar/x86
12Tortola Applications
- Combine global program information with run-time
feedback - System-specific power usage
- Application-specific heat anomalies
- Workload/input specific performance optimization
- Reduce hardware complexity
- No more backwards compatibility warts
- Fix bugs after shipment
- Reduce time to market for new architectures
- One such application The di/dt problem
13The Di/dt Problem
- Voltage stability is important for reliability,
performance - Low-power techniques have a negative side effect
current variation - Dips (undershoots) in supply voltage can cause
incorrect values to be calculated or stored - Spikes (overshoots) in supply voltage can cause
reliability problems
14The Di/dt Problem
- ITRS cites noise management as a Grand Challenge
for 5-10 year time frame - Several trends are aggravating the issue
- Voltage is scaling down with technology
- Current draw is increasing
- Package impedance is not scaling as quickly
- Aggressive clock gating causes large swings in
processor current draw (di/dt)
15Di/dt Solutions
Software MicroArch Circuit-Level
Compiler Optimizations Sensor/Actuator
Mechanisms Decoupling capacitors More
Vdd Gnd pins on package
16Sensor-Actuator Mechanisms
- On-chip voltage sensors detect abnormally
high/low voltage levels - On-chip actuator then attempts to quickly
raise/lower the processors current draw - Phantom firing
- increases current (at the expense of power)
- Resource throttling
- reduces current (at the expense of performance)
17Detecting Imminent Emergencies
Hard Emergency
Soft Emergency
Control Threshold
1.05V 1.03V 1V 0.97V 0.95V
Operating Voltage Range
18Targeting Mid-Frequency Di/dt
- Problematic wide current spike
- Worst case pulse at the resonant frequency
Processor Current (A)
Processor Current (A)
From Joseph et al. HPCA-9
Supply Voltage (V)
Supply Voltage (V)
Time (Cycles)
Time (Cycles)
19A Di/dt Stressmark
BEGIN_LOOP ldt f1, (4) divt f1, f2,
f3 divt f3, f2, f3 stt f3, 8(4) ldq 7,
8(4) cmovne 31, 7, 3 stq 3, (4) stq 3,
(4) stq 3, (4) stq 3, (4) JUMP
BEGIN_LOOP
- ButActuator engages every loop iteration
degrading performance - Why not correct the problem in the code?
Sequential Low Current
Parallel High Current
20Proposed Solution
- Leverage our additional software layer to
supplement existing solutions - Microarchitecture provides feedback to our
software-based virtual layer
Altered Executable
Binary Modifier
VL
Executable
SW
HW
SensorActuator Ext
Microprocessor
21Required Investigations
- Characterizing emergencies
- How often do we see di/dt emergency loops?
- Communication between the microarchitecture and
the virtual layer - What information should be passed to virtual
layer during an emergency? - Fixing di/dt via binary modification
- Will existing techniques help?
- New algorithms?
22Static vs. Dynamic Instances
Data suggests modifying a few code sequences will
eliminate many voltage emergencies
23Possible Compiler Optimizations
- Our goal is to
- Smooth out current profile, or
- Knock pulses off of the resonant frequency
- Some existing options
- Software pipelining, code motion, instruction
padding
Executable
Apply Optimizations
Altered Executable
Binary Modifier
SensorActuator Extns
Microprocessor
24Loop Unrolling SW Pipelining
A A B B
Problematic loop
Current
25Unrolling the Di/dt Stressmark
H1
H
H2
L
L1
L2
26Summary
- Symbiotic program optimization is a powerful
approach - The di/dt problem well suited for a symbiotic
solution - The Tortola design can also target power
reduction, temperature reduction, reliability,
etc. - http//www.tortolaproject.com/