Title: Automated Microprocessor Stressmark Generation
1Automated Microprocessor Stressmark Generation
- Ajay M. Joshi
- Lieven Eeckhout
- Lizy K. John
- Ciji Isen
- The University of Texas at Austin
- Ghent University, Belgium
- IBM Technical Contact Alex Mericas Alan MacKay
2Energy, power, power density, temperature,
voltage variation,
- First-class design constraints
- Embedded processors
- High-performance processors
- Understanding and analysis of primary importance
- Average typical
- Maximum worst-case
3Why care about worst-case?
- Processor must operate properly under extreme
conditions - Examples
- Max power ? power supply, DPM
- Max temperature ? thermal package, DTM
- Max dI/dt ? power delivery
- Localized max power ? hot spots ? circuit
failure, timing errors, etc. - Max temperature differentials ? sensor placement
4How to characterize worst-case?
- Stressmarks
- Hand-coded synthetic stress codes
- Examples
- Max power Alphas Toast
- Max dI/dt Alphas Thumper
- Limitations
- Time-consuming to develop
- Requires intimate understanding of system
- Tied to a specific processor
- Difficult to do in early design stages
5A possible solution
- Automatic stressmark generation
- In two steps
- BenchMaker
- Generate synthetic benchmark from abstract
workload model - StressMaker
- Explore workload space by turning knobs using
BenchMaker and search for stressmarks
6BenchMaker
hardware
abstract workload model
instruction mix
ILP
synthetic benchmark
I D footprint
benchmark synthesizer
D stream strides
branch transition
simulator
BB size
7Experimental setup
- sim-alpha validated Alpha 21264 simulator
- Wattch for power modeling
- HotSpot for thermal modeling
- SPEC CPU2000
- 100M simulation points
- Commercial workloads
- SPECjbb2005, DBT2, DBMS
8Synthetic clone benchmark preserves
characteristics
Original benchmark
Synthetic clone benchmark
2.0
1.5
IPC
1.0
0.5
0.0
vpr
gcc
mcf
gzip
twolf
dbt2
bzip2
crafty
dbms
perlbmk
vortex
jbb2005
Original benchmark
Synthetic clone benchmark
35
30
25
20
EPI
15
10
5
0
vpr
gcc
mcf
gzip
dbt2
twolf
bzip2
crafty
dbms
vortex
perlbmk
jbb2005
9A possible solution
- Automatic stressmark generation
- In two steps
- BenchMaker
- Generate synthetic benchmark from abstract
workload model - StressMaker
- Explore workload space by turning knobs using
BenchMaker and search for stressmarks
10StressMaker
BenchMaker
synthetic benchmark
abstract workload configuration
microprocessor model
abstract workload space exploration
stressmark
objective function e.g., max power
11Max-power stressmark
StressMaker
SPEC CPU / commercial
art
30
25
mesa
SPECjbb2005
20
perlbmk
gzip
Power (Watts)
15
perlbmk
perlbmk
mesa
gzip
dbt2
gzip
10
eon
mcf
art
5
0
lsq
alu
fetch
icache
clock
issue
bpred
regfile
dcache
window
rename
dispatch
dcache2
resultbus
- 8-wide OOO processor 81.5Watts in total
- assuming Wattch (0.18um, 1.2GHz, aggressive
clock gating)
12Max-power stressmark chars
- Keep functional units busy
- Uniform mix of instruction types
- Keep issue logic busy
- High ILP
- No pipeline flushes
- High branch predictability
- Keep caches busy
- Good locality
- ? similar to hand-crafted stressmarks
- Gowan et al., DAC98 Vishwanath, Intel Tech
Journal, 2000
13Evaluation of genetic algorithm
- Speed
- Three orders of magnitude faster than exhaustive
search - Effectiveness
- Max-power stressmark through StressMaker achieves
99 of max-power stressmark through exhaustive
search 48Watts for 4-wide OOO processor
14Thermal stressmarks
- Thermal hotspots
- Max component power
- Thermal differentials
- Thermal sensor placement
- Lee et al., ICCD05
- Examples
- L2 vs. I-fetch 44.6ºC difference
- No stress on L2, high ILP, high branch
predictability - L2 vs. register remap 48.4ºC difference
- Lots of L2 accesses stress L2 and minimal stress
on register remap
15Why automate the process?
2-wide OOO max-power stressmark
100
4-wide OOO max-power stressmark
80
8-wide OOO max-power stressmark
60
Power (Watts)
40
20
0
2-wide OOO
4-wide OOO
8-wide OOO
stressmark is processor-specific
16Conclusion two contributions
- BenchMaker
- Abstract workload model
- Generates proxies for real-life benchmarks
- High accuracy
- StressMaker
- Automated stressmark generation
- Case studies max-power, max single-cycle power,
dI/dt, thermal hotspots, etc.
17Thank You. Questions?