Automated Microprocessor Stressmark Generation - PowerPoint PPT Presentation

1 / 17
About This Presentation
Title:

Automated Microprocessor Stressmark Generation

Description:

Commercial workloads. SPECjbb2005, DBT2, DBMS. 8 ... SPEC CPU / commercial. eon. SPECjbb2005. gzip. perlbmk. art. gzip. perlbmk. mcf. mesa. mesa ... – PowerPoint PPT presentation

Number of Views:34
Avg rating:3.0/5.0
Slides: 18
Provided by: lievene
Category:

less

Transcript and Presenter's Notes

Title: Automated Microprocessor Stressmark Generation


1
Automated Microprocessor Stressmark Generation
  • Ajay M. Joshi
  • Lieven Eeckhout
  • Lizy K. John
  • Ciji Isen
  • The University of Texas at Austin
  • Ghent University, Belgium
  • IBM Technical Contact Alex Mericas Alan MacKay

2
Energy, power, power density, temperature,
voltage variation,
  • First-class design constraints
  • Embedded processors
  • High-performance processors
  • Understanding and analysis of primary importance
  • Average typical
  • Maximum worst-case

3
Why care about worst-case?
  • Processor must operate properly under extreme
    conditions
  • Examples
  • Max power ? power supply, DPM
  • Max temperature ? thermal package, DTM
  • Max dI/dt ? power delivery
  • Localized max power ? hot spots ? circuit
    failure, timing errors, etc.
  • Max temperature differentials ? sensor placement

4
How to characterize worst-case?
  • Stressmarks
  • Hand-coded synthetic stress codes
  • Examples
  • Max power Alphas Toast
  • Max dI/dt Alphas Thumper
  • Limitations
  • Time-consuming to develop
  • Requires intimate understanding of system
  • Tied to a specific processor
  • Difficult to do in early design stages

5
A possible solution
  • Automatic stressmark generation
  • In two steps
  • BenchMaker
  • Generate synthetic benchmark from abstract
    workload model
  • StressMaker
  • Explore workload space by turning knobs using
    BenchMaker and search for stressmarks

6
BenchMaker
hardware
abstract workload model
instruction mix
ILP
synthetic benchmark
I D footprint
benchmark synthesizer
D stream strides
branch transition
simulator
BB size
7
Experimental setup
  • sim-alpha validated Alpha 21264 simulator
  • Wattch for power modeling
  • HotSpot for thermal modeling
  • SPEC CPU2000
  • 100M simulation points
  • Commercial workloads
  • SPECjbb2005, DBT2, DBMS

8
Synthetic clone benchmark preserves
characteristics
Original benchmark
Synthetic clone benchmark
2.0
1.5
IPC
1.0
0.5
0.0
vpr
gcc
mcf
gzip
twolf
dbt2
bzip2
crafty
dbms
perlbmk
vortex
jbb2005
Original benchmark
Synthetic clone benchmark
35
30
25
20
EPI
15
10
5
0
vpr
gcc
mcf
gzip
dbt2
twolf
bzip2
crafty
dbms
vortex
perlbmk
jbb2005
9
A possible solution
  • Automatic stressmark generation
  • In two steps
  • BenchMaker
  • Generate synthetic benchmark from abstract
    workload model
  • StressMaker
  • Explore workload space by turning knobs using
    BenchMaker and search for stressmarks

10
StressMaker
BenchMaker
synthetic benchmark
abstract workload configuration
microprocessor model
abstract workload space exploration
stressmark
objective function e.g., max power
11
Max-power stressmark
StressMaker
SPEC CPU / commercial
art
30
25
mesa
SPECjbb2005
20
perlbmk
gzip
Power (Watts)
15
perlbmk
perlbmk
mesa
gzip
dbt2
gzip
10
eon
mcf
art
5
0
lsq
alu
fetch
icache
clock
issue
bpred
regfile
dcache
window
rename
dispatch
dcache2
resultbus
  • 8-wide OOO processor 81.5Watts in total
  • assuming Wattch (0.18um, 1.2GHz, aggressive
    clock gating)

12
Max-power stressmark chars
  • Keep functional units busy
  • Uniform mix of instruction types
  • Keep issue logic busy
  • High ILP
  • No pipeline flushes
  • High branch predictability
  • Keep caches busy
  • Good locality
  • ? similar to hand-crafted stressmarks
  • Gowan et al., DAC98 Vishwanath, Intel Tech
    Journal, 2000

13
Evaluation of genetic algorithm
  • Speed
  • Three orders of magnitude faster than exhaustive
    search
  • Effectiveness
  • Max-power stressmark through StressMaker achieves
    99 of max-power stressmark through exhaustive
    search 48Watts for 4-wide OOO processor

14
Thermal stressmarks
  • Thermal hotspots
  • Max component power
  • Thermal differentials
  • Thermal sensor placement
  • Lee et al., ICCD05
  • Examples
  • L2 vs. I-fetch 44.6ºC difference
  • No stress on L2, high ILP, high branch
    predictability
  • L2 vs. register remap 48.4ºC difference
  • Lots of L2 accesses stress L2 and minimal stress
    on register remap

15
Why automate the process?
2-wide OOO max-power stressmark
100
4-wide OOO max-power stressmark
80
8-wide OOO max-power stressmark
60
Power (Watts)
40
20
0
2-wide OOO
4-wide OOO
8-wide OOO
stressmark is processor-specific
16
Conclusion two contributions
  • BenchMaker
  • Abstract workload model
  • Generates proxies for real-life benchmarks
  • High accuracy
  • StressMaker
  • Automated stressmark generation
  • Case studies max-power, max single-cycle power,
    dI/dt, thermal hotspots, etc.

17
Thank You. Questions?
Write a Comment
User Comments (0)
About PowerShow.com