Statistical Simulation of Superscalar Architectures using Commercial Workloads - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Statistical Simulation of Superscalar Architectures using Commercial Workloads

Description:

8 IBS system traces (mpeg, jpeg, gs, verilog, gcc, sdet, nroff, groff) ... verilog. sdet. TPC-D. vortex. go. 18. Conclusion (1) ... – PowerPoint PPT presentation

Number of Views:49
Avg rating:3.0/5.0
Slides: 21
Provided by: leec6
Category:

less

Transcript and Presenter's Notes

Title: Statistical Simulation of Superscalar Architectures using Commercial Workloads


1
Statistical Simulation of Superscalar
Architectures using Commercial Workloads
  • Lieven Eeckhout and Koen De Bosschere
  • Dept. of Electronics and Information Systems
    (ELIS)
  • Ghent University, Belgium
  • CAECW01, January 21, 2001

2
Outline
  • Introduction
  • Statistical Simulation
  • Statistical profiling
  • Synthetic trace generation
  • Methodology
  • Evaluation
  • Conclusion

3
Introduction
  • Architectural simulation
  • trace-driven or execution-driven
  • accurate
  • long simulation times
  • long traces to be stored
  • Need for fast simulation techniques
  • take part of a full trace
  • analytical modeling
  • trace sampling
  • statistical simulation

4
Goal
  • Previous work used SPEC benchmarks to evaluate
    statistical simulation
  • In this talk we use both commercial and
    scientific workloads
  • SPECint, SPECfp, system traces, multimedia, X
    graphics, database

5
Statistical Simulation
  • Three steps
  • extract statistical profile from a program
    execution
  • generate synthetic trace from it
  • simulate on a trace-driven simulator
  • Two major advantages
  • statistical profile is more compact than full
    trace
  • fast simulation due to statistical nature
  • design space exploration in limited time

6
Statistical Simulation
real trace (e.g. SPEC benchmark)
branch profiling
cache profiling
instruction profiling
branch statistics
cache statistics
instruction statistics
7
Statistical Profiling
  • Microarchitecture-independent statistics
  • instruction statistics
  • Microarchitecture-dependent statistics
  • branch statistics
  • cache statistics
  • Result statistical simulation only to explore
    design options of processor core (cache and
    branch predictor are fixed)

8
Statistical ProfilingInstruction Statistics
  • Instruction mix (13 classes)
  • Number of register operands
  • Age of register operands
  • probability that register operand was produced ?
    instructions before it in the trace (only RAW)
  • Memory dependencies
  • probability that load is memory-dependent on the
    ?-th store before it in the trace (only RAW)

9
Statistical ProfilingBranch Statistics
  • Six branch types
  • conditional branch, unconditional branch, call
    with offset, indirect jump, indirect call, return
  • Distinction
  • branch prediction accuracy refill pipeline on
    branch misprediction
  • branch target prediction accuracy single-cycle
    bubble in pipeline on correct branch prediction
    but target misprediction

10
Statistical ProfilingCache Statistics
  • D-cache statistics
  • L1 D-cache miss rate
  • L2 D-cache miss rate
  • I-cache statistics
  • L1 I-cache miss rate
  • L2 I-cache miss rate

11
Synthetic Trace Generation
  • Instruction-by-instruction
  • through random number generation
  • Determine
  • instruction type
  • number of operands
  • age of register operands
  • memory dependency
  • branch behavior
  • D-cache behavior
  • I-cache behavior

I-cache miss
D-cache miss
mispredicted
12
Methodology microarchitecture
  • Out-of-order processor
  • 8 and 16 issue
  • windows of 64 and 128 instructions
  • McFarling branch predictor
  • small cache configuration
  • 8KB DM L1 I-cache, 8KB DM L1 D-cache, 64KB 2WSA
    unified L2 cache
  • large cache configuration
  • 32KB DM L1 I-cache, 64KB 2WSA L1 D-cache, 512KB
    4WSA unified L2 cache
  • Access time
  • L1 I-cache (1 cycle), L1 D-cache (2 cycles), L2
    cache (10 cycles), main memory (80 cycles)

13
Methodology benchmarks
  • 8 SPECint95 benchmarks
  • 5 SPECfp95 benchmarks (hydro2d, su2cor, swim,
    tomcatv, wave5)
  • 8 IBS system traces (mpeg, jpeg, gs, verilog,
    gcc, sdet, nroff, groff)
  • 4 MediaBench applications (g721, gs, gsm, mpeg2)
  • 4 X graphics benchmarks (DooM, POVRay, Xanim,
    Quake)
  • 2 TPC-D queries running on Postgres 6.3
  • 200 million instructions / trace

14
Evaluation
  • IPC prediction error
  • IPC real trace - IPC synthetic trace
  • IPC real trace
  • IPC real trace IPC when running real trace on
    trace-driven simulator
  • IPC synthetic trace IPC when running synthetic
    trace generated from the statistical profile of
    the real trace
  • Simulation speed sIPC/xIPC less than 1 after
    simulating 1 million instructions

15
IPC prediction error (1)
high D-cache miss rate
157
135
40
30
20
10
IPC prediction error
0
-10
-20
-30
li
go
gs
gs
perl
jpeg
sdet
gcc
ijpeg
nroff
groff
swim
verilog
gsm_e
mpeg2
xanim
mpeg
tpc-d.2
vortex
wave5
su2cor
xdoom
xquake
hydro2d
g721_e
xpovray
tomcatv
tpc-d.17
real_gcc
m88ksim
compress
SPECint95
SPECfp95
IBS
MediaBench
X graphics
TPC-D
16-issue, 128-entry window, small cache
configuration
16
IPC prediction error (2)
30
20
10
IPC prediction error
0
-10
-20
-30
li
go
gs
gs
gcc
jpeg
ijpeg
sdet
perl
nroff
groff
swim
verilog
mpeg
gsm_e
mpeg2
xanim
vortex
tpc-d.2
wave5
xquake
su2cor
xdoom
g721_e
xpovray
tomcatv
tpc-d.17
real_gcc
hydro2d
m88ksim
compress
SPECint95
SPECfp95
IBS
MediaBench
X graphics
TPC-D
16-issue, 128-entry window, large cache
configuration
17
IPC prediction error vs. static instruction count
160
w 64 i 8 'small' cache
140
w 128 i 16 'small' cache
120
w 64 i 8 'large' cache
nroff jpeg (IBS) verilog sdet
100
w 128 i 16 'large' cache
80
mpeg (IBS) groff
gcc
DooM Quake
gs (IBS)
IPC prediction error
60
40
20
0
gcc (IBS)
vortex go
TPC-D
-20
-40
0
20000
40000
60000
80000
100000
120000
140000
160000
static instruction count (number of instructions
executed at least once)
18
Conclusion (1)
  • Higher IPC prediction errors for applications
    with smaller static instruction count
  • MediaBench applications
  • SPECfp95 benchmarks
  • 2 X graphics benchmarks (POVRay and Xanim)
  • 5 SPECint95 benchmarks

19
Conclusion (2)
  • Smaller IPC prediction errors for applications
    with larger instruction footprint
  • IBS system traces
  • TPC-D traces
  • 2 X graphics benchmarks (DooM and Quake)
  • 3 SPECint95 benchmarks (go, gcc, vortex)
  • IPC prediction error between -1 and 25

20
Conclusion (3)
  • Statistical simulation is a useful fast
    simulation technique for commercial workloads
  • due to higher variability in instructions
  • since commercial workloads have larger
    instruction footprint
  • which makes a statistical technique more powerful
Write a Comment
User Comments (0)
About PowerShow.com