Wake%20Up%20and%20Smell%20the%20Coffee:%20Performance%20Analysis%20Methodologies%20for%20the%2021st%20Century - PowerPoint PPT Presentation

About This Presentation
Title:

Wake%20Up%20and%20Smell%20the%20Coffee:%20Performance%20Analysis%20Methodologies%20for%20the%2021st%20Century

Description:

20 papers use C and/or C . 5 papers are orthogonal to the programming language ... VMs should report total memory not just application memory ... – PowerPoint PPT presentation

Number of Views:66
Avg rating:3.0/5.0
Slides: 31
Provided by: CSCF2
Learn more at: http://www.spec.org
Category:

less

Transcript and Presenter's Notes

Title: Wake%20Up%20and%20Smell%20the%20Coffee:%20Performance%20Analysis%20Methodologies%20for%20the%2021st%20Century


1
Wake Up and Smell the Coffee Performance
Analysis Methodologies for the 21st Century
  • Kathryn S McKinley
  • Department of Computer Sciences
  • University of Texas at Austin

2
Shocking News!
  • In 2000, Java overtook C and C as the most
    popular programming language
  • TIOBE 2000--2008

3
Systems Researchin Industry and Academia
  • ISCA 2006
  • 20 papers use C and/or C
  • 5 papers are orthogonal to the programming
    language
  • 2 papers use specialized programming languages
  • 2 papers use Java and C from SPEC
  • 1 paper uses only Java from SPEC

4
What is Experimental Computer Science?
5
What is Experimental Computer Science?
  • An idea
  • An implementation in some system
  • An evaluation

6
The success of most systems innovation hinges on
evaluation methodologies.
  1. Benchmarks reflect current and ideally, future
    reality
  2. Experimental design is appropriate
  3. Statistical data analysis

7
The success of most systems innovation hinges on
experimental methodologies.
?
  1. Benchmarks reflect current and ideally, future
    reality DaCapo Benchmarks 2006
  2. Experimental design is appropriate.
  3. Statistical Data Analysis Georges et al. 2006

?
8
Experimental Design
  • Were not in Kansas anymore!
  • JIT compilation, GC, dynamic checks, etc
  • Methodology has not adapted
  • Needs to be updated and institutionalized

this sophistication provides a significant
challenge to understanding complete system
performance, not found in traditional languages
such as C or C Hauswirth et al OOPSLA 04
9
Experimental Design
  • Comprehensive comparison
  • 3 state-of-the-art JVMs
  • Best of 5 executions
  • 19 benchmarks
  • Platform 2GHz Pentium-M, 1GB RAM, linux 2.6.15

10
Experimental Design
11
Experimental Design
12
Experimental Design
13
Experimental Design
First Iteration
Second Iteration
Third Iteration
14
Experimental Design
  • Another Experiment
  • Compare two garbage collectors
  • Semispace Full Heap Garbage Collector
  • Marksweep Full Heap Garbage Collector

15
Experimental Design
  • Another Experiment
  • Compare two garbage collectors
  • Semispace Full Heap Garbage Collector
  • Marksweep Full Heap Garbage Collector
  • Experimental Design
  • Same JVM, same compiler settings
  • Second iteration for both
  • Best of 5 executions
  • One benchmark - SPEC 209_db
  • Platform 2GHz Pentium-M, 1GB RAM, linux 2.6.15

16
Marksweep vs Semispace
17
Marksweep vs Semispace
18
Marksweep vs Semispace
19
Experimental Design
20
Experimental DesignBest Practices
  • Measuring JVM innovations
  • Measuring JIT innovations
  • Measuring GC innovations
  • Measuring Architecture innovations

21
JVM InnovationBest Practices
  • Examples
  • Thread scheduling
  • Performance monitoring
  • Workload triggers differences
  • real workloads perhaps microbenchmarks
  • e.g., force frequency of thread switching
  • Measure report multiple iterations
  • start up
  • steady state (aka server mode)
  • never configure the VM to use completely
    unoptimized code!
  • Use a modest or multiple heap sizes computed as a
    function of maximum live size of the application
  • Use report multiple architectures

22
Best Practices
23
JIT Innovation Best Practices
  • Example new compiler optimization
  • Code quality Does it improve the application
    code?
  • Compile time How much compile time does it add?
  • Total time compiler and application time
    together
  • Problem adaptive compilation responds to
    compilation load
  • Question How do we tease all these effects apart?

24
JIT Innovation Best Practices
  • Teasing apart compile time and code quality
  • requires multiple experiments
  • Total time Mix methodology
  • Run adaptive system as intended
  • Result mixture of optimized and unoptimized code
  • First second iterations (that include compile
    time)
  • Set and/or report the heap size as a function of
    maximum live size of the application
  • Report average and show statistical error
  • Code quality
  • OK Run iterations until performance stabilizes
    on best, or
  • Better Run several iterations of the benchmark,
    turn off the compiler, and measure a run
    guaranteed to have no compilation
  • Best Replay mix compilation
  • Compile time
  • Requires the compiler to be deterministic
  • Replay mix compilation

25
Replay Compilation
  • Force the JIT to produce a deterministic result
  • Make a compilation profiler replayer
  • Profiler
  • Profile first or later iterations with adaptive
    JIT, pick best or average
  • Record profiling information used in compilation
    decisions, e.g., dynamic profiles of edges,
    paths, /or dynamic call graph
  • Record compilation decisions, e.g., compile
    method bar at level two, inline method foo into
    bar
  • Mix of optimized and unoptimized, or all
    optimized/unoptimized
  • Replayer
  • Reads in profile
  • As the system loads each class, apply profile /-
    innovation
  • Result
  • controlled experiments with deterministic
    compiler behavior
  • reduces statistical variance in measurements
  • Still not a perfect methodology for inlining

26
GC Innovation Best Practices
  • Requires more than one experiment...
  • Use report a range of fixed heap sizes
  • Explore the space time tradeoff
  • Measure heap size with respect to the maximum
    live size of the application
  • VMs should report total memory not just
    application memory
  • Different GC algorithms vary in the meta-data
    they require
  • JIT and VM use memory...
  • Measure time with a constant workload
  • Do not measure through put
  • Best run two experiments
  • mix with adaptive methodology what users are
    likely to see in practice
  • replay hold the compiler activity constant
  • Choose a profile with best application
    performance in order to keep from hiding mutator
    overheads in bad code.

27
Architecture Innovation Best Practices
  • Requires more than one experiment...
  • Use more than one VM
  • Set a modest heap size and/or report heap size as
    a function of maximum live size
  • Use a mixture of optimized and uncompiled code
  • Simulator needs the same code in many cases to
    perform comparisons
  • Best for microarchitecture only changes
  • Multiple traces from live system with adaptive
    methodology
  • start up and steady state with compiler turned
    off
  • what users are likely to see in practice
  • Wont work if architecture change requires
    recompilation, e.g., new sampling mechanism
  • Use replay to make the code as similar as possible

28
benchmarks
There are lies, damn lies, and
  • statistics
  • Disraeli

29
Conclusions
  • Methodology includes
  • Benchmarks
  • Experimental design
  • Statistical analysis OOPSLA 2007
  • Poor Methodology
  • can focus or misdirect innovation and energy
  • We have a unique opportunity
  • Transactional memory, multicore performance,
    dynamic languages,
  • What we can do
  • Enlist VM builders to include replay
  • Fund and broaden participation in benchmarking
  • Research and industrial partnerships
  • Funding through NSF, ACM, SPEC, industry or ??
  • Participate in building community workloads

30
Thank you!
www.dacapobench.org
Write a Comment
User Comments (0)
About PowerShow.com