Cpsc 318 Computer Structures Lecture 3 Performance 2 - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

Cpsc 318 Computer Structures Lecture 3 Performance 2

Description:

Very small benchmarks may be used. Benchmarks may be manually translated to optimize the performance. ... Benchmark v. trans. ... – PowerPoint PPT presentation

Number of Views:35
Avg rating:3.0/5.0
Slides: 28
Provided by: davepat4
Category:

less

Transcript and Presenter's Notes

Title: Cpsc 318 Computer Structures Lecture 3 Performance 2


1
Cpsc 318Computer Structures Lecture 3
Performance 2
  • Dr. Son Vuong
  • (vuong_at_cs.ubc.ca)
  • January 15, 2004

2
Course Overview (topics)
  • Introduction (Lecture 1)
  • Performance (Lectures 2 and 3) (Today)
  • Assembly programming (MIPS) (Next
    lecture)
  • Instruction set architecture
  • Processor design (pipelining, branch
    prediction)
  • Caches, virtual memory, I/O
  • Compare and contrast current processor designs

3
Overview
  • Weve looked at
  • How do we measure performance?
  • Metrics
  • Benchmarking
  • Now
  • Review
  • Fallacies
  • Amdahls Law
  • MIPS
  • Arithmetic and Geometric means
  • A few examples

Readings Chapter 2 ( sections 2.7 to end)
4
Benchmarking games
Benchmark v. trans. To subject (a system) to a
series of tests in order to obtain prearranged
results not available on competitive systems.
-- S. Kelly-Bootle, The Devils
DP Dictionary
  • Differing configurations may be used to run the
    same workload on two systems.
  • The compilers may be wired to optimize the
    workload.
  • Test specifications may be written so that they
    are biased toward one machine.
  • A synchronized job sequence may be used.
  • The workload may be arbitrarily picked.
  • Very small benchmarks may be used.
  • Benchmarks may be manually translated to
    optimize the performance.

R. Jain, The art of computer systems performance
analysis
5
Basis of Evaluation
Cons
Pros
  • very specific
  • non-portable
  • difficult to run, or
  • measure
  • hard to identify cause
  • representative

Actual Target Workload
  • portable
  • widely used
  • improvements useful in reality
  • less representative

Full Application Benchmarks
  • easy to fool

Small Kernel Benchmarks
  • easy to run, early in design cycle
  • peak may be a long way from application
    performance
  • identify peak capability and potential
    bottlenecks

Microbenchmarks
6
Aspects of CPU Performance
  • instr count CPI clock rate
  • Program X
  • Compiler X X
  • Instr. Set X X
  • Organization X X
  • Technology X

7
Machine Organization
  • Capabilities Performance Characteristics of
    Principal Functional Units (FUs)
  • (e.g., Registers, ALU, Shifters, Logic Units,
    ...)
  • Ways in which these components are interconnected
  • Information flows between components
  • Logic and means by which such information flow is
    controlled.
  • Choreography of FUs to realize the ISA
  • Register Transfer Level (RTL) Description

Logic Designer's View
8
UltraSPARC chip
9
Example Organization
  • TI SuperSPARCtm TMS390Z50 in Sun SPARCstation20

MBus Module
SuperSPARC
Floating-point Unit
L2
CC
DRAM Controller
Integer Unit
MBus
MBus control M-S Adapter
L64852
Inst Cache
Ref MMU
Data Cache
STDIO
SBus
serial
kbd
SCSI
Store Buffer
SBus DMA
mouse
Ethernet
audio
RTC
Bus Interface
SBus Cards
Boot PROM
Floppy
10
Example (RISC processor)
Base Machine (Reg / Reg) Op Freq
Cycles CPI(i) Time ALU 50 1
.5 23 Load 20 5 1.0 45 Store 10 3
.3 14 Branch 20 2 .4 18 2.2
Typical Mix
How much faster would the machine be if a better
data cache reduced the average load time to 2
cycles? How does this compare with using branch
prediction to save a cycle off the branch
time? What if two ALU instructions could be
executed at once?
11
Fallacies and Pitfalls
Common Misconceptions about Performance
Wise men learn by other mens mistakes, fools by
their own

-- H. G. Wells
12
Fallacies and Pitfalls
  • Expecting the improvement in one aspect of a
    machine to increase performance by an amount
    proportional to the size of the improvement
    (Amdahls law). (pitfall)
  • Using MIPS to predict performance (fallacy)
  • Using the arithmetic mean of normalized execution
    time to predict performance (pitfall)
  • The geometric mean of execution time ratios is
    proportional to total execution time. (fallacy)

13
Amdahls Law
Consider an enhancement to a system that
accelerates a fraction f of the task by a speedup
factor s. Suppose the remainder of the task is
unaffected by the change.
Without enhancement
With enhancement
14
Amdahls Law

You can only go as fast as the slowest part
15
Example of Amdahls Law
Suppose we have a program that takes 100 seconds
to execution with the multiply taking 80 seconds
of this time. How much do I have to improve the
speed of multiplication if I want my program to
run 5 times faster?
gt 20 seconds
BUT the time_new must also be 20 seconds (to be
5 times faster)
16
Misuse of MIPS - Example
Assume the clock rate is 500MHz. Compare the
two, first using execution time and then MIPS.
CycleTime x CPI x InstructionCount
1 / (CycleTime x CPI x 106) ClockRate /
(CPI x 106)
17
MIPS example cont.
ns
ns
Compiler1 is 1.5 times faster than Compiler2
18
MIPS example cont.
Compiler2 has a higher MIPS rating than Compiler1
19
Arithmetic Mean
Normalized to machine A or to machine B?
Does this make any sense?
The problem is that the (arithmetic) mean of the
normalized performance is not a quantity that
makes sense!!
20
Geometric Mean
Normalized to machine B
Normalized to machine A
SPECmarks uses the geometric mean.
The product is meaningful, its the product of
the speedups!!
21
Example of Geometric Mean
Performance improvements in the latest versions
of seven layers of a new networking protocol was
measured separately for each layer.
What is the average improvement per layer?
22
Disadvantages of Geometric Mean
It does not track execution time!
But the geometric mean these two machines are
equal. This is true only for a workload that
runs program 1, 100 more times than program 2.
100 x 1 100 x 10 100 1000
Improving program 1 by 50, according to the
geometric mean, is equivalent to improving
program 2 by 50.
The only true measure is EXECUTION TIME!!!
23
SPEC
Standard Performance Evaluation
Committee www.specbench.org
The System Performance Evaluation Cooperative
(SPEC) was founded in 1988 by a small number of
workstation vendors who realized that the
marketplace was in desperate need of realistic,
standardized performance tests. Their key
realization was that an ounce of honest data was
worth more than a pound of marketing hype. SPEC
has grown to become one of the more successful
performance standardization bodies with more than
40 member companies. SPEC publishes several
hundred different performance results each
quarter spanning across a variety of system
performance disciplines. www.specbench.org
24
SPEC95 Benchmarks
Name Application go Artificial intelligence
plays the game of "Go" m88ksim Motorola 88K chip
simulator runs test program gcc New version of
GCC builds SPARC code compress Compresses and
decompresses file in memory li LISP
interpreter ljpeg Graphic compression and
decompression perl Manipulates strings
(anagrams) and prime numbers in Perl vortex A
database program tomcatv A mesh-generation
program swim Shallow water model with 513 x 513
grid su2cor Quantum physics Monte Carlo
simulation hydro2d Astrophysics
Hydrodynamical Navier Stokes equations mgrid Mult
i-grid solver in 3D potential field applu Parabol
ic/elliptic partial differential
equations turb3d Simulates isotropic,homogeneous
turbulence in a cube apsi Solves problems
regarding temperature, wind, velocity and
distribution of
pollutants fpppp Quantum chemistry wave5 Plasma
physics electromagnetic particle simulation
25
SPEC CPU2000
  • 12 integer (gzip, gcc, crafty, perl, bzip, ...)
  • 14 floating-point (swim, mesa, art, apsi, ...)
  • Separate average for integer (CINT2000) and FP
    (CFP2000) relative to base machine Sun 300MHz
    256Mb-RAM Ultra5_10, which gets score of 100
  • www.spec.org/osg/cpu2000/
  • They measure
  • System speed (SPECint2000)
  • System throughput (SPECint_rate2000)

26
An in Conclusion
  • Benchmarking is essential
  • Fallacies
  • Amdahls Law cant do better than part
  • MIPS can be misleading
  • Arithmetic and Geometric means
  • Use geometric mean for ratios (normalized
    performance)
  • Excution time is the true measure

27
Course Overview (topics)
  • Introduction (Lecture 1)
  • Performance (Lectures 2 and 3)
  • Assembly programming (MIPS) (Next
    lecture)
  • Instruction set architecture
  • Processor design (pipelining, branch
    prediction)
  • Caches, virtual memory, I/O
  • Compare and contrast current processor designs
Write a Comment
User Comments (0)
About PowerShow.com