Performance II - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

Performance II

Description:

CS 141 Chien April 2, 1999. Comparing Computers using Metrics ... Megabytes (communication) Limitations of each of these? How can you cheat/reduce each of these? ... – PowerPoint PPT presentation

Number of Views:39
Avg rating:3.0/5.0
Slides: 26
Provided by: Andre524
Category:

less

Transcript and Presenter's Notes

Title: Performance II


1
Performance II
  • Last Time
  • Computer Architecture definition and drivers
  • Basic notions of Performance and Relative
    Performance
  • Today
  • Quiz
  • Time bases and Performance Metrics
  • Amdahls Law
  • Reminders/Announcements
  • Read PH Chapter 2, Performance

2
Comparing Computers using Metrics
  • Run programs, record execution times
  • How can we describe the relative performance of
    machines with such a metric?

3
Relative Performance
  • Can be confusing
  • A runs in 12 seconds
  • B runs in 20 seconds
  • A/B .6 , so A is 40 faster, or 1.4X faster, or
    B is .40 slower
  • B/A 1.67, so A is 67 faster, or 1.67X faster,
    or B is 67 slower
  • Needs a precise definition

4
Relative Performance Statements
ExecTimeB 75 __________ __
1.5 ExecTimeA 50
  • Performance Ratio (A/B)
  • A is 1.5 times faster than B
  • Performance Ratio (A/B)
  • Performance Ratio (B/A)
  • B is 0.67 times the performance of A

PerfA 1/ExecTimeA 75 ______
__________ __ 1.5 PerfB
1/ExecTimeB 50
ExecTimeA 50 __________ __
0.67 ExecTimeB 75
5
Performance
, for program X
  • only has meaning in the context of a program or
    workload
  • Not very intuitive as an absolute measure

6
Defining Relative Performance
PerformanceX
Relative Performance
Execution TimeY
n



Execution TimeX
PerformanceY
  • We can remove all ambiguity by always
    constraining n to be gt 1 gt machine x is n times
    faster than y.

7
Performance Measurements
  • gt Of metrics, not computer architectures!
  • Other Metrics

Millions of of Instructions (for a particular
program) Insts/Sec ______________ (MIPS) Exe
cTime x 106 (particular program) Cycles
per ExecTime x Clock Rate Instruction
____________________ Complexity of
Instructions (CPI) of Instructions Clock
Rate Hardware Implementation (Megahertz)
Characteristic Complexity of a cycle
8
Performance Summary
  • Many metrics, basis for comparison
  • Relative comparison
  • Quantitative comparison
  • Execution time is the preferred metric.
  • Cannot hide anything, include things by default
  • Easiest to avoid errors in comparison
  • Corresponds to user waiting time, resource usage
  • What metrics?

9
What is Time?
  • CPU Execution Time CPU clock cycles Clock
    cycle time
  • CPU clock cycles / Clock rate
  • Every conventional processor has a clock with an
    associated clock cycle time or clock rate.
  • Every program runs in an integral number of clock
    cycles.
  • x MHz x millions of cycles/second (clock rate)
  • 1/ (x MHz) cycle time, 1/(500 MHz) 2 ns

10
How many clock cycles?
  • Number of CPU cycles Instructions executed
  • Average Clock Cycles per Instruction (CPI)
  • or
  • CPI CPU clock cycles / Instruction count

11
All Together Now
seconds
instructions
seconds/cycle
cycles/instruction
12
Who Affects Performance?
CPU Execution Time
Instruction Count
CPI
Clock Cycle Time

X
X
  • programmer
  • compiler
  • instruction-set architect
  • machine architect
  • hardware designer
  • materials scientist/physicist/silicon engineer

13
Performance Variation
CPU Execution Time
Instruction Count
CPI
Clock Cycle Time

X
X
14
Speedup
  • Speedup is just relative performance on the same
    machine with something changed.
  • speedup relative performance

15
Amdahls Law
  • The impact of a performance improvement is
    limited by the percent of execution time affected
    by the improvement
  • Make the common case fast!!

Execution Time Affected
Execution time after improvement

Execution Time Unaffected

Amount of Improvement
16
MIPS and MFLOPS
  • MIPS - million instructions per second
  • number of instructions executed in program
    Clock rate
  • execution time in seconds 106
    CPI 106
  • MFLOPS - million floating point operations per
    second
  • number of floating point operations executed in
    program
  • execution time in seconds 106

17
Millions of Instructions per Second (MIPS)
  • MIPS of insts (insts/sec)
  • Time 106
  • All rates measures of performance are
  • Units of work Xs
  • Time Unit Sec
  • Problem to make these measures representative,
    units of work must be conserved.
  • They must correspond to real work that is is
    irreducible!
  • (i.e. work that is conserved over ANY
    implementation)

18
Units of Work
  • Instructions, Floating Point Operations, Window
    updates, answers, etc.
  • Are these things conserved in a computation?
  • Instructions compiler, architecture
  • Floating Point operations compiler, algorithm
  • Window updates algorithm
  • Answers to real problems ??
  • Depends on compiler, architecture, algorithm,
    implementation, etc. gt all of this is part of
    the benchmark

19
Example Instructions are not always conserved
addi R1, R1, 1 load R2, 16(R1) addi R1, R1,
1 load R3, 16(R1) addi R1, R1, 1 load R4,
16(R1)
load R1, 16(R2) add R3, R4, R1 load R1,
16(R2) add R5, R6, R1
  • Loads and stores may be redundant
  • data motion, not computation, real work
  • Arithmetic operations may be redundant
  • 3 adds can be reduced to one
  • add 3, and fix all of the other offsets
  • gt Many things which seem like work can be
    optimized away...

20
Example Floating point operations are not always
conserved
  • Matrix multiplication Strassens algorithm
  • Algorithmic improvements
  • Iterative algorithms that converge, different
    precision or subtle arithmetic differences can
    have a major effect.
  • Precision, arithmetic details
  • Errors (flaws) can require workarounds
  • Intel Pentium bug, numerous operations to replace
    each FDIV (multiple floating point operations)
  • Others?

21
A Benchmarking Example
  • Pentium II 450Mhz system, Microsoft C compiler
  • Compile the program, execute, count instructions
  • Measure at 400 MIPs
  • What does this tell you about performance?
  • Compile again, this time with optimization ON!
  • Compile takes a lot longer, execute, count
    instructions
  • Measure performance at 350 MIPs
  • What happened?

22
Benchmarking Example (cont.)
  • of InstsA vs of InstsB
  • ExecTimeA ExecTimeB
  • There are fewer instructions executed in the
    optimized program!
  • MIPS rating depends on compiler
  • Quality of code generated
  • Optimized for instruction execution time, not
    MIPS rating
  • Compilers are always benchmarked with the machine
  • How could you cheat to get a high MIPS rating?

23
Benchmarking Example II
  • Power Macintosh, 500Mhz, PowerPC 603
  • Compile same program, optimized
  • Execute, assuming no obvious cheating
  • Experiment produces 450 MIPS rating
  • Is this faster than the Pentium II?
  • gt Theres no easy way to tell from this
    information!
  • Why?
  • The unit of work has changed.
  • Pentium Instruction ! PowerPC instruction
  • Hard to compare MIPS across architectures, of
    little use for comparing architectures. Resort
    to execution time.

24
Other Measures of Work
  • Floating Point Operations
  • Window Updates
  • Frames/Polygons (rendering)
  • Megabytes (communication)
  • Limitations of each of these?
  • How can you cheat/reduce each of these?

25
Performance Metrics Summary
  • Many possible measures of work / performance
    metrics
  • Choosing is rife with potential errors
  • Because it includes everything, Execution time is
    the safest choice.
  • Still need to analyze the other influences
    carefully before you can draw any conclusions
    about the causes.
  • Amdahls Law
Write a Comment
User Comments (0)
About PowerShow.com