Cpsc 318 Computer Structures Lecture 2 Performance - PowerPoint PPT Presentation

1 / 36
About This Presentation
Title:

Cpsc 318 Computer Structures Lecture 2 Performance

Description:

Fallicies How to avoid getting fooled by performance calculations ... 12 integer (gzip, gcc, crafty, perl, bzip, ...) 14 floating-point (swim, mesa, art, apsi, ... – PowerPoint PPT presentation

Number of Views:24
Avg rating:3.0/5.0
Slides: 37
Provided by: davepat4
Category:

less

Transcript and Presenter's Notes

Title: Cpsc 318 Computer Structures Lecture 2 Performance


1
Cpsc 318Computer Structures Lecture 2
Performance
  • Dr. Son Vuong
  • (vuong_at_cs.ubc.ca)
  • January 6, 2004

2
Overview
  • Performance
  • How do we measure performance?
  • Metrics
  • Benchmarking
  • Fallicies How to avoid getting fooled by
    performance calculations and demos!

Readings Chapter 4
3
Performance
  • Purchasing Perspective given a collection of
    machines (or upgrade options), which has the
  • best performance ?
  • least cost ?
  • best performance / cost ?
  • Computer Designer Perspective faced with design
    options, which has the
  • best performance improvement ?
  • least cost ?
  • best performance / cost ?
  • All require basis for comparison and metric for
    evaluation
  • Solid metrics lead to solid progress!

4
Cost-performance
Smaller is better
  • Divide the cost of the machine by its
    performance
  • (equivalent to) multiply cost by execution time
  • Units of cost / performance dollars-seconds

5
Measurement and Evaluation
Architecture is an iterative process --
searching the space of possible designs --
at all levels of computer systems
Creativity
Cost / Performance Analysis
Good Ideas
Mediocre Ideas
Bad Ideas
6
Example
For 400, we can add 256MBytes of memory and
reduce the execution time from 12 down to 8.5
seconds. Does this improve cost-performance?
  • Old cost-performance 3000 x 12 36000 sec
  • New cost-performance 3400 x 8.5 28900 sec

For and additional 300, we can upgrade to a
wider disk interface and decrease execution time
to 8 seconds. Does this improve
cost-performance?
  • Old cost-performance 3400 x 8.5 28900 sec
  • New cost-performance 3700 x 8.0 29600 sec

7
Example
For 200, we can add 1 GBytes of memory and
reduce the execution time from 12 down to 8.5
seconds. Does this improve cost-performance?
  • Old cost-performance 2000 x 12 24000 sec
  • New cost-performance 2200 x 8.5 18700 sec

For and additional 175, we can upgrade to a
wider disk interface and decrease execution time
to 8 seconds. Does this improve
cost-performance?
  • Old cost-performance 2200 x 8.5 18700 sec
  • New cost-performance 2375 x 8.0 19000 sec

8
Metrics of performance
Answers per month Useful Operations per second
Application
Programming Language
Compiler
ISA
Datapath
Control
Function Units
Cycles per second (clock rate)
Transistors
Wires
Pins
Each metric has a place and a purpose, and each
can be misused
9
Two Notions of Performance
Plane
TopSpeed
DC to Paris
Passen-gers
Throughput (pmph)
Boeing 747
BAD/Sud Concorde
  • Which has higher performance?
  • Time to deliver 1 passenger?
  • Time to deliver 400 passengers?
  • In a computer, time for 1 job called Response
    Time or Execution Time
  • In a computer, jobs per day called Throughput or
    Bandwidth

10
Definitions
  • Performance is in units of things per sec
  • bigger is better
  • If we are primarily concerned with response time

" X is n times faster than Y " means
11
Example of Response Time v. Throughput
  • Time of Concorde vs. Boeing 747?
  • Concord is 6.5 hours / 3 hours 2.2 times
    faster
  • Throughput of Boeing vs. Concorde?
  • Boeing 747 286,700 pmph / 178,200 pmph 1.6
    times faster
  • Boeing is 1.6 times (60) faster in terms of
    throughput
  • Concord is 2.2 times (120) faster in terms of
    flying time (response time)
  • We will focus primarily on execution time for a
    single job

12
Comparisons
Definitions
  • Speed-up time_old / time_new
  • Improvement 100 (time_old - time_new) /
    time_new
  • Example
  • Consider the enhancement in a previous example
    that improved the execution time from 12 down to
    8.5
  • Speed-up of 12/8.5 1.41
  • Improvement of 100 (12 - 8.5) / 8.5 41

Execute 41 more applications in the same time.
If the old machine could execute 300
programs/hour this one execute 423 programs/hour
13
Confusing Wording on Performance
  • Will (try to) stick to n times faster its less
    confusing than m faster
  • As faster means both increased performance and
    decreased execution time, to reduce confusion
    will use improve performance or improve
    execution time

14
What is Time?
  • Straightforward definition of time
  • Total time to complete a task, including disk
    accesses, memory accesses, I/O activities,
    operating system overhead, ... (User Time)
  • real time, response time or elapsed time
  • Alternative just time processor (CPU) is
    working only on your program (since multiple
    processes running at same time)
  • CPU execution time or CPU time
  • Often divided into system CPU time (in OS) and
    user CPU time (in user program)

15
Different aspects
cascade time myprogram 90.7u 12.9s 239 65
  • User CPU time 90.7 seconds
  • System CPU time 12.9 seconds
  • Elapsed time is 2 minutes and 39 seconds
  • Percentage of elapsed time that is CPU time is
    65

16
How to Measure Time?
  • User Time ? seconds
  • CPU Time Computers constructed using a clock
    that runs at a constant rate and determines when
    events take place in the hardware
  • These discrete time intervals called clock
    cycles (or informally clocks or cycles)
  • Length of clock period clock cycle time (e.g.,
    2 nanoseconds or 2 ns) and clock rate (e.g., 500
    megahertz, or 500 MHz), which is the inverse of
    the clock period use these!

17
Clock Cycle
Clock Period
Start of Cycle
End of Cycle
1 MHz (MegaHertz) 1 million cycles/sec OR 1
cycle/microsecond
18
Measuring Time using Clock Cycles (1/2)
CPU execution time for program Clock
Cycles for a program x Clock Cycle Time
  • or
  • Clock Cycles for a program Clock Rate

19
Measuring Time using Clock Cycles (2/2)
  • One way to define clock cycles
  • Clock Cycles for program
  • Instructions for a program (called
    Instruction Count)
  • x Average Clock Cycles Per Instruction
    (abbreviated CPI)
  • CPI one way to compare two machines with same
    instruction set, since Instruction Count would be
    the same

20
Performance Calculation
  • CPU execution time for program Clock Cycles
    for program x Clock Cycle Time
  • Substituting for clock cycles
  • CPU execution time for program
    (Instruction Count x CPI)
    x Clock Cycle Time
  • Instruction Count x CPI x Clock Cycle Time

21
How to Calculate the 3 Components?
  • Clock Cycle Time in specification of computer
    (Clock Rate in advertisements)
  • Instruction Count
  • Count instructions in loop of small program
  • Use simulator to count instructions
  • Hardware counter in spec. register (Pentium II)

22
Example
Two implementation of the same ISA
Which is faster?
Assuming I instructions
23
Another way of calculating CPI
InstructionCount x CPI x ClockCycleTime
"instruction frequency"
Invest Resources where time is Spent!
24
Example
Choice of code sequence to generate for some
high-level language Construct.
Which is the best choice?
Best
25
What Programs Measure for Comparison?
  • Ideally run typical programs with typical input
    before purchase, or before even build machine
  • Called a workload For example
  • Engineer uses compiler, spreadsheet
  • Author uses word processor, drawing program,
    compression software
  • In some situations its hard to do
  • Dont have access to machine to benchmark
    before purchase
  • Dont know workload in future

26
Benchmarks
  • Obviously, apparent speed of processor depends on
    code used to test it
  • Need industry standards so that different
    processors can be fairly compared
  • Companies exist that create these benchmarks
    typical code used to evaluate systems
  • Need to be changed every 2 or 3 years since
    designers could (and do!) target for these
    standard benchmarks

27
Example Standardized Benchmarks 1/2
  • Workstations Standard Performance Evaluation
    Corporation (SPEC)
  • SPEC95
  • 8 integer (gcc, compress, li, ijpeg, perl, ...)
  • 10 floating-point programs (hydro2d, mgrid,
    applu, turbo3d, ...)
  • www.spec.org
  • Separate avg for integer (CINT95) and FP (CFP95)
    relative to base machine SPARC10/40
  • Benchmarks distributed in source code
  • Company representatives select workload
  • Compiler, machine designers target benchmarks, so
    try to change every 3 years

28
Example Standardized Benchmarks 2/2
  • SPEC CPU2000
  • 12 integer (gzip, gcc, crafty, perl, bzip, ...)
  • 14 floating-point (swim, mesa, art, apsi, ...)
  • Separate average for integer (CINT2000) and FP
    (CFP2000) relative to base machine Sun 300MHz
    256Mb-RAM Ultra5_10, which gets score of 100
  • www.spec.org/osg/cpu2000/
  • They measure
  • System speed (SPECint2000)
  • System throughput (SPECint_rate2000)

29
SPECint95base Performance (Oct. 1997)
Compaq/DEC Alpha
HP PA
Intel Pentium Pro
30
SPECfp95base Performance (Oct. 1997)
Compaq/DEC Alpha
HP PA
Intel Pentium Pro
31
Example PC Workload Benchmark
  • PCs Ziff-Davis Benchmark Suite
  • Business Winstone is a system-level,
    application-based benchmark that measures a PC's
    overall performance when running today's
    top-selling Windows-based 32-bit applications it
    doesn't mimic what these packages do it runs
    real applications through a series of scripted
    activities and uses the time a PC takes to
    complete those activities to produce its
    performance scores.
  • Also tests for CDs, Content-creation, Audio, 3D
    graphics, battery life
  • http//www.etestinglabs.com/benchmarks/

32
Performance Evaluation
  • Good products created when have
  • Good benchmarks
  • Good ways to summarize performance
  • Given sales is a function of performance relative
    to competition, should invest in improving
    product as reported by performance summary?
  • If benchmarks/summary inadequate, then choose
    between improving product for real programs vs.
    improving product to get more sales Sales almost
    always wins!

33
Performance Evaluation The Demo
  • If were talking about performance, lets
    discuss the ways shady salespeople have fooled
    consumers(so that you dont get taken!)
  • 5. Never let the user touch it
  • 4. Only run the demo through a script
  • 3. Run it on a stock machine in which no expense
    was spared
  • 2. Preprocess all available data
  • 1. Play a movie

34
Megahertz Myth Marketing Movie
35
And in conclusion 1/2
  • Latency v. Throughput
  • Performance doesnt depend on any single factor
    need to know Instruction Count, Clocks Per
    Instruction and Clock Rate to get valid
    estimations
  • User Time time user needs to wait for program to
    execute depends heavily on how OS switches
    between tasks
  • CPU Time time spent executing a single program
    depends solely on design of processor (datapath,
    pipelining effectiveness, caches, etc.)

36
And in conclusion 2/2
  • Benchmarks
  • Attempt to predict performance
  • Updated every few years
  • Measure everything from simulation of desktop
    graphics programs to battery life
  • Megahertz Myth
  • MHz ? performance, its just one factor
Write a Comment
User Comments (0)
About PowerShow.com