Performance - PowerPoint PPT Presentation

About This Presentation
Title:

Performance

Description:

Elapsed Time (Wall clock time, response time) ... Computers use a clock that runs at a constant rate and determines when an event ... – PowerPoint PPT presentation

Number of Views:34
Avg rating:3.0/5.0
Slides: 31
Provided by: erk9
Category:

less

Transcript and Presenter's Notes

Title: Performance


1
Performance Computer Architecture CS401 Erkay
Savas Sabanci University
2
Performance
  • What is performance?
  • How to measure performance?
  • Performance metrics
  • Performance evaluation
  • Why some hardware perform better than others for
    different programs?
  • What factors in hardware are related to system
    overall performance?
  • How does the machine's instruction set affect
    performance?

3
Airplane Analogy
  • Which of these airplanes has the best performance?

4
Computer Performance
  • Response time (latency)
  • How long does it take for my job to run?
  • How long does it take to execute a program?
  • How long must I wait for a database query?
  • Throughput
  • How many jobs can the machine run at once?
  • What is the average execution rate?
  • How much work is getting done?
  • If we upgrade a machine with a new processor what
    do we increase?
  • If we add a new machine what do we increase?

5
Which Time to Measure?
  • Elapsed Time (Wall clock time, response time)
  • Counts everything (disk and memory access, I/O,
    operating system overhead, work on other
    processes)
  • Useful but not always good for comparison
    purposes
  • CPU (execution) time
  • The time CPU spends computing for the user task
  • Not include time spent waiting for I/O, running
    other programs
  • user CPU time CPU time spent within the program,
  • system CPU time CPU time spent in the operating
    system performing tasks on behalf of the program

6
CPU Time
  • Unix time command reflects this breakdown by
    returning the following when prompted
  • 90.7u 12.9s 239 65
  • Interpretation
  • User CPU time is 90.7 s
  • System CPU time is 12.9s
  • Elapsed time is 159 s (? 90.712.9)
  • CPU time is 65 of total elapsed time

7
A Definition of Performance
  • For some program running on machine X
  • PerformanceX 1/Execution_timeX
  • The machine X is said to be n times faster than
    the machine Y if
  • PerformanceX/PerformanceY n
  • Execution_timeY/Execution_timeX n
  • Example Machine A runs a program in 10 seconds
    and machine B runs the same program in 15
    seconds, how much faster is A than B?

8
Metrics of Performance
  • Time to execute a program is the ultimate
    metric in determining the performance
  • However, it is convenient to inspect other
    metrics as well when we examine the details of a
    machine.
  • Computers use a clock that runs at a constant
    rate and determines when an event takes place in
    hardware.
  • These discrete time intervals are called clock
    cycles (or ticks, clock ticks, clock periods).
  • Clock rate (frequency) is the inverse of clock
    period.

9
Clock Cycles
  • Clock ticks indicate when to start activities
  • Instead of reporting execution time in seconds,
    we often use cycles

10
Clock Cycle
  • cycle time (CT) time between ticks seconds
    per cycle
  • Cycle Count (CC) the number of clock cycles to
    execute a program
  • clock rate (frequency) cycles per second (1
    Hz 1 cycle/sec)
  • A 200 MHz clock has a 1/(200106) ? nanosecond
    cycle time
  • A 4 GHz clock has a 1/(4 109) ? nanosecond
    cycle time

11
CPI
  • CPI Clocks Per Instruction
  • Number of cycles spent on an instruction on
    average.
  • CC IC ? CPI
  • Hard to compute.
  • It is useful when comparing the performances of
    two machines with the same ISA. (Why?)
  • Example two machines with the same ISA. For a
    certain program we have
  • Machine A CPI 2.0
  • Machine B CPI 1.2
  • Which machine is faster?
  • What if machine A uses 250 ps and machine B 500
    ps cycle time

12
Improving Performance
  • So, to improve performance
  • Increase the clock frequency (i.e. decrease the
    clock period)
  • Reduce the number of the clock cycles per program
    (IC ? CPI)

13
Instruction ? Cycle ?
  • No !
  • The number of cycles per instruction depends on
    the implementations of the instructions in
    hardware
  • The number differs for each processor (even with
    the same ISA)

14
The Reason
  • Operations take different number of cycles
  • Multiplication takes longer than addition
  • Floating point operations take longer than
    integer operations
  • The access time to a register is much shorter
    than access to the main memory.

15
Simple Formulae for CPU Time
  • CPU execution time CPU clock cycles for a
    program ? Clock cycle time (CC ? CT)
  • CPU execution time CPU clock cycles for a
    program/Clock rate
  • We can writeCPU clock cycles for a program IC
    ? CPI
  • ThenCPU execution time (IC ? CPI)/Clock rate

16
Example
  • Computer A of 800 MHz
  • It runs our favorite program in 15 s
  • Our goal
  • Design computer B with the same ISA
  • It will run the same program in 8 s.
  • We will use a new technology
  • can increase the clock rate
  • however, it will also increase CPI by 1.25.
  • What clock rate should we aim to use?

17
Performance
  • Performance is determined by execution time (CPU
    time)
  • We have also other indicators
  • of cycles to execute program
  • of instructions in program (IC)
  • of cycles per second
  • average of cycles per instruction (CPI)
  • average of instructions per second
  • Common pitfall thinking one of the variables is
    indicative of performance when it really isnt.

18
Number of Instructions Example
  • A compiler designer has the following two
    alternatives to generate a certain piece of code
    with instructions A(1 cycle) , B (2 cycles), and
    C(3 cycles)
  • 2?106 of A, 106 of B, and 2?106 of C (IC
    5?106)
  • 4?106 of A, 106 of B, and 106 of C (IC 6?106)
  • Which code sequence is faster?

19
MIPS
  • Millions Instructions Per Second
  • MIPS IC/(Execution_time ? 106)
  • MIPS IC/(of clocks ? cycle time ? 106)
  • MIPS (IC ? clock rate)/(IC ? CPI ? 106)
  • MIPS clock rate/(CPI ? 106)
  • A faster machine has a higher MIPS

Execution_time IC/(MIPS ? 106)
20
A MIPS Example
  • A computer with 500 MHz clock
  • Three different classes of instructions
  • A (1 cycle), B (2 cycles), C (3 cycles)
  • Two compilers used to produce code for a large
    piece of software.
  • Compiler 1
  • 5 billion A, 1 billion B, and 1 billion C
    instructions.
  • Compiler 2
  • 10 billion A, 1 billion B, and 1 billion C
    instructions.
  • Which sequence will be faster according to
    execution time?
  • Which sequence will be faster according to MIPS?

21
Problems of MIPS
  • MIPS specifies instruction execution rate
  • MIPS does not take into account the capabilities
    of the instructions
  • Thus, it is impossible to compare computers with
    different ISA using MIPS.
  • MIPS is not constant, even on a single machine,
    depends on the application.
  • As we saw in the previous example, MIPS can vary
    inversely with performance.

22
CPI example
  • CPI
  • Machine A CPI 10/7 1.43
  • Machine B CPI 15/12 1.25
  • CPU time
  • CPU time (IC ? CPI) / clock rate
  • Let us assume both machines use 200 MHz clock

23
Overview
  • A given program will require
  • Some number of instructions
  • Some number of clock cycles
  • Some number of seconds
  • Vocabulary
  • Cycle time (micro or nano) seconds per cycle
  • Clock rate (frequency) cycles per second
  • CPI clock per instruction
  • MIPS millions of instruction per second
  • MFLOPS millions of floating point operations per
    second

24
Performance
  • Performance is ultimately determined by execution
    time
  • Is any of the following metrics good to measure
    performance by itself? Why?
  • of cycles to execute a program
  • of instructions in a program
  • of cycles per second
  • Average of cycles per instruction
  • Average number of instructions per second

25
Question
  • Assuming two machines have the same ISA, which of
    the following quantities are identical?
  • Clock rate
  • CPI
  • Execution time
  • of instructions
  • MIPS

26
Program Performance
IC, possibly CPI
Algorithm
IC, CPI
Programming Language
IC, CPI
Compiler
IC, clock rate, CPI
ISA
27
Benchmarks
  • Programs specifically chosen to measure
    performance
  • must reflect typical workload of the user
  • Benchmark types
  • Real applications
  • Small benchmarks
  • Benchmark suites
  • Synthetic benchmarks

28
Real Applications
  • Workload Set of programs a typical user runs day
    in and day out.
  • To use these real applications for metrics is a
    direct way of comparing the execution time of the
    workload on two machines.
  • Using real applications for metrics has certain
    restrictions
  • They are usually big
  • Takes time to port to different machines
  • Takes considerable time to execute
  • Hard to observe the outcome of a certain
    improvement technique

29
Comparing Summarizing Performance
Computer A Computer B
Program 1 1 s 100 s
Program 2 1000 s 100 s
Total time 1001 s 200 s
  • A is 100 times faster than B for program 1
  • B is 10 times faster than A for program 2
  • For total performance, arithmetic mean is used

30
Arithmetic Mean
  • If each program, in the workload, do not run
    equal times, then we have to use weighted
    arithmetic mean
  • Suppose that the program 1 runs 10 times as often
    as the program 2. Which machine is faster?

weight Computer A Computer B
Program 1 (seconds) 10 1 100
Program 2 (seconds) 1 1000 100
Weighted AM - ? ?
Write a Comment
User Comments (0)
About PowerShow.com