THE ROLE OF PERFORMANCE - PowerPoint PPT Presentation

About This Presentation
Title:

THE ROLE OF PERFORMANCE

Description:

Why is some hardware better than others for different programs? ... CPI helps software designers avoid Instructions with a high CPI in favor of ... – PowerPoint PPT presentation

Number of Views:18
Avg rating:3.0/5.0
Slides: 24
Provided by: JamiiruLu8
Learn more at: http://nia.ecsu.edu
Category:
Tags: performance | role | the

less

Transcript and Presenter's Notes

Title: THE ROLE OF PERFORMANCE


1
CHAPTER 2
  • THE ROLE OF PERFORMANCE

2
Performance
  • Measure, Report, and Summarize
  • Make intelligent choices
  • Why is some hardware better than others for
    different programs?What factors of system
    performance are hardware related? (e.g., Do we
    need a new machine, or a new operating
    system?)How does the machine's instruction set
    affect performance?

3
Objectives Performance and Benchmarks
  • What do we mean by the performance of a computer
    and why are we concerned with it?
  • What's the best way to compare the performance of
    two machines?
  • What are benchmarks?  How useful are they?
  • Performance can be used to
  • Guide design decisions
  • Compare architectures/implementations/compilers
  • However, performance is in the eye of the
    beholder!
  • Response/Execution time - time between start and
    completion of a task Throughput - total amount
    of work done in a given time (number of job
    processes per unit time)

4
Computer Performance TIME, TIME, TIME
  • Response Time (latency) How long does it take
    for my job to run? How long does it take to
    execute a job? How long must I wait for the
    database query?
  • Throughput How many jobs can the machine run
    at once? What is the average execution
    rate? How much work is getting done?
  • If we upgrade a machine with a new processor what
    do we increase?
  • If we add a new machine to the lab what do we
    increase?

5
Measuring Performance
  • Factors that affect performance
  • How well the program uses the instructions of the
    machine
  • How well the underlying hardware implements the
    instructions
  • How well the memory and I/O systems perform
  • We will compare performance of different machines
    on the same task
  • Performance of machine X for a given program is
    defined as Performance (X)   1 / Execution
    Time(X)
  • If performance of X is better than Y Execution
    Time (Y) gt Execution Time (X) Performance (X) gt
    Performance (Y) because 1 / Execution Time(X)
    gt 1 / Execution Time(Y)
  • Speedup of architecture X over Y Performance(X)
    / Performance(Y) Execution Time(Y)
    /  Execution Time(X) n meaning X is n times
    faster than Y

6
Examples
  • Example 1
  • Machine A does a task in 20s, machine B does the
    same task in 25s.
  • What is the performance of each machine? (PA
    1/20,PB 1/25)
  • How much faster is A than B? (what is the
    speedup?) (5/4)
  • Is "performance" a meaningful metric? (NO
    depends on task)
  • Example 2 Machine A executes a program in 10s.
  • If machine B is 1.3x faster than A, what is the
    execution time on machine B? (1.3 PB/PA
    TA/TB TB 10/1.3)
  • If machine C is 1.5x slower than A, what is the
    execution time on machine C? (1.5 PA/PC
    TC/TA TC 15)
  • But how do we measure time?

7
Measuring Computer Time
  • Unix time command output on a program provides
  • Real time time from invocation to termination
  • User CPU time - time CPU executes within this
    task
  • System CPU time - O/S tasks performed on behalf
    of this task
  • These measures (especially elapsed time) are what
    users perceive.  Is this response time or
    throughput?
  • How do you measure portions of a program? How do
    you measure time on Windows?

8
Clock cycles, Clock Rate and Execution Time
  • Computers are constructed using a clock that runs
    at a constant rate and determines when events
    take place in hardware.  These discrete time
    intervals are called clock cycles/ticks /clock
    periods/cycles.
  • The length of a clock period is the time for a
    complete clock cycle (e.g., 2 nanoseconds, 2 ns).
  • Clock rate is the number of cycles per second,
    often expressed in megahertz (MHz).  Clock rate
    is the inverse of clock period 1/cycle time.
  • What is the clock rate for a 2 ns cycle?
     1/(210-9) 500106 500 MHz  
  • What is the clock period for a machine with a
    clock rate of 800 MHz?
  • What is the clock period for a machine with a
    clock rate of 400 MHz?
  • (Answer 1/(800106) 1.2510-9 sec 1/(400106)
    2.510-9 sec)
  • Relationship faster clock rate, lower clock
    period. 

9
Clock cycles, Clock Rate and Execution Time
  • Instead of reporting execution time in seconds,
    we often use cycles
  • Clock ticks indicate when to start activities
    (one abstraction)
  • cycle time (clock period) time between ticks
    seconds per cycle
  • clock rate (frequency) cycles per second (1
    Hz. 1 cycle/sec)
  • A 200 MHz clock ticks
  • A 200 MHz. clock has cycle time

10
Clock cycles, Clock Rate and Execution Time
  • How do we calculate execution time?
  • Factors
  • How many cycles to do all the work?
  • How long each cycle takes (Clock Period)?
  • Calculation of Time using Clock Period (cycle
    period, cycle length)
  • CPU Exec Time   clock cycles clock period
    Units seconds cycle seconds/cycle
  • Example Assume a program requires 200 106
    cycles on a machine where each cycle takes 2 ns. 
    What is the execution time? (200 106 2
    10-9 0.4 sec)
  • Calculation of Time using Clock Rate (cycle
    frequency, clock frequency) Clock period
    1/Clock Rate Therefore Execution Time
    clock cycles/clock rate Units
    seconds cycles / (cycles/second)
  • Example Assume a program requires 200 106
    cycles on a machine with clock rate of 500 MHz. 
    What is the execution time? (200 106/(500
    106) 0.4 sec)

11
Examples
  • Example 1
  • Machine A runs at 500 MHz. Machine B runs at 650
    MHz. Program1 requires 100 x 106 clock cycles on
    machine A and 1.2 times that many on machine B.
    Which machine is faster?  By how much?  
    Exec(A) 100 106 / (500 106) .2 seconds
     OR 100 106 2 10-9 200 10-3 .2 s
    Exec(B) 120 106 / (650 106) .18 seconds
    Machine B is .2/.18 1.11 times faster than A
  • Compare 650/500 1.3 times clock rate
  • Example 2
  • If a program takes 10 seconds on a 500 MHz
    machine.
  • How many cycles must it require?   Cycles 10
    seconds   500 106 cycles/second 5000 106
    cycles
  • What clock rate would be needed to achieve a 1.2
    times speedup? (assuming clock cycles can stay
    the same)
  •  Target Execution 10/1.2 8.3 sec
  • 5000 106 / 8.33 602 MHz

12
How many cycles are required for a program?
  • Could assume that of cycles of
    instructions
  • This assumption is incorrect Different
    instructions take different amounts of time on
    different machines.Why? hint remember that
    these are machine instructions, not lines of C
    code

time
13
Different numbers of cycles for different
instructions
  • Multiplication takes more time than addition
  • Floating point operations take longer than
    integer ones
  • Accessing memory takes more time than accessing
    registers
  • Important point changing the cycle time often
    changes the number of cycles required for various
    instructions (more later)

time
14
Cycles per Instruction, (CPI)
  • The number of Cycles per Instruction, CPI helps
    software designers avoid Instructions with a high
    CPI in favor of those with a low CPI.
  • Program CPI Average number of clock cycles per
    instruction.
  • CPI depends on hardware implementation and
    instruction mix.  We may calculate based on
    instruction counts OR based on relative
    instruction frequencies.
  • Example 1 Assume 3 types of instructions
  • Arithmetic (,,-,,/) takes 4 cycles
  • Conditional (if) takes 3 cycles
  • I/O takes 5 cycles
  • Consider the following code segment
  • cin gtgt num1 cin gtgt num2 num3 num1
    num2 if (num3 gt 10)  cout ltlt "yes" else
     cout ltlt "no"
  • a) How many cycles to complete? (5583526
    cycles)b) What's the average number of cycles
    per instruction?(26/5 5.2 cycles)

15
Program Cycles per Instruction, (CPI)
  • CPI Calculation with Instruction Count Assume
    CPI CPU Clock Cycles/Instruction Count then
    overall program CPU Clock Cycles S(CPIi
    Counti)so that CPI Overall Program
    Cycles/Instructions
  • Example 2 Assume Class A CPI1, Class B CPI2,
    Class C CPI3 Program requires 5 A, 3 B, 2 C
    instructions.  What is the CPI? CPU Cycles
    5 1 3 2 2 3 17   Instructions
    5 3 2 10 Therefore CPI 17 cycles/10
    instructions 1.7 cycles/instruction
  • CPI Calculation with Relative FrequenciesLet fi
    be the relative frequency of instruction set i
    with CPIi cycles per instruction. Then Program
    CPI S(CPIi fi)
  • Example 3 Assume Class A CPI1, Class B CPI2,
    Class C CPI3 and Program uses 50 A, 30 B, 20 
    C instructions.  What is the CPI? CPI .5 1
    .3 2 .2 3 1.7

16
Program Cycles per Instruction, (CPI)
  • Why is CPI S(CPIi fi) true?
  • CPI CPU Clock Cycles/Instr. Count S(CPIi
    Counti)/Instr. Count S(CPIi Counti/Instr.
    Count) S(CPIi fi).
  • Execution Time
  • Execution Time Cycles cycle time (CPI
    Instr. Count) cycle time
  •   Instruction Count CPI
    cycle time (Instruction
    Count CPI)/Clock Rate
  • Example 1 How long would it take to execute a
    program with 100 106 instructions if CPI is 3
    and clock rate is 500 MHz?
  • (Answer Time 100 106 3/(500 106) 3/5
    0.6 sec)

17
Improving Computer Performance
  • Time Instruction Count
    CPI cycle time Time
    (Instructions / Program)( Cycles /
    Instruction)(Seconds / Cycle)
  • For a given instruction set architecture,
    increases in CPU performance come from three
    sources
  • Increases in clock rate
  • Improvements in processor organization that lower
    the CPI
  • Compiler enhancements that lower instruction
    count or generate lower average CPI
  • Which source was used to improve performance by
  • Using Intel Pentium III 933 MHz instead of Intel
    Pentium III 800 MHz.
  • Using Intel Pentium IV instead of Intel Pentium
    III.
  • Using release versions instead of debug versions
    of programs.
  • Very important  When comparing two machines, you
    must consider all three components of execution
    time.  If some factors are identical, then
    comparison can be based on just non-identical
    factors.

18
Improving Computer Performance RISC vs. CISC
  • Time (Instructions / Program)( Cycles /
    Instruction)(Seconds / Cycle)
  • Computer Architectures can be categorized as RISC
    or CISC (Reduced Instruction Set Computer vs.
    Complex Instruction Set Computer).
  • The CISC approach attempts to minimize the number
    of instructions per program, sacrificing the
    number of cycles per instruction.
  • Emphasizes improving hardware
  • Includes multi-clock complex instructions
  • RISC does the opposite, reducing the cycles per
    instruction at the cost of the number of
    instructions per program.
  • Emphasis on software
  • Includes single-clock reduced instruction only
  • Modern architectures emphasizes RISC

19
Improving Computer Performance
  • Example 2 Machine 1 and Machine 2 both have
    clock speeds of 500 MHz On Machine 1, program P
    requires 100 106 instructions has a CPI of
    2.5 On Machine 2, program P requires 90 106
    instructions has a CPI of 3
  • Which machine is faster?  By how much?(T1
    0.5 sec, T2 0.54 sec, Machine 1 is 1.08 times
    faster)
  • Evaluating Computer Performance
  • A company that uses the same set of programs day
    in, day out uses the same programs (workload) to
    compare systems (e.g. old vs. new)
  • What if a company does not fall in these
    categories?Use some kind of rating.

20
Evaluating Computer Performance
  • Goal simple metric where higher rating means
    better performance.
  • Some ratings are
  • Native MIPS
  • Peak MIPS
  • Relative MIPS
  • MOPS, MFLOPS
  • For all these measures, there is a tendency to
    generalize, which is not valid.
  • Benchmarks Programs specifically chosen to
    measure performance. Organization in charge of
    Benchmarks is System Performance Evaluation
    Cooperative (SPEC). The rating is the SPEC ratio
    with respect to some standard machine.
  • The higher the SPEC ratio, the better the
    machine.

21
SPEC 89 for IBM Powerstation 550
  • Compiler enhancements and performance

22
Summary
  • Performance of a computer can be measured by
    Response/Execution time - time between start and
    completion of a task and Throughput - total
    amount of work done in a given time.
  • Factors determining execution time are Number of
    cycles to do all the work and how long each cycle
    takes (Clock Period).
  • CPI helps software designers avoid Instructions
    with a high CPI in favor of those with a low CPI
    where possible.
  • Program CPI can be obtained from Instruction
    Count or from the instruction relative
    frequencies.
  • Improving Performance means decreasingTime
    Instruction Count CPI cycle time
    (Instr. / Program)( Cycles / Inst.)(Seconds /
    Cycle) by
  • Increases in clock rate
  • Improvements in processor organization that lower
    the CPI
  • Compiler enhancements that lower instruction
    count or generate lower average CPI
  • Ratings of Computer Performances are MIPS, MOPS,
    MFLPOS and by using Benchmarks.

23
Performance Formulas
Write a Comment
User Comments (0)
About PowerShow.com