Title: THE ROLE OF PERFORMANCE
1CHAPTER 2
2Performance
- Measure, Report, and Summarize
- Make intelligent choices
- Why is some hardware better than others for
different programs?What factors of system
performance are hardware related? (e.g., Do we
need a new machine, or a new operating
system?)How does the machine's instruction set
affect performance?
3Objectives Performance and Benchmarks
- What do we mean by the performance of a computer
and why are we concerned with it? - What's the best way to compare the performance of
two machines? - What are benchmarks? How useful are they?
- Performance can be used to
- Guide design decisions
- Compare architectures/implementations/compilers
- However, performance is in the eye of the
beholder! - Response/Execution time - time between start and
completion of a task Throughput - total amount
of work done in a given time (number of job
processes per unit time)
4Computer Performance TIME, TIME, TIME
- Response Time (latency) How long does it take
for my job to run? How long does it take to
execute a job? How long must I wait for the
database query? - Throughput How many jobs can the machine run
at once? What is the average execution
rate? How much work is getting done? - If we upgrade a machine with a new processor what
do we increase? - If we add a new machine to the lab what do we
increase?
5Measuring Performance
- Factors that affect performance
- How well the program uses the instructions of the
machine - How well the underlying hardware implements the
instructions - How well the memory and I/O systems perform
- We will compare performance of different machines
on the same task - Performance of machine X for a given program is
defined as Performance (X) 1 / Execution
Time(X) - If performance of X is better than Y Execution
Time (Y) gt Execution Time (X) Performance (X) gt
Performance (Y) because 1 / Execution Time(X)
gt 1 / Execution Time(Y) - Speedup of architecture X over Y Performance(X)
/ Performance(Y) Execution Time(Y)
/ Execution Time(X) n meaning X is n times
faster than Y
6Examples
- Example 1
- Machine A does a task in 20s, machine B does the
same task in 25s. - What is the performance of each machine? (PA
1/20,PB 1/25) - How much faster is A than B? (what is the
speedup?) (5/4) - Is "performance" a meaningful metric? (NO
depends on task) - Example 2 Machine A executes a program in 10s.
- If machine B is 1.3x faster than A, what is the
execution time on machine B? (1.3 PB/PA
TA/TB TB 10/1.3) - If machine C is 1.5x slower than A, what is the
execution time on machine C? (1.5 PA/PC
TC/TA TC 15) - But how do we measure time?
7Measuring Computer Time
- Unix time command output on a program provides
- Real time time from invocation to termination
- User CPU time - time CPU executes within this
task - System CPU time - O/S tasks performed on behalf
of this task - These measures (especially elapsed time) are what
users perceive. Is this response time or
throughput? - How do you measure portions of a program? How do
you measure time on Windows?
8Clock cycles, Clock Rate and Execution Time
- Computers are constructed using a clock that runs
at a constant rate and determines when events
take place in hardware. These discrete time
intervals are called clock cycles/ticks /clock
periods/cycles. - The length of a clock period is the time for a
complete clock cycle (e.g., 2 nanoseconds, 2 ns).
- Clock rate is the number of cycles per second,
often expressed in megahertz (MHz). Clock rate
is the inverse of clock period 1/cycle time. - What is the clock rate for a 2 ns cycle?
1/(210-9) 500106 500 MHz - What is the clock period for a machine with a
clock rate of 800 MHz? - What is the clock period for a machine with a
clock rate of 400 MHz? - (Answer 1/(800106) 1.2510-9 sec 1/(400106)
2.510-9 sec) - Relationship faster clock rate, lower clock
period.
9Clock cycles, Clock Rate and Execution Time
- Instead of reporting execution time in seconds,
we often use cycles - Clock ticks indicate when to start activities
(one abstraction) - cycle time (clock period) time between ticks
seconds per cycle - clock rate (frequency) cycles per second (1
Hz. 1 cycle/sec) - A 200 MHz clock ticks
- A 200 MHz. clock has cycle time
10Clock cycles, Clock Rate and Execution Time
- How do we calculate execution time?
- Factors
- How many cycles to do all the work?
- How long each cycle takes (Clock Period)?
- Calculation of Time using Clock Period (cycle
period, cycle length) - CPU Exec Time clock cycles clock period
Units seconds cycle seconds/cycle - Example Assume a program requires 200 106
cycles on a machine where each cycle takes 2 ns.
What is the execution time? (200 106 2
10-9 0.4 sec) - Calculation of Time using Clock Rate (cycle
frequency, clock frequency) Clock period
1/Clock Rate Therefore Execution Time
clock cycles/clock rate Units
seconds cycles / (cycles/second) - Example Assume a program requires 200 106
cycles on a machine with clock rate of 500 MHz.
What is the execution time? (200 106/(500
106) 0.4 sec)
11Examples
- Example 1
- Machine A runs at 500 MHz. Machine B runs at 650
MHz. Program1 requires 100 x 106 clock cycles on
machine A and 1.2 times that many on machine B.
Which machine is faster? By how much?
Exec(A) 100 106 / (500 106) .2 seconds
OR 100 106 2 10-9 200 10-3 .2 s
Exec(B) 120 106 / (650 106) .18 seconds
Machine B is .2/.18 1.11 times faster than A - Compare 650/500 1.3 times clock rate
- Example 2
- If a program takes 10 seconds on a 500 MHz
machine. - How many cycles must it require? Cycles 10
seconds 500 106 cycles/second 5000 106
cycles - What clock rate would be needed to achieve a 1.2
times speedup? (assuming clock cycles can stay
the same) - Target Execution 10/1.2 8.3 sec
- 5000 106 / 8.33 602 MHz
12How many cycles are required for a program?
- Could assume that of cycles of
instructions - This assumption is incorrect Different
instructions take different amounts of time on
different machines.Why? hint remember that
these are machine instructions, not lines of C
code
time
13Different numbers of cycles for different
instructions
- Multiplication takes more time than addition
- Floating point operations take longer than
integer ones - Accessing memory takes more time than accessing
registers - Important point changing the cycle time often
changes the number of cycles required for various
instructions (more later)
time
14Cycles per Instruction, (CPI)
- The number of Cycles per Instruction, CPI helps
software designers avoid Instructions with a high
CPI in favor of those with a low CPI. - Program CPI Average number of clock cycles per
instruction. - CPI depends on hardware implementation and
instruction mix. We may calculate based on
instruction counts OR based on relative
instruction frequencies. - Example 1 Assume 3 types of instructions
- Arithmetic (,,-,,/) takes 4 cycles
- Conditional (if) takes 3 cycles
- I/O takes 5 cycles
- Consider the following code segment
- cin gtgt num1 cin gtgt num2 num3 num1
num2 if (num3 gt 10) cout ltlt "yes" else
cout ltlt "no" - a) How many cycles to complete? (5583526
cycles)b) What's the average number of cycles
per instruction?(26/5 5.2 cycles)
15Program Cycles per Instruction, (CPI)
- CPI Calculation with Instruction Count Assume
CPI CPU Clock Cycles/Instruction Count then
overall program CPU Clock Cycles S(CPIi
Counti)so that CPI Overall Program
Cycles/Instructions - Example 2 Assume Class A CPI1, Class B CPI2,
Class C CPI3 Program requires 5 A, 3 B, 2 C
instructions. What is the CPI? CPU Cycles
5 1 3 2 2 3 17 Instructions
5 3 2 10 Therefore CPI 17 cycles/10
instructions 1.7 cycles/instruction - CPI Calculation with Relative FrequenciesLet fi
be the relative frequency of instruction set i
with CPIi cycles per instruction. Then Program
CPI S(CPIi fi) - Example 3 Assume Class A CPI1, Class B CPI2,
Class C CPI3 and Program uses 50 A, 30 B, 20
C instructions. What is the CPI? CPI .5 1
.3 2 .2 3 1.7
16Program Cycles per Instruction, (CPI)
- Why is CPI S(CPIi fi) true?
- CPI CPU Clock Cycles/Instr. Count S(CPIi
Counti)/Instr. Count S(CPIi Counti/Instr.
Count) S(CPIi fi). - Execution Time
- Execution Time Cycles cycle time (CPI
Instr. Count) cycle time - Instruction Count CPI
cycle time (Instruction
Count CPI)/Clock Rate - Example 1 How long would it take to execute a
program with 100 106 instructions if CPI is 3
and clock rate is 500 MHz? - (Answer Time 100 106 3/(500 106) 3/5
0.6 sec)
17Improving Computer Performance
- Time Instruction Count
CPI cycle time Time
(Instructions / Program)( Cycles /
Instruction)(Seconds / Cycle) - For a given instruction set architecture,
increases in CPU performance come from three
sources - Increases in clock rate
- Improvements in processor organization that lower
the CPI - Compiler enhancements that lower instruction
count or generate lower average CPI - Which source was used to improve performance by
- Using Intel Pentium III 933 MHz instead of Intel
Pentium III 800 MHz. - Using Intel Pentium IV instead of Intel Pentium
III. - Using release versions instead of debug versions
of programs. - Very important When comparing two machines, you
must consider all three components of execution
time. If some factors are identical, then
comparison can be based on just non-identical
factors.
18Improving Computer Performance RISC vs. CISC
- Time (Instructions / Program)( Cycles /
Instruction)(Seconds / Cycle) - Computer Architectures can be categorized as RISC
or CISC (Reduced Instruction Set Computer vs.
Complex Instruction Set Computer). - The CISC approach attempts to minimize the number
of instructions per program, sacrificing the
number of cycles per instruction. - Emphasizes improving hardware
- Includes multi-clock complex instructions
- RISC does the opposite, reducing the cycles per
instruction at the cost of the number of
instructions per program. - Emphasis on software
- Includes single-clock reduced instruction only
- Modern architectures emphasizes RISC
19Improving Computer Performance
- Example 2 Machine 1 and Machine 2 both have
clock speeds of 500 MHz On Machine 1, program P
requires 100 106 instructions has a CPI of
2.5 On Machine 2, program P requires 90 106
instructions has a CPI of 3 - Which machine is faster? By how much?(T1
0.5 sec, T2 0.54 sec, Machine 1 is 1.08 times
faster) - Evaluating Computer Performance
- A company that uses the same set of programs day
in, day out uses the same programs (workload) to
compare systems (e.g. old vs. new) - What if a company does not fall in these
categories?Use some kind of rating.
20Evaluating Computer Performance
- Goal simple metric where higher rating means
better performance. - Some ratings are
- Native MIPS
- Peak MIPS
- Relative MIPS
- MOPS, MFLOPS
- For all these measures, there is a tendency to
generalize, which is not valid. - Benchmarks Programs specifically chosen to
measure performance. Organization in charge of
Benchmarks is System Performance Evaluation
Cooperative (SPEC). The rating is the SPEC ratio
with respect to some standard machine. - The higher the SPEC ratio, the better the
machine.
21SPEC 89 for IBM Powerstation 550
- Compiler enhancements and performance
22Summary
- Performance of a computer can be measured by
Response/Execution time - time between start and
completion of a task and Throughput - total
amount of work done in a given time. - Factors determining execution time are Number of
cycles to do all the work and how long each cycle
takes (Clock Period). - CPI helps software designers avoid Instructions
with a high CPI in favor of those with a low CPI
where possible. - Program CPI can be obtained from Instruction
Count or from the instruction relative
frequencies. - Improving Performance means decreasingTime
Instruction Count CPI cycle time
(Instr. / Program)( Cycles / Inst.)(Seconds /
Cycle) by - Increases in clock rate
- Improvements in processor organization that lower
the CPI - Compiler enhancements that lower instruction
count or generate lower average CPI - Ratings of Computer Performances are MIPS, MOPS,
MFLPOS and by using Benchmarks.
23Performance Formulas