Title: CS1104: Computer Organisation http://www.comp.nus.edu.sg/~cs1104
1CS1104 Computer Organisation http//www.comp.nus.
edu.sg/cs1104
- School of Computing
- National University of Singapore
2PII Lecture 2 Performance
- Performance Definition
- Factors Affecting Performance
- Measurement Parameters for Performance
- Co-relation Among Performance Parameters
- Chapter 2 of Pattersons book 2.1, 2.2, 2.3
first part of 2.7 and 2.9.
3Performance Definition
- Purchasing perspective
- Given a collection of machines, which has the
- best performance?
- least cost?
- best performance/cost?
- Design perspective
- Faced with design options, which has the
- best performance improvement?
- least cost?
- best performance/cost?
4Performance Definition (2)
- Both require
- basis for comparison
- metric for evaluation
- Our goal is to understand performance of
machines architectural design.
5Two Notions of Performance
- Which has higher performance?
- Time to do ONE task (execution time)
- Execution time, response time, latency
- Tasks per day, hour, week, (performance)
- Throughput, bandwidth.
- Response time and throughput might be in
opposition.
6Two Notions of Performance (2)
- Time of AirBus vs. Boeing 747?
- AirBus is 1350 mph / 610 mph 2.2 times faster
6.5 hours / 3 hours - Throughput of AirBus vs. Boeing 747?
- AirBus is 178,200 pmph / 286,700 pmph 0.62
times faster - Boeing is 286,700 pmph / 178,200 pmph 1.6
times faster - Boeing is 1.6 times (60) faster in terms of
throughput. - AirBus is 2.2 times (120) faster in terms of
flying time.
7Definitions
- Performance is in units of things-per-second
- Bigger is better
- If we are primarily concerned with response time
-
- X is n times faster than Y means the speedup n
is -
8Computer Performance
- Response Time (latency)
- Time between start and end of an event (execution
time). - How long does it take for my job to run?
- How long does it take to execute a job?
- How long must I wait for the database query?
- Throughput
- Total amount of work (or number of jobs) done in
a given time. - How many jobs can the machine run at once?
- What is the average execution rate?
- How much work is getting done?
9Computer Performance (2)
- If we upgrade a machine with a new processor,
what do we increase? Answer Response Time. - If we add a new machine to the lab what do we
increase? Answer Throughput.
10Execution Time
- Elapsed Time
- counts everything (disk and memory accesses,
I/O, etc.) - a useful number, but often not good for
comparison purposes. - CPU time
- doesn't count I/O or time spent running other
programs. - can be broken up into system time, and user time.
11Execution Time (2)
- Our focus user CPU time
- time spent executing the lines of code that are
"in" our program.
12Execution Time (3)
- Instead of reporting execution time in seconds,
we often use clock cycles (basic time unit in
machine).
- Cycle time (or cycle period) time between two
consecutive rising edges, measured in seconds. - Clock rate (or clock frequency) cycles per
second (1 Hz 1 cycle/second). - Example A 200 MHz clock has cycle time of
1/(200x106) 5 x 10-9 seconds 5 nanoseconds.
...
13Execution Time (4)
- Therefore, to improve performance (everything
else being equal), you can do the following - ? Reduce the number of cycles for a program,
or - ? Reduce the clock cycle time, or said in
another way, - ? Increase the clock rate.
14Cycles in a Program
- Could we assume that number of cycles number
of instructions? Or number of cycles is
proportional to number of instructions?
15Cycles in a Program (2)
- No, the assumption is incorrect.
- The tasks of different instructions take
different amount of time to finish. - For example, a Multiple instruction may take more
cycles than an Add instruction, floating-point
operations take longer than integer operations. - Accessing memory takes more time than accessing
registers.
16Example 1 Clock Rate
- Our favorite program runs in 10 seconds on
computer A, which has a 400 MHz clock. We are
trying to help a computer designer build a new
machine B, that will run this program in 6
seconds. The designer can use new (or perhaps
more expensive) technology to substantially
increase the clock rate, but has informed us that
this increase will affect the rest of the CPU
design, causing machine B to require 1.2 times as
many clock cycles as machine A for the same
program. What clock rate should we tell the
designer to target at? - ANSWER
- For A Time 10 sec. cycles 1/400MHz
- For B Time 6 sec. 1.2 cycles
1/clock_rate - gt clock_rate 10 400 1.2 / 6 MHz 800MHz
17Cycles per Instruction (CPI)
- A given program will require
- some number of instructions (machine
instructions) - some number of cycles
- some number of seconds
X CPI
X cycle time
18Cycles per Instruction (CPI) (2)
Average cycles per instruction CPI (CPU Time
Clock Rate) / Instruction Count Clock Cycles /
Instruction Count
where
Ik instruction frequency
- Invest resources where time is spent!
19Aspects of CPU Performance
- Performance is determined by execution time.
- Does any of these variables equal performance?
- number of cycles to execute program?
- number of instructions in program?
- number of cycles per second?
- average number of cycles per instruction?
- average number of instructions per second?
- Answer No to all.
- Common pitfall thinking that one of the
variables is indicative of performance when it
really isnt.
20Aspects of CPU Performance (2)
- CPU performance depends on
- Clock cycle time ? Hardware technology and
organization - CPI ? Organization and ISA
- Instruction Count ? ISA and compiler
- Be careful of the following concepts
- Machine ? ISA and hardware organization
- Machine ? cycle time
- ISA hardware organization ? number of cycles
for any instruction (this is not average CPI) - ISA compiler program ? number of instructions
executed - Therefore, ISA Compiler Program Hardware
organization Cycle time ? Total CPU time.
21Example 2 CPI
- Suppose we have two implementations of the same
instruction set architecture (ISA). For some
program,Machine A has a clock cycle time of 10
ns and a CPI of 2.0.Machine B has a clock cycle
time of 20 ns and a CPI of 1.2.What machine is
faster for this program, and by how much? - ANSWERMachine A time _Inst 2.0 (CPI) 10
ns (cycle time) Machine B time _Inst 1.2
(CPI) 20 ns (cycle time) Performance(A) /
Performance(B) 1.2 20 / (2.0 10) 1.2
22Example 3 Number of Instructions
- A compiler designer is trying to decide between
two code sequences for a particular machine.
Based on the hardware implementation, there are
three different classes of instructions Class
A, Class B, and Class C, and they require one,
two, and three cycles (respectively). First
code seq. has 5 instructions 2 of A, 1 of B,
and 2 of CSecond seq. has 6 instructions 4 of
A, 1 of B, and 1 of C.Which sequence will be
faster? How much?What is the CPI for each
sequence? - ANSWERCPI(S1) (2 1 1 2 2 3) / (2
1 2) 2CPI(S2) (4 1 1 2 1 3) / (4
1 1) 1.5Time(S1) (2 1 1 2 2 3)
cycle_time 10 c_tTime(S2) (4 1 1 2
1 3) cycle_time 9 c_tTime(S1) / Time(S2)
10/9 1.1 and S2 is faster.
23Sample Question
- You are given two machine designs IA1 and IA2 for
performance benchmarking. Both IA1 and IA2 have
the same ISA, but different hardware
implementations and compilers. Assuming that the
clock cycle times for IA1 IA2 are the same,
performance study gives the following
measurements for the two designs - For design IA1
- Instruction Class CPI No. of Instructions
Executed - A 1 3,000,000,000,000,000,000
- B 2 2,000,000,000,000,000,000
- C 3 2,000,000,000,000,000,000
- D 4 1,000,000,000,000,000,000
- For design IA2
- A 2 2,700,000,000,000,000,000
- B 3 1,800,000,000,000,000,000
- C 3 1,800,000,000,000,000,000
- D 2 900,000,000,000,000,000
24Sample Question (2)
- What is the CPI for each machine (IA1 and IA2)?
- Let Y 1,000,000,000,000,000,000.
- CPI(IA1) (13Y 22Y 32Y 41Y) /
(3Y 2Y 2Y 1Y) - 17Y / 8Y 2.125
- CPI(IA2) (23Y32Y32Y21Y)0.9 /
(3Y2Y2Y1Y)0.9 - 20Y / 8Y 2.5
- Which machine is faster? By how much?
- Let C clock cycle.
- Time(IA1) 2.125 (8Y C)
- Time(IA2) 2.5 0.9 (8Y C)
- 2.25 (8Y C)
- IA1 is faster than IA2 by 2.25 / 2.125
1.0588
25Sample Question (3)
- To further improve the performance of the
machines, a new compiler technique is
introduced. The compiler can simply eliminate all
class D instructions from the benchmark program
without any side effects. That is, there is no
change to the number of class A, B, and C
instructions executed in the two machines. With
this new technique, which machine is faster? By
how much? - Let Y 1,000,000,000,000,000,000 and C clock
cycle. - CPI(IA1) (1 3Y 2 2Y 3 2Y) / (3Y
2Y 2Y) - 13Y / 7Y 1.857
- CPI(IA2) (23Y 32Y 32Y) 0.9 / (3Y
2Y 2Y) 0.9 - 18Y / 7Y 2.571
- Time(IA1) 1.857 (7Y C)
- Time(IA2) 2.571 0.9 (7Y C) 2.314
(7Y C) - IA1 is faster than IA2 by 2.314 / 1.857 1.246
26Sample Question (4)
- Alternatively, to further improve the performance
of the machines, a new hardware technique is
introduced. The hardware can simply execute all
class D instructions in zero times without any
side effects. (There is still execution for class
D instructions). With this new technique, which
machine is faster? By how much? - Let Y 1,000,000,000,000,000,000 and C clock
cycle. - CPI(IA1) (1 3Y 2 2Y 3 2Y 0 Y) /
(3Y 2Y 2Y Y) - 13Y / 8Y 1.625
- CPI(IA2) (23Y32Y32Y 0Y) 0.9 /
(3Y2Y2YY) 0.9 - 18Y / 8Y 2.25
- Time(IA1) 1.625 (8Y C)
- Time(IA2) 2.25 0.9 (8Y C) 2.025
(8Y C) - IA1 is faster than IA2 by 2.025 / 1.625 1.2461
27Key Concepts
- Performance is specific to a particular program.
- Total execution time is a consistent summary of
performance - For a given architecture, performance increase
comes from - increase in clock rate (without adverse CPI
affects) - improvement in processor organisation that lowers
CPI - compiler enhancement that lowers CPI and/or
instruction count - Pitfall expecting improvement in one aspect of
a machines performance to affect the total
performance.
28Key Concepts (2)
- Basic formula for execution time.
- Performance factors CPI, total number of
instructions executed, cycle time. - Factors affecting their values
- Calculation of individual values
- Note Change of one factor might change more than
one component (eg reduction of instruction types
executed) - Different performance matrix what it can tell
and what it might mislead people.
29Sample Question (1)
- Which of the following statements will affect the
program execution time? - Compiler technology
- Instruction set architecture
- Integration circuit chip technology
- Application software
- (i) and (iii)
- (ii) and (iii)
- (i), (ii), and (iii)
- All of the above
- None of the above
Answer
30Sample Question (2)
- Execution time can always be improved by
- Combining simple instructions into complex
instructions in the given ISA - Improving the clock speed of the processor
- Using a new compiler that produces a shorter
sequence of program code - (ii) only
- (i) and (ii)
- (ii) and (iii)
- All of the above
- None of the above
Answer
31Sample Question (3)
- The overall (average) CPI of a given program is
- decreased by decreasing the total number of
instructions executed in a program, assuming all
other hardware and software parameters remain the
same. - the sum of the CPIs of the individual types of
instructions executed in the program. - the average of the CPIs of the individual classes
of instructions executed in the program. That is,
if instruction class A, B, C, D, with
corresponding CPI 1, 2, 3, 4, appear in a
program, the overall CPI will be (1234)/4. - decreased by doubling the clock speed of the
processor, assuming all other hardware and
software parameters remain the same. - None of the above.
Answer
32Sample Question (4)
- Assuming all other hardware and software
parameters remain the same, the total number of
instructions executed in a program can be changed
by - changing the ISA of the processor.
- reducing the number of cycles needed to execute
each instruction type in a program. - increasing the number of cycles needed to execute
each instruction type in a program. - doubling the individual CPI of each instruction
class executed in a program. - None of the above.
Answer
33Sample Question (5)
- Execution time can always be improved by
- changing the instruction set architecture of the
processor from a complex one to a simple one. - changing the instruction set architecture of the
processor from a simple one to a complex one. - changing the compiler so that a lower overall CPI
value (and different program code sequence) is
obtained. - (i) and (iii)
- (i) only
- (ii) only
- (iii) only
- None of the above
Answer
34Sample Question (6)
- Which of the following affects the overall CPI of
a program on a given ISA? - The CPIs of individual classes of instructions
defined in the ISA. - The number of instructions executed in the
program. - The clock speed of the processor.
- (i) only
- (ii) only
- (iii) only
- (i) and (ii) only
- (i), (ii) and (iii)
Answer
35Sample Question (7)
- A program takes 5 seconds to execute on machine
MM1 and 10 seconds to execute on machine MM2.
Which of the following conclusions must be true? - The number of instructions executed on MM1 is
definitely fewer than that on MM2. - The overall CPI on MM1 is definitely smaller than
that on MM2. - The clock speed of MM1 is definitely higher than
that of MM2. - All (a), (b), (c).
- None of the above.
Answer
36Sample Question (8)
- To improve performance of a program, state what
(increase/decrease/do nothing) you would do to
the following, assuming that everything else
being equal - The number of required cycles.
- The number of instructions executed.
- The clock cycle time.
- The clock rate.
Decrease
Decrease
Decrease
Increase
37Sample Question (9)
- Given a machine at 500 MHz, the measurement of a
program execution is as follows - Instr. Class CPI Frequency
- A 1 10
- B 2 20
- C 3 30
- D 4 40
- What happens to (i) overall CPI, (ii) execution
time, if - Instruction distribution is A 40, B 30,
C20, D10. - The CPIs of all instruction classes are doubled.
- Answer increase/decrease/unchanged.
a) CPI decreases, execution time decreases.
b) CPI increases, execution time increases.
38Sample Question (10)
- State whether these statements are true or false
-
- Instruction set architecture is mainly about the
digital and circuit design of microprocessor. - My 400MHz PC is always faster than the SoCs
SUN450 system, which runs at 336 MHz. - CPU execution time of a program is defined as the
elapsed time between the program submission to
the generation of the result. -
a) False.
b) False.
c) False.
39Sample Question (11)
- Given a machine at 500 MHz, the measurement of a
program execution is as follows - Instr. Class CPI Frequency
- A 1 10
- B 2 20
- C 3 30
- D 4 40
- What happens to (i) overall CPI, (ii) execution
time, if - Equal instruction distribution among 4 classes
(25 each) - Clock rate is increased to 800 MHz.
- Answer increase/decrease/unchanged.
a) CPI decreases, execution time decreases.
b) CPI unchanged, execution time decreases.
40Sample Question (12)
- Given a machine at 500 MHz, the measurement of a
program execution is as follows - Instr. Class CPI Frequency
- A 1 10
- B 2 20
- C 3 30
- D 4 40
- What happens to (i) overall CPI, (ii)
instruction count, (iii) clock speed, if - A different compiler is used to produce the
executable code of a program. - A new hardware implement of the given ISA is
used. - Answer changed/unchanged.
a) CPI changed, IC changed, CS unchanged.
b) CPI changed, IC unchanged, CS changed.
41End of file