CS1104: Computer Organisation http://www.comp.nus.edu.sg/~cs1104

1 / 41
About This Presentation
Title:

CS1104: Computer Organisation http://www.comp.nus.edu.sg/~cs1104

Description:

CS1104: Computer Organisation http://www.comp.nus.edu.sg/~cs1104 School of Computing National University of Singapore PII Lecture 2: Performance Performance ... –

Number of Views:165
Avg rating:3.0/5.0
Slides: 42
Provided by: compNusE
Category:

less

Transcript and Presenter's Notes

Title: CS1104: Computer Organisation http://www.comp.nus.edu.sg/~cs1104


1
CS1104 Computer Organisation http//www.comp.nus.
edu.sg/cs1104
  • School of Computing
  • National University of Singapore

2
PII Lecture 2 Performance
  • Performance Definition
  • Factors Affecting Performance
  • Measurement Parameters for Performance
  • Co-relation Among Performance Parameters
  • Chapter 2 of Pattersons book 2.1, 2.2, 2.3
    first part of 2.7 and 2.9.

3
Performance Definition
  • Purchasing perspective
  • Given a collection of machines, which has the
  • best performance?
  • least cost?
  • best performance/cost?
  • Design perspective
  • Faced with design options, which has the
  • best performance improvement?
  • least cost?
  • best performance/cost?

4
Performance Definition (2)
  • Both require
  • basis for comparison
  • metric for evaluation
  • Our goal is to understand performance of
    machines architectural design.

5
Two Notions of Performance
  • Which has higher performance?
  • Time to do ONE task (execution time)
  • Execution time, response time, latency
  • Tasks per day, hour, week, (performance)
  • Throughput, bandwidth.
  • Response time and throughput might be in
    opposition.

6
Two Notions of Performance (2)
  • Time of AirBus vs. Boeing 747?
  • AirBus is 1350 mph / 610 mph 2.2 times faster
    6.5 hours / 3 hours
  • Throughput of AirBus vs. Boeing 747?
  • AirBus is 178,200 pmph / 286,700 pmph 0.62
    times faster
  • Boeing is 286,700 pmph / 178,200 pmph 1.6
    times faster
  • Boeing is 1.6 times (60) faster in terms of
    throughput.
  • AirBus is 2.2 times (120) faster in terms of
    flying time.

7
Definitions
  • Performance is in units of things-per-second
  • Bigger is better
  • If we are primarily concerned with response time
  • X is n times faster than Y means the speedup n
    is

8
Computer Performance
  • Response Time (latency)
  • Time between start and end of an event (execution
    time).
  • How long does it take for my job to run?
  • How long does it take to execute a job?
  • How long must I wait for the database query?
  • Throughput
  • Total amount of work (or number of jobs) done in
    a given time.
  • How many jobs can the machine run at once?
  • What is the average execution rate?
  • How much work is getting done?

9
Computer Performance (2)
  • If we upgrade a machine with a new processor,
    what do we increase? Answer Response Time.
  • If we add a new machine to the lab what do we
    increase? Answer Throughput.

10
Execution Time
  • Elapsed Time
  • counts everything (disk and memory accesses,
    I/O, etc.)
  • a useful number, but often not good for
    comparison purposes.
  • CPU time
  • doesn't count I/O or time spent running other
    programs.
  • can be broken up into system time, and user time.

11
Execution Time (2)
  • Our focus user CPU time
  • time spent executing the lines of code that are
    "in" our program.

12
Execution Time (3)
  • Instead of reporting execution time in seconds,
    we often use clock cycles (basic time unit in
    machine).
  • Cycle time (or cycle period) time between two
    consecutive rising edges, measured in seconds.
  • Clock rate (or clock frequency) cycles per
    second (1 Hz 1 cycle/second).
  • Example A 200 MHz clock has cycle time of
    1/(200x106) 5 x 10-9 seconds 5 nanoseconds.

...
13
Execution Time (4)
  • Therefore, to improve performance (everything
    else being equal), you can do the following
  • ? Reduce the number of cycles for a program,
    or
  • ? Reduce the clock cycle time, or said in
    another way,
  • ? Increase the clock rate.

14
Cycles in a Program
  • Could we assume that number of cycles number
    of instructions? Or number of cycles is
    proportional to number of instructions?

15
Cycles in a Program (2)
  • No, the assumption is incorrect.
  • The tasks of different instructions take
    different amount of time to finish.
  • For example, a Multiple instruction may take more
    cycles than an Add instruction, floating-point
    operations take longer than integer operations.
  • Accessing memory takes more time than accessing
    registers.

16
Example 1 Clock Rate
  • Our favorite program runs in 10 seconds on
    computer A, which has a 400 MHz clock. We are
    trying to help a computer designer build a new
    machine B, that will run this program in 6
    seconds. The designer can use new (or perhaps
    more expensive) technology to substantially
    increase the clock rate, but has informed us that
    this increase will affect the rest of the CPU
    design, causing machine B to require 1.2 times as
    many clock cycles as machine A for the same
    program. What clock rate should we tell the
    designer to target at?
  • ANSWER
  • For A Time 10 sec. cycles 1/400MHz
  • For B Time 6 sec. 1.2 cycles
    1/clock_rate
  • gt clock_rate 10 400 1.2 / 6 MHz 800MHz

17
Cycles per Instruction (CPI)
  • A given program will require
  • some number of instructions (machine
    instructions)
  • some number of cycles
  • some number of seconds

X CPI
X cycle time
18
Cycles per Instruction (CPI) (2)
Average cycles per instruction CPI (CPU Time
Clock Rate) / Instruction Count Clock Cycles /
Instruction Count

where
Ik instruction frequency
  • Invest resources where time is spent!

19
Aspects of CPU Performance
  • Performance is determined by execution time.
  • Does any of these variables equal performance?
  • number of cycles to execute program?
  • number of instructions in program?
  • number of cycles per second?
  • average number of cycles per instruction?
  • average number of instructions per second?
  • Answer No to all.
  • Common pitfall thinking that one of the
    variables is indicative of performance when it
    really isnt.

20
Aspects of CPU Performance (2)
  • CPU performance depends on
  • Clock cycle time ? Hardware technology and
    organization
  • CPI ? Organization and ISA
  • Instruction Count ? ISA and compiler
  • Be careful of the following concepts
  • Machine ? ISA and hardware organization
  • Machine ? cycle time
  • ISA hardware organization ? number of cycles
    for any instruction (this is not average CPI)
  • ISA compiler program ? number of instructions
    executed
  • Therefore, ISA Compiler Program Hardware
    organization Cycle time ? Total CPU time.

21
Example 2 CPI
  • Suppose we have two implementations of the same
    instruction set architecture (ISA). For some
    program,Machine A has a clock cycle time of 10
    ns and a CPI of 2.0.Machine B has a clock cycle
    time of 20 ns and a CPI of 1.2.What machine is
    faster for this program, and by how much?
  • ANSWERMachine A time _Inst 2.0 (CPI) 10
    ns (cycle time) Machine B time _Inst 1.2
    (CPI) 20 ns (cycle time) Performance(A) /
    Performance(B) 1.2 20 / (2.0 10) 1.2

22
Example 3 Number of Instructions
  • A compiler designer is trying to decide between
    two code sequences for a particular machine.
    Based on the hardware implementation, there are
    three different classes of instructions Class
    A, Class B, and Class C, and they require one,
    two, and three cycles (respectively). First
    code seq. has 5 instructions 2 of A, 1 of B,
    and 2 of CSecond seq. has 6 instructions 4 of
    A, 1 of B, and 1 of C.Which sequence will be
    faster? How much?What is the CPI for each
    sequence?
  • ANSWERCPI(S1) (2 1 1 2 2 3) / (2
    1 2) 2CPI(S2) (4 1 1 2 1 3) / (4
    1 1) 1.5Time(S1) (2 1 1 2 2 3)
    cycle_time 10 c_tTime(S2) (4 1 1 2
    1 3) cycle_time 9 c_tTime(S1) / Time(S2)
    10/9 1.1 and S2 is faster.

23
Sample Question
  • You are given two machine designs IA1 and IA2 for
    performance benchmarking. Both IA1 and IA2 have
    the same ISA, but different hardware
    implementations and compilers. Assuming that the
    clock cycle times for IA1 IA2 are the same,
    performance study gives the following
    measurements for the two designs
  • For design IA1
  • Instruction Class CPI No. of Instructions
    Executed
  • A 1 3,000,000,000,000,000,000
  • B 2 2,000,000,000,000,000,000
  • C 3 2,000,000,000,000,000,000
  • D 4 1,000,000,000,000,000,000
  • For design IA2
  • A 2 2,700,000,000,000,000,000
  • B 3 1,800,000,000,000,000,000
  • C 3 1,800,000,000,000,000,000
  • D 2 900,000,000,000,000,000

24
Sample Question (2)
  • What is the CPI for each machine (IA1 and IA2)?
  • Let Y 1,000,000,000,000,000,000.
  • CPI(IA1) (13Y 22Y 32Y 41Y) /
    (3Y 2Y 2Y 1Y)
  • 17Y / 8Y 2.125
  • CPI(IA2) (23Y32Y32Y21Y)0.9 /
    (3Y2Y2Y1Y)0.9
  • 20Y / 8Y 2.5
  • Which machine is faster? By how much?
  • Let C clock cycle.
  • Time(IA1) 2.125 (8Y C)
  • Time(IA2) 2.5 0.9 (8Y C)
  • 2.25 (8Y C)
  • IA1 is faster than IA2 by 2.25 / 2.125
    1.0588

25
Sample Question (3)
  • To further improve the performance of the
    machines, a new compiler technique is
    introduced. The compiler can simply eliminate all
    class D instructions from the benchmark program
    without any side effects. That is, there is no
    change to the number of class A, B, and C
    instructions executed in the two machines. With
    this new technique, which machine is faster? By
    how much?
  • Let Y 1,000,000,000,000,000,000 and C clock
    cycle.
  • CPI(IA1) (1 3Y 2 2Y 3 2Y) / (3Y
    2Y 2Y)
  • 13Y / 7Y 1.857
  • CPI(IA2) (23Y 32Y 32Y) 0.9 / (3Y
    2Y 2Y) 0.9
  • 18Y / 7Y 2.571
  • Time(IA1) 1.857 (7Y C)
  • Time(IA2) 2.571 0.9 (7Y C) 2.314
    (7Y C)
  • IA1 is faster than IA2 by 2.314 / 1.857 1.246

26
Sample Question (4)
  • Alternatively, to further improve the performance
    of the machines, a new hardware technique is
    introduced. The hardware can simply execute all
    class D instructions in zero times without any
    side effects. (There is still execution for class
    D instructions). With this new technique, which
    machine is faster? By how much?
  • Let Y 1,000,000,000,000,000,000 and C clock
    cycle.
  • CPI(IA1) (1 3Y 2 2Y 3 2Y 0 Y) /
    (3Y 2Y 2Y Y)
  • 13Y / 8Y 1.625
  • CPI(IA2) (23Y32Y32Y 0Y) 0.9 /
    (3Y2Y2YY) 0.9
  • 18Y / 8Y 2.25
  • Time(IA1) 1.625 (8Y C)
  • Time(IA2) 2.25 0.9 (8Y C) 2.025
    (8Y C)
  • IA1 is faster than IA2 by 2.025 / 1.625 1.2461

27
Key Concepts
  • Performance is specific to a particular program.
  • Total execution time is a consistent summary of
    performance
  • For a given architecture, performance increase
    comes from
  • increase in clock rate (without adverse CPI
    affects)
  • improvement in processor organisation that lowers
    CPI
  • compiler enhancement that lowers CPI and/or
    instruction count
  • Pitfall expecting improvement in one aspect of
    a machines performance to affect the total
    performance.

28
Key Concepts (2)
  • Basic formula for execution time.
  • Performance factors CPI, total number of
    instructions executed, cycle time.
  • Factors affecting their values
  • Calculation of individual values
  • Note Change of one factor might change more than
    one component (eg reduction of instruction types
    executed)
  • Different performance matrix what it can tell
    and what it might mislead people.

29
Sample Question (1)
  • Which of the following statements will affect the
    program execution time?
  • Compiler technology
  • Instruction set architecture
  • Integration circuit chip technology
  • Application software
  • (i) and (iii)
  • (ii) and (iii)
  • (i), (ii), and (iii)
  • All of the above
  • None of the above

Answer
30
Sample Question (2)
  • Execution time can always be improved by
  • Combining simple instructions into complex
    instructions in the given ISA
  • Improving the clock speed of the processor
  • Using a new compiler that produces a shorter
    sequence of program code
  • (ii) only
  • (i) and (ii)
  • (ii) and (iii)
  • All of the above
  • None of the above

Answer
31
Sample Question (3)
  • The overall (average) CPI of a given program is
  • decreased by decreasing the total number of
    instructions executed in a program, assuming all
    other hardware and software parameters remain the
    same.
  • the sum of the CPIs of the individual types of
    instructions executed in the program.
  • the average of the CPIs of the individual classes
    of instructions executed in the program. That is,
    if instruction class A, B, C, D, with
    corresponding CPI 1, 2, 3, 4, appear in a
    program, the overall CPI will be (1234)/4.
  • decreased by doubling the clock speed of the
    processor, assuming all other hardware and
    software parameters remain the same.
  • None of the above.

Answer
32
Sample Question (4)
  • Assuming all other hardware and software
    parameters remain the same, the total number of
    instructions executed in a program can be changed
    by
  • changing the ISA of the processor.
  • reducing the number of cycles needed to execute
    each instruction type in a program.
  • increasing the number of cycles needed to execute
    each instruction type in a program.
  • doubling the individual CPI of each instruction
    class executed in a program.
  • None of the above.

Answer
33
Sample Question (5)
  • Execution time can always be improved by
  • changing the instruction set architecture of the
    processor from a complex one to a simple one.
  • changing the instruction set architecture of the
    processor from a simple one to a complex one.
  • changing the compiler so that a lower overall CPI
    value (and different program code sequence) is
    obtained.
  • (i) and (iii)
  • (i) only
  • (ii) only
  • (iii) only
  • None of the above

Answer
34
Sample Question (6)
  • Which of the following affects the overall CPI of
    a program on a given ISA?
  • The CPIs of individual classes of instructions
    defined in the ISA.
  • The number of instructions executed in the
    program.
  • The clock speed of the processor.
  • (i) only
  • (ii) only
  • (iii) only
  • (i) and (ii) only
  • (i), (ii) and (iii)

Answer
35
Sample Question (7)
  • A program takes 5 seconds to execute on machine
    MM1 and 10 seconds to execute on machine MM2.
    Which of the following conclusions must be true?
  • The number of instructions executed on MM1 is
    definitely fewer than that on MM2.
  • The overall CPI on MM1 is definitely smaller than
    that on MM2.
  • The clock speed of MM1 is definitely higher than
    that of MM2.
  • All (a), (b), (c).
  • None of the above.

Answer
36
Sample Question (8)
  • To improve performance of a program, state what
    (increase/decrease/do nothing) you would do to
    the following, assuming that everything else
    being equal
  • The number of required cycles.
  • The number of instructions executed.
  • The clock cycle time.
  • The clock rate.

Decrease
Decrease
Decrease
Increase
37
Sample Question (9)
  • Given a machine at 500 MHz, the measurement of a
    program execution is as follows
  • Instr. Class CPI Frequency
  • A 1 10
  • B 2 20
  • C 3 30
  • D 4 40
  • What happens to (i) overall CPI, (ii) execution
    time, if
  • Instruction distribution is A 40, B 30,
    C20, D10.
  • The CPIs of all instruction classes are doubled.
  • Answer increase/decrease/unchanged.

a) CPI decreases, execution time decreases.
b) CPI increases, execution time increases.
38
Sample Question (10)
  • State whether these statements are true or false
  • Instruction set architecture is mainly about the
    digital and circuit design of microprocessor.
  • My 400MHz PC is always faster than the SoCs
    SUN450 system, which runs at 336 MHz.
  • CPU execution time of a program is defined as the
    elapsed time between the program submission to
    the generation of the result.

a) False.
b) False.
c) False.
39
Sample Question (11)
  • Given a machine at 500 MHz, the measurement of a
    program execution is as follows
  • Instr. Class CPI Frequency
  • A 1 10
  • B 2 20
  • C 3 30
  • D 4 40
  • What happens to (i) overall CPI, (ii) execution
    time, if
  • Equal instruction distribution among 4 classes
    (25 each)
  • Clock rate is increased to 800 MHz.
  • Answer increase/decrease/unchanged.

a) CPI decreases, execution time decreases.
b) CPI unchanged, execution time decreases.
40
Sample Question (12)
  • Given a machine at 500 MHz, the measurement of a
    program execution is as follows
  • Instr. Class CPI Frequency
  • A 1 10
  • B 2 20
  • C 3 30
  • D 4 40
  • What happens to (i) overall CPI, (ii)
    instruction count, (iii) clock speed, if
  • A different compiler is used to produce the
    executable code of a program.
  • A new hardware implement of the given ISA is
    used.
  • Answer changed/unchanged.

a) CPI changed, IC changed, CS unchanged.
b) CPI changed, IC unchanged, CS changed.
41
End of file
Write a Comment
User Comments (0)
About PowerShow.com