Computer%20Architecture%20Lecture%20Notes%20Spring%202005%20Dr.%20Michael%20P.%20Frank

About This Presentation

Title:

Computer%20Architecture%20Lecture%20Notes%20Spring%202005%20Dr.%20Michael%20P.%20Frank

Description:

THROUGHPUT the total amount of work done in a given amount of time. Performance Metrics ... One common approach: Total Execution Time (TET) Based on: ... – PowerPoint PPT presentation

Number of Views:53

Avg rating:3.0/5.0

Slides: 48

Provided by: adrianj5

Learn more at: https://eng.fsu.edu

Category:

more less

Transcript and Presenter's Notes

Title: Computer%20Architecture%20Lecture%20Notes%20Spring%202005%20Dr.%20Michael%20P.%20Frank

1
Computer Architecture Lecture Notes Spring
2005Dr. Michael P. Frank

Competency Area 2
Performance Metrics
Lecture 1

2
Performance Metrics

Why is it necessary for us to study performance?
Performance is usually the key to the
effectiveness of a system (hardware software).
Performance is critical to customers
(purchasers), thus, we as designers and
architects must also make it a priority.
Performance must be assessed and understood in
order for a system to communicate efficiently
with peripheral devices.

3
Performance Metrics

How can we determine performance?

Consider this example from the transportation
industry
4
Performance Example

Fuel Capacity in liters
Range in kilometers
Speed in kilometers/hour
Throughput is defined as
( of passengers) x (cruising speed)
Cost is given as
(fuel capacity) / (passengers x range)
Which mode of transportation has the best
performance?

5
Performance Example

It depends on how we define performance.
Consider raw speed
Getting from one place to another quickly

6
Performance Example

What if were interested in the rate at which
people are carried throughput

7
Performance Example

Often times we relate performance and cost. Thus
we can consider the amount of fuel used per
passenger

8
Performance Metrics

Similar measures of performance are used for
computers.
Number of computations done per unit of time
Cost of computations
Possibly several aspects of cost can be
considered including initial purchase price,
operating cost, cost of training users of system,
etc.
Common performance measures are
RESPONSE TIME the amount of time it takes a
program to complete (a.k.a execution time)
THROUGHPUT the total amount of work done in a
given amount of time

9
Performance Metrics

Example
Given the following actions
1. Replacing processor with a faster version
2. Adding additional processors to perform
separate tasks in a multiprocessor system
do they (a) increase throughput, (a) decrease
response time or (c) both?

10
Defining Performance

Our focus will be primarily on execution time.
To maximize performance implies a minimization in
execution time
For two machines
We say that machine Y is faster than machine X.

11
Performance Metrics

Notes

(1) If X is n times faster than Y, then

To avoid confusion, well use the following
terminology
We say We mean
improve performance ? increase
performance
improve execution time ? decrease execution
time

12
Performance Example
If machine A runs a program in 10 seconds and
machine B runs the same program in 15 seconds,
how much faster is A than B?
13
Performance Example
If machine A runs a program in 10 seconds and
machine B runs the same program in 15 seconds,
how much faster is A than B?
14
Measuring Performance

Quite simply, TIME is the measure of computer
performance!
The most straightforward definition of time in
wall-clock time ? elapsed time ? response time.

Total time to complete a task including system
overhead activities such as Input/Output tasks,
disk and memory accesses, etc.
15
Measuring Performance

CPU Time is the time it takes to complete a task
excluding the time it takes for I/O waits.

CPU TIME
USER CPU TIME The time CPU is busy executing the
users code.
SYSTEM CPU TIME The time CPU spends performing
operating system tasks.
Note Sometimes system and user CPU times are
difficult to distinguish since it is hard to
assign responsibility for OS activities.
16
Measuring Performance

Example,
To understand the concept of CPUTime, consider
the UNIX command time. Once typed, it may
return a response similar to
90.7u 12.9s 239 65
What do these numbers mean?

17
Measuring Performance

Example,
To understand the concept of CPUTime, consider
the UNIX command time. Once typed, it may
return a response similar to
90.7u 12.9s 239 65

of elapsed time that is CPU time
User CPU Time
System CPU Time
Elapsed Time
18
Measuring Performance

Example,
To understand the concept of CPUTime, consider
the UNIX command time. Once typed, it may
return a response similar to
90.7u 12.9s 239 65
What is the total CPUTime?
Percentage of time spent on I/O and other
programs?

19
Measuring Performance

Example,
To understand the concept of CPUTime, consider
the UNIX command time. Once typed, it may
return a response similar to
90.7u 12.9s 239 65
What is the total CPUTime?
Percentage of time spent on I/O and other
programs?

20
Measuring Performance

Other notes
SYSTEM PERFORMANCE reciprocal of elapsed time
on an unloaded system (e.g. no user applications)
CPU PERFORMANCE recip. of user CPU time
CLOCK CYCLES (CC) discrete time intervals
measured by the processor clock running at a
constant rate.
CLOCK PERIOD time it takes to complete a clock
cycle
CLOCK RATE inverse of clock period

21
Measuring Performance

Consider CPU performance
Also,

22
Measuring Performance

Since the execution time clearly depends on the
number of instructions for a program, we must
also define another performance metric
CPI average number of clock cycles
per instruction

23
Measuring Performance

Now we have two more equations that we can define
for CPUTime

24
Measuring Performance

In summary, performance metrics include

Components of Performance Units of Measure
CPUTime Seconds for program
IC of instructions for a program
CPI Average of clock cycles per instructions
tCC Seconds per clock cycle
25
Measuring Performance

Example,
Suppose Machine A implements the same ISA as
Machine B. Given and
for some program, and
and for the same program, determine
which machine is faster and by how much.

26
Breakdown by Instruction Category

Recall CPI Clock cycles (CC) per instruction
But, CPI depends on many factors, including
Memory system behavior
Processor structure
Availability special processor features
E.g., floating point, graphics, etc.
To characterize the effect of changing specific
aspects of the architecture, we find it helpful
to break down CC into components due to different
classes (categories) of instructions
Where
ICi instruction count for class i
CPIi avg. cycles for insts. in class i
n the number of instruction classes

27
Example

Suppose a processor has 3 categories of
instructions A,B,C with the following CPIs
And, suppose a compiler designer is comparing two
code sequences for a given program that have the
following instruction counts
Determine
(i) Which code sequence executes the most
instructions?
(ii) Which will be faster?
(iii) What is the average CPI for each code
sequence?

Instr. Class CPIi
A 1
B 2
C 3
Code Seq. Inst. counts Inst. counts Inst. counts
Code Seq. ICA ICB ICC
1 2 1 2
2 4 1 1
28
Solution to Example

Part (i)
ICseq1 2 1 2 5 instructions
ICseq2 4 1 1 6 instructions
? Code sequence 2 executes more instructions
Part (ii)
CCseq1 ?i(CPIixICi) 1x2 2x1 3x2 10
cycles
CCseq2 ?i(CPIixICi) 1x4 2x1 3x1 9
cycles
? Code sequence 2 takes fewer cycles ? is faster!
Part (iii)
CPIseq1 CC/ICseq1 10 cyc./5 inst. 2
CPIseq2 CC/ICseq2 9 cyc./6 inst. 1.5
Which part should we consult to tell us which
code sequence has better performance?

29
Importance of Benchmarks

How do we evaluate and compare the performance of
different architectures?
We use benchmarks
Programs that are specifically chosen to measure
performance.
A workload is a set of programs.
Benchmarks consist of workloads that (user hopes)
will predict the performance of the actual
workload
It is important that benchmarks consist of
realistic workloads
Not simple toy programs or code fragments
Manufacturers often try to fine-tune their
machines to do well on popular benchmarks that
were too simple
This does not always mean the machine will do
well on real programs!

30
SPEC benchmark

A popular source of benchmarks is SPEC
Standard Performance Evaluation Corporation
General CPU benchmarks CPU2000.
Includes programs such as
gzip (compression), vpr (FPGA place route), gcc
(compiler), crafty (chess), vortex (database)
SPEC also offers specialized benchmarks for
Graphics, Parallel computing, Java, mail servers,
network fileservers, web servers
They publish reports on benchmark results for
various systems.
Main metric SPECRatio Proportional to average
inverse execution time. The bigger, the better!
Reproducibility of results is very important!

31
Summarizing Performance

How do we summarize performance in a way that
accurately compares different machines?
One common approach Total Execution Time (TET)
Based on
Or, if the workload includes n different
programs, we can calculate the average or
Arithmetic Mean (AM)
Smaller AM ? Improved performance
Other methods are also used
Weighted arithmetic mean, geometric mean ratio.

32
Performance Improvement

Recall the formula CPUTime IC CPI / fcyc.
Thus, CPU performance is Perf f / (ICCPI).
Thus we can see 3 basic ways to improve CPU
performance on a given task
Increase clock frequency
Decrease CPI
by improved processor organization
Decrease instruction count
By compiler enhancement,
change in ISA design (new instructions), or
A more efficient application algorithm.
However, we have to be careful!
Sometimes, improving one of these can hurt others!

33
Generalized Cost Measures

In this course, we will often be focusing on ways
to minimize execution time of programs.
Either CPU time, or number of clock cycles.
Execution time is one example of what we may call
a generalized cost measure (GCM).
A GCM is any property of a HW/SW design that
tells us how much of some valued resource is used
up when the system is manufactured or used.
Other examples of important GCMs include
Energy consumed by a computation
Silicon chip area used up by a circuit design
Dollar cost to manufacture a computer component
We will study some general engineering principles
that apply to the minimization of any GCM in any
system.

34
Additive Cost Measures

Let us suppose we have a GCM C for a system.
Many times, the total cost C can be represented
as a sum of independent cost components
E.g., C C1 C2 Cn or .
These could correspond to the resources used by
individual subsystems of the whole system.
Or, used in doing particular categories of tasks.
For example, execution time T can be broken down
as the sum of time Tfp taken by floating-point
instructions and the time Toth for others.
That is, T Tfp Toth.

35
Improving Part of a System

Suppose a GCM is broken down as C A B.
The total cost is the sum of two components A
B.
Now suppose you are considering making an
improvement to the system design that affects
only cost component B.
Suppose you reduce it by a factor f, to B' B/f.
The new total cost is then C' A B'.
The cost of component A is unaffected.
Overall (total) cost has therefore been reduced
by the factor

36
Diminishing Returns

Suppose we continue improving (reducing) a cost
component by larger and larger factors.
Does this mean the systems total cost will be
reduced by correspondingly large factors? ? NO!
Even if we improved one cost component (B in our
example) by a factor of f 8, note that
Even here, the overall cost reduction factor
foverall would still be only the finite value
1B/A!
The system can only be improved by at most this
factor, if we improve just the one component B.

37
Diminishing Returns Example

Suppose a particular chip contains B 1 cm2 of
logic circuits, and A 2 cm2 of cache memory.
The total cost (in terms of area) is C AB 3
cm2.
Now, lets go crazy trying to simplify and shrink
the design of just the logic circuit
What is the maximum factor by whichthis tactic
can reduce the area cost of the whole design
(logicmemory)?
Obviously, this can reduce the total area from 3
(cm2) to no less than 2 (area of memory alone),
or, shrink it by a factor of foverall 3/2
1.5.
Note we could have obtained this same answer
using the equation foverall,max 1B/A as well.

Logic1 cm2
Memory2 cm2
38
Graph Showing Diminishing Returns
Part/rest (initial)
(B/A)
( f )
39
Important Lessons to Take from This

Its probably not worth spending significant
design time extensively improving just a single
component of a system,
Unless that component accounts for a dominant
part of the total cost (by some measure) to begin
with.(B/A gtgt 1).
Its only worth improving a given component up to
the point where it is no longer dominant.
Reducing it further wont make a lot of
difference.
Therefore, all components with significant costs
must be improved together in order to
significantly improve an entire design.
Well-engineered systems will tend to have roughly
comparable costs in all of their major components.

40
Other Ways to Calculate foverall

Earlier, we saw this formula
For the overall improvement factorfoverall
resulting from improvingcomponent B by the
factor f.
But, what if we dont know the values of A and B?
What if we only know their relative sizes?
Fortunately, it turns out that we can still
calculate foverall.
Let us define fracenh B/C B/(AB) to be the
fraction of the original total system cost that
is accounted for by the particular part B that is
going to be enhanced.
Then, the fraction of cost accounted for by A
(the rest of the system) is
Our equation for foverall can then be reexpressed
in terms of the quantities fracenh and 1-fracenh,
as follows

41
Calculating foverall in terms of fracenh

Lets re-express foverall in terms of fracenh
We will call this form for foverall the
Generalized Amdahls Law. (Well see why in a
moment.)

42
Amdahls Law Proper

We saw that execution time is one valid cost
measure.
In such a case, note that the factor by which a
cost is reduced is the speedup, or the factor by
which performance is improved.
We thus rename the improvement factor f of B
(the enhanced part) to speedupenh, and the
overall improvement factor foverall becomes
speedupoverall, and we get
This is called Amdahls Law, and it is one of the
most widely hyped quantitative principles of
processor design.
But as we can see, it is not a special law of CPU
architecture, but just an application of the
universal engineering principle of diminishing
returns which we discussed earlier.

43
Key Points from This Module

Throughput vs. Response Time
Performance as Inverse Execution Time
Speedup Factors
Averaging Benchmark Results
CPU Performance Equation
Execution time IC CPI tcc
Performance fcc / (IC CPI)
Amdahls Law
C' A B/f
Implies

C Execution time after improvement B Part of
execution time affected by improvement f Factor
of improvement (speedup of enhanced part) A
Part of execution time unaffected by improvement
44
Example Performance Calculation

Suppose program takes 10 secs. on computer A
And suppose computer A has a 4 GHz clock
Want new computer B to run prg. in 6 seconds.
Suppose that increasing the clock speed is only
possible with a substantial processor redesign,
which will result in 1.2 as many clock cycles
being needed to execute the program.
What clock rate is needed?
Answer 4 GHz (10/6) 1.2 8 GHz

45
Another Example

Consider two different implementations of a given
ISA, running a given benchmark
Processor A has a cycle time of 250 ps
And a CPI of 2.0
Processor B has a cycle time of 500 ps
And a CPI of 1.2
Which computer is faster on this benchmark, and
by what factor?
Processor A takes 250 ps 2.0 500 ps / instr.
Processor B takes 500 ps 1.2 600 ps / instr.
Thus, A is faster by a factor of 6/5 1.2.

46
Another example

Suppose some Java application takes 15 seconds on
a certain machine.
A new Java compiler is released that requires
only 0.6 as many dynamic instructions to run the
application.
Unfortunately, it also increases the CPI by 1.1
Presumably, uses more multi-cycle instructions.
How fast will the application run when compiled
using the new compiler?
It will take 15 0.6 1.1 9.9 seconds to run
It will be 15/9.9 50/33 1.515 faster
Only slightly more than 50 faster than before.

47
Another Example

Consider the following measurements of execution
time
Which of the following statements are true?
A is faster than B for program 1.
A is faster than B for program 2.
A is faster than B for a workload with equal
numbers of executions of programs 1 and 2.
A is faster than B for a workload with twice as
many executions of program 1 as of program 2.

Program Computer A Computer B
1 2 sec. 4 sec.
2 5 sec. 2 sec.

Write a Comment

User Comments (0)