HSN lecture 8: CPU operation and some performance improvements presentation

About This Presentation

Transcript and Presenter's Notes

Title: HSN lecture 8: CPU operation and some performance improvements

1
HSN - lecture 8 CPU operation and some
performance improvements

In this lecture we will cover
Instruction format
Fetch-Execute cycle
micro-operation implementation of fetch-execute
cycle
performance improvements through
caching
pipelining

Before looking at the overall structure of a
simple CPU in more detail and look at the basic
cycle of actions in a CPU it is first necessary
to look very simply at how instructions are held
in memory

3
Instructions

you have seen how bits can be used to encode
values for data e.g. Integers - positive or
negative, floating point numbers, characters,
etc.
sequences of bits are also used to represent
instructions, a given pattern of bits to each
specific instruction
bit values - whether instructions or data are
usually held in memory in several consecutive
bytes called a word
word is the basic unit of bits that is used to
transfer data instructions around the computer
e.g. 32 bit word size or 64 bit word size

In order to carry out an operation the CPU needs
to know not only the operation to be carried out,
but also the what bit values are going to be
operated on (called the operands) e.g. which 2
numbers are to be added together in an ADD
operation
so in memory each instruction is normally
followed by some bit values which enable the CPU
to work out where to find the operands for the
operation - in fact the instruction bit pattern
tells the CPU how many of the bits that follow
the instruction itself are needed for the
calculation and how to use those bit values to
locate the operands. Note a few instructions do
not need any operands

So typical format of an instruction in memory is

May be 0, 1 ,2 or even more Op. Location words
depending upon instruction and type of CPU
Instruction bits
Operand location bits
Operand location bits
Remember words may be more than 1 byte in size
6
(No Transcript)
7
Fetch-Execute cycle

So to execute a program it is first necessary to
fetch each instruction from memory to determine
what instruction to execute and to fetch any
operand values needed by a given instruction from
memory to have the data to be operated on - this
is called the Fetch
when both the instruction defined and the data to
be operated on are in CPU, then CPU can execute
the operation on the data to give the result -
this is called Execute.

Fetching and then executing have to be done for
each instruction that is to executed during a run
of a program - thus this defines the basic cycle
of activities for a CPU
This basic cycle of actions of a CPU is called
the Fetch-Execute cycle

9
Fetch-Execute expressed as an algorithm

repeat
Fetch instruction from main memory at address
specified in Program Counter to Instruction
Register
Update Program Counter so that Program Counter
has address of the word that follows instruction
- might need this for operand location
information (address)

Decode instruction instruction word decoded to
determine nature of operation and if further
fetches of operand location information needed
(address info of operands), if it is then the
relevant words will be fetched updating the
Program Counter as required
Execute instruction access actual operand
values (very often involves fetching them from
memory), perform operation and place results in
designated location
until halt signal

11
Micro-operations

Each phase in the Fetch-Execute cycle consists of
a sequence of smaller actions called
micro-operations
a micro-operation is a distinct operation that
can be affected by a set of control signals from
the control unit
the specific control signals sent out that
determine the micro-operations may be
hardwired - which means that the signals required
are determined by the wiring of digital circuitry
alone
micro-programmed - which means that the signals
required are determined by the contents of a
special memory that contains the set of signals
for each step - like a set of instructions in a
program

12
Schematic of simple CPU - 2nd version
inside dotted line is CPU
13

The following gives an example of a series of
micro-operations that might implement a
fetch/execute cycle for our simple CPU - we will
need to refer back to our previous diagram of a
simple CPU
FETCH instruction
issue control signals 9 and 11 - MAR PC
issue control signals 1 and 14 - MDR
MEMORYMAR PC
issue control signals 8 and 12 - IR MDR

FETCH operand
issue 9 and 11 - MAR PC
issue 1 and 14 - MDR MEMORYMAR PC
issue 11 and 12 - MAR MDR
issue 1 - MDR MEMORYMAR
operation decoded was ADD value to accumulator
EXECUTE instruction
issue 4 and 12 - TEMP MDR
issue 3, 6 and 15 - ACC ALU(ACCTEMP)

15
List of some operations and associated control
signals for our simple CPU
16
Cache memory

normally CPU has to fetch data from RAM, but
accessing RAM is very slow compared to speed with
which CPU can make requests for data
to increase speed a cache is located between CPU
and RAM
Caching keeping local copy of some of data from
a slower storage medium in faster storage medium,
close to the point of use of that data - reduces
time required to access items partly because they
are physically closer to point of use and fewer
actions are required to access items
essentially a fast local copy

Cache normally implemented in static RAM because
static RAM is faster than Dynamic RAM - so it is
expensive
thus of small size compared to main memory
but fast
when cache full need some policy to decide which
part of the cache information will be replaced
with the newly required information
LRU (Least Recently Used) Algorithm removes the
least recently referenced memory from the Cache
Also need to provide mapping between copy in
cache and copy in main memory and maintain
consistency between 2 copies

18
Cache levels

Level 1 - inside the processor chip close to
instruction execution - very fast, but very
expensive - thus very small
Level 2 - used to be outside the processor chip
between processor and ordinary memory but now
usually incorporated as a separate chip next to
CPU on a module (called processor module) that
contains the CPU and a dedicated cache chip -
level 2 slower than level 1 although still much
faster than main memory, but cheaper than level 1
and can be larger, although not as large as main
memory.

19
Cache operation
pre-fetch/ write back
Address Bus
Cache
RAM
CPU
Data Bus
20
Cache operation

Prefetch guesses what data or instructions are
going to be read from main memory next
fetches data/instructions from main memory into
cache if not already in cache
When CPU requests information,
if L1 cache has a copy, then given to CPU by
cache otherwise checks L2 cache
if L2 cache has copy, then given to CPU and copy
placed in L1 cache
otherwise requests main memory for data and copy
placed in L1 and L2 cache for future use and
passed on to CPU

A block of data/instructions (often called a
page) from main memory are placed in cache - so
that instructions and data that are close in
memory to one currently being used (ones most
likely to be used next) are already in cache
if cache copy modified, then modified data is
written back to main memory - to maintain
consistency of data.

22
Pipelines

Fetch-execute cycle requires a number of
micro-operations to complete, but this is a fixed
sequence of events where one thing cannot be done
until another is completed
this means that actually for a large part of the
fetch-execute cycle the various component of the
CPU are idle e.g.
the ALU only does some useful work at the end of
the execution phase, but is idle when the
instructions, operand location information and
the operand values themselves are being fetched
instruction decode mechanism only does actual
work at the time instruction is first fetched,
but after that it is idle

Analogy - imagine passing a piece of paper down a
line of people where each person has to look at
the paper and do something with it, before
passing it on to the next person, but only one
piece of paper is allowed in the line at one time
- so the first person in the line cannot start a
new job until the last person has finished - a
lot of waste time - this is like normal
fetch-execute cycle
compare that with a line of people where after
first person has processed the piece of paper
they start on the next piece of paper and so on,
so eventually everyone in the line is busy
processing some job - this is like the pipeline
approach

Example - simplistic 3 stage pipeline to
illustrate idea
3 instructions in code
load 100 load into Accumulator value at address
100
add 200 add to accumulator value at address 200
store 104 store value in accumulator into
location at address 104
without pipeline each will execute in turn with
no overlap

Example instruction execution without a pipeline

Illustration of overlap in instruction execution
with simplistic 3 stage pipeline it is more
efficient

So there is the possibility of increasing the
work done by overlapping the various
micro-operations required in the fetch-execute
sequence for one instruction with the
fetch-execute micro-operations of another
instruction
there are problems with the approach which leads
to some waste, because in CPU in general we do
not know for certain which instruction will need
to be executed next after a given instruction
until we have completed that instruction

so for pipeline idea to work it is necessary for
the CPU to predict which instruction will be
executed next (done by a predictor) in order to
start processing it before the earlier
instructions have completed. If it turns out that
the execution of a particular instruction is not
going to be needed, it simply means that that the
work done processing that instruction has to be
abandoned - however, the predictor gets it right
much of the time - so still overall a great
increase in efficiency
the prediction and the duplication of operations
will require the duplication of some of the
components on the CPU, but space on the CPU is
comparatively cheap, whereas the performance
benefits are great

Write a Comment

User Comments (0)

About PowerShow.com

HSN lecture 8: CPU operation and some performance improvements PowerPoint PPT Presentation