HSN lecture 8: CPU operation and some performance improvements PowerPoint PPT Presentation

presentation player overlay
1 / 28
About This Presentation
Transcript and Presenter's Notes

Title: HSN lecture 8: CPU operation and some performance improvements


1
HSN - lecture 8 CPU operation and some
performance improvements
  • In this lecture we will cover
  • Instruction format
  • Fetch-Execute cycle
  • micro-operation implementation of fetch-execute
    cycle
  • performance improvements through
  • caching
  • pipelining

2
  • Before looking at the overall structure of a
    simple CPU in more detail and look at the basic
    cycle of actions in a CPU it is first necessary
    to look very simply at how instructions are held
    in memory

3
Instructions
  • you have seen how bits can be used to encode
    values for data e.g. Integers - positive or
    negative, floating point numbers, characters,
    etc.
  • sequences of bits are also used to represent
    instructions, a given pattern of bits to each
    specific instruction
  • bit values - whether instructions or data are
    usually held in memory in several consecutive
    bytes called a word
  • word is the basic unit of bits that is used to
    transfer data instructions around the computer
    e.g. 32 bit word size or 64 bit word size

4
  • In order to carry out an operation the CPU needs
    to know not only the operation to be carried out,
    but also the what bit values are going to be
    operated on (called the operands) e.g. which 2
    numbers are to be added together in an ADD
    operation
  • so in memory each instruction is normally
    followed by some bit values which enable the CPU
    to work out where to find the operands for the
    operation - in fact the instruction bit pattern
    tells the CPU how many of the bits that follow
    the instruction itself are needed for the
    calculation and how to use those bit values to
    locate the operands. Note a few instructions do
    not need any operands

5
  • So typical format of an instruction in memory is

May be 0, 1 ,2 or even more Op. Location words
depending upon instruction and type of CPU
Instruction bits
Operand location bits
Operand location bits
Remember words may be more than 1 byte in size
6
(No Transcript)
7
Fetch-Execute cycle
  • So to execute a program it is first necessary to
    fetch each instruction from memory to determine
    what instruction to execute and to fetch any
    operand values needed by a given instruction from
    memory to have the data to be operated on - this
    is called the Fetch
  • when both the instruction defined and the data to
    be operated on are in CPU, then CPU can execute
    the operation on the data to give the result -
    this is called Execute.

8
  • Fetching and then executing have to be done for
    each instruction that is to executed during a run
    of a program - thus this defines the basic cycle
    of activities for a CPU
  • This basic cycle of actions of a CPU is called
    the Fetch-Execute cycle

9
Fetch-Execute expressed as an algorithm
  • repeat
  • Fetch instruction from main memory at address
    specified in Program Counter to Instruction
    Register
  • Update Program Counter so that Program Counter
    has address of the word that follows instruction
    - might need this for operand location
    information (address)

10
  • Decode instruction instruction word decoded to
    determine nature of operation and if further
    fetches of operand location information needed
    (address info of operands), if it is then the
    relevant words will be fetched updating the
    Program Counter as required
  • Execute instruction access actual operand
    values (very often involves fetching them from
    memory), perform operation and place results in
    designated location
  • until halt signal

11
Micro-operations
  • Each phase in the Fetch-Execute cycle consists of
    a sequence of smaller actions called
    micro-operations
  • a micro-operation is a distinct operation that
    can be affected by a set of control signals from
    the control unit
  • the specific control signals sent out that
    determine the micro-operations may be
  • hardwired - which means that the signals required
    are determined by the wiring of digital circuitry
    alone
  • micro-programmed - which means that the signals
    required are determined by the contents of a
    special memory that contains the set of signals
    for each step - like a set of instructions in a
    program

12
Schematic of simple CPU - 2nd version
inside dotted line is CPU
13
  • The following gives an example of a series of
    micro-operations that might implement a
    fetch/execute cycle for our simple CPU - we will
    need to refer back to our previous diagram of a
    simple CPU
  • FETCH instruction
  • issue control signals 9 and 11 - MAR PC
  • issue control signals 1 and 14 - MDR
    MEMORYMAR PC
  • issue control signals 8 and 12 - IR MDR

14
  • FETCH operand
  • issue 9 and 11 - MAR PC
  • issue 1 and 14 - MDR MEMORYMAR PC
  • issue 11 and 12 - MAR MDR
  • issue 1 - MDR MEMORYMAR
  • operation decoded was ADD value to accumulator
  • EXECUTE instruction
  • issue 4 and 12 - TEMP MDR
  • issue 3, 6 and 15 - ACC ALU(ACCTEMP)

15
List of some operations and associated control
signals for our simple CPU
16
Cache memory
  • normally CPU has to fetch data from RAM, but
    accessing RAM is very slow compared to speed with
    which CPU can make requests for data
  • to increase speed a cache is located between CPU
    and RAM
  • Caching keeping local copy of some of data from
    a slower storage medium in faster storage medium,
    close to the point of use of that data - reduces
    time required to access items partly because they
    are physically closer to point of use and fewer
    actions are required to access items
  • essentially a fast local copy

17
  • Cache normally implemented in static RAM because
    static RAM is faster than Dynamic RAM - so it is
  • expensive
  • thus of small size compared to main memory
  • but fast
  • when cache full need some policy to decide which
    part of the cache information will be replaced
    with the newly required information
  • LRU (Least Recently Used) Algorithm removes the
    least recently referenced memory from the Cache
  • Also need to provide mapping between copy in
    cache and copy in main memory and maintain
    consistency between 2 copies

18
Cache levels
  • Level 1 - inside the processor chip close to
    instruction execution - very fast, but very
    expensive - thus very small
  • Level 2 - used to be outside the processor chip
    between processor and ordinary memory but now
    usually incorporated as a separate chip next to
    CPU on a module (called processor module) that
    contains the CPU and a dedicated cache chip -
    level 2 slower than level 1 although still much
    faster than main memory, but cheaper than level 1
    and can be larger, although not as large as main
    memory.

19
Cache operation
pre-fetch/ write back
Address Bus
Cache
RAM
CPU
Data Bus
20
Cache operation
  • Prefetch guesses what data or instructions are
    going to be read from main memory next
  • fetches data/instructions from main memory into
    cache if not already in cache
  • When CPU requests information,
  • if L1 cache has a copy, then given to CPU by
    cache otherwise checks L2 cache
  • if L2 cache has copy, then given to CPU and copy
    placed in L1 cache
  • otherwise requests main memory for data and copy
    placed in L1 and L2 cache for future use and
    passed on to CPU

21
  • A block of data/instructions (often called a
    page) from main memory are placed in cache - so
    that instructions and data that are close in
    memory to one currently being used (ones most
    likely to be used next) are already in cache
  • if cache copy modified, then modified data is
    written back to main memory - to maintain
    consistency of data.

22
Pipelines
  • Fetch-execute cycle requires a number of
    micro-operations to complete, but this is a fixed
    sequence of events where one thing cannot be done
    until another is completed
  • this means that actually for a large part of the
    fetch-execute cycle the various component of the
    CPU are idle e.g.
  • the ALU only does some useful work at the end of
    the execution phase, but is idle when the
    instructions, operand location information and
    the operand values themselves are being fetched
  • instruction decode mechanism only does actual
    work at the time instruction is first fetched,
    but after that it is idle

23
  • Analogy - imagine passing a piece of paper down a
    line of people where each person has to look at
    the paper and do something with it, before
    passing it on to the next person, but only one
    piece of paper is allowed in the line at one time
    - so the first person in the line cannot start a
    new job until the last person has finished - a
    lot of waste time - this is like normal
    fetch-execute cycle
  • compare that with a line of people where after
    first person has processed the piece of paper
    they start on the next piece of paper and so on,
    so eventually everyone in the line is busy
    processing some job - this is like the pipeline
    approach

24
  • Example - simplistic 3 stage pipeline to
    illustrate idea
  • 3 instructions in code
  • load 100 load into Accumulator value at address
    100
  • add 200 add to accumulator value at address 200
  • store 104 store value in accumulator into
    location at address 104
  • without pipeline each will execute in turn with
    no overlap

25
  • Example instruction execution without a pipeline

26
  • Illustration of overlap in instruction execution
    with simplistic 3 stage pipeline it is more
    efficient

27
  • So there is the possibility of increasing the
    work done by overlapping the various
    micro-operations required in the fetch-execute
    sequence for one instruction with the
    fetch-execute micro-operations of another
    instruction
  • there are problems with the approach which leads
    to some waste, because in CPU in general we do
    not know for certain which instruction will need
    to be executed next after a given instruction
    until we have completed that instruction

28
  • so for pipeline idea to work it is necessary for
    the CPU to predict which instruction will be
    executed next (done by a predictor) in order to
    start processing it before the earlier
    instructions have completed. If it turns out that
    the execution of a particular instruction is not
    going to be needed, it simply means that that the
    work done processing that instruction has to be
    abandoned - however, the predictor gets it right
    much of the time - so still overall a great
    increase in efficiency
  • the prediction and the duplication of operations
    will require the duplication of some of the
    components on the CPU, but space on the CPU is
    comparatively cheap, whereas the performance
    benefits are great
Write a Comment
User Comments (0)
About PowerShow.com