Chapter 5: ISAs - PowerPoint PPT Presentation

About This Presentation
Title:

Chapter 5: ISAs

Description:

In MARIE, we had simple instructions 4 bit op code followed by either 12 bit address for load, store, add, subt, jump 2 bit condition code for skipcond – PowerPoint PPT presentation

Number of Views:44
Avg rating:3.0/5.0
Slides: 22
Provided by: fox141
Learn more at: https://www.nku.edu
Category:

less

Transcript and Presenter's Notes

Title: Chapter 5: ISAs


1
Chapter 5 ISAs
  • In MARIE, we had simple instructions
  • 4 bit op code followed by either
  • 12 bit address for load, store, add, subt, jump
  • 2 bit condition code for skipcond
  • 12 0s for instructions that did not need a datum
  • However, most ISAs are much more complex so there
    are many more op codes and possibly more than 1
    operand
  • How do we specify the operation?
  • Each operation will have a unique op code,
    although op codes might not be equal length (in
    MARIE, all were 4 bits, in some ISAs, op codes
    range from 8 bits to 16 or more)
  • How do we specify the number of operands?
  • This is usually determined by the op code,
    although it could also be specified in the
    instruction as an added piece of instruction
    information
  • How do we specify the location of each operand?
  • We need addressing information

2
Instruction Formats
PDP-10 fixed length instructions, 9 bit op code
(512 operations) followed by 2 operands one
operand in a register, the other in memory
PDP-11 variable length with 13 different
formats Varies from 4 bit op code to 16 bit op
code, 0, 1, 2 and 3 operands can be specified
based on the format
3
2 More Formats
The variable length Intel (Pentium) format is
shown above, instructions can vary from 1 byte to
17 with op codes being 1 or 2 bytes long, and all
instructions having up to 2 operands The fixed
length PowerPC format is shown on the right, all
instructions are 32 bits but there are five basic
forms, and have up to 3 operands as long as the 3
operands are stored in registers
4
Instruction Format Decisions
  • Length decisions
  • Fixed length
  • makes instruction fetching predictable (which
    helps out in pipelining)
  • Variable length
  • flexible instructions, can accommodate up to 3
    operands including 3 memory references and length
    is determined by need, so does not waste memory
    space
  • Number of addressing modes
  • Fewer addressing modes makes things easier on the
    architect, but possibly harder on the programmer
  • Simple addressing modes makes pipelining easier
  • How many registers?
  • Generally, the more the better but with more
    registers, there is less space available for
    other circuits or cache (more registers more
    expense)

5
Alignment
  • Another question is what alignment should be
    used?
  • Recall that most machines today have word sizes
    of 32 bits or 64 bits and the CPU fetches or
    stores 1 word at a time
  • Yet memory is organized in bytes
  • Should we allow the CPU to access something
    smaller than a word?
  • If so, we have to worry about alignment
  • Two methods used
  • Big Endian bytes are placed in order in the
    word
  • Little Endian bytes are placed in opposite
    order
  • See below where the word is 12345678
  • Different architectures use different alignments
    between these two

Intel uses little Endian, and bitmaps were
developed this way, so a bitmap must be
converted before it can be viewed on a big Endian
machine!
6
Type of CPU Storage
  • Although all architectures today use register
    storage, other approaches have been tried
  • Accumulator-based a single data register, the
    accumulator (MARIE is like this)
  • This was common in early computers when register
    storage was very expensive
  • General-purpose registers many data registers
    are available for the programmers use
  • Most RISC architectures are of this form
  • Special-purpose registers many data registers,
    but each has its own implied use (e.g., a counter
    register for loops, an I/O register for I/O
    operations, a base register for arrays, etc)
  • Pentium is of this form
  • Stack-based instead of general-purpose
    registers, storage is a stack, operations are
    rearranged to be performed in postfix order
  • An early alternative to accumulator-based
    architectures, obsolete now

7
Load-Store Architectures
  • When deciding on the number of registers to make
    available, architects also decide whether to
    support a load-store instruction set
  • In a load-store instruction set, the only
    operations that are allowed to reference memory
    are loads and stores
  • All other operations (ALU operations, branches)
    must reference only values in registers or
    immediate data (data in the instruction itself)
  • This makes programming more difficult because
    simple operations like inc X must now first cause
    X to be loaded to a register and stored back to X
    after the inc
  • But it is necessary to support a pipeline, which
    ultimately speeds up processing!
  • All RISC architectures are load-store instruction
    sets and require at least 16 registers (hopefully
    more!)
  • Many CISC architectures permit memory-memory and
    memory-register ALU operations so these machines
    can get by with fewer registers
  • Intel has 4 general purpose data registers

8
Number of Operands
  • The number of operands that an instruction can
    specify has an impact on instruction sizes
  • Consider the instruction Add R1, R2, R3
  • Add op code is 6 bits
  • Assume 32 registers, each takes 5 bits
  • This instruction is 21 bits long
  • Consider Add X, Y, Z
  • Assume 256 MBytes of memory
  • Each memory reference is 28 bits
  • This instruction is 90 bits long!
  • However, we do not necessarily want to limit our
    instructions to having 1 or 2 operands, so we
    must either permit long instructions or find a
    compromise
  • The load-store instruction set is a compromise
    3 operands can be referenced as long as they are
    all in registers, 1 operand can be reference in
    memory as long as it is in an instruction by
    itself (load or store use 1 memory reference
    only)

9
1, 2 and 3 Operand Examples
Instruction Comment SUB Y, A, B Y ? A B MPY
T, D, E T ? D E ADD T, T, C T ? T C DIV Y,
Y, T Y ? Y / T Using three addresses
Instruction Comment LOAD D AC ? D MPY E AC
? D E ADD C AC ? AC C STOR Y Y ? AC LOAD
A AC ? A SUB B AC ? AC B DIV Y AC ? AC /
Y STOR Y Y ? AC Using one address
Instruction Comment MOVE Y, A Y ? A SUB Y,
B Y ? Y B MOVE T, D T ? D MPY T, E T ? T
E ADD T, C T ? T C DIV Y, T Y ? Y / T Using
two addresses
Here we compare the length of code if we have one
address instructions, two address instructions
and three address instructions, each computes Y
(A B) / (C D E) Notice one and two
address operand instructions write over a source
operand, thus destroying data
See pages 206-207 for another example
10
Addressing Modes
  • In our instruction, how do we specify the data?
  • We have different modes to specify how to find
    the data
  • Most modes generate memory addresses, some modes
    reference registers instead
  • Below are the most common formats (we have
    already used Direct and Indirect in our MARIE
    examples of the last chapter)

11
Computing These Modes
In Register, the operand is stored in a
register, and the register is specified in
the Instruction Example Add R1, R2, R3
In Immediate, the operand is in the instruction
such as Add 5 This is used when the datum is
known at compile time this is the quickest
form of addressing
In Direct, the operand is in memory and the
instruction contains a reference to the memory
location because there is a memory access,
this method is slower than Register Examples
Add Y (in assembly) Add 110111000 (in machine)
In Indirect, the memory reference is to a
pointer, this requires two memory accesses and so
is the slowest of all addressing modes
12
Continued
Indexed or Based is like Direct except that the
address referenced is computed as a combination
of a base value stored in a register and an
offset in the instruction Example Add
R3(300) This is also called Displacement or Base
Displacement
Register Indirect mode is like Indirect except
that the instruction references a pointer in a
register, not memory so that one memory access is
saved Notice that Register and Register Indirect
can permit shorter instructions because the
register specification is shorter than a memory
address specification
In Stack, the operand is at the top of the stack
where the stack is pointed to by a
special register called the Stack Pointer
this is like Register Indirect in that it
accesses a register followed by memory
13
Example
Assume memory stores the values as shown To the
left and register R1 stores 800
Assume our instruction is Load 800 The value
loaded into the accumulator depends on
the addressing mode used, see below
Data is 800 Datas location is at 800 Datas
location is pointed to by value in 800 Datas
location is at R1 800 (1600)
14
Instruction Types
  • Now that we have explored some of the issues in
    designing an instruction set, lets consider the
    types of instructions
  • Data movement (load, store)
  • I/O
  • Arithmetic (, -, , /, )
  • Boolean (AND, OR, NOT, XOR, Compare)
  • Bit manipulation (rotate, shift)
  • Transfer of control (conditional branch,
    unconditional branch, branch and link, trap)
  • Special purpose (halt, interrupt, others)
  • Those marked with use the ALU
  • Note branches add or subtract or change the PC,
    so these use the ALU
  • Those marked with use memory or I/O

15
Instruction-Level Pipelining
  • We have already covered the fetch-execute process
  • It turns out that, if we are clever about
    designing our architecture, we can design the
    fetch-execute cycle so that each phase uses
    different hardware
  • we can overlap instruction execution in a
    pipeline
  • the CPU becomes like an assembly line,
    instructions are fetched from memory and sent
    down the pipeline one at a time
  • the first instruction is at stage 2 when the
    second instruction is at stage 1
  • or, instruction j is at stage 1 when instruction
    j 1 is at stage 2 and instruction j 2 is at
    stage 3, etc
  • The length of the pipeline determines how many
    overlapping instructions we can have
  • The longer the pipeline, the greater the overlap
    and so the greater the potential for speedup
  • It turns out that long pipelines are difficult to
    keep running efficiently though, so smaller
    pipelines are often used

16
A 6-stage Pipeline
  • Stage 1 Fetch instruction
  • Stage 2 Decode op code
  • Stage 3 Calculate operand addresses
  • Stage 4 Fetch operands (from registers usually)
  • Stage 5 Execute instruction (includes computing
    new PC for branches or doing loads and stores)
  • Stage 6 Store result (in register if ALU
    operation or load)

This is a pipeline timing diagram showing how
instructions overlap
17
Pipeline Performance
  • Assume a machine has 6 steps in the fetch-execute
    cycle
  • A non-pipelined machine will take 6 n clock
    cycles to execute a program of n instructions
  • A pipelined machine will take n (6 1) clock
    cycles to execute the same program!
  • If n is 1000, the pipelined machine is 6000 /
    1005 times faster, or almost a speed up of 6
    times!
  • In general, a pipelines performance is computed
    as
  • Time (k n 1) tp where k number of
    stages, n number of instructions and tp is the
    time per stage (plus delays caused by moving
    instructions down the pipeline)
  • The non-pipeline machine is Time k n
  • So the speedup is k n / (k n 1) tp
  • However, there are problems that a pipeline faces
    because it overlaps the execution of instructions
    that cause it to slow down

18
Pipeline Problems
  • Pipelines are impacted by
  • Resource conflicts
  • If it takes more than 1 cycle to perform a stage,
    then the next instruction cannot move into that
    stage
  • For instance, floating point operations often
    take 2-10 cycles to execute rather than the
    single cycle of most integer operations
  • Data dependences
  • Consider
  • Load R1, X
  • Add R3, R2, R1
  • Since we want to add in the 5th stage, but the
    datum in R1 is not available until the previous
    instruction reaches the 6th stage, the Add must
    be postponed by at least 1 cycle
  • Branches
  • In a branch, the PC is changed, but in a
    pipeline, we may have already fetched one or more
    instructions before we reach the stage in the
    pipeline where the PC is changed!

19
Impact of Branches
  • In our 6-stage pipeline
  • we compute the new PC at the 5th stage, so we
    would have loaded 4 wrong instructions (these 4
    instructions are the branch penalty)
  • thus, every branch slows the pipeline down by 4
    cycles because 4 wrong instructions were already
    fetched
  • consider the four-stage pipeline below
  • S1 fetch instruction
  • S2 decode instruction, compute operand
    addresses
  • S3 fetch operands
  • S4 execute instruction, store result (this
    includes computing the PC value)
  • So, every branch instruction is followed by 3
    incorrectly fetched instructions, or a branch
    penalty of 3

20
Other Ideas
  • In order to improve performance, architects have
    come up with all kinds of interesting ideas to
    maintain a pipelines performance
  • Superscalar have multiple pipelines so that the
    CPU can fetch, decode, execute 2 or more
    instructions at a time
  • Use Branch Prediction when it comes to
    branching, try to guess if the branch is taken
    and if so where in advance to lower or remove the
    branch penalty if you guess wrong, start over
    from where you guessed incorrectly
  • Compiler Optimizations let the compiler
    rearrange your assembly code so that data
    dependencies are broken up and branch penalties
    are removed by filling the slots after a branch
    with neutral instructions
  • SuperPipeline divide pipeline stages into
    substages to obtain a greater overlap without
    necessarily changing the clock speed
  • We study these ideas in 462

21
Real ISAs
  • Intel 2 operands, variable length,
    register-memory operations (but not
    memory-memory), pipeline superscalar with
    speculation but at the microcode level
  • MIPS fixed length, 3 operand instructions if
    operands are in registers, load-store otherwise,
    8-stage superpipeline, very simple instruction set
Write a Comment
User Comments (0)
About PowerShow.com