Chapter 5: ISAs

About This Presentation

Title:

Chapter 5: ISAs

Description:

In MARIE, we had simple instructions 4 bit op code followed by either 12 bit address for load, store, add, subt, jump 2 bit condition code for skipcond – PowerPoint PPT presentation

Number of Views:44

Avg rating:3.0/5.0

Slides: 22

Provided by: fox141

Learn more at: https://www.nku.edu

Category:

more less

Transcript and Presenter's Notes

Title: Chapter 5: ISAs

1
Chapter 5 ISAs

In MARIE, we had simple instructions
4 bit op code followed by either
12 bit address for load, store, add, subt, jump
2 bit condition code for skipcond
12 0s for instructions that did not need a datum
However, most ISAs are much more complex so there
are many more op codes and possibly more than 1
operand
How do we specify the operation?
Each operation will have a unique op code,
although op codes might not be equal length (in
MARIE, all were 4 bits, in some ISAs, op codes
range from 8 bits to 16 or more)
How do we specify the number of operands?
This is usually determined by the op code,
although it could also be specified in the
instruction as an added piece of instruction
information
How do we specify the location of each operand?
We need addressing information

2
Instruction Formats
PDP-10 fixed length instructions, 9 bit op code
(512 operations) followed by 2 operands one
operand in a register, the other in memory
PDP-11 variable length with 13 different
formats Varies from 4 bit op code to 16 bit op
code, 0, 1, 2 and 3 operands can be specified
based on the format
3
2 More Formats
The variable length Intel (Pentium) format is
shown above, instructions can vary from 1 byte to
17 with op codes being 1 or 2 bytes long, and all
instructions having up to 2 operands The fixed
length PowerPC format is shown on the right, all
instructions are 32 bits but there are five basic
forms, and have up to 3 operands as long as the 3
operands are stored in registers
4
Instruction Format Decisions

Length decisions
Fixed length
makes instruction fetching predictable (which
helps out in pipelining)
Variable length
flexible instructions, can accommodate up to 3
operands including 3 memory references and length
is determined by need, so does not waste memory
space
Number of addressing modes
Fewer addressing modes makes things easier on the
architect, but possibly harder on the programmer
Simple addressing modes makes pipelining easier
How many registers?
Generally, the more the better but with more
registers, there is less space available for
other circuits or cache (more registers more
expense)

5
Alignment

Another question is what alignment should be
used?
Recall that most machines today have word sizes
of 32 bits or 64 bits and the CPU fetches or
stores 1 word at a time
Yet memory is organized in bytes
Should we allow the CPU to access something
smaller than a word?
If so, we have to worry about alignment
Two methods used
Big Endian bytes are placed in order in the
word
Little Endian bytes are placed in opposite
order
See below where the word is 12345678
Different architectures use different alignments
between these two

Intel uses little Endian, and bitmaps were
developed this way, so a bitmap must be
converted before it can be viewed on a big Endian
machine!
6
Type of CPU Storage

Although all architectures today use register
storage, other approaches have been tried
Accumulator-based a single data register, the
accumulator (MARIE is like this)
This was common in early computers when register
storage was very expensive
General-purpose registers many data registers
are available for the programmers use
Most RISC architectures are of this form
Special-purpose registers many data registers,
but each has its own implied use (e.g., a counter
register for loops, an I/O register for I/O
operations, a base register for arrays, etc)
Pentium is of this form
Stack-based instead of general-purpose
registers, storage is a stack, operations are
rearranged to be performed in postfix order
An early alternative to accumulator-based
architectures, obsolete now

7
Load-Store Architectures

When deciding on the number of registers to make
available, architects also decide whether to
support a load-store instruction set
In a load-store instruction set, the only
operations that are allowed to reference memory
are loads and stores
All other operations (ALU operations, branches)
must reference only values in registers or
immediate data (data in the instruction itself)
This makes programming more difficult because
simple operations like inc X must now first cause
X to be loaded to a register and stored back to X
after the inc
But it is necessary to support a pipeline, which
ultimately speeds up processing!
All RISC architectures are load-store instruction
sets and require at least 16 registers (hopefully
more!)
Many CISC architectures permit memory-memory and
memory-register ALU operations so these machines
can get by with fewer registers
Intel has 4 general purpose data registers

8
Number of Operands

The number of operands that an instruction can
specify has an impact on instruction sizes
Consider the instruction Add R1, R2, R3
Add op code is 6 bits
Assume 32 registers, each takes 5 bits
This instruction is 21 bits long
Consider Add X, Y, Z
Assume 256 MBytes of memory
Each memory reference is 28 bits
This instruction is 90 bits long!

However, we do not necessarily want to limit our
instructions to having 1 or 2 operands, so we
must either permit long instructions or find a
compromise
The load-store instruction set is a compromise
3 operands can be referenced as long as they are
all in registers, 1 operand can be reference in
memory as long as it is in an instruction by
itself (load or store use 1 memory reference
only)

9
1, 2 and 3 Operand Examples
Instruction Comment SUB Y, A, B Y ? A B MPY
T, D, E T ? D E ADD T, T, C T ? T C DIV Y,
Y, T Y ? Y / T Using three addresses
Instruction Comment LOAD D AC ? D MPY E AC
? D E ADD C AC ? AC C STOR Y Y ? AC LOAD
A AC ? A SUB B AC ? AC B DIV Y AC ? AC /
Y STOR Y Y ? AC Using one address
Instruction Comment MOVE Y, A Y ? A SUB Y,
B Y ? Y B MOVE T, D T ? D MPY T, E T ? T
E ADD T, C T ? T C DIV Y, T Y ? Y / T Using
two addresses
Here we compare the length of code if we have one
address instructions, two address instructions
and three address instructions, each computes Y
(A B) / (C D E) Notice one and two
address operand instructions write over a source
operand, thus destroying data
See pages 206-207 for another example
10
Addressing Modes

In our instruction, how do we specify the data?
We have different modes to specify how to find
the data
Most modes generate memory addresses, some modes
reference registers instead
Below are the most common formats (we have
already used Direct and Indirect in our MARIE
examples of the last chapter)

11
Computing These Modes
In Register, the operand is stored in a
register, and the register is specified in
the Instruction Example Add R1, R2, R3
In Immediate, the operand is in the instruction
such as Add 5 This is used when the datum is
known at compile time this is the quickest
form of addressing
In Direct, the operand is in memory and the
instruction contains a reference to the memory
location because there is a memory access,
this method is slower than Register Examples
Add Y (in assembly) Add 110111000 (in machine)
In Indirect, the memory reference is to a
pointer, this requires two memory accesses and so
is the slowest of all addressing modes
12
Continued
Indexed or Based is like Direct except that the
address referenced is computed as a combination
of a base value stored in a register and an
offset in the instruction Example Add
R3(300) This is also called Displacement or Base
Displacement
Register Indirect mode is like Indirect except
that the instruction references a pointer in a
register, not memory so that one memory access is
saved Notice that Register and Register Indirect
can permit shorter instructions because the
register specification is shorter than a memory
address specification
In Stack, the operand is at the top of the stack
where the stack is pointed to by a
special register called the Stack Pointer
this is like Register Indirect in that it
accesses a register followed by memory
13
Example
Assume memory stores the values as shown To the
left and register R1 stores 800
Assume our instruction is Load 800 The value
loaded into the accumulator depends on
the addressing mode used, see below
Data is 800 Datas location is at 800 Datas
location is pointed to by value in 800 Datas
location is at R1 800 (1600)
14
Instruction Types

Now that we have explored some of the issues in
designing an instruction set, lets consider the
types of instructions
Data movement (load, store)
I/O
Arithmetic (, -, , /, )
Boolean (AND, OR, NOT, XOR, Compare)
Bit manipulation (rotate, shift)
Transfer of control (conditional branch,
unconditional branch, branch and link, trap)
Special purpose (halt, interrupt, others)
Those marked with use the ALU
Note branches add or subtract or change the PC,
so these use the ALU
Those marked with use memory or I/O

15
Instruction-Level Pipelining

We have already covered the fetch-execute process
It turns out that, if we are clever about
designing our architecture, we can design the
fetch-execute cycle so that each phase uses
different hardware
we can overlap instruction execution in a
pipeline
the CPU becomes like an assembly line,
instructions are fetched from memory and sent
down the pipeline one at a time
the first instruction is at stage 2 when the
second instruction is at stage 1
or, instruction j is at stage 1 when instruction
j 1 is at stage 2 and instruction j 2 is at
stage 3, etc

The length of the pipeline determines how many
overlapping instructions we can have
The longer the pipeline, the greater the overlap
and so the greater the potential for speedup
It turns out that long pipelines are difficult to
keep running efficiently though, so smaller
pipelines are often used

16
A 6-stage Pipeline

Stage 1 Fetch instruction
Stage 2 Decode op code
Stage 3 Calculate operand addresses
Stage 4 Fetch operands (from registers usually)
Stage 5 Execute instruction (includes computing
new PC for branches or doing loads and stores)
Stage 6 Store result (in register if ALU
operation or load)

This is a pipeline timing diagram showing how
instructions overlap
17
Pipeline Performance

Assume a machine has 6 steps in the fetch-execute
cycle
A non-pipelined machine will take 6 n clock
cycles to execute a program of n instructions
A pipelined machine will take n (6 1) clock
cycles to execute the same program!
If n is 1000, the pipelined machine is 6000 /
1005 times faster, or almost a speed up of 6
times!
In general, a pipelines performance is computed
as
Time (k n 1) tp where k number of
stages, n number of instructions and tp is the
time per stage (plus delays caused by moving
instructions down the pipeline)
The non-pipeline machine is Time k n
So the speedup is k n / (k n 1) tp
However, there are problems that a pipeline faces
because it overlaps the execution of instructions
that cause it to slow down

18
Pipeline Problems

Pipelines are impacted by
Resource conflicts
If it takes more than 1 cycle to perform a stage,
then the next instruction cannot move into that
stage
For instance, floating point operations often
take 2-10 cycles to execute rather than the
single cycle of most integer operations
Data dependences
Consider
Load R1, X
Add R3, R2, R1
Since we want to add in the 5th stage, but the
datum in R1 is not available until the previous
instruction reaches the 6th stage, the Add must
be postponed by at least 1 cycle
Branches
In a branch, the PC is changed, but in a
pipeline, we may have already fetched one or more
instructions before we reach the stage in the
pipeline where the PC is changed!

19
Impact of Branches

In our 6-stage pipeline
we compute the new PC at the 5th stage, so we
would have loaded 4 wrong instructions (these 4
instructions are the branch penalty)
thus, every branch slows the pipeline down by 4
cycles because 4 wrong instructions were already
fetched
consider the four-stage pipeline below
S1 fetch instruction
S2 decode instruction, compute operand
addresses
S3 fetch operands
S4 execute instruction, store result (this
includes computing the PC value)
So, every branch instruction is followed by 3
incorrectly fetched instructions, or a branch
penalty of 3

20
Other Ideas

In order to improve performance, architects have
come up with all kinds of interesting ideas to
maintain a pipelines performance
Superscalar have multiple pipelines so that the
CPU can fetch, decode, execute 2 or more
instructions at a time
Use Branch Prediction when it comes to
branching, try to guess if the branch is taken
and if so where in advance to lower or remove the
branch penalty if you guess wrong, start over
from where you guessed incorrectly
Compiler Optimizations let the compiler
rearrange your assembly code so that data
dependencies are broken up and branch penalties
are removed by filling the slots after a branch
with neutral instructions
SuperPipeline divide pipeline stages into
substages to obtain a greater overlap without
necessarily changing the clock speed
We study these ideas in 462

21
Real ISAs

Intel 2 operands, variable length,
register-memory operations (but not
memory-memory), pipeline superscalar with
speculation but at the microcode level
MIPS fixed length, 3 operand instructions if
operands are in registers, load-store otherwise,
8-stage superpipeline, very simple instruction set

Write a Comment

User Comments (0)