Henri Casanova

About This Presentation

Title:

Henri Casanova

Description:

It is the portion of the computer visible to the programmer/compiler ... world. memory contains. both code and data. CPU. PC. program counter. points to the ... – PowerPoint PPT presentation

Number of Views:158

Avg rating:5.0/5.0

Slides: 102

Provided by: henri172

more less

Transcript and Presenter's Notes

Title: Henri Casanova

1
ICS431Computer Architecture Instruction Set
Architectures (ISA)

Henri Casanova
henric_at_hawaii.edu

2
Components of Architecture

Three main components to computer architecture
Instruction set architecture
Computer organization
Hardware
In this course well mostly focus on the first
two, starting with the first one

3
What is an ISA?

The Instruction Set Architecture, or ISA, of a
computer is the interface between the software
and the hardware
It is the portion of the computer visible to the
programmer/compiler
In some sense, the instruction set architecture
is defined by the set of assembly instructions
that can be used and by what they do
Basic Model

CPU
FETCH DECODE EXECUTE
program counter points to the current instruction
PC
External Data Bus
. . . LOAD R1, (R2) LOAD R3, (R4) ADD R5, R1,
R3 . . .
Memory
. . . 0x13A3B3FC 010101010010110101010101011010110
0x13A3B400 000010101010101010101101011101010 0x13
A3B404 011011111111101101011101101011010 . . .
. . . hello 2122666 world . . .
memory contains both code and data
. . . 0x23E453BB 111010111010100100001010101010111
0x23E453BF 000010100000001001110101010101010 0x23
E453C3 110110101010000000000000011010001 . . .
Instructions can refer to memory cells
4
ISA Categories

Although ISAs can come in many flavors, a typical
distinction is the way in which operands to
instructions are stored in a computer
Three main methods have been used
A stack architecture
Implicitly, all operands are at the top of the
stack
An accumulator architecture
Implicitly, one operand is the accumulator, and
the other must be referenced explicitly
A register architecture
All operands must be referenced explicitly
They can be either in registers or in memory
For now were only talking about operations on
one of two operands (e.g., additions,
comparisons)
These operations are performance in an ALU...

5
ALU

The CPU component that performs arithmetic and
logic operations is called an Arithmetic Logic
Unit (ALU)
Performs additions, substractions,
multiplications, divisions, comparisons, bit
shifting
You have seen how to implement such things in
ICS331
Basically, tons of gates
There can be integer ALUs or floating point ALUs
If no floating point hardware, then it can be
emulated in software
No longer the case for several decades
Floating Point emulation used to be the
overwhelming bottleneck
A CPU may have multiple ALUs
Usual Representation

Op1
Op2
Operation
Result
Overflow
6
ALU Design a birds eye view

Basically an 8-bit ALU takes 16 bits of input
A0,...,A7 and B0,..., B7
Lets say the ALU does ADD, SUB, OR, and AND
Can be encoded with 2 bits X and Y
00 ADD, 01 SUB, 10 OR, 11 AND
The output is 8-bit output plus an overflow bit
(could also have a carry out output)

A0 ... A7
B0 ... B7
X
Y
Overflow
C0 ... C7
7
ALU Design a birds eye view

In the end, its all a big Karnaugh map, and can
thus be implemented with (many) gates
In the real-world
There are convention for binary representations
of numbers
2s complements
sign bits
Large ALUs are built using 1-bit ALUs as building
blocks
See ICS311 for all details

8
Back to ISAs a Stack ISA

The hardware implements a Stack
Reserve a zone in memory that will store the
Stack
Use a TOS (Top of Stack) register that contains
the address of the top of the Stack
The ISA provides a PUSH and a POP instruction, as
well as, say, and ADD instruction
The operands to the ADD are implicitly the two
elements at the top of the Stack, they are
popped, and then the result is placed at the top
of the Stack.

TOS
PUSH A // load the data at _at_ A at the top PUSH
B // load the data at _at_ B at the top ADD //
pop the two two elements, add them, push
the result POP C // store the top at _at_ C Every
time the Stack is modified, the TOS pointer is
updated
9
An Accumulator ISA

The hardware implements an accumulator
A memory cell
The ISA provides a LOAD and a STORE instruction,
as well as, say, and ADD instruction
LOAD and STORE operate between the accumulator
and the memory
One operands to the ADD is implicitly the content
of the accumulator, the other is in memory, and
the result is placed in the accumulator

LOAD A // load the data at _at_ A in the acc. ADD
B // add B to the acc. STORE C // store the
content of the // acc. at _at_
C (Fewer instructions than with a Stack)
...
...
memory
10
Register ISAs

In register ISAs, operations have only explicit
operands from
A set of registers, i.e., memory cells, are
located on the CPU and can hold data
or, Memory cells in RAM (or cache), away from the
CPU
Options 1 register-memory ISAs
Instructions can have some operands in memory
Output is stored in a register
Option 2 register-register ISAs
Instructions can only have operands in registers
Output is stored in a register
Both ISAs provide ways to load data from memory
into registers and store data from register into
memory
Both are called General Purpose Register (GPR)
ISAs
Memory-memory ISAs are no longer found in
computers today
the point is that the use of registers is fast

11
Register ISAs

registers
registers
...
...
...
...
memory
12
Register ISAs

Since 1980s, no computer has been designed with a
stack or accumulator ISA.
Reason 1 Registers are faster than memory and
have become cheap(er) to build
Reason 2 Stack and or accumulator are
inefficient because they make the compilers job
difficult
Example (AB) - (BC) - (AD)
With a stack or accumulator the values of A and B
must be loaded from memory multiple time, which
reduces performance
With a stack of accumulator, there is only one
operand evaluation order possible, which may
prevent cool optimizations that weve come to
take for granted in modern computers
Registers can be used to store frequently used
variables and thereby reduce memory traffic
Registers can be named with fewer bits than a
memory cell, which improves ode density
Therefore, we are left with only register ISAs

13
How many registers?

Newer ISAs tend to have more registers
Having more registers gives the compiler more
opportunities to try to be as efficient as
possible
Typically, compilers reserve some registers for
special purposes
to hold parameters to a function call
as helper for expression evaluation
The remaining registers can be used to hold
program variables

14
Operands and ALU

GPR (General purpose register) ISAs are
characterized precisely by two characteristics
Whether the ALU has 2 or 3 operands
2 operands one of the operands is both an input
and an output
3 operands the third operand is an output
The number of operands that may be memory
addresses
Figure 2.3 in the book shows examples of
architectures that have take different approaches
The only memory-memory ISA is the VAX, which is
now obsolete
We will mostly focus on the (3,0) configuration,
that is three operands, which must be in
registers
corresponds to modern trends
fits the general principles that all instructions
should take the same number of cycles for better
performance

15
ISAs more characteristics

Defining the format of ALU operations is only one
small part of defining the ISA
Other considerations include
Memory addressing
Type and size of operands
What are the operations in the ISA?
Control Flow
ISA encoding
Compilers?
Lets look at all these issues, which are covered
in a bit more detail in the book
Ignore the DSP content in the book

16
Memory Addressing

Every ISA must define how memory addresses
are interpreted
are specified in instructions (Load / Store)
Most ISAs are byte addressed
i.e., the smallest piece of memory that can be
addressed is a byte
Meaning that each byte has an address represented
by a binary string
If I have 128 bytes of addressable RAM, then each
byte has an address thats composed of 7 bits.
Typically, an ISA allows access to
bytes (8 bits)
half words (16 bits)
Supported in C (short integers)
Useful when size of data structures is a concern
(e.g., O/S)
Useful for the Unicode character set (Java)
words (32 bits)
double words (64 bits)

17
Registers

The CPU has several 64-bit registers
These registers can hold bytes, half words,
words, or double words

byte half word word double word
dont care
dont care
dont care
18
Memory Alignment

In most computers access to objects larger than a
byte (half words, words, double words) must be
aligned
Definition of aligned An object of size S
bytes is aligned if its address is a multiple of
S (i.e., _at_ mod S 0)
Example
Accessing 1 byte at _at_ 000000 aligned
Accessing 1 half word at _at_ 000001 NOT aligned
1 mod 2 1
Accessing 1 half word at _at_ 000010 aligned
Accessing 1 double word at _at_ 011000 aligned
24 mod 8 0
See Figure 2.5

19
Why Alignment?

The main reason is simpler hardware
The memory hardware is build with alignment on
large-object boundaries (e.g., double word)
Therefore accessing an object thats misaligned
would increase the number of memory accesses
Example
If you have a computer that allows misaligned
access, then a program that does only aligned
accesses would run faster
Therefore, imposing memory alignment it not such
a terrible thing

20
Alignment to registers

When you load something from memory you actually
get a double word, that is placed in a 64-bit
register
The memory hardware is simpler to design (and
therefore cheaper) if you assume that there is
only one size of data that you can load from it
Besides, its not like loading 64 bits takes
longer than loading 8 bits as long as we have 64
wires
Note that storing can be done at the byte level
Lets say I want to load 1 byte whose address is
x..x100
That byte is located within a double word of
memory (lets assume Big Endian)
Therefore, when I load the whole double word into
a 64-bit register, the byte will be in the middle
of the register!
What I need is to right-shit the whole double
word three times so that the byte is where it
should be the 8 least-significant bits of the
register

0
1
2
3
4
5
6
7
X
X
X
0
1
2
3
4
21
Byte ordering

There are two conventions for the bytes within a
half word, word, or double word
Lets look at a double word (8 bytes)
The bytes within this double word are contiguous
in memory
Therefore their addresses are
x...x000, x...x001, x...x010, x...x011, x...x100,
x...x101, x...x110, x...x111
Question does the left-most byte have address
x...x000 or x...x111.
The notion of left/right is based on the fact
that the memory is linear, with increasing
addresses

22
Byte Ordering

Two conventions for byte ordering in a double
word
Little EndianThe byte whose address is x...x000
is placed at the least significant position,
the little end, meaning that the bytes are
numbered
Pentium
Big EndianThe byte whose address is x...x000 is
placed at the most significant position, the
big end, meaning that the bytes are numbered
SPARC
Some architectures can be configured either way
(bi-endian)
Either in software (at boot up) or on the
motherboard
PowerPC, IA64

7
6
5
4
3
2
1
0
0
1
2
3
4
5
6
7
23
Byte Ordering

Why should one care about byte ordering?
Most of the time, one doesnt
its all hidden
But it can be an issue when one accesses the same
location both as a double word or as a byte

24
Byte ordering in action

Lets see how one can write a C program that
gives different results on computers that use
different byte orderings
To expose different behaviors one need to
interpret the bytes of a word as individual bytes
Therefore, one can cast an integer to a
4-character string as in the code below
include ltstdio.hgt
int main()
struct
int x
char y
data
data.x 0x61626364
data.y '\0'
printf("--gt 's'\n",(char)data)

x
y
25
Byte ordering in action

On a Little Endian machine
include ltstdio.hgt
int main()
struct
int x
char y
data
data.x 0x61626364
data.y '\0'
printf("--gt 's'\n",(char)data)

d
c
b
a
\0
26
Byte ordering in action

On a Big Endian machine
include ltstdio.hgt
int main()
struct
int x
char y
data
data.x 0x61626364
data.y '\0'
printf("--gt 's'\n",(char)data)

a
b
c
d
\0
27
Byte ordering

Little Endian doesnt match normal ordering of
characters in strings
Therefore, many people think that Little Endian
is backwards
But note that its the way we interpret bits in a
byte!!
Differences in Endianness causes problem when
exchanging data among computers
IP defines network byte order as Big Endian
UNIX provides the htons() and ntohl() functions
to enable easy conversions

28
Exercise 2.3

The value represented by the hexadecimal number
434F 4D50 5554 4552 is to be stored in an aligned
64-bit double word
Question a Using the physical arrangement of the
first row in Figure 2.5, write the value to be
stored using Big Endian Order

0 1 2 3 4 5 6 7
29
Exercise 2.3. Question a

Answer
Next, interpret each byte as an ASCII character
and below each byte write the corresponding
character, forming the character string as it
would be stored in Big Endian order

43
4F
4D
50
55
54
45
52
ASCII codes in decimal
30
Exercise 2.3. Question b

Answer
Question b Same question but using Little Endian
byte order

43
4F
4D
50
55
54
45
52
67
79
77
80
85
84
69
82
C
O
M
P
U
T
E
R
52
45
54
55
50
4D
4F
43
31
Exercise 2.3. Question b

Answer
Question b Same question but using Little Endian
byte order

67
79
77
80
85
84
69
82
C
O
M
P
U
T
E
R
52
45
54
55
50
4D
4F
43
52
45
54
55
50
4D
4F
43
82
69
84
85
80
77
79
67
R
E
T
U
P
M
O
C
32
Exercise 2.3. Question c

Question c What are the hexadecimal values of
all misaligned 2-byte words that can be read from
the given 64-bit double word when stored in Big
Endian byte order?

43
4F
4D
50
55
54
45
52
33
Exercise 2.3. Question c

Question c What are the hexadecimal values of
all misaligned 2-byte words that can be read from
the given 64-bit double word when stored in Big
Endian byte order?

43
4F
4D
50
55
54
45
52
4F4D
5055
5445
34
Exercise 2.3. Question d

Question d What are the hexadecimal values of
all misaligned 4-byte words that can be read from
the given 64-bit double word when stored in
Little Endian byte order?

52
45
54
55
50
4D
4F
43
35
Exercise 2.3. Question d

Question c What are the hexadecimal values of
all misaligned 2-byte words that can be read from
the given 64-bit double word when stored in
Little Endian byte order?

5455504D
52
45
54
55
50
4D
4F
43
45545550
55504D4F
36
Exercise 2.3

This exercise is actually not very well
formulated for the Little Endian scheme
The way in which the quantities are written and
read matters, and the exercise does not say which
way is used.
In this class we will just consider C-code
fragment, so that everything is crystal clear.
Remember the C program I showed earlier
include ltstdio.hgt
int main()
struct
int x
char y
data
data.x 0x41424344
data.y '\0'
printf("--gt 's'\n",(char)data)

Big Endian abcd Little Endian dcba
37
Exercise 2.3

What happens if the same constant is stored as
two short integers (i.e., 2-byte integers)?
include ltstdio.hgt
int main()
struct
short x1
short x2
char y
data
data.x1 0x4142
data.x2 0x4344
data.y '\0'
printf("--gt 's'\n",(char)data)
Little Endian badc !

38
Exercise 2.3

Whats happening here?
Bytes are ordered within the object that was
stored from the program
Consider these two code fragments
int a
((short )a)0x4142
((short )a1) 0x4344
and
int a
a 0x41424344
On a Big Endian machine they are equivalent
One a Little Endian machine they are not

39
Exercise 2.3

Important The endianness also matters when data
is being read into object larger than bytes
Example
struct
char x
char y
data
data.x 0x04
data.y 0x01
printf(--gt d\n,((short)data))
On a Big Endian machine
0x0401 (decimal1025)
On a Little Endian machine
0x0104 (decimal 260)

0A
01
40
In-class Exercise

Consider the following fragment of code
include ltstdio.hgt
int main()
struct
short x1
int x2
char y
data
data.x1 0x4142
data.x2 0x43444546
data.y '\0'
printf("--gt 's'\n",(char)data)
The output in Big Endian mode is abcdef, whats
the output in Little Endian mode?

41
Answer

include ltstdio.hgt
int main()
struct
short x1
int x2
char y
data
data.x1 0x4142
data.x2 0x43444546
data.y '\0'
printf("--gt 's'\n",(char)data)

y
x2
x1
42
Answer

include ltstdio.hgt
int main()
struct
short x1
int x2
char y
data
data.x1 0x4142
data.x2 0x43444546
data.y '\0'
printf("--gt 's'\n",(char)data)

42
41
y
x2
x1
43
Answer

include ltstdio.hgt
int main()
struct
short x1
int x2
char y
data
data.x1 0x4142
data.x2 0x43444546
data.y '\0'
printf("--gt 's'\n",(char)data)

42
41
46
45
44
43
y
x2
x1
44
Answer

include ltstdio.hgt
int main()
struct
short x1
int x2
char y
data
data.x1 0x4142
data.x2 0x43444546
data.y '\0'
printf("--gt 's'\n",(char)data)

42
41
46
45
44
43
0
y
x2
x1
45
Answer

include ltstdio.hgt
int main()
struct
short x1
int x2
char y
data
data.x1 0x4142
data.x2 0x43444546
data.y '\0'
printf("--gt 's'\n",(char)data)

42
41
46
45
44
43
0
b
a
f
e
d
c
0
46
Addressing modes

Question how are addresses specified in the
instructions of the ISA?
Example
Consider 2-operand ALUs
ADD R1, (R2)
take the value of register R2
interpret it as an address
fetch the memory cell at that address
add it to the value of register R1
store the result into register R1
This addressing mode is called Register
indirect
It happens to be only one of MANY addressing
modes that have been used in real-world computers
We will see a few of them and their associated
C-like pseudo-code
Notation
Mem an array of memory cells
Regs an array of registers

47
Addressing Modes
48
Addressing Modes

Even more complex addressing modes

49
Addressing Modes

Some ISAs include all of these addressing modes
VAX
Some ISAs include only a few MIPS
What is the trade-offs?
The more addressing modes the lower the IC
The programmer/compiler can use a single
instruction to do several things at once
The more addressing modes the more complex the
hardware
The cost of the system increases
The more addressing modes the higher the CPI
Some instructions will take many more clock
cycles than others
We will see that nowadays this is considered a
bad thing
(Note that VAXes are extinct)
Goal Find the right balance
Simple hardware
Enough addressing modes so that writing code
isnt too painful

50
VAX Addressing Modes
Figure 2.7

Lesson Only a few addressing modes are heavily
used
Immediate
Displacement

51
Displacement range

Add R4, 100(R1) // add R4 and Mem100R1
Question How big can the displacement be?
Trade-off
the bigger the maximum displacement, the wider
the use of the instruction
the smaller the maximum displacement, the shorter
the instruction encoding
instructions are encoded as bit strings
the displacement will be encoded as a binary
number
the larger the maximum displacement, the larger
the number of bits

Figure 2.8
52
Immediates

Add R4, 3 // add R4 and the number 3
Question How big can the immediate number be?
Trade-off
the bigger the maximum number, the wider the use
of the instruction
the smaller the maximum number, the shorter the
instruction encoding
instructions are encoded as bit strings
the maximum number will be encoded as a binary
number
the larger the maximum number, the larger the
number of bits

Figure 2.10
53
Type and Size of Operands

Question How is the type of an operand
designated?
Typically types are encoded as part of the
instruction code
The type generally gives the size of an operand
characters/bytes 8bits
half words / Unicode characters 16bits
words/single precision floats 32bits
double words / double precision floats 64bits
SPEC benchmark
Figure 2.12
indicates that it is important to support
efficient access to 64-bit objects
64-bit access path? (1 cycle)
32-bit access path? (2 cycles)

54
X-bit architecture?

What do we mean when we say a X-bit architecture
A 32-bit architecture?
A 64-bit architecture?
Unfortunately, there is no clear definition
Can be used to describe an architecture in which
integers, memory addresses (and sometimes other
data units) are at most encoded with X bits
On a 32-bit architecture
8-, 16-, and 32-bit integers
32- and 64-bit floating point numbers
On a 64-bit architecture
8-, 16, 32-, and 64-bit integers
32- and 64-bit floating point numbers
Can be used to describe a CPU (or ALU) that uses
X-bit registers
Can be used to describe a computer that has an
X-bit wide external data bus

55
X-bit Architecture

In general when one says an X-bit architecture
one refers to a system that can deal with X-bit
chunks of integer data (integers and addresses)
internally (registers) and externally (data
busses)
Although a CPU may be X-bit internally, its
external data bus or address bus may have a
different size, either larger or smaller, and the
term is often used to describe the size of these
buses as well

Intel-Compatible Processors
56
Storing Integers

True binary representation
1101 means 13
Problem with true binary representation
No way to encode negative numbers
Ones complement representation
The Most Significant Bit (MSB), i.e., the
leftmost bit, indicates the sign of the number
0 means positive
1 means negative
If the bit string starts with 0, then just
interpret the string as true binary
representation
If the bit string starts with 1, then takes its
complement (change all 0s to 1s and all 1s to 0s)
and interpret the string as true binary
representation
Example
1101 means -2
0010 means 2
It is symmetric as many positive numbers as
negative numbers

57
Ones Complement

Problem with ones complement
There are 2 ways to represent zero (0000 and
1111)
0110 3
1001 -3
1111 -0
A machine using ones complement must recognize
that the two zeros are identical
Option 1 deal with the two representations
All instructions that deal with zero must be
augmented with two versions
Way too complicated and costly
Option 2 transform 1111 into 0000
This is done with a AND gate over all bits and an
inversion of all bits when a string with all 1s
is computed
Possible but slows down arithmetic operations
(addition)
Computers havent used ones complement in
decades (early Cray computers)

58
Twos Complement

Goal avoid the two zeros problem
Principle
Add one to the complement of a number to make it
negative
Example
01001 means 9
10111 means -9
If we try to compute the negative of 00000
Take the complement 11111
Add 1 100000 (6-bit)
With overflow 00000
Addition
0110 3
1010 -3
(1) 0000 -0 (overflow)
Minor problem there is one more negative number
than there are positive numbers
All computers today use twos complement

59
Storing Floating Point Numbers

IEEE Standard for Floating Point Arithmetic IEEE
754
General Layout

exponent (e bits)
sign bit
mantissa (f bits)

Single precision (32-bit) exp8 bits,
mantissa23 bits
Double precision (64-bit) exp11 bits,
mantissa52 bits
Sign bit
0positive, 1negative
Exponent
base2
both positive and negative exponent
Number /- mantissa x 2exponent

60
ISA Operations

What fundamental operations are provided by ISA?
There is a generally accepted taxonomy of
operation types (Figure 2.15)

universal
varies
optional
61
Most popular instructions

Question Which instructions end up being used
the most?
SPEC Benchmark on 80x86
(Figure 2.16)

Observation 10 simple instructions account for
96 of all instructions
Lesson One should make sure that they go fast
because they are the common case
Lesson2 It is dubious that its worth
implementing many other sophisticated functions
Motivation to move from CISC to RISC (more on
this later)

62
Control Flow

Control Flow instructions instructions that
allow the program counter (PC) to jump to
another instruction than the next instructions
The PC contains the address of the instruction
being executed
By default, PCPC1 after executing an
instruction
Terminology
1950s transfer
1960s branch
conditional branches (if test then goto label)
unconditional branches (goto label)
The book
jump when unconditional
branch when conditional
Four types of control flow instructions
Conditional branches
Jumps
Procedure calls
Procedure returns

63
Control Flow

Conditional branches dominates other control flow
instructions
Figure 2.19 (SPEC benchmark)

64
Control Flow and Addressing

Question How does one specify the target
address, i.e., the address of the instruction
that needs to be executed next?
The target address must be encoded in the
instruction
As bits in the instruction encoding
One exception returning from a procedure call
Reason the return address from a procedure call
is not known at compile time
known at compile time The compiler can just
look at the code and figure it out
example size of array x int x3
known only at runtime The compiler cannot just
look at the code and figure it out.
Instead, one must wait until the program runs to
figure it out.
But, the compiler can generate code to figure it
out at runtime.
example size of array x int x x
(int)calloc(1,nsizeof(int))
Question Why dont we know the returning address
at compile-time?

65
Procedure returns

Reason why procedure return addresses are not
known at compile-time

int main() int answer sscanf(d,answer)
if (d gt 0) foo1() else foo2()
void foo1() ... f() ...
void f() printf(hello\n) return
void foo2() ... f() ...
66
Control Flow and Addressing

Most common way to specify the address of a
change in control flow relatively to the Program
Counter (PC)
Called PC-relative

Advantages
The target is often near the control flow
instruction, and the PC offset (2 in the example)
can be encoded with few bits
The code can be loaded at any address and run
correctly
position independence

... LOAD R1,0(R2) ADD R4, R1 JMP to PC2 ADD R3,
R6 LOAD R3, 0(R5) ...
PC
67
Non-PC-relative addressing

When the target address is not known at compile
time, the PC-relative addressing is not
applicable
Example procedure call return
Whats needed is a way to specify the target
address at runtime
e.g., could be done by putting the target address
in a register and then encoding the register in
the control flow instructions encoding
Other examples of uses in which the target
address is not known at compile-time
switch statements
virtual methods
function pointers
dynamically linked libraries

68
PC-relative branches

Question How far off are PC-relative branches
from the PC?
Important to know to determine how many bits are
needed to encode the offset in the instruction
Figure 2.20
Message gt8 bits is probably not necessary

69
Conditional Branches

Question How does one specify the condition on
which branching occurs?
Three options (Figure 2.21)
Condition code
The ALU sets special bits based on the last
operation it has executed
Drawback these bits can be overwritten
Condition register
A register contains the result of a comparison
Drawback uses up a register
Compare and branch
Do a comparison between a register and another
register or a constant, and depending on the
result branch or not
Drawback a lot of work for the instruction
Many comparisons are with 0, and many ISAs
provide special operations for testing/comparing
to 0.

70
Conditional Branches

Figure 2.22

71
Procedure Invocations

What happens when a procedure is called?
Caller procedure placing the call
Callee procedure being called
Sequence of actions
The context of the caller is saved
return address (sometimes to a special register)
registers
The callee is executed in a new context
fresh registers
The callee returns
to the address saved earlier
The callers context is restored
old register values are copied back into registers

72
Procedure Invocations

How does all this happen?
Answer the compiler just generates code (i.e.,
assembly code) that deals with all the logistics
Generates loads and stores so that the contexts
are managed correctly
Example
reserve a zone of memory for register saving
every time a procedure is called, save ALL
register values into that zone
every time a procedure returns, load ALL register
values from that zone
create a stack of such zones for subsequent
procedure calls
This approach is naive and inefficient
Compilers go to great lengths to eliminate
unneeded loads and stores during procedure calls
When you write the compiler you have tons of
options for the assembly code you generate

73
Encoding an ISA

We have mention the notion of binary encoding of
instructions of the ISA
Look at the specification of the ISA
For each possible instruction come up with a
binary string that represents it (called the
opcode)
Represent operands via binary strings as well
Then build the decoding/execution of the
instructions in hardware (with gates and other
ICS331 things)
A key concern how to encode addressing modes?
Some (older) ISAs have instructions with many
operands and many addressing modes
In this case, one encodes an address specifier
for each operand
Just come up with a binary representation of the
addressing modes in Figure 2.6
Some (more recent) ISAs are so-called
load-store architectures, with only one memory
operand and only one or two addressing modes
everything can be encoded easily as part of the
instruction

74
Encoding an ISA

Three important competing forces
1 The desire to have as many registers and
addressing modes as possible, which makes
instruction encode longer
2The desire to reduce the average instruction
code size, and thus on average program size
3 The desire to have instructions that are
easy/fast to decode
in multiple of bytes, not arbitrary bit lengths
fixed-length instructions are the easiest
although they may preclude some optimizations
Lets look at how some popular ISAs encode their
instructions
Variable format (force 1 and 2)
Fixed format (force 3)
Hybrid format (an attempt at a good balance)

75
ISA encoding
Figure 2.23
76
ISA Encoding

Variable (vs. Fixed)
Since the instruction format is variable,
instruction codes can just be as big as needed
Much less extra padding just to comply to a fixed
format when in fact not all bits are used
All instructions look different and decoding can
be more time consuming
Question What do real computers do?

77
CISC vs. RISC

CISC Complex Instruction Set Computer
Each instruction is complex, meaning that it
can do many things at once
e.g., load something from memory, perform an
arithmetic operation, and store something to
memory, all in the same instruction
The idea was to bridge the gap between high-level
programming languages and the machine
Assembly code was close(r) to a high-level
programming language
Many machines were designed in the pre-compiler
age
RISC Reduced Instruction Set Computer
Came after CISC systems, and motivated by several
observations
Most programs used only a small fraction of the
instructions in the ISAs, leaving the most
complex ones out
The most complex ones were thus slower because of
the make the common case fast principle
employed by the designers!
Complex instructions were difficult to decode
Complex instructions took many clock cycles to
execute
Registers became cheaper and using them to hold
many temporary values was becoming conceivable
Compilers were better at using simpler
instructions
The idea of pipelining became prevalent (see
upcoming lecture)

78
CISC vs. RISC

Fallacy Reduced Instruction Set doesnt mean
that there are fewer instructions in a RISC ISA
than in a CISC ISA
It just means that the instructions are all
simple
The key philosophy of RISC, which differs from
CISC is
Do operations in registers
which are all identical and not scarce
Use load and store to communicate between
registers and memory
using simple addressing modes
Code is implemented as a (much longer) series of
these simple operations
Therefore, many people prefer the term
load-store architecture
Note that the label CISC was given to systems
after the fact, so as to differentiate them from
RISC
By the late 1980s RISCs were outperforming most
CISCs
Transistors saved by implementing simpler
instructions can be used for other things
Rule of Thumb At the same number of transistors,
a RISC system is faster than a CISC system
Note that RISC is not a good idea when code-size
is an issue
Almost never the case for desktop and servers
Definitely an issue for current embedded systems

79
Back to ISA Encoding

CISC machines use the variable encoding
RISC machines use the fixed format

80
The x86 ISA

Fallacy Today, most computers are RISC
The x86 ISA is the most widely use ISA today
for instance, all Pentium processors
The x86 is a CISC architecture
x86 early history
1978 Intel creates the 8086 processor (16-bit)
Sort of like an accumulator machine because
registers were not general purpose
1980 Intel 8087
Pretty much a stack architecture
1982 80286 (24-bit)
New instructions
backward compatible (in a special mode) with the
8086
1985 80386 (32-bit)
New addressing modes and instructions
backward compatible (in a special mode) with the
8086
The concern for backward compatibility due to an
existing software base kept each step
incremental without true architectural changes

81
The x86 ISA

x86 more recent history
1989 80486, 1992 Pentium, 1995 P6
aimed at higher performance
only tiny additions to the ISA
1997 MMX Extension
57 new instructions to perform operations on
narrow data types (8-bit, 16-bit, 32-bit) in
parallel
used for multi-media
we will talk more about this
1999 SEE Extension (in the Pentium III)
70 new instructions
4 32-bit floating-point operations in parallel
new cache prefetch instructions
2001 SSE2 Extension
144 new instructions
Basically MMX and SSE instructions but for 64-bit
floating-point numbers

82
The x86 ISA

Bottom line
The x86 ISA is not orthogonal
i.e., there are many special cases and exceptions
to rules
Mastering ways in which to determine which
registers and which addressing modes are
available for a given task is difficult
The ugliness of it all is that is stems from
the antiquated 8086 processor, with pieces glued
on
How come it was successful?
Intel had 16-bit processors before everybody else
Much more elegant architectures got there a bit
later (Motorola 6800)
BUT, this head start led to the selection of the
8086 for the IBM PC
Similar phenomenon with FORTRAN, for instance
How come its still successful?
The x86 ISA is not too difficult to implement
After all, Intel improved performance steadily
over the years
easy for integer programs
more problematic for floating point programs
But what about this RISC idea?
Current x86 processor decode x86 instructions
into smaller instructions (called micro-ops)
which are then executed by the RISC-like
architecture
Goal performance, but still expose the
familiar ISA to the programmer
See Appendix D for more details on x86

83
IA64

New ISA developed by Intel and HP
Not to mistake with the IA32, which is basically
a 32-bit x86 ISA
Not much in common with the x86 ISA
Use in the Intel Itanium processor family
Places its main emphasis on Instruction Level
Parallelism (ILP)
See upcoming lecture on ILP
Is is RISC?
Supposed to be a post-RISC era ISA
Called Explicit Parallel Instruction Computing
(EPIC)
But it borrows many RISC concepts
some say calling is a new name is a gimmick
Well talk at length about related issues in the
ILP lecture

84
Some Historical Perspective

Earliest computers Accumulator-based
The only feasible approach at times when hardware
was incredibly bulky and expensive
Stack and register architectures fought it out
until the late 1970s, with the register
architecture winning in the end
Architecture and programming languages
VAX complex instructions to have a better
mapping between programming language and assembly
language
In the 1980s the trend went towards RISC
architectures
memory is cheap and code size is no longer a
concern
compiler technology has improved and almost
nobody writes assembly by hand any longer
RISC was proven to be much faster and cheaper to
manufacture (Figure 2.41).
More details in Section 2.16

85
In-class Exercise

Exercise 2.4

86
Exercise 2.4

You task is to compare the memory efficiency of
four different styles of instruction set
architectures
Accumulator All operations occur between a
single register and a memory location
Memory-memory All instruction addresses
reference only memory locations
Stack All operations occur on top of the stack.
Push and pop are the only instructions that
access memory all others remove their operands
from the stack and replace them with the result.
Only the top two stack entries are kept near the
processor (with circuitry). Lower stack positions
are kept in memory locations, and accesses to
these stack positions require memory references
Load-Store All operations occur in registers,
and register-to-register instruction have three
register names per instruction.

87
Exercise 2.4

Make the following assumptions
All instructions are an integral number of bytes
in length
The opcode is always 1 byte
Memory accesses use direct addressing
A, B, C, and D are initially in memory
Question a Invent your own assembly language
mnemonics (Figure 2.2), and for each architecture
write the best equivalent assembly language code
for
A B C
B A C
D A - B
Accumulator Memory-Memory Stack Load-Store
Load A Add C, A, B Push A Load R1, A
Add B Push B Load R2, B
Store C Add Add R3, R1, R2
Pop C Store R3, C
First architecture Accumulator

88
Exercise 2.4. Question a

Code for the accumulator architecture
Load B Acc ? B
Add C Acc ?Acc C
Store A A ? Acc
Add C Acc ? Acc C
Store B B ? Acc
Negate Acc ? -Acc
Add A Acc ? Acc A
Store D D ? Acc
Next memory-memory architecture

89
Exercise 2.4. Question a

Code for the memory-memory architecture
Add A, B, C
Add B, A, C
Sub D, A, B
Next stack architecture

90
Exercise 2.4. Question a

Code for the stack architecture
Push B
Push C
Add
Pop A
Push A
Push C
Add
Pop B
Push A
Push B
Sub
Pop D
Next load-store architecture

91
Exercise 2.4. Question a

Code for the load-store architecture
Load R1, B R1 ? B
Load R2, C R2 ? C
Add R3, R1, R2 R3 ? R1 R2
Store R3, A A ? R3
Add R4, R3, R2 R4 ? R3 R2
Store R4, B B ? R4
Sub R4, R3, R4 R4 ? R3 - R4
Store R4, D D ? R4
Lets skip Question b and go directly to Question
c

92
Exercise 2.4. Question c

Assume the given code sequence is from a small,
embedded computer application, such as a
microwave oven controller, that uses 16-bit
memory addresses and data operands. (A load-store
architecture for this system would use 16-bit
registers.) For each architecture answer
How many instruction bytes are fetched?
How many bytes of data are transferred from/to
memory?

93
Exercise 2.4. Question c

First Accumulator architecture
How many instruction bytes are fetched?
How many bytes of data are transferred from/to
memory?
Load B
Add C
Store A
Add C
Store B
Negate
Add A
Store D

Remember that opcodes are 8-bit and
data/addresses are 16-bit
94
Exercise 2.4. Question c

First Accumulator architecture
How many instruction bytes are fetched?
How many bytes of data are transferred from/to
memory?
Load B I 123 D 2
Add C I 123 D 2
Store A I 123 D 2
Add C I 123 D 2
Store B I 123 D 2
Negate I 1 D 0
Add A I 123 D 2
Store D I 123 D 2
-------------------
I 22 D 14 36

95
Exercise 2.4. Question c

Second Memory-memory architecture
How many instruction bytes are fetched?
How many bytes of data are transferred from/to
memory?
Add A, B, C
Add B, A, C
Sub D, A, B

Remember that opcodes are 8-bit and
data/addresses are 16-bit
96
Exercise 2.4. Question c

Second Memory-memory architecture
How many instruction bytes are fetched?
How many bytes of data are transferred from/to
memory?
Add A, B, C I 16 D 6
Add B, A, C I 16 D 6
Sub D, A, B I 16 D 6
---------------
I 21 D 18 39

97
Exercise 2.4. Question c

Third Stack architecture
How many instruction bytes are fetched?
How many bytes of data are transferred from/to
memory?
Push B
Push C
Add
Pop A
Push A
Push C
Add
Pop B
Push A
Push B
Sub
Pop D

Remember that opcodes are 8-bit and
data/addresses are 16-bit Remember that the
stack is not empty when the code starts executing
and in fact has most likely more than two
elements in it.
98
Exercise 2.4. Question c

Write a Comment

User Comments (0)