Title: Henri Casanova
1ICS431Computer Architecture Instruction Set
Architectures (ISA)
- Henri Casanova
- henric_at_hawaii.edu
2Components of Architecture
- Three main components to computer architecture
- Instruction set architecture
- Computer organization
- Hardware
- In this course well mostly focus on the first
two, starting with the first one
3What is an ISA?
- The Instruction Set Architecture, or ISA, of a
computer is the interface between the software
and the hardware - It is the portion of the computer visible to the
programmer/compiler - In some sense, the instruction set architecture
is defined by the set of assembly instructions
that can be used and by what they do - Basic Model
CPU
FETCH DECODE EXECUTE
program counter points to the current instruction
PC
External Data Bus
. . . LOAD R1, (R2) LOAD R3, (R4) ADD R5, R1,
R3 . . .
Memory
. . . 0x13A3B3FC 010101010010110101010101011010110
0x13A3B400 000010101010101010101101011101010 0x13
A3B404 011011111111101101011101101011010 . . .
. . . hello 2122666 world . . .
memory contains both code and data
. . . 0x23E453BB 111010111010100100001010101010111
0x23E453BF 000010100000001001110101010101010 0x23
E453C3 110110101010000000000000011010001 . . .
Instructions can refer to memory cells
4ISA Categories
- Although ISAs can come in many flavors, a typical
distinction is the way in which operands to
instructions are stored in a computer - Three main methods have been used
- A stack architecture
- Implicitly, all operands are at the top of the
stack - An accumulator architecture
- Implicitly, one operand is the accumulator, and
the other must be referenced explicitly - A register architecture
- All operands must be referenced explicitly
- They can be either in registers or in memory
- For now were only talking about operations on
one of two operands (e.g., additions,
comparisons) - These operations are performance in an ALU...
5ALU
- The CPU component that performs arithmetic and
logic operations is called an Arithmetic Logic
Unit (ALU) - Performs additions, substractions,
multiplications, divisions, comparisons, bit
shifting - You have seen how to implement such things in
ICS331 - Basically, tons of gates
- There can be integer ALUs or floating point ALUs
- If no floating point hardware, then it can be
emulated in software - No longer the case for several decades
- Floating Point emulation used to be the
overwhelming bottleneck - A CPU may have multiple ALUs
- Usual Representation
Op1
Op2
Operation
Result
Overflow
6ALU Design a birds eye view
- Basically an 8-bit ALU takes 16 bits of input
- A0,...,A7 and B0,..., B7
- Lets say the ALU does ADD, SUB, OR, and AND
- Can be encoded with 2 bits X and Y
- 00 ADD, 01 SUB, 10 OR, 11 AND
- The output is 8-bit output plus an overflow bit
(could also have a carry out output)
A0 ... A7
B0 ... B7
X
Y
Overflow
C0 ... C7
7ALU Design a birds eye view
- In the end, its all a big Karnaugh map, and can
thus be implemented with (many) gates - In the real-world
- There are convention for binary representations
of numbers - 2s complements
- sign bits
- Large ALUs are built using 1-bit ALUs as building
blocks - See ICS311 for all details
8Back to ISAs a Stack ISA
- The hardware implements a Stack
- Reserve a zone in memory that will store the
Stack - Use a TOS (Top of Stack) register that contains
the address of the top of the Stack - The ISA provides a PUSH and a POP instruction, as
well as, say, and ADD instruction - The operands to the ADD are implicitly the two
elements at the top of the Stack, they are
popped, and then the result is placed at the top
of the Stack.
TOS
PUSH A // load the data at _at_ A at the top PUSH
B // load the data at _at_ B at the top ADD //
pop the two two elements, add them, push
the result POP C // store the top at _at_ C Every
time the Stack is modified, the TOS pointer is
updated
9An Accumulator ISA
- The hardware implements an accumulator
- A memory cell
- The ISA provides a LOAD and a STORE instruction,
as well as, say, and ADD instruction - LOAD and STORE operate between the accumulator
and the memory - One operands to the ADD is implicitly the content
of the accumulator, the other is in memory, and
the result is placed in the accumulator
LOAD A // load the data at _at_ A in the acc. ADD
B // add B to the acc. STORE C // store the
content of the // acc. at _at_
C (Fewer instructions than with a Stack)
...
...
memory
10Register ISAs
- In register ISAs, operations have only explicit
operands from - A set of registers, i.e., memory cells, are
located on the CPU and can hold data - or, Memory cells in RAM (or cache), away from the
CPU - Options 1 register-memory ISAs
- Instructions can have some operands in memory
- Output is stored in a register
- Option 2 register-register ISAs
- Instructions can only have operands in registers
- Output is stored in a register
- Both ISAs provide ways to load data from memory
into registers and store data from register into
memory - Both are called General Purpose Register (GPR)
ISAs - Memory-memory ISAs are no longer found in
computers today - the point is that the use of registers is fast
11Register ISAs
- Register-Memory Register-Register
registers
registers
...
...
...
...
memory
12Register ISAs
- Since 1980s, no computer has been designed with a
stack or accumulator ISA. - Reason 1 Registers are faster than memory and
have become cheap(er) to build - Reason 2 Stack and or accumulator are
inefficient because they make the compilers job
difficult - Example (AB) - (BC) - (AD)
- With a stack or accumulator the values of A and B
must be loaded from memory multiple time, which
reduces performance - With a stack of accumulator, there is only one
operand evaluation order possible, which may
prevent cool optimizations that weve come to
take for granted in modern computers - Registers can be used to store frequently used
variables and thereby reduce memory traffic - Registers can be named with fewer bits than a
memory cell, which improves ode density - Therefore, we are left with only register ISAs
13How many registers?
- Newer ISAs tend to have more registers
- Having more registers gives the compiler more
opportunities to try to be as efficient as
possible - Typically, compilers reserve some registers for
special purposes - to hold parameters to a function call
- as helper for expression evaluation
- The remaining registers can be used to hold
program variables
14Operands and ALU
- GPR (General purpose register) ISAs are
characterized precisely by two characteristics - Whether the ALU has 2 or 3 operands
- 2 operands one of the operands is both an input
and an output - 3 operands the third operand is an output
- The number of operands that may be memory
addresses - Figure 2.3 in the book shows examples of
architectures that have take different approaches - The only memory-memory ISA is the VAX, which is
now obsolete - We will mostly focus on the (3,0) configuration,
that is three operands, which must be in
registers - corresponds to modern trends
- fits the general principles that all instructions
should take the same number of cycles for better
performance
15ISAs more characteristics
- Defining the format of ALU operations is only one
small part of defining the ISA - Other considerations include
- Memory addressing
- Type and size of operands
- What are the operations in the ISA?
- Control Flow
- ISA encoding
- Compilers?
- Lets look at all these issues, which are covered
in a bit more detail in the book - Ignore the DSP content in the book
16Memory Addressing
- Every ISA must define how memory addresses
- are interpreted
- are specified in instructions (Load / Store)
- Most ISAs are byte addressed
- i.e., the smallest piece of memory that can be
addressed is a byte - Meaning that each byte has an address represented
by a binary string - If I have 128 bytes of addressable RAM, then each
byte has an address thats composed of 7 bits. - Typically, an ISA allows access to
- bytes (8 bits)
- half words (16 bits)
- Supported in C (short integers)
- Useful when size of data structures is a concern
(e.g., O/S) - Useful for the Unicode character set (Java)
- words (32 bits)
- double words (64 bits)
17Registers
- The CPU has several 64-bit registers
- These registers can hold bytes, half words,
words, or double words
byte half word word double word
dont care
dont care
dont care
18Memory Alignment
- In most computers access to objects larger than a
byte (half words, words, double words) must be
aligned - Definition of aligned An object of size S
bytes is aligned if its address is a multiple of
S (i.e., _at_ mod S 0) - Example
- Accessing 1 byte at _at_ 000000 aligned
- Accessing 1 half word at _at_ 000001 NOT aligned
- 1 mod 2 1
- Accessing 1 half word at _at_ 000010 aligned
- Accessing 1 double word at _at_ 011000 aligned
- 24 mod 8 0
- See Figure 2.5
19Why Alignment?
- The main reason is simpler hardware
- The memory hardware is build with alignment on
large-object boundaries (e.g., double word) - Therefore accessing an object thats misaligned
would increase the number of memory accesses - Example
- If you have a computer that allows misaligned
access, then a program that does only aligned
accesses would run faster - Therefore, imposing memory alignment it not such
a terrible thing
20Alignment to registers
- When you load something from memory you actually
get a double word, that is placed in a 64-bit
register - The memory hardware is simpler to design (and
therefore cheaper) if you assume that there is
only one size of data that you can load from it - Besides, its not like loading 64 bits takes
longer than loading 8 bits as long as we have 64
wires - Note that storing can be done at the byte level
- Lets say I want to load 1 byte whose address is
x..x100 - That byte is located within a double word of
memory (lets assume Big Endian) - Therefore, when I load the whole double word into
a 64-bit register, the byte will be in the middle
of the register! - What I need is to right-shit the whole double
word three times so that the byte is where it
should be the 8 least-significant bits of the
register
0
1
2
3
4
5
6
7
X
X
X
0
1
2
3
4
21Byte ordering
- There are two conventions for the bytes within a
half word, word, or double word - Lets look at a double word (8 bytes)
- The bytes within this double word are contiguous
in memory - Therefore their addresses are
- x...x000, x...x001, x...x010, x...x011, x...x100,
x...x101, x...x110, x...x111 - Question does the left-most byte have address
x...x000 or x...x111. - The notion of left/right is based on the fact
that the memory is linear, with increasing
addresses
22Byte Ordering
- Two conventions for byte ordering in a double
word - Little EndianThe byte whose address is x...x000
is placed at the least significant position,
the little end, meaning that the bytes are
numbered - Pentium
- Big EndianThe byte whose address is x...x000 is
placed at the most significant position, the
big end, meaning that the bytes are numbered - SPARC
- Some architectures can be configured either way
(bi-endian) - Either in software (at boot up) or on the
motherboard - PowerPC, IA64
7
6
5
4
3
2
1
0
0
1
2
3
4
5
6
7
23Byte Ordering
- Why should one care about byte ordering?
- Most of the time, one doesnt
- its all hidden
- But it can be an issue when one accesses the same
location both as a double word or as a byte
24Byte ordering in action
- Lets see how one can write a C program that
gives different results on computers that use
different byte orderings - To expose different behaviors one need to
interpret the bytes of a word as individual bytes - Therefore, one can cast an integer to a
4-character string as in the code below - include ltstdio.hgt
- int main()
- struct
- int x
- char y
- data
- data.x 0x61626364
- data.y '\0'
- printf("--gt 's'\n",(char)data)
x
y
25Byte ordering in action
- On a Little Endian machine
- include ltstdio.hgt
- int main()
- struct
- int x
- char y
- data
- data.x 0x61626364
- data.y '\0'
- printf("--gt 's'\n",(char)data)
-
d
c
b
a
\0
26Byte ordering in action
- On a Big Endian machine
- include ltstdio.hgt
- int main()
- struct
- int x
- char y
- data
- data.x 0x61626364
- data.y '\0'
- printf("--gt 's'\n",(char)data)
-
a
b
c
d
\0
27Byte ordering
- Little Endian doesnt match normal ordering of
characters in strings - Therefore, many people think that Little Endian
is backwards - But note that its the way we interpret bits in a
byte!! - Differences in Endianness causes problem when
exchanging data among computers - IP defines network byte order as Big Endian
- UNIX provides the htons() and ntohl() functions
to enable easy conversions
28Exercise 2.3
- The value represented by the hexadecimal number
434F 4D50 5554 4552 is to be stored in an aligned
64-bit double word - Question a Using the physical arrangement of the
first row in Figure 2.5, write the value to be
stored using Big Endian Order
0 1 2 3 4 5 6 7
29Exercise 2.3. Question a
- Answer
- Next, interpret each byte as an ASCII character
and below each byte write the corresponding
character, forming the character string as it
would be stored in Big Endian order
43
4F
4D
50
55
54
45
52
ASCII codes in decimal
30Exercise 2.3. Question b
- Answer
- Question b Same question but using Little Endian
byte order
43
4F
4D
50
55
54
45
52
67
79
77
80
85
84
69
82
C
O
M
P
U
T
E
R
52
45
54
55
50
4D
4F
43
31Exercise 2.3. Question b
- Answer
- Question b Same question but using Little Endian
byte order
67
79
77
80
85
84
69
82
C
O
M
P
U
T
E
R
52
45
54
55
50
4D
4F
43
52
45
54
55
50
4D
4F
43
82
69
84
85
80
77
79
67
R
E
T
U
P
M
O
C
32Exercise 2.3. Question c
- Question c What are the hexadecimal values of
all misaligned 2-byte words that can be read from
the given 64-bit double word when stored in Big
Endian byte order?
43
4F
4D
50
55
54
45
52
33Exercise 2.3. Question c
- Question c What are the hexadecimal values of
all misaligned 2-byte words that can be read from
the given 64-bit double word when stored in Big
Endian byte order?
43
4F
4D
50
55
54
45
52
4F4D
5055
5445
34Exercise 2.3. Question d
- Question d What are the hexadecimal values of
all misaligned 4-byte words that can be read from
the given 64-bit double word when stored in
Little Endian byte order?
52
45
54
55
50
4D
4F
43
35Exercise 2.3. Question d
- Question c What are the hexadecimal values of
all misaligned 2-byte words that can be read from
the given 64-bit double word when stored in
Little Endian byte order?
5455504D
52
45
54
55
50
4D
4F
43
45545550
55504D4F
36Exercise 2.3
- This exercise is actually not very well
formulated for the Little Endian scheme - The way in which the quantities are written and
read matters, and the exercise does not say which
way is used. - In this class we will just consider C-code
fragment, so that everything is crystal clear. - Remember the C program I showed earlier
- include ltstdio.hgt
- int main()
- struct
- int x
- char y
- data
- data.x 0x41424344
- data.y '\0'
- printf("--gt 's'\n",(char)data)
Big Endian abcd Little Endian dcba
37Exercise 2.3
- What happens if the same constant is stored as
two short integers (i.e., 2-byte integers)? - include ltstdio.hgt
- int main()
- struct
- short x1
- short x2
- char y
- data
- data.x1 0x4142
- data.x2 0x4344
- data.y '\0'
- printf("--gt 's'\n",(char)data)
-
- Little Endian badc !
38Exercise 2.3
- Whats happening here?
- Bytes are ordered within the object that was
stored from the program - Consider these two code fragments
- int a
- ((short )a)0x4142
- ((short )a1) 0x4344
- and
- int a
- a 0x41424344
- On a Big Endian machine they are equivalent
- One a Little Endian machine they are not
39Exercise 2.3
- Important The endianness also matters when data
is being read into object larger than bytes - Example
- struct
- char x
- char y
- data
- data.x 0x04
- data.y 0x01
- printf(--gt d\n,((short)data))
- On a Big Endian machine
- 0x0401 (decimal1025)
- On a Little Endian machine
- 0x0104 (decimal 260)
0A
01
40In-class Exercise
- Consider the following fragment of code
- include ltstdio.hgt
- int main()
- struct
- short x1
- int x2
- char y
- data
- data.x1 0x4142
- data.x2 0x43444546
- data.y '\0'
- printf("--gt 's'\n",(char)data)
-
- The output in Big Endian mode is abcdef, whats
the output in Little Endian mode?
41Answer
- include ltstdio.hgt
- int main()
- struct
- short x1
- int x2
- char y
- data
- data.x1 0x4142
- data.x2 0x43444546
- data.y '\0'
- printf("--gt 's'\n",(char)data)
-
y
x2
x1
42Answer
- include ltstdio.hgt
- int main()
- struct
- short x1
- int x2
- char y
- data
- data.x1 0x4142
- data.x2 0x43444546
- data.y '\0'
- printf("--gt 's'\n",(char)data)
-
42
41
y
x2
x1
43Answer
- include ltstdio.hgt
- int main()
- struct
- short x1
- int x2
- char y
- data
- data.x1 0x4142
- data.x2 0x43444546
- data.y '\0'
- printf("--gt 's'\n",(char)data)
-
42
41
46
45
44
43
y
x2
x1
44Answer
- include ltstdio.hgt
- int main()
- struct
- short x1
- int x2
- char y
- data
- data.x1 0x4142
- data.x2 0x43444546
- data.y '\0'
- printf("--gt 's'\n",(char)data)
-
42
41
46
45
44
43
0
y
x2
x1
45Answer
- include ltstdio.hgt
- int main()
- struct
- short x1
- int x2
- char y
- data
- data.x1 0x4142
- data.x2 0x43444546
- data.y '\0'
- printf("--gt 's'\n",(char)data)
-
42
41
46
45
44
43
0
b
a
f
e
d
c
0
46Addressing modes
- Question how are addresses specified in the
instructions of the ISA? - Example
- Consider 2-operand ALUs
- ADD R1, (R2)
- take the value of register R2
- interpret it as an address
- fetch the memory cell at that address
- add it to the value of register R1
- store the result into register R1
- This addressing mode is called Register
indirect - It happens to be only one of MANY addressing
modes that have been used in real-world computers - We will see a few of them and their associated
C-like pseudo-code - Notation
- Mem an array of memory cells
- Regs an array of registers
47Addressing Modes
48Addressing Modes
- Even more complex addressing modes
49Addressing Modes
- Some ISAs include all of these addressing modes
VAX - Some ISAs include only a few MIPS
- What is the trade-offs?
- The more addressing modes the lower the IC
- The programmer/compiler can use a single
instruction to do several things at once - The more addressing modes the more complex the
hardware - The cost of the system increases
- The more addressing modes the higher the CPI
- Some instructions will take many more clock
cycles than others - We will see that nowadays this is considered a
bad thing - (Note that VAXes are extinct)
- Goal Find the right balance
- Simple hardware
- Enough addressing modes so that writing code
isnt too painful
50VAX Addressing Modes
Figure 2.7
- Lesson Only a few addressing modes are heavily
used - Immediate
- Displacement
51Displacement range
- Add R4, 100(R1) // add R4 and Mem100R1
- Question How big can the displacement be?
- Trade-off
- the bigger the maximum displacement, the wider
the use of the instruction - the smaller the maximum displacement, the shorter
the instruction encoding - instructions are encoded as bit strings
- the displacement will be encoded as a binary
number - the larger the maximum displacement, the larger
the number of bits
Figure 2.8
52Immediates
- Add R4, 3 // add R4 and the number 3
- Question How big can the immediate number be?
- Trade-off
- the bigger the maximum number, the wider the use
of the instruction - the smaller the maximum number, the shorter the
instruction encoding - instructions are encoded as bit strings
- the maximum number will be encoded as a binary
number - the larger the maximum number, the larger the
number of bits
Figure 2.10
53Type and Size of Operands
- Question How is the type of an operand
designated? - Typically types are encoded as part of the
instruction code - The type generally gives the size of an operand
- characters/bytes 8bits
- half words / Unicode characters 16bits
- words/single precision floats 32bits
- double words / double precision floats 64bits
- SPEC benchmark
- Figure 2.12
- indicates that it is important to support
efficient access to 64-bit objects - 64-bit access path? (1 cycle)
- 32-bit access path? (2 cycles)
54X-bit architecture?
- What do we mean when we say a X-bit architecture
- A 32-bit architecture?
- A 64-bit architecture?
- Unfortunately, there is no clear definition
- Can be used to describe an architecture in which
integers, memory addresses (and sometimes other
data units) are at most encoded with X bits - On a 32-bit architecture
- 8-, 16-, and 32-bit integers
- 32- and 64-bit floating point numbers
- On a 64-bit architecture
- 8-, 16, 32-, and 64-bit integers
- 32- and 64-bit floating point numbers
- Can be used to describe a CPU (or ALU) that uses
X-bit registers - Can be used to describe a computer that has an
X-bit wide external data bus
55X-bit Architecture
- In general when one says an X-bit architecture
one refers to a system that can deal with X-bit
chunks of integer data (integers and addresses)
internally (registers) and externally (data
busses) - Although a CPU may be X-bit internally, its
external data bus or address bus may have a
different size, either larger or smaller, and the
term is often used to describe the size of these
buses as well
Intel-Compatible Processors
56Storing Integers
- True binary representation
- 1101 means 13
- Problem with true binary representation
- No way to encode negative numbers
- Ones complement representation
- The Most Significant Bit (MSB), i.e., the
leftmost bit, indicates the sign of the number - 0 means positive
- 1 means negative
- If the bit string starts with 0, then just
interpret the string as true binary
representation - If the bit string starts with 1, then takes its
complement (change all 0s to 1s and all 1s to 0s)
and interpret the string as true binary
representation - Example
- 1101 means -2
- 0010 means 2
- It is symmetric as many positive numbers as
negative numbers
57Ones Complement
- Problem with ones complement
- There are 2 ways to represent zero (0000 and
1111) - 0110 3
- 1001 -3
- 1111 -0
- A machine using ones complement must recognize
that the two zeros are identical - Option 1 deal with the two representations
- All instructions that deal with zero must be
augmented with two versions - Way too complicated and costly
- Option 2 transform 1111 into 0000
- This is done with a AND gate over all bits and an
inversion of all bits when a string with all 1s
is computed - Possible but slows down arithmetic operations
(addition) - Computers havent used ones complement in
decades (early Cray computers)
58Twos Complement
- Goal avoid the two zeros problem
- Principle
- Add one to the complement of a number to make it
negative - Example
- 01001 means 9
- 10111 means -9
- If we try to compute the negative of 00000
- Take the complement 11111
- Add 1 100000 (6-bit)
- With overflow 00000
- Addition
- 0110 3
- 1010 -3
- (1) 0000 -0 (overflow)
- Minor problem there is one more negative number
than there are positive numbers - All computers today use twos complement
59Storing Floating Point Numbers
- IEEE Standard for Floating Point Arithmetic IEEE
754 - General Layout
exponent (e bits)
sign bit
mantissa (f bits)
- Single precision (32-bit) exp8 bits,
mantissa23 bits - Double precision (64-bit) exp11 bits,
mantissa52 bits - Sign bit
- 0positive, 1negative
- Exponent
- base2
- both positive and negative exponent
- Number /- mantissa x 2exponent
60ISA Operations
- What fundamental operations are provided by ISA?
- There is a generally accepted taxonomy of
operation types (Figure 2.15)
universal
varies
optional
61Most popular instructions
- Question Which instructions end up being used
the most? - SPEC Benchmark on 80x86
- (Figure 2.16)
- Observation 10 simple instructions account for
96 of all instructions - Lesson One should make sure that they go fast
because they are the common case - Lesson2 It is dubious that its worth
implementing many other sophisticated functions - Motivation to move from CISC to RISC (more on
this later)
62Control Flow
- Control Flow instructions instructions that
allow the program counter (PC) to jump to
another instruction than the next instructions - The PC contains the address of the instruction
being executed - By default, PCPC1 after executing an
instruction - Terminology
- 1950s transfer
- 1960s branch
- conditional branches (if test then goto label)
- unconditional branches (goto label)
- The book
- jump when unconditional
- branch when conditional
- Four types of control flow instructions
- Conditional branches
- Jumps
- Procedure calls
- Procedure returns
63Control Flow
- Conditional branches dominates other control flow
instructions - Figure 2.19 (SPEC benchmark)
64Control Flow and Addressing
- Question How does one specify the target
address, i.e., the address of the instruction
that needs to be executed next? - The target address must be encoded in the
instruction - As bits in the instruction encoding
- One exception returning from a procedure call
- Reason the return address from a procedure call
is not known at compile time - known at compile time The compiler can just
look at the code and figure it out - example size of array x int x3
- known only at runtime The compiler cannot just
look at the code and figure it out. - Instead, one must wait until the program runs to
figure it out. - But, the compiler can generate code to figure it
out at runtime. - example size of array x int x x
(int)calloc(1,nsizeof(int)) - Question Why dont we know the returning address
at compile-time?
65Procedure returns
- Reason why procedure return addresses are not
known at compile-time
int main() int answer sscanf(d,answer)
if (d gt 0) foo1() else foo2()
void foo1() ... f() ...
void f() printf(hello\n) return
void foo2() ... f() ...
66Control Flow and Addressing
- Most common way to specify the address of a
change in control flow relatively to the Program
Counter (PC) - Called PC-relative
- Advantages
- The target is often near the control flow
instruction, and the PC offset (2 in the example)
can be encoded with few bits - The code can be loaded at any address and run
correctly - position independence
... LOAD R1,0(R2) ADD R4, R1 JMP to PC2 ADD R3,
R6 LOAD R3, 0(R5) ...
PC
67Non-PC-relative addressing
- When the target address is not known at compile
time, the PC-relative addressing is not
applicable - Example procedure call return
- Whats needed is a way to specify the target
address at runtime - e.g., could be done by putting the target address
in a register and then encoding the register in
the control flow instructions encoding - Other examples of uses in which the target
address is not known at compile-time - switch statements
- virtual methods
- function pointers
- dynamically linked libraries
68PC-relative branches
- Question How far off are PC-relative branches
from the PC? - Important to know to determine how many bits are
needed to encode the offset in the instruction - Figure 2.20
- Message gt8 bits is probably not necessary
69Conditional Branches
- Question How does one specify the condition on
which branching occurs? - Three options (Figure 2.21)
- Condition code
- The ALU sets special bits based on the last
operation it has executed - Drawback these bits can be overwritten
- Condition register
- A register contains the result of a comparison
- Drawback uses up a register
- Compare and branch
- Do a comparison between a register and another
register or a constant, and depending on the
result branch or not - Drawback a lot of work for the instruction
- Many comparisons are with 0, and many ISAs
provide special operations for testing/comparing
to 0.
70Conditional Branches
71Procedure Invocations
- What happens when a procedure is called?
- Caller procedure placing the call
- Callee procedure being called
- Sequence of actions
- The context of the caller is saved
- return address (sometimes to a special register)
- registers
- The callee is executed in a new context
- fresh registers
- The callee returns
- to the address saved earlier
- The callers context is restored
- old register values are copied back into registers
72Procedure Invocations
- How does all this happen?
- Answer the compiler just generates code (i.e.,
assembly code) that deals with all the logistics - Generates loads and stores so that the contexts
are managed correctly - Example
- reserve a zone of memory for register saving
- every time a procedure is called, save ALL
register values into that zone - every time a procedure returns, load ALL register
values from that zone - create a stack of such zones for subsequent
procedure calls - This approach is naive and inefficient
- Compilers go to great lengths to eliminate
unneeded loads and stores during procedure calls - When you write the compiler you have tons of
options for the assembly code you generate
73Encoding an ISA
- We have mention the notion of binary encoding of
instructions of the ISA - Look at the specification of the ISA
- For each possible instruction come up with a
binary string that represents it (called the
opcode) - Represent operands via binary strings as well
- Then build the decoding/execution of the
instructions in hardware (with gates and other
ICS331 things) - A key concern how to encode addressing modes?
- Some (older) ISAs have instructions with many
operands and many addressing modes - In this case, one encodes an address specifier
for each operand - Just come up with a binary representation of the
addressing modes in Figure 2.6 - Some (more recent) ISAs are so-called
load-store architectures, with only one memory
operand and only one or two addressing modes - everything can be encoded easily as part of the
instruction
74Encoding an ISA
- Three important competing forces
- 1 The desire to have as many registers and
addressing modes as possible, which makes
instruction encode longer - 2The desire to reduce the average instruction
code size, and thus on average program size - 3 The desire to have instructions that are
easy/fast to decode - in multiple of bytes, not arbitrary bit lengths
- fixed-length instructions are the easiest
although they may preclude some optimizations - Lets look at how some popular ISAs encode their
instructions - Variable format (force 1 and 2)
- Fixed format (force 3)
- Hybrid format (an attempt at a good balance)
75ISA encoding
Figure 2.23
76ISA Encoding
- Variable (vs. Fixed)
- Since the instruction format is variable,
instruction codes can just be as big as needed - Much less extra padding just to comply to a fixed
format when in fact not all bits are used - All instructions look different and decoding can
be more time consuming - Question What do real computers do?
77CISC vs. RISC
- CISC Complex Instruction Set Computer
- Each instruction is complex, meaning that it
can do many things at once - e.g., load something from memory, perform an
arithmetic operation, and store something to
memory, all in the same instruction - The idea was to bridge the gap between high-level
programming languages and the machine - Assembly code was close(r) to a high-level
programming language - Many machines were designed in the pre-compiler
age - RISC Reduced Instruction Set Computer
- Came after CISC systems, and motivated by several
observations - Most programs used only a small fraction of the
instructions in the ISAs, leaving the most
complex ones out - The most complex ones were thus slower because of
the make the common case fast principle
employed by the designers! - Complex instructions were difficult to decode
- Complex instructions took many clock cycles to
execute - Registers became cheaper and using them to hold
many temporary values was becoming conceivable - Compilers were better at using simpler
instructions - The idea of pipelining became prevalent (see
upcoming lecture)
78CISC vs. RISC
- Fallacy Reduced Instruction Set doesnt mean
that there are fewer instructions in a RISC ISA
than in a CISC ISA - It just means that the instructions are all
simple - The key philosophy of RISC, which differs from
CISC is - Do operations in registers
- which are all identical and not scarce
- Use load and store to communicate between
registers and memory - using simple addressing modes
- Code is implemented as a (much longer) series of
these simple operations - Therefore, many people prefer the term
load-store architecture - Note that the label CISC was given to systems
after the fact, so as to differentiate them from
RISC - By the late 1980s RISCs were outperforming most
CISCs - Transistors saved by implementing simpler
instructions can be used for other things - Rule of Thumb At the same number of transistors,
a RISC system is faster than a CISC system - Note that RISC is not a good idea when code-size
is an issue - Almost never the case for desktop and servers
- Definitely an issue for current embedded systems
79Back to ISA Encoding
- CISC machines use the variable encoding
- RISC machines use the fixed format
80The x86 ISA
- Fallacy Today, most computers are RISC
- The x86 ISA is the most widely use ISA today
- for instance, all Pentium processors
- The x86 is a CISC architecture
- x86 early history
- 1978 Intel creates the 8086 processor (16-bit)
- Sort of like an accumulator machine because
registers were not general purpose - 1980 Intel 8087
- Pretty much a stack architecture
- 1982 80286 (24-bit)
- New instructions
- backward compatible (in a special mode) with the
8086 - 1985 80386 (32-bit)
- New addressing modes and instructions
- backward compatible (in a special mode) with the
8086 - The concern for backward compatibility due to an
existing software base kept each step
incremental without true architectural changes
81The x86 ISA
- x86 more recent history
- 1989 80486, 1992 Pentium, 1995 P6
- aimed at higher performance
- only tiny additions to the ISA
- 1997 MMX Extension
- 57 new instructions to perform operations on
narrow data types (8-bit, 16-bit, 32-bit) in
parallel - used for multi-media
- we will talk more about this
- 1999 SEE Extension (in the Pentium III)
- 70 new instructions
- 4 32-bit floating-point operations in parallel
- new cache prefetch instructions
- 2001 SSE2 Extension
- 144 new instructions
- Basically MMX and SSE instructions but for 64-bit
floating-point numbers
82The x86 ISA
- Bottom line
- The x86 ISA is not orthogonal
- i.e., there are many special cases and exceptions
to rules - Mastering ways in which to determine which
registers and which addressing modes are
available for a given task is difficult - The ugliness of it all is that is stems from
the antiquated 8086 processor, with pieces glued
on - How come it was successful?
- Intel had 16-bit processors before everybody else
- Much more elegant architectures got there a bit
later (Motorola 6800) - BUT, this head start led to the selection of the
8086 for the IBM PC - Similar phenomenon with FORTRAN, for instance
- How come its still successful?
- The x86 ISA is not too difficult to implement
- After all, Intel improved performance steadily
over the years - easy for integer programs
- more problematic for floating point programs
- But what about this RISC idea?
- Current x86 processor decode x86 instructions
into smaller instructions (called micro-ops)
which are then executed by the RISC-like
architecture - Goal performance, but still expose the
familiar ISA to the programmer - See Appendix D for more details on x86
83IA64
- New ISA developed by Intel and HP
- Not to mistake with the IA32, which is basically
a 32-bit x86 ISA - Not much in common with the x86 ISA
- Use in the Intel Itanium processor family
- Places its main emphasis on Instruction Level
Parallelism (ILP) - See upcoming lecture on ILP
- Is is RISC?
- Supposed to be a post-RISC era ISA
- Called Explicit Parallel Instruction Computing
(EPIC) - But it borrows many RISC concepts
- some say calling is a new name is a gimmick
- Well talk at length about related issues in the
ILP lecture
84Some Historical Perspective
- Earliest computers Accumulator-based
- The only feasible approach at times when hardware
was incredibly bulky and expensive - Stack and register architectures fought it out
until the late 1970s, with the register
architecture winning in the end - Architecture and programming languages
- VAX complex instructions to have a better
mapping between programming language and assembly
language - In the 1980s the trend went towards RISC
architectures - memory is cheap and code size is no longer a
concern - compiler technology has improved and almost
nobody writes assembly by hand any longer - RISC was proven to be much faster and cheaper to
manufacture (Figure 2.41). - More details in Section 2.16
85In-class Exercise
86Exercise 2.4
- You task is to compare the memory efficiency of
four different styles of instruction set
architectures - Accumulator All operations occur between a
single register and a memory location - Memory-memory All instruction addresses
reference only memory locations - Stack All operations occur on top of the stack.
Push and pop are the only instructions that
access memory all others remove their operands
from the stack and replace them with the result.
Only the top two stack entries are kept near the
processor (with circuitry). Lower stack positions
are kept in memory locations, and accesses to
these stack positions require memory references - Load-Store All operations occur in registers,
and register-to-register instruction have three
register names per instruction.
87Exercise 2.4
- Make the following assumptions
- All instructions are an integral number of bytes
in length - The opcode is always 1 byte
- Memory accesses use direct addressing
- A, B, C, and D are initially in memory
- Question a Invent your own assembly language
mnemonics (Figure 2.2), and for each architecture
write the best equivalent assembly language code
for - A B C
- B A C
- D A - B
- Accumulator Memory-Memory Stack Load-Store
- Load A Add C, A, B Push A Load R1, A
- Add B Push B Load R2, B
- Store C Add Add R3, R1, R2
- Pop C Store R3, C
- First architecture Accumulator
88Exercise 2.4. Question a
- Code for the accumulator architecture
- Load B Acc ? B
- Add C Acc ?Acc C
- Store A A ? Acc
- Add C Acc ? Acc C
- Store B B ? Acc
- Negate Acc ? -Acc
- Add A Acc ? Acc A
- Store D D ? Acc
- Next memory-memory architecture
89Exercise 2.4. Question a
- Code for the memory-memory architecture
- Add A, B, C
- Add B, A, C
- Sub D, A, B
- Next stack architecture
90Exercise 2.4. Question a
- Code for the stack architecture
- Push B
- Push C
- Add
- Pop A
- Push A
- Push C
- Add
- Pop B
- Push A
- Push B
- Sub
- Pop D
- Next load-store architecture
91Exercise 2.4. Question a
- Code for the load-store architecture
- Load R1, B R1 ? B
- Load R2, C R2 ? C
- Add R3, R1, R2 R3 ? R1 R2
- Store R3, A A ? R3
- Add R4, R3, R2 R4 ? R3 R2
- Store R4, B B ? R4
- Sub R4, R3, R4 R4 ? R3 - R4
- Store R4, D D ? R4
- Lets skip Question b and go directly to Question
c
92Exercise 2.4. Question c
- Assume the given code sequence is from a small,
embedded computer application, such as a
microwave oven controller, that uses 16-bit
memory addresses and data operands. (A load-store
architecture for this system would use 16-bit
registers.) For each architecture answer - How many instruction bytes are fetched?
- How many bytes of data are transferred from/to
memory?
93Exercise 2.4. Question c
- First Accumulator architecture
- How many instruction bytes are fetched?
- How many bytes of data are transferred from/to
memory? - Load B
- Add C
- Store A
- Add C
- Store B
- Negate
- Add A
- Store D
Remember that opcodes are 8-bit and
data/addresses are 16-bit
94Exercise 2.4. Question c
- First Accumulator architecture
- How many instruction bytes are fetched?
- How many bytes of data are transferred from/to
memory? - Load B I 123 D 2
- Add C I 123 D 2
- Store A I 123 D 2
- Add C I 123 D 2
- Store B I 123 D 2
- Negate I 1 D 0
- Add A I 123 D 2
- Store D I 123 D 2
- -------------------
- I 22 D 14 36
-
95Exercise 2.4. Question c
- Second Memory-memory architecture
- How many instruction bytes are fetched?
- How many bytes of data are transferred from/to
memory? - Add A, B, C
- Add B, A, C
- Sub D, A, B
-
Remember that opcodes are 8-bit and
data/addresses are 16-bit
96Exercise 2.4. Question c
- Second Memory-memory architecture
- How many instruction bytes are fetched?
- How many bytes of data are transferred from/to
memory? - Add A, B, C I 16 D 6
- Add B, A, C I 16 D 6
- Sub D, A, B I 16 D 6
- ---------------
- I 21 D 18 39
-
97Exercise 2.4. Question c
- Third Stack architecture
- How many instruction bytes are fetched?
- How many bytes of data are transferred from/to
memory? - Push B
- Push C
- Add
- Pop A
- Push A
- Push C
- Add
- Pop B
- Push A
- Push B
- Sub
- Pop D
-
Remember that opcodes are 8-bit and
data/addresses are 16-bit Remember that the
stack is not empty when the code starts executing
and in fact has most likely more than two
elements in it.
98Exercise 2.4. Question c