Title: Kein Folientitel
1Computer Architecture Slide Sets WS
2011/2012 Prof. Dr. Uwe Brinkschulte Prof. Dr.
Klaus Waldschmidt
Part 7 Instruction Set Architecture (ISA)
2Programming model
The Instruction Set Architecture (ISA) is the
programming model which is needed for programming
a processor. All details concerning the
implementation of the processor are out of focus
in the ISA. Therefore the ISA can be regarded
as an abstract interface between the compiler and
the microarchitecture of the processor.
3Programming model
- The following key questions lead us to the
specification of this interface - How data is represented?
- Where data is stored?
- How data is accessed?
- How instructions are coded?
- Which instructions are available to process
data?
4Programming model
- Therefore, the ISA defines
- machine data types
- address space organisation
- register model
- addressing modes
- machine instruction set
5Programming model
Since the programming model abstracts from
implementation details it is realized either in
hardware (real processors) or in software
(virtual processors). For instance, if the
instruction set includes an instruction for
multiplication, the CPU of the processor needs a
digital combinatorial circuit for multiplication.
In this sense, a relation between the abstract
ISA and the microarchitecture exists.
6Machine data types
- A data type is a tuple of values and operations
which can be performed on these values. - The operations are implemented by the machine
instructions. - Machine data types (like data types in high level
languages) are classified into structured and
unstructured data types. - An additional class are the primitive data types.
7Primitive machine data types
- Bit value set 0,1 operations AND, OR, XOR,
negation, compare - Byte value set bit pattern (8
bit) normally smallest addressable unit
operations same as for bit, additionally
ADD, SUB, MUL, DIV, SHIFT, ROTATE, - Word value set normally a multiple of
bytes largest addressable unit (in a single
operation) operations same as for byte - (sometimes the following convention is used
Half-Word 16 Bit Word 32
Bit Double Word 64 Bit)
8Examples for more data types
n-1 i 0
n 8,16,32
- vector (bit) - BCD number (binary coded
decimal) - Binary number unsigned - two
complement number - floating point
number - string
7 0 15 0
1
0
3
2
1
0
8, 16 Bit
31 0
7
6
5
4
3
2
1
0
32 Bit
n-1 0
n 8, 16, 32
MSB LSB
n-1 0
n 8, 16, 32
MSBsign bit LSB
31 23 22 0
biased.expon.
s
fraction
n-1 0 n-1 0 n-1
0
...
n 8, 16, 32
(taken from MC680x0)
9Address space organisation
Physical organisation depends on the processor
7 0
15 8 7 0
31 24 23 16 15
8 7 0
0 n
0 n
0 n
. . .
. . .
. . .
. . .
. . .
. . .
. . .
32 bit processor
16 bit processor
8 bit processor
n physical address, n 2 address bus width
10Address space organisation
Locical organisation byte oriented access for
most processor types
7 0
physical word on a 8 bit processor
0 1 2 3 m
physical word on a 16 bit processor
physical word on a 32 bit processor
. . .
m logical address, m n bit width / 8
11Address space organisation
Physical to logical mapping
31 24 23 16 15
8 7 0
0 n
0
1
2
3
4
5
6
7
8
9
10
11
locical address
12
13
14
15
physical address
. . .
. . .
. . .
. . .
m-3
m-2
m-1
m
12Address space organisation
Aligned access the accessed word is aligned
according to its length in the physical
address space
(logical adress mod length) 0
31 24 23 16 15
8 7 0
bytes to byte boundaries
0 n
byte
byte
byte
byte
half-words to half-word boundaries
half-word
half-word
words to word boundaries
word
. . .
. . .
. . .
. . .
13Address space organisation
Unaligned (misaligned) access the accessed word
is not aligned according to its length in
the physical address space
(logical
adress mod length) ? 0
31 24 23 16 15
8 7 0
0 n
half-word
-word
word
half-
. . .
. . .
. . .
. . .
Some processors do not support unaligned access
(e.g. SPARC)
14Byte order in words
Two different formats
8 Bit - byte 16 Bit - word 32 Bit - word
N
big endian byte ordering
N 1
N
N 1
N
N 2
N 3
31 24
23 16
15 8
7 0
Word address is the address of the most
significant byte (used e.g. in MC680x0 or SPARC)
little endian byte ordering
8 Bit - byte 16 Bit - word 32 Bit - word
N
N 1
N
N 1
N
N 2
N 3
31 24
23 16
15 8
7 0
Word address is the address of the least
significant byte (used e.g. in Pentium family)
N least significant byte, N 3 most
significant byte
15Byte order in words
Locical (byte oriented) memory organization of a
32 bit word
big endian byte ordering
N 3
b b1 b2 b3
N 2
byte address
N 1
N
little endian byte ordering
N
b b1 b2 b3
N 1
byte address
N 2
N 3
16Register model
- The number of registers being part of a processor
varies between 20 and 200. The advantage of data
storage in registers against DRAM or
SRAM-memories are - faster access time
- register addresses could be shorter with respect
to the instruction format. - An ISA is called Load-Store-ISA if all machine
instructions except register load and store
instructions operate on the register file only.
17 Register model
- Registers are classified into hidden registers
and programmer visible registers. - The visible registers are the workplace of the
programmer and are often organized as register
files. - Hidden registers are supply registers needed for
the internal functionality of the processing unit
(CPU). - Both visible and hidden registers are designed
for various purpose and functionality.
18 Register model
- A register model defines which processor
registers are visible (addressable) to the
programmer. - Usually these are the working registers and the
state register. - The state register monitors the state of the
processor through conditional flags. - It shows for example whether the processor
operates in system or user mode. - The state register is mostly read-only
- Commonly existing hidden registers are the
instruction register and the memory interface
registers.
19Register implementation
D0
D1
D31
32 bit register with D-Latches
D Q
D Q
D Q
........
clk
clk
D0
Q0
D1
Q1
Q0
Q1
Q31
....
Q31
D31
Symbol
Q0
Q2
Q1
Q3
D Q Q
D Q Q
D Q Q
D Q Q
Asynchronous counter with D-Latches
clk
20 Common visible registers
- program counter (PC) - contains the next
instruction address - state register (SR) - monitors the state of the
processor - stackpointer (SP) - stores the top of the stack
- accumulator (ACCU) stores computation results
(in older or simple processors) - data registers (DXi) - storing operands for
computations - address registers (AXi) - storing operand
addresses - general purpose registers (GPi) - storing either
operands or operand addresses
21 Common hidden registers
- instruction register (IR) contains the
currently processed instruction - instruction queue (IQ) - contains the next
instructions to be processed - memory address register (MAR) - buffers the
address of a memory access (e.g. to
save or load a general purpose
register) - memory data register (MDR) - buffers the content
of a memory access (e.g. to save or load a
general purpose register)
22 Program counter register
- Pointer to the next instruction to be executed
- Normally incremented
- Set by a jump, jump subroutine, interrupt, return
or return from interrupt instruction
31
0
N - 4 N N 4 M
Add
A
B
Jump
M
23 Stackpointer register
- Addresses a location in the memory which is
organized as a stack (LIFO). - Elements can be pushed (write) and popped (read)
only from the top of the stack. - Consequence Data are stored in a subsequent
order - Used e.g. for jump subroutine/return operation on
PC
31
0
N - 4 N N 4
Push
X
Pop
Some processors distinguish between user
stackpointer (e.g. for jump subroutine/return)
and supervisor stackpointer (e.g. for
interrupt/return from interrupt)
24 Sampe CISC register set
Intel Pentium
25 Sampe RISC register set
Power PC (extract)
- The register file of RISC processors has to be
much bigger compared to CISC processors. - A RISC needs more registers, because the register
file is source and destination of all arithmetic
or logic instructions.
26Multiple register sets
27Multiple register sets
Processors with multiple register sets a step
towards multithreaded processors
- Processor with multiple register sets
- Each register set can store the program counter
(PC) and the state register (SR) - PC and SR exist only once
- gt several contexts can be stored, fast
context switching
- Multithreaded processor
- multiple PCs and SRs exist
- instructions from several threads can be
executed at the same time in the pipeline - gt several contexts can be processed
28Multiple overlapping register sets,register
windows
- The registers of a register file are grouped into
blocks called windows. - These overlapping windows are used by the
subroutines of a program. - MORS (multiple overlapping register set)
jump subroutine
Overall register set
Register window 1
Register window 2
Register window 3
Register window n
return
29Multiple overlapping register sets,register
windows
- Simplifies parameter passing on jumping to
subroutines - Each subroutine has its own working space within
the register file - Parameters can be directly passed with no need to
copy registers or pass parameters by memory - gt mainly used in RISC processors
- Two possible approaches
- Fixed size register window
- Variable size register window
30Fixed size window
preceding window
local register
alternative register naming r31 i7 r24
i0 r23 I7 r16 I0 r15 o7 r8 o0 r7
g7 r0 g0
r31 r24
save
In i1
continued
r23 r16
current window
Local i1
r15 r8
r31
Out i1
In i
succeeding window
Local i
r31
Out i
In i-1
CWP
r8
Local i-1
r7 r0
global registers
restore
r8
Out i-1
continued
based on SPARC architecture
31Fixed size window
- In case of the SPARC architecture, a window
consists of 32 registers of which the first 8
also belong to the preceding window and the last
8 also belong to the succeeding window. - The registers are addressed relative to the
current window pointer (CWP). - A subroutine call is performed by incrementing
the CWP and saving the PC. - The parameters are passed through the overlapping
registers of the two windows. - The content of the program counter is saved
(return address) into one of these registers. - A time consuming save and reload of registers is
omitted. - In case of an overflow of the MORS the window
contents have to be saved to a stack.
32Variable size window
global registers
local registers
preceding
0
0
previous RSP
r0 r1
gr0
r66
gr1
Local Out
current window
?
current RSP
?
r0 r1
In Local Out
gr63
63
?
?
r64
based on AMD 29000 architecture
r65
127
33Register size of processors with 3-address
architecture
processor/architecture (vendor) of general purpose registers of general purpose registers of general purpose registers bit width bit width bit width
processor/architecture (vendor) overall directly accessible register width register address immediate operands instr.
Alpha 21364 (Compaq) 32 32 64 Bit 5 Bit 8 Bit 32 Bit
Am29000 (AMD) 192 192 32 Bit 8 Bit 8 Bit 32 Bit
ARM7TDMI (ARM) 16 16 32 Bit 4 Bit 8 Bit 32 Bit
Crusoe TM5800 (Transmeta) 64 64 32 Bit 6 Bit - -
pa-8700 (HP) 32 32 64 Bit 5 Bit 11 Bit 32 Bit
Itanium 2 (Intel, HP) 128 128 64 Bit 7 Bit 8 Bit 41 Bit
MC88100 (Motorola) 32 32 32 Bit 5 Bit 16 Bit 32 Bit
MIPS65 20Kc (MIPS) 32 32 64 Bit 5 Bit 16 Bit 32 Bit
Nemesis C (TU Berlin) 96 16 32 Bit 4 Bit 1 Bit 16 Bit
PowerPC 970 (IBM) 32 32 64 Bit 5 Bit 16 Bit 32 Bit
UltraSPARC III Cu (SUN) 160 32 64 Bit 5 Bit 13 Bit 32 Bit
34Register size of processors with 2-address
architecture
processor (vendor) of general purpose registers of general purpose registers of general purpose registers bit width bit width bit width
processor (vendor) overall directly accessible register width register address immediate operands smallest instr.
Athlon (AMD X86-64) 16 16 64 Bit 4 Bit 8 - 32 Bit 8 Bit
ColdFire MFC5206 (Motorola) 8 8 8 8 32 Bit 3 Bit 8 - 32 Bit 16 Bit
MC680xx (Motorola) 8 8 8 8 32 Bit 3 Bit 8 - 32 Bit 16 Bit
Pentium X (Intel X86) 8 8 32 Bit 3 Bit 8 - 32 Bit 8 Bit
35Addressing modes
- Machine instructions normally hold information
about the operand addresses. - This can either be a physical address, e.g. a
register number or the address of a memory
location, or it can be an address specification. - An address specification defines how to calculate
the address. - Thus, the address information determines the
location of the operand(s) belonging to the
instruction using one of many addressing modes.
36Addressing modes
- Instruction format
- e.g. arithmetic instruction
operands needed for the execution defined by the
opcode
target source source
opcode
operand register memory address
specification itself number location
(dynamic address calculation) The result of
the dynamic address calculation is called
effective address
37 Addressing modes
- immediate The operand is part of the
instruction. - memory direct and register direct The
instruction contains the operand address. - register indirect The instruction contains
a register number pointing to a register
holding the address of the operand. In
assembler code this addressing mode is typically
denoted by register name
38 Addressing modes
- memory indirect A register addressed in
the instruction contains the address of a
memory cell which holds the operand address. - register offset The instruction contains a
register number and an offset. The operand
address is the sum of the registers content and
the offset. - implicit The instruction implicitly targets
a single register (like the ACCU)
39 Effective address
The address is calculated from several parts
found in the instruction and in registers or
memory cells at runtime (dynamic address
calculation). The calculated address is defined
as effective address.
- Reasons for using dynamic address calculation
- Addresses of data structure elements are
composed of the first address of the data
structure and the offset of the element to the
beginning. Often this offset is unknown at
compile time, therefore the effective address
has to be calculated at runtime. - Repeated execution of the same instruction,
e.g. in a loop, often accesses successive memory
addresses which have to be calculated at
runtime.
40 Effective address (cont.)
- An operand address often is unknown at compile
time, because it is calculated during program
execution. - The partitioning of addresses into a base
address stored in a register and an offset
simplifies the handling of shift able
variables and shift able program code.
41Addressing modes 1
instruction
immediate
operand
e.g. LOAD 8, r1
instruction
memory direct
eff. address
e.g. LOAD (2000), r1
memory
o p e r a n d
instruction
register direct
eff. address
e.g. LOAD r2, r1
register
o p e r a n d
42Addressing modes 2
instruction
register address
register indirect
register
e.g. LOAD (r2), r1
e f f e c t i v e a d d r e s s
memory
o p e r a n d
instruction
register indirect with predecrement
register address
register
e.g. LOAD -(r2), r1
m e m o r y a d d r e s s
-
decrement
eff. address
memory
o p e r a n d
43Addressing modes 3
instruction
register address
register
m e m o r y a d d r e s s
register indirect with displacement (indexed)
displacement
register
i n d e x
eff. address
e.g. LOAD.B 126(r3)(r2), r1
scaling 1, 2 or 4
memory
o p e r a n d
44Addressing modes 4
instruction
register address
register
m e m o r y a d d r e s s
memory indirect
displacement1
memory
indirect memory address
e.g. LOAD 28(126(r2)), r1
displacement2
eff. address
memory
o p e r a n d
45Addressing modes 5
instruction
register address
register
m e m o r y a d d r e s s
memory indirect (post indexed)
displacement1
memory
indirect memory address
e.g. LOAD.B 28(r3)(126(r2)), r1
displacement2
register
eff. address
i n d e x
memory
scaling 1, 2 or 4
o p e r a n d
46Addressing modes 6
instruction
register address
register
m e m o r y a d d r e s s
memory indirect (preindexed)
displacement1
register
i n d e x
memory
scaling 1, 2 or 4
indirect memory address
e.g. LOAD.B 28(126(r3)(r2)), r1
eff. address
displacement2
memory
o p e r a n d
47Access to branch target table by PC relative
addressing
memory
JMP disp (PC)(rn)
(PC)
branch target table access through program
counter relative addressing
displacement
target 0
target 1
i n d e x
target 2
48Machine instruction set
- The machine instruction set of a computer
normally includes instructions of different
formats, e.g. 0-address instructions, 1-address
instructions, 2-address instructions and
3-address instructions. - An instruction is divided into so called fields.
- The more address fields an instruction contains
the smaller the number of addressable memory
cells and/or the number of operations encoded in
the opcode field becomes (if we assume a
constant instruction length).
49Variable length vs. constant length instruction
format
- Variable length (e.g. 16 - 256 Bit) mostly used
in CISC architectures - flexible instruction format
- high code density
- long immediate and displacement values
- Constant length (e.g. 32 Bit) mostly used in
RISC architectures - simple and fast fetch
- simple and fast decode
- simplified pipelining
50Scheme of basic operations of common processors
basic operations
conditional operations
unconditional operations
combinatorial operations
control flow operations
simple branches
transport operations
arithmetic logic operations
system branches
subroutine branches
load operations
arithmetic operations
store operations
logic and shift operations
call
call
return
return
semaphore operations
state and control operations
51 Instruction classes
- Instruction sets are divided into groups
combining instructions with similar
functionality - Typical instruction groups
- transport instructions
- arithmetic instructions
- logic instructions
- shift and rotate instructions
- bitwise instructions
- string and array instructions
- branch instructions
- system instructions
- synchronization instructions
52Load store architecture
- All instructions - except load and store
instructions - address registers only. - Load and store instructions are needed to
transfer data to and from main memory. - Mainly used in RISC ISA, combined with pipelining
it allows to complete most instructions in one
cycle - Furthermore, the address fields of instructions
becomes shorter as they only have to address a
register instead of a memory address. - A load store ISA accelerates a machine if there
are only small caches or if the caches are
completely missing and a big register file is
available.
53Two examples for an instruction format
Example An arithmetic instruction SUBc r3, r7,
r21 binary code 11010 10101 00111 00011 1
0000000000 hexcode D54E3800
Example A store instruction STORE r24,
126(r5) binary code 00111 11000 00101
00000000001111110 hexcode 3E0A007E
31 26 21 16 11 0
31 26 21 16
0
OP
TR
SR1
SR2
OP
SR
BR
c x
DP
instruction format OP opcode TR target
register SRn source register c/x set/do not set
condition code
instruction format OP opcode SR source
register BR base register DP displacement
(signed)
54State register of a RISC processor (based on
SPARC-architecture)
31
0
16 15
I E
P S
SR
N Z V C
IM
CWP
S
supervisor/user
carry
interrupt mask
previous S-bit
overflow
interrupt enable
conditional bits
zero
current window pointer
negative
55Conditional codes dependent on conditional bits Z
(zero),N (negative), C (carry) und V (overflow).
Mnemonics according toMotorolas ColdFire MFC5206
processor.
conditional value mnemonic operation expression operand type
equal not equal eq ne ? Z ? Z independent
higher than higher than or same lower than lower than or same ht hs lo ls gt lt ? C ? ? Z ? C C C ? Z unsigned
greater than greater than or equal less than less than or equal gt ge lt le gt lt (N V) ? ? Z (N V) (N ? V) (N ? V) ? Z signed
arithmetic overflow arithmetic shortfall negative positive vs vc ne pl V ? V N ? N signed
56Multimedia instructions
- Typical SIMD instructions to process a single
operation on a set of data (e.g. changing the
brightness of image pixels) - Operations can be on packed integers (e.g. MMX on
Pentium) or packed floats (e.g. SSE2 on Pentium) - Typical operations arithmetic (saturated or
overflow), logic, compare, pack, unpack - Example