Title: Instruction Set Principles
1Instruction Set Principles
- ISA should reflect application characteristics
- Desktop computing is compute-intensive, thus
focusing on features favoring Integer and FP ops - Server computing is data-intensive, focusing on
integers and char-strings (yet FP ops are still
standard in them) - Embedded computing is time-sensitive, memory and
power conciouse, thus focusing on code-density,
real-time and media data streams.
2Instruction Set Principles
- Taxonomy of ISA
- Stack both operands are implicit on the top of
the stack, a data structure in which items are
accessed an a last in, first out fashion. - Accumulator one operand is implicit in the
accumulator, a special-purpose register. - General Purpose Register all operands are
explicit in specified registers or memory
locations. Depending on where operands are
specified and stored, there are three different
ISA groups - Register-Memory one operand in register and one
in memory.Examples IBM 360/370, Intel 80x86
family, Mototola 68000 - Memory-Memory both operands are in memory.
Example VAX. - RegisterRegister (load store) all operands,
except for those in load and store instructions,
are in registers. Examples SPARC (Sun
Microsystems), MIPS, Precision Architecture (HP),
PowerPC (IBM), Alpha (DEC).
3Instruction Set Principles
C?AB
(a) Stack
(d) Reg-Reg/Load-Store
(e) Memory-Memory
(b) Accumulator
(c) Register-Memory
TOS
Reg. Set
Reg. Set
Stack
Accumulator
ALU
ALU
ALU
ALU
ALU
Memory
Memory
Memory
Memory
Memory
Push A Push B Add Pop C
Load A Add B Store C
Load R1,A Add R1,B Store R1,C
Load R1,A Load R2,B Add R3,R1,R2 Store R3,C
Add C,A,B or Add A,B
4Instruction Set Principles
ISA Type Advantages Disadvantages
Register-register (0,3) ( of mem addr, Max of opnds) Simple, fixed-length instruction encoding simple code generation model instructions take similar of cycles to execute. Higher IC than ISAs with memory references in instructions. Higher instructions count and lower instruction density leads to larger programs.
Register-memory (1,2) Data can be accessed without a separate load instruction first Instruction format tends to be easy to encode and yields good code density Operands are not equivalent since a source operand in a binary operation is destroyed Encoding a register number and a memory address in each instruction may restrict the number of registers CPIi vary by operand location.
Memory-memory (2,2) or (3,3) Most compact Does not waste registers for temporaries. Large variation in instruction size, especially for three-operand instructions. In addition, large variation in CPIi Memory accesses create memory bottleneck (no longer used today).
5Instruction Set Principles
- Addressing Memory how to specify and interpret
memory address is important since all data are
initially in the memory. - Interpreting Memory Addresses
- All computers, except DSPs, are byte-addressed,
providing access for bytes, half-words (2 bytes),
words (4 bytes), and double words (8 bytes) - Ordering bytes within a larger object 8 bytes in
a double word - Little Endian
- Big Endian
- Byte ordering can be a problem when exchanging
data between computers with different ordering
conventions - Alignment of bytes an access to an object of
size s bytes at byte address A is aligned if A
mod s 0. Memory is aligned on a multiple of a
word or double-word boundary - Misalignment causes extra memory accesses and HW
costs
7 6 5 4 3 2 1
0
0 1 2 3 4 5 6
7
6Instruction Set Principles
- Addressing Modes how ISA specifies the address
of an object to be accessed (fig. 2.6-2.7) - Operands they can be found in registers, memory
locations, and instructions themselves
(instruction stream) - Effective Address specifies the actual memory
address when a memory location is used for an
operand - PC-Relative Addressing addressing modes that
depend on the program counter - Immediates/Literals considered as memory
addressing modes, even though the value they
access is in the instruction steam - Displacement Mode must determine the range of
displacement judiciously (via quantitative
studies, fig. 2.8) - Immediate/literal Mode must decide the level of
support (all or a subset ops) and the range of
values (fig. 2.9-10) - Modulo/Circular Mode for DSPs handling infinite,
continuous stream of data relies on circular
buffers - Bit-Reverse Mode used exclusively for FFTs
7Instruction Set Principles
- Type and Size of Operands encoding in opcode
designates operand types in all modern day
computers while tags were used to indicate types
in old machines - Desktop and Server architectures
- Character 8-bit, usually in ASCII
- 16-bit Unicode used in Java is gaining
popularity - Integers are almost universally represented as
twos complement binary numbers short integer
(half-word), integer (word), long integer
(double-word) - Single-precision (1-word) and double-precision
(2-word) floating point the IEEE float-point
standard, IEEE standard 754 - Architectures supporting business applications
- Packed decimal/binary-coded decimal 4 bits are
used to encode the values 0-9 and two decimal
digits are packed into each byte, for getting
results that exactly match decimal numbers (some
decimal fractions do not have exact
representation in binary) - Frequency of access to types helps determine what
types are most important to support efficiently
(fig. 2.12)
8Instruction Set Principles
- Operands for Media and Signal Processing
- Graphics applications deal with 2D and 3D images
- Vertex usually of 32-bit floating-point values,
is a data structure with four components for
representing 3D images x-coordinate,
y-coordinate, z-coordinate, w-coordinate (color
or hidden surfaces) - Pixel consists of four 8-bit channels R (red),
G (green), B (blue), and A (transparency of the
surface or pixel) - DSPs adds a unique data type
- fixed point a binary point just to the right of
the sign bit, thus representing a fraction
between 1 and 1 - Blocked floating point because the exponent
variable is often shared among many fixed-point
variables (the fixed point does not include an
exponent in every word, thus relying on DSP
programmer to keep the exponent in a separate
variable and ensure that each result is shifted
left or right to keep alignment).
9Instruction Set Principles
- Operations in the Instruction Set (fig. 2.15)
- Rule of thumb the most widely executed
instructions are the simple operations of an
instruction set (fig 2.16) - Operations for Media and Signal Processing less
precision and narrower data width due to the
tolerance of human perception - Partitioned add 4 16-bit adds performed on a
single 64-bit ALU in a single cycle (SIMD or
vector instructions, fig2.17) - Paired operations one instruction can launch two
32-bit operations on operands found side by side
on a double-precision register - Saturated arithmetic due to real-time
requirement, DSP does not allow exception
handling and must tolerate overflow by
substituting it with the largest representable
number - Multiply-accumulate (MAC) key to dot-product
operations for vector and matrix multiplies
(MACs/second is the primary peak performance
metric for DSP)
10Instruction Set Principles
- Instructions for Control Flow
- There four different types of control flow change
(fig 2.19) - Conditional branch 75 integer and 82 fp
- How to specify branch conditions? (fig 2.21-2.22)
- Jump (or unconditional branch) 6 integer and
10 fp - Procedure calls and Procedure returns 19 and 8
- Caller saving vs. callee saving
- Addressing Modes for Control Flow Instructions
- PC-relative advantageous for cases where targets
are near the branch instruction and has the
desirable property of position independence (fig
2.20) - Register indirect jumps if the target is not
known at compile time, PC cannot be used rather,
a location is used to dynamically specify the
target - Case of switch in most languages
- Virtual functions or methods in OO languages
- High-order functions or function pointers in C
or C - Dynamically shared libraries
11Instruction Set Principles
- Encoding an Instruction Set there are three
choices - Variable allows virtually all addressing modes
to be with all operations, enabling the smallest
code representation - examples VAX and Intel 80x86 (1-5 operands, each
with 10 addressing modes) - Fixed load-store ISA, with only one memory
operand and only one or two addressing modes,
thus being able to encode addressing mode as part
of the opcode - Examples Alpha, ARM, MIPS, PowerPC, SPARC,
SuperH - Largest code size
- Hybrid IBM 360/370, MIPS16, Thumb, TI TMS320C54x
(fig 2.23) - Competing forces no. size of reg addr modes,
code, pipeline
Operation and of operands
Address specifier 1
Address field 1
Address specifier n
Address field n
Address field 2
Address field 1
Address field 3
Operation
12Instruction Set Principles
- The Role of Compilers
- The Structure of Recent Compilers multi-phased
(fig. 2.24) - Difficulties compiler makes gross assumptions
about the abilities of later phases, hence
phase-ordering problem. For instance, it can not
guarantee allocations of registers where they are
most desirable. - Example global common subexpression elimination
-- replacing multiple computations of the same
variable with a single computation and a
temporary location for storing the value. If this
temporary is not allocated a register, the slow
accessing to memory may actually negate the gain
from such optimization! - Register Allocation plays a central role in
compiler optimization both in speeding up the
code and in making other optimizations useful. - graph coloring (16 general purpose registers)
for simple cases and heuristics for more
complicated cases
13Instruction Set Principles
- Impact of Optimizations on Performance
- Major types of optimizations and examples in each
class - Change in instruction count for the programs
lucas and mcf from the SPEC2000 as compiler
optimization levels vary - Level 0 unoptimized
- Level 1 local optimizations, code scheduling,
and local register allocation - Level 2 global optimizations, loop
transformation, and global register allocation
and - Level 3 procedure integration
14Instruction Set Principles
- The Impact of Compiler Technology on the
Architects Decisions - How are variables allocated and addressed?
- How many registers are needed to allocate
variables appropriately? - stack procedure calls (grows) and returns
(shrinks), activation of records most effective
with register - global data area statically declared objects --
arrays or aggregate data structure difficult, if
not impossible, to allocate registers if objects
are aliased - heap dynamic objects -- accessed through
pointers and typically non-scalar almost
impossible for register allocation due to
pointers - Because of aliasing, a compiler must be
conservative for it is impossible to know what a
pointer may refer to, or inversely, what an
object is referred to by.
15Instruction Set Principles
- How the Architect Can Help the Compiler Writer
- Guiding principle for compiler designer Make the
frequent cases fast and the rare cases correct. - Other guide lines
- Regularity orthogonality (independence among the
3 components of ISA operation, data type, and
addressing mode) helps to make decision early and
correctly - Provide primitives, not solutions support for
HLL should be in ways that's not language
dependent - Simplify trade-offs among alternatives
(optimizing objectives) help the compiler writer
understand costs of various alternatives - Provide instructions that bind the quantities
known at compile time as constants - It is better to err on the side of simplicity
less is more!!
16Instruction Set Principles
- The MIPS Architecture
- MIPS is a simple 64-bit load-store architecture.
- 32 64-bit general purpose registers
- R0, R1, R31 integer registers Value of R0 is
always 0. - 32 64-bit floating point registers
- F0, F1, F31 floating point registers
- Data types
- 8-bit bytes, 16-bit half words, 32-bit words, and
64-bit double words for integers - 32-bit single precision and 64-bit double
precision for floating point. - Addressing modes
- Register Immediate and displacement with 16-bit
field. - Byte-addressable memory, a mode bit to allow
software to select either Big Endian or Little
Endian - Instruction encoding fixed
17Instruction Set Principles
- The MIPS Instruction Format
18Instruction Set Principles
- The MIPS Operations
- Load and store instructions
19Instruction Set Principles
- The MIPS Operations
- ALU instructins
20Instruction Set Principles
- The MIPS Operations
- Control flow instructions
21Instruction Set Principles MIPS Example
22Instruction Set Principles MIPS/DLX Example
23Instruction Set Principles MIPS Example
24Instruction Set Principles MIPS Example
25Instruction Set Principles MIPS Example
26Instruction Set Principles
27Instruction Set Principles
28Instruction Set Principles
29Instruction Set Principles
30Instruction Set Principles
31Instruction Set Principles
32Instruction Set Principles
33Instruction Set Principles
34Instruction Set Principles
35Instruction Set Principles
36Instruction Set Principles
37Instruction Set Principles
38Instruction Set Principles
39Instruction Set Principles
40Instruction Set Principles
41Instruction Set Principles MIPS/DLX Example