Instruction Set Principles and Examples

About This Presentation

Title:

Instruction Set Principles and Examples

Description:

By a binary format since the hardware understands only bits. Concatenate together binary encoding for instructions, registers, constants, memories ... – PowerPoint PPT presentation

Number of Views:83

Avg rating:3.0/5.0

Slides: 78

Provided by: ccNct

Category:

more less

Transcript and Presenter's Notes

Title: Instruction Set Principles and Examples

1
Instruction Set Principles and Examples
2
Outline

Introduction
Classifying instruction set architectures
Instruction set measurements
Memory addressing
Addressing modes for signal processing
Type and size of operands
Operations in the instruction set
Operations for media and signal processing
Instructions for control flow
Encoding an instruction set
Role of compilers
MIPS architecture

3
Brief Introduction to ISA

Instruction Set Architecture a set of
instructions
Each instruction is directly executed by the
CPUs hardware
How is it represented?
By a binary format since the hardware understands
only bits
Concatenate together binary encoding for
instructions, registers, constants, memories
Typical physical blobs are bits, bytes, words,
n-words
Word size is typically 16, 32, 64 bits today
Options - fixed or variable length formats
Fixed - each instruction encoded in same size
field (typically 1 word)
Variable half-word, whole-word, multiple word
instructions are possible

4
Example of Program Execution

Command
1 Load AC from Memory
2 Store AC to memory
5 Add to AC from memory
Add the contents of memory 940 to the content of
memory 941 and stores the result at 941

Fetch
Execution
5
A Note on Measurements

Were taking the quantitative approach
BUT measurements will vary
Due to application selection or application mix
Due to the particular compiler being used
Also dependent on compiler optimization selection
And the target ISA
Hence the measurements well talk about
Are useful to understand the method
Are a typical yet small sample derived from
benchmark codes
To do it for real
You would want lots of real applications
Plus - your compiler and ISA

6
Classifying Instruction Set Architecture
7
Instruction Set Design
The instruction set influences everything
8
Instruction Characteristics

Usually a simple operation
Which operation is identified by the op-code
field
But operations require operands - 0, 1, or 2
To identify where they are, they must be
addressed
Address is to some piece of storage
Typical storage possibilities are main memory,
registers, or a stack
2 options explicit or implicit addressing
Implicit - the op-code implies the address of the
operands
ADD on a stack machine - pops the top 2 elements
of the stack, then pushes the result
HP calculators work this way
Explicit - the address is specified in some field
of the instruction
Note the potential for 3 addresses - 2 operands
the destination

9
Classifying Instruction Set Architectures
Based on CPU internal storage optionsAND of
operands
These choices critically affect - instructions,
CPI, and cycle time
10
Operand Locations for Four ISA Classes
11
CAB

Stack
Push A
Push B
Add
Pop the top-2 values of the stack (A, B) and push
the result value into the stack
Pop C
Accumulator (AC)
Load A
Add B
Add AC (A) with B and store the result into AC
Store C

Register (register-memory)
Load R1, A
Add R3, R1, B
Store R3, C
Register (load-store)
Load R1, A
Load R2, B
Add R3, R1, R2
Store R3, C

12
Pros and Cons of Stack, Accumulator, Register
Machine
13
Modern Choice Load-store Register (GPR)
Architecture

Reasons for choosing GPR (general-purpose
registers) architecture
Registers (stacks and accumulators) are faster
than memory
Registers are easier and more effective for a
compiler to use
(AB) (CD) (EF)
May be evaluated in any order (for pipelining
concerns or )
But on a stack machine ? must left to right
Registers can be used to hold variables
Reduce memory traffic
Speed up programs
Improve code density (fewer bits are used to name
a register)
Compiler writers prefer that all registers be
equivalent and unreserved
The number of GPR at least 16

14
Characteristics Divide GPR Architectures

of operands
Three-operand 1 result and 2 source operands
Two-operand 1 both source/result and 1 source
How many operands are memory addresses
0 3 (two sources 1 result)

Load-store
Register-memory
Memory-memory
15
Pros and Cons of Three Most Common GPR Computers
16
Short Summary Classifying Instruction Set
Architectures

Expect the use of general-purpose registers
Figure 2.4 pipelining (Appendix A)
Expect the use of Register-Register (load-store)
GPR architecture

17
Memory Addressing
18
Memory Addressing Basics
All architectures must address memory

What is accessed - byte, word, multiple words?
Todays machine are byte addressable
Main memory is organized in 32 - 64 byte lines
Big-Endian or Little-Endian addressing
Hence there is a natural alignment problem
Size s bytes at byte address A is aligned if A
mod s 0
Misaligned access takes multiple aligned memory
references
Memory addressing mode influences instruction
counts (IC) and clock cycles per instruction (CPI)

19
Typical Address Modes (I)
20
Typical Address Modes (II)
21
Use of Memory Addressing Mode (Figure 2.7)
Based on a VAX which supported everything
Not counting Register mode (50 of all)
22
Displacement Field Size
At least 1216 bits (75 -- 99) of the
displacements
23
Immediate Operands
24
Distribution of Immediate Values
25
Addressing Modes for Signal Processing

DSPs deal with infinite, continuous streams of
data, they routinely rely on circular buffers
Modulo or circular addressing mode
Support data shuffling in Fast Fourier Transform
(FFT)
Bit reverse addressing
0112 ? 1102
However, the two fancy addressing modes do not
used heavily
Mismatch between what programmers and compilers
actually use versus what architects expect

26
Frequency of Addressing Modes for T1 TMS320C54x
DSP
27
Short Summary Memory Addressing

Need to support at least three addressing modes
Displacement, immediate, and register deferred
( REGISTER)
They represent 75 -- 99 of the addressing modes
in benchmarks
The size of the address for displacement mode to
be at least 1216 bits (75 99)
The size of immediate field to be at least 8 16
bits (50 80)

28
Operand Type Size

Specified by instruction (opcode) or by hardware
tag
Tagged machines are extinct
Typical types assume word 32 bits
Character - byte - ASCII or EBCDIC (IBM) - 4 per
word
Short integer - 2- bytes, 2s complement
Integer - one word - 2s complement
Float - one word - usually IEEE 754 these days
Double precision float - 2 words - IEEE 754
BCD or packed decimal - 4- bit values packed 8
per word
Instructions will be needed for common
conversions -- software can do the rare ones

29
Data Access Patterns
30
Operands for Media and Signal Processing

Graphics applications vertex
(x, y, z) w to help with color or hidden
surfaces (R, G, B, A)
32-bit floating-point values
DSPs
Fixed point a binary point just to the right of
the sign bit
Represent fractions between 1 and 1
Have a separate exponent variable
Blocked floating point a block of variables has
a common exponent
Need some registers that are wider to guard
against round-off error

31
Operand Type and Size in DSP
32
Short Summary Type and Size of Operand

The future - as we go to 64 bit machines
Decimals future is unclear
Larger offsets, immediate, etc. is likely
Usage of 64 and 128 bit values will increase
DSPs need wider accumulating registers than the
size in memory to aid accuracy in fixed-point
arithmetic

33
What Operations are Needed

Arithmetic Logical
Integer arithmetic ADD, SUB, MULT, DIV, SHIFT
Logical operation AND, OR, XOR, NOT
Data Transfer - copy, load, store
Control - branch, jump, call, return, trap
System - OS and memory management
Well ignore these for now - but remember they
are needed
Floating Point
Same as arithmetic but usually take bigger
operands
Decimal - if you go for it what else do you need?
legacy from COBOL and the commercial application
domain
String - move, compare, search
Graphics pixel and vertex, compression/decompres
sion operations

34
Top 10 Instructions for 80x86

load 22
conditional branch 20
compare 16
store 12
add 8
and 6
sub 5
move register-register 4
call 1
return 1

The most widely executed instructions are the
simple operations of an instruction set
The top-10 instructions for 80x86 account for 96
of instructions executed
Make them fast, as they are the common case

35
Control Instructions are a Big Deal

Jumps - unconditional transfer
Conditional Branches
How is condition code set? by flag or part of
the instruction
How is target specified? How far away is it?
Calls
How is target specified? How far away is it?
Where is return address kept?
How are the arguments passed? Callee vs. Caller
save!
Returns
Where is the return address? How far away is it?
How are the results passed?

36
Breakdown of Control Flows

Call/Returns
Integer 19 FP 8
Jump
Integer 6 FP 10
Conditional Branch
Integer 75 FP 82

37
Branch Address Specification

Known at compile time for unconditional and
conditional branches - hence specified in the
instruction
As a register containing the target address
As a PC-relative offset
Consider word length addresses, registers, and
instructions
Full address desired? Then pick the register
option.
BUT - setup and effective address will take
longer.
If you can deal with smaller offset then PC
relative works
PC relative is also position independent - so
simple linker duty

38
Returns and Indirect Jumps

Branch target is not known at compile time
Need a way to specify the target dynamically
Use a register
Permit any addressing mode
RegsR4 ? RegsR4 MemRegsR1
Also useful for
case or switch
Dynamically shared libraries
High-order functions or function pointers
Virtual functions in OO

39
Branch Stats - 90 are PC Relative

Call/Return
TeX 16, Spice 13, GCC 10
Jump
TeX 18, Spice 12, GCC 12
Conditional
TeX 66, Spice 75, GCC 78

40
Branch Distances
41
Condition Testing Options
42
What kinds of compares do Branches Use?
43
Direction, Frequency, and real Change
Key points 75 are forward branch Most
backward branches are loops - taken about 90
Branch statistics are both compiler and
application dependent Any loop optimizations
may have large effect
44
Short Summary Operations in the Instruction Set

Branch addressing to be able to jump to about
100 instructions either above or below the
branch
Imply a PC-relative branch displacement of at
least 8 bits
Register-indirect and PC-relative addressing for
jump instructions to support returns as well as
many other features of current systems

45
Encoding an Instruction Set
46
Encoding the ISA

Encode instructions into a binary representation
for execution by CPU
Can pick anything but
Affects the size of code - so it should be tight
Affects the CPU design - in particular the
instruction decode
So it may have a big influence on the CPI or
cycle-time
Must balance several competing forces
Desire for lots of addressing modes and registers
Desire to make average program size compact
Desire to have instructions encoded into lengths
that will be easy to handle in a pipelined
implementation (multiple of bytes)

47
3 Popular Encoding Choices

Variable (compact code but difficult to encode)
Primary opcode is fixed in size, but opcode
modifiers may exist
Opcode specifies number of arguments - each used
as address fields
Best when there are many addressing modes and
operations
Use as few bits as possible, but individual
instructions can vary widely in length
e. g. VAX - integer ADD versions vary between 3
and 19 bytes
Fixed (easy to encode, but lengthy code)
Every instruction looks the same - some field may
be interpreted differently
Combine the operation and the addressing mode
into the opcode
e. g. all modern RISC machines
Hybrid
Set of fixed formats
e. g. IBM 360 and Intel 80x86

Trade-off between size of programVS. ease of
decoding
48
3 Popular Encoding Choices (Cont.)
49
An Example of Variable Encoding -- VAX

addl3 r1, 737(r2), (r3) 32-bit integer add
instruction with 3 operands ? need 6 bytes to
represent it
Opcode for addl3 1 byte
A VAX address specifier is 1 byte (4-bits
addressing mode, 4-bits register)
r1 1 byte (register addressing mode r1)
737(r2)
1 byte for address specifier (displacement
addressing r2)
2 bytes for displacement 737
(r3) 1 byte for address specifier (register
indirect r3)
Length of VAX instructions 153 bytes

50
Short Summary Encoding the Instruction Set

Choice between variable and fixed instruction
encoding
Code size than performance ? variable encoding
Performance than code size ? fixed encoding

51
Role of Compilers

Critical goals in ISA from the compiler viewpoint
What features will lead to high-quality code
What makes it easy to write efficient compilers
for an architecture

52
Compiler and ISA

ISA decisions are no more for programming AL
easily
Due to HLL, ISA is a compiler target today
Performance of a computer will be significantly
affected by compiler
Understanding compiler technology today is
critical to designing and efficiently
implementing an instruction set
Architecture choice affects the code quality and
the complexity of building a compiler for it

53
Goal of the Compiler

Primary goal is correctness
Second goal is speed of the object code
Others
Speed of the compilation
Ease of providing debug support
Inter-operability among languages
Flexibility of the implementation - languages may
not change much but they do evolve - e. g.
Fortran 66 gt HPF

Make the frequent cases fast and the rare case
correct
54
Typical Modern Compiler Structure
Common Intermediate Representation
Somewhat language dependentLargely machine
independent
Small language dependentSlight machine dependent
Language independentHighly machine dependent
55
Typical Modern Compiler Structure (Cont.)

Multi-pass structure ? easy to write bug-free
compilers
Transform HL, more abstract representations, into
progressively low-level representations,
eventually reaching the instruction set
Compilers must make assumptions about the ability
of later steps to deal with certain problems
Ex. 1 choose which procedure calls to expand
inline before they know the exact size of the
procedure being called
Ex. 2 Global common sub-expression elimination
Find two instances of an expression that compute
the same value and saves the result of the first
one in a temporary
Temporary must be register, not memory
(Performance)
Assume register allocator will allocate temporary
into register

56
Optimization Types

High level - done at source code level
Procedure called only once - so put it in-line
and save CALL
Local - done on basic sequential block
(straight-line code)
Common sub-expressions produce same value
Constant propagation - replace constant valued
variable with the constant - saves multiple
variable accesses with same value
Global - same as local but done across branches
Code motion - remove code from loops that compute
same value on each pass and put it before the
loop
Simplify or eliminate array addressing
calculations in loop

57
Optimization Types (Cont.)

Register allocation
Use graph coloring (graph theory) to allocate
registers
NP-complete
Heuristic algorithm works best when there are at
least 16 (and preferably more) registers
Processor-dependent optimization
Strength reduction replace multiply with shift
and add sequence
Pipeline scheduling reorder instructions to
minimize pipeline stalls
Branch offset optimization Reorder code to
minimize branch offsets

58
Major Types of Optimizations and Example in Each
Class
59
Change in IC Due to Optimization

Level 1 local optimizations, code scheduling,
and local register allocation
Level 2 global optimization, loop transformation
(software pipelining), global register allocation
Level 3 procedure integration

60
Optimization Observations

Hard to reduce branches
Biggest reduction is often memory references
Some ALU operation reduction happens but it is
usually a few
Implication
Branch, Call, and Return become a larger relative
of the instruction mix
Control instructions among the hardest to speed up

61
Impact of Compiler Technology on Architects
Decisions

Important questions
How are variables allocated and addressed?
How many registers will be needed?
We must look at 3 areas to allocate data

62
Where to allocate data?

Stack
Local variable access in activation records,
almost no push/pop
Addressing is relative to the stack pointer
Grown or shrunk on calls and returns
Global data area - the easy one
Constants and global static structures
For arrays addressing may be indexed off head
Heap
Used for dynamic objects
Access usually by pointers
Data is typically not scalar

63
Register Allocation Data

Reasonably simple for stack objects
Hard for global data due to aliasing opportunity
Must be conservative
Heap objects pointers in general are even
harder
Computed pointers make allocation impossible to
register save the target data
Any structured data - string, array, etc. is too
big to save
Since register allocation is a major optimization
source
The effect is clearly important

p a a p a
64
How can Architects Help Compiler Writers

Provide Regularity
Address modes, operations, and data types should
be orthogonal (independent) of each other
Simplify code generation especially multi-pass
Counterexample restrict what registers can be
used for a certain classes of instructions
Provide primitives - not solutions
Special features that match a HLL construct are
often un-usable
What works in one language may be detrimental to
others

65
How can Architects Help Compiler Writers (Cont.)

Simplify trade-offs among alternatives
How to write good code? What is a good code?
Metric IC or code size (no longer true) ?caches
and pipeline
Anything that makes code sequence performance
obvious is a definite win!
How many times a variable should be referenced
before it is cheaper to load it into a register
Provide instructions that bind the quantities
known at compile time as constants
Dont hide compile time constants
Instructions which work off of something that the
compiler thinks could be a run-time determined
value hand-cuffs the optimizer

66
Short Summary -- Compilers

ISA has at least 16 GPR (not counting FP
registers) to simplify allocation of registers
using graph coloring
Orthogonality suggests all supported addressing
modes apply to all instructions that transfer
data
Simplicity understand that less is more in ISA
design
Provide primitives instead of solutions
Simplify trade-offs between alternatives
Dont bind constants at runtime
Counterexample Lack of compiler support for
multimedia instructions

67
The MIPS Architecture
68
Expectations for New ISA

Use general-purpose registers, with a load-store
architecture
Support displacement (offset size12-16 bits),
immediate (size 8 to 16 bits), and register
indirect
Support 8-, 16-, 32-, and 64-bit integers and
64-bit IEEE 754 floating-point numbers
Support the following simple instructions load,
store, add, subtract, move register-register,
and, shift, compare equal, compare not equal,
branch (with a PC-relative address at least 8
bits long), jump, call, return
Use fixed instruction encoding if interested in
performance and use variable instruction encoding
if interested in code size
Provide at least 16 general-purpose registers
(GPA) separate floating-point registers, be
sure all addressing modes apply to all data
transfer instructions, and aim for a minimalist
instruction set

69
MIPS

Simple load- store ISA
Enable efficient pipeline implementation
Fixed instruction set encoding
Efficiency as a compiler target
MIPS64 variant is discussed here

70
Register for MIPS

32 64-bit integer GPRs - R0, R1, ... R31, R0 0
always
32 FPRs - used for single or double precision
For single precision F0, F1, ... , F31 (32-bit)
For double precision F0, F2, ... , F30 (64-bit)
Extra status registers - moves via GPRs
Instructions for moving between an FRP and a GPR

71
Data Types for MIPS

8-bit byte, 16-bit half words, 32-bit word, and
64-bit double words for integer data
32-bit single precision and 64-bit double
precision for FP
MIPS64 operations work on 64-bit integer and 32-
or 64-bit floating point
Bytes, half words, and words are loaded into the
GPRs with zeros or the sign bit replicated to
fill the 64 bits of the GPRs
All references between memory and either GPRs or
FPRs are through load or stores

72
Addressing Modes for MIPS

Data addressing immediate and displacement (16
bits)
Displacement Add R4, 100(R1) (RegsR4?RegsR4M
em100RegsR1)
Register-indirect placing 0 in displacement
field
Add R4, (R1) (RegsR4?RegsR4MemRegsR1)
Absolute addressing (16 bits) using R0 as the
base register
Add R1, (1001) (RegsR4?RegsR4Mem1001)
Byte addressable with 64-bit address
Mode selection for Big Endian or Little Endian

73
MIPS Instruction Format

Encode addressing mode into the opcode
All instructions are 32 bits with 6-bit primary
opcode

74
MIPS Instruction Format (Cont.)

I-Type Instruction

Loads and Stores LW R1, 30(R2), S.S F0, 40(R4)
ALU ops on immediates DADDIU R1, R2, 3
rt lt-- rs op immediate
Conditional branches BEQZ R3, offset
rs is the register checked
rt unused
immediate specifies the offset
Jump registers ,jump and link register JR R3
rs is target register
rt and immediate are unused but 011

75
MIPS Instruction Format (Cont.)
R-Type Instruction

Register-register ALU operations rd?rs funct rt
DADDU R1, R2, R3
Function encodes the data path operations Add,
Sub...
read/write special registers
Moves

J-Type Instruction Jump, Jump and Link, Trap and
return from exception
6 26
opcode
Offset added to PC
76
MIPS instruction MIX
SPECint2000
77
MIPS instruction MIX (Cont.)
SPECfp2000

Write a Comment

User Comments (0)

About PowerShow.com

Instruction Set Principles and Examples - PowerPoint PPT Presentation

Instruction Set Principles and Examples

By a binary format since the hardware understands only bits. Concatenate together binary encoding for instructions, registers, constants, memories ... – PowerPoint PPT presentation