INSTRUCTION SET DESIGN - PowerPoint PPT Presentation

About This Presentation
Title:

INSTRUCTION SET DESIGN

Description:

Title: COSC3330/6308 Computer Architecture Author: Jehan-Fran ois P ris Last modified by: Jehan-Fran ois P ris Created Date: 8/29/2001 4:04:21 AM – PowerPoint PPT presentation

Number of Views:116
Avg rating:3.0/5.0
Slides: 108
Provided by: Jehan76
Learn more at: https://www2.cs.uh.edu
Category:

less

Transcript and Presenter's Notes

Title: INSTRUCTION SET DESIGN


1
INSTRUCTION SET DESIGN
  • Jehan-François Pâris
  • jparis_at_uh.edu

2
Chapter Organization
  • General Overview
  • Objectives
  • Realizations
  • The MIPS instruction set

3
Importance
  • The instruction set of a processor is its
    interface with the outside world
  • Defined by the hardware
  • Used by assemblers, compilers and interpreters
  • Remained very visible to the users up to the 80s
  • Earlier PC programs were written in assembler

4
GENERAL OVERVIEW
5
Common features
  • A machine instruction normally consists of
  • An operation code
  • One, two or three operand addresses
  • Various flags
  • Some operands can be immediate
  • Address field contains the value of the operand
    instead of its address

6
Common features
  • One or more operands can be in high-speed
    registers
  • Dedicated registers
  • Very old solution
  • Register address can be specified in the opcode
  • General purpose registers

7
Common features
  • Memory operand addresses are represented in a
    compact form
  • Base displacement
  • The address is specified by the contents of a
    base register plus a displacement
  • Saves space because displacement is generally
    small

8
Objectives
  • IS should be
  • Expressive
  • Powerful instructions
  • Designed for speed
  • Should be able to run fast and allow extensive
    prefetching
  • Compact
  • Faster fetches from disk and from main memory

9
Objectives
  • User friendly
  • Very important when people were expected to
    program is assembler
  • Manufacturers loved that because
  • Instruction sets were mostly proprietaryIBM
    360/370 was the exception
  • Programs could not be ported to a different
    architecture

10
The story of Gene Amdahl
  • Raised on a farm without electricity until he
    went to high school
  • PhD from U Wisconsin-Madison
  • Became one of the top architects of the IBM/360
    series
  • Had his next design rejected by IBM
  • Started his own company (Amdahl ) building big
    mainframes and selling them at a much lower cost
    than comparable IBM machines

11
How could they do that? (II)
  • Amdahl could undersell IBM by focusing on larger
    "mainframes"
  • Andahl's computers were air-cooled while IBM's
    water-cooled
  • "Decreased installation costs by 50,000 to
    250,000."
  • http//www.fundinguniverse.com/company-histories/a
    mdahl-corporation-history/

12
How could they do that? (I)
  • IBM 360 series was first series of computers
    with
  • Very different capacities
  • Same instruction set
  • IBM pricing policy was keeping computer prices
    proportional to their capacity
  • Did not reflect proportionally lower
    manufacturing cost of high-end machines

13
The end of the story
  • Left Amdahl in 1979 to pursue unsuccessfully
    several ventures
  • Amdahl Computers is now part of Fujitsu and
    focuses on services

14
Inherent conflicts
  • Expressiveness vs. Speed
  • CISC instructions were powerful but microcoded
  • Compactness vs. Speed
  • Many instruction sets had instructions of
    different length
  • Cannot know start of next instruction before
    decoding the opcode of current one

15
What is microcode?
  • Some machine language below the instruction set
  • Invisible to the programmer/compiler
  • Each instruction corresponded to one or more
    microinstructions
  • Some architectures allowed the user to program
    new instructions or a whole new instruction set

16
AN EXAMPLE IBM 360/370/
17
The 360 architecture (I)
  • Developed in the 60s but kept almost unchanged
    for 30 years
  • Thirty-two bit words and 8-bit bytes
  • Memory was byte-addressable
  • Had 24-bit addresses restricting main memory size
    to 16 MB (enormous at that time!)
  • Later extended to 32 then 64 bits

18
The 360 architecture (II)
  • Instruction set included
  • 32-bit operations for scientific and engineering
    computing (FORTRAN)
  • Byte-oriented operations for character processing
    and decimal arithmetic then judged essential for
    business applications
  • Name of series referred to wide range of
    applications that could run on the machine

19
IBM 360 instruction set (I)
  • Had multiple instructions formats all with
  • Mostly 8-bit opcodes
  • 16 general purpose registers
  • RR (register to register)
  • 16 bits

20
IBM 360 instruction set (II)
  • RX (register to/from indexed storage)
  • 32 bits
  • Address of memory operand wascontents of base
    and index registers plus12-bit displacement D

21
IBM 360 instruction set (III)
  • SI ( storage and immediate)
  • 32 bits
  • Has an 8-bit wide immediate field I
  • Address of memory operand wascontents of base
    register B plus12-bit displacement D

22
IBM 360 instruction set (IV)
  • SS ( storage to storage)
  • 48 bits
  • Two memory operands
  • First addresses of fields of length L

23
IBM 360 instruction set (V)
  • S ( storage )
  • 32 bits with a 16-bit opcode
  • Mostly for privileged instructions

24
Discussion (I)
  • Flexible and compact
  • Multiple instruction sizes
  • Must decode current instruction to know start of
    the next one
  • Regular design
  • Many operations can be RR, RS, RX, SI and SS
    (character manipulation and decimal arithmetic)

25
Discussion (II)
  • RX format
  • Memory address is indexed by base register and
    index register
  • ai can be decomposed into
  • Current base register
  • Offset of a0 relative to base register
  • Index i multiplied by size of array element in
    index register

26
Discussion (III)
  • Why such a complex addressing format?
  • Index register was used to access arrays
  • Base register allowed for a much shorter address
    field
  • 4 bits for base register 12 bits for
    displacement
  • vs
  • 24 bits for a full address

27
THE MIPS INSTRUCTION SET
28
MIPS (I)
  • Originally stood for Microprocessor without
    Interlocked Pipeline Stages
  • First RISC microprocessor
  • Development started in 1981 under John Hennessy
    at Stanford University
  • Started a companyMIPS Computer Systems, Inc.

29
MIPS (II)
  • Owned by SGI from 1992 to 1998
  • Until SGI switched to the Intel Itanium
    architecture
  • Used by used by DEC, NEC, Pyramid Technology,
    Siemens Nixdorf, Tandem and others during the
    late 80s and 90s
  • Until Intel Pentium took over
  • Now primarily used in embedded systems

30
Overview
  • Two versions
  • MIPS32 with 32-bit addresses (discussed here)
  • MIPS64 with 64-bit addresses
  • Both MIPS architectures have
  • Thirty-two registers (32 bits on MIPS 32)
  • A byte-addressable memory
  • Byte, half-word and word-oriented operations

31
Bit ordering
  • All examples assume that byte-ordering
    islittle-endian
  • Bits are numbered from right to left

31
0
32
Number representation (I)
  • MIPS uses twos complement representation for
    negative numbers
  • 00.0 represents 0
  • 00.1 represents 1
  • 01.1 represents 2n1 1
  • 10.0 represents 2n1
  • 11.1 represents 1

33
Two-complement representation
  • Assume n-bit integers
  • All positive integers have first bit equal to
    zero
  • All negative integers have first bit equal to one
  • To negate an integer, we compute its complement
    to 2n in unsigned arithmetic

34
Example
  • Assume n 4
  • 0000, 0001, 0010, 0011, 0100, 0101, 0110 and 0111
    represent integers 0 to 7
  • To find the representation of -3, we do
  • 16 -3 13, that is, 1101
  • More generally 1000, 1001, 1010, 1011, 1100,
    1101, 1110, 1111 represent negative integers -8
    to -1

35
Another way to look at it (I)
Unsigned Unsigned
0 0000 1000 8
1 0001 1001 9
2 0010 1010 10
3 0011 1011 11
4 0100 1100 12
5 0101 1101 13
6 0110 1110 14
7 0111 1110 15
36
Another way to look at it (II)
U and S Unsigned Signed
0 0000 1000 8 16 - 8 - 8
1 0001 1001 9 16 - 7 - 7
2 0010 1010 10 16 - 6 - 6
3 0011 1011 11 16 - 5 - 5
4 0100 1100 12 16 - 4 - 4
5 0101 1101 13 16 - 3 - 3
6 0110 1110 14 16 -2 - 2
7 0111 1110 15 16 -1 - 1
37
Number representation (II)
  • Can create problems when we fetch a byte or
    half-word into a 32 bit register
  • If we fetch the 16-bit half-word
  • we have two possible outcomes

100101
(unsigned)
(signed)
38
MIPS instruction set
  • Designed for speed and prefetching ease
  • All instructions are 32-bit long
  • Five instruction formats
  • Three basic formats
  • R, I and J
  • Two floating point formats
  • FR and FJ

39
The R format
  • Six-bit opcode
  • R instructions have three operands
  • Five bits per register ? 32 registers
  • Shamt specifies a shift amount (5 bits)
  • Funct selects the specific variant of operation
    defined in opcode (6 bits)
  • Many R instructions have an all-zero opcode

40
Register naming conventions
  • s0, s1, , s7 for saved registers
  • Saved when we do a procedure call
  • t0, , t9 for temporary registers
  • Not saved when you do a procedure call
  • 0 is the zero register
  • Always contains zeroes
  • Other conventions are used for registers used in
    procedures calls

41
R format instructions (I)
  • Arithmetic instructions
  • add sa, sb, sc a b c
  • sub sd, se, sf d e f
  • Logical instructions
  • and sa, sb, sc a b c
  • or sd, se, sf d e f
  • nor sg, sh, si g (h i)
  • xor sj, sk, sl j (kl)(kl)

42
Notes
  • MIPS logical instructions are bitwise operations
  • Implement bitwise and I operations of C
  • MIPS has no negation instructions
  • Use NOR and specify register 0 as one of the two
    input registers
  • 0 is hard-wired to contain zero
  • nor sk, sl, 0 k l

43
R format instructions (II)
  • More arithmetic instructions
  • addu s1, s2, s3
  • subu s1, s2, s3
  • Unsigned versions of add and sub
  • Multiply and divide instructions will be covered
    later

44
R format instructions (III)
  • Shift instructions
  • sll s1, s0, n
  • slr s1, s0, n
  • Shift contents of source register s0 by n bits
    to the left (sll) or to the right (slr)
  • Fill the emptied bits with zero
  • Store results in destination register s1

45
Notes (II)
  • A right shift followed by a logical and can be
    used to extract some specific bits
  • If we are interested in bits 14-15 of register
    s0
  • We set up a register s2 containing 011two
  • We do
  • slr t0, s0, 14 use temporary register t0
  • and s1, t0, s2 answer is in s1

46
Notes (III)
  • Want to extract bits XY in positions 14 and 15
  • slr t0, s0,15
  • and s1, t0, s2 s2 contains mask 011

?????????????????xy??????????
0000000000??????????????????xy
000000000000000000000000000xy
47
R format instructions (IV)
  • Register comparison instructions
  • slt t0, s3, s4
  • sltu t0, s3, s4
  • Sets register t0 to 1 if s3 lt s4and to 0
    otherwise
  • slt does a signed comparison
  • sltu does an unsigned comparison

48
R format instructions (IV)
for big jumps
  • Jump instructions
  • jr s0
  • Jump to address contained in register s0
  • Since s0 can contain a 32-bit address, the jump
    can go anywhere
  • jalr s0, s1
  • Jump to address contained in register s0 and
    save address of next instruction in register s1
    (defaults to register 31)

49
The I format
constant/address
  • Last field can be
  • a 16-bit displacementAddress of memory operand
    is the sum of the contents of register rt and
    this displacement
  • a 16-bit constantRegister rt can then specify
    a second register operand

50
Discussion
  • I format contains instructions involving
  • One register and a memory location
  • Two registers and an immediate value
  • Two registers and a jump address relative to the
    current value of the program counter (PC)
  • MIPS instruction set uses the same format for
    three very different instruction styles
  • Simplifies decoding hardware

51
Register and memory instructions
  • Transfer data
  • To a register load
  • From a register store
  • No arithmetic instructions as in the IBM/360 IS
  • All arithmetic and logical operations are either
    register to register or immediate to register

52
Register and memory instructions
  • Load instructions
  • lbu s1, a(s2)
  • Load unsigned byte into register s1 from memory
    address obtained by adding the contents of
    register s2 to theaddress field a
  • lbhu s1, a(s2)
  • Load half-word unsigned

53
Register and memory instructions
  • Load instructions (contd)
  • lw s1, a(s2)
  • Load word unsigned
  • ll t1, a(s2)
  • Load linked used with store conditional to
    implement atomic locks
  • The famous spinlocks

Wait for COSC 4330
54
Register and memory instructions
  • Store instructions
  • sb s1, a(s2)
  • Store least significant byte
  • sbh s1, a(s2)
  • Store least significant half-word
  • sw s1, a(s2)
  • Store word

55
Register and memory instructions
  • Store instructions (contd)
  • sc t1, a(s2)
  • Store conditional
  • Always follows a load linked
  • Fails if value in memory changes since the load
    linked instruction
  • Saves contents of t1 in memory and sets t1 to
    one if store conditional succeeds or to zero
    otherwise

Wait for COSC 4330
56
Immediate instructions
  • Immediate value is 16-bit wide
  • addi 1, 2, n
  • Store in register s0 the sum of the contents of
    register s1 and the decimal value n
  • addiu 1, 2, n
  • Unsigned version of addi
  • andi 1, 2, n
  • ori 1, 2, n

57
Three missing instructions
  • No subi or subiu
  • Instead of subtracting an immediate value nwe
    can add a negative value n
  • No load immediate
  • Replaced by ori with a zero value
  • ori s0, 0, n load value n into register s0

? Typo in textbook?
58
Loading 32-bit constants (I)
  • lui s0, n
  • load upper immediate
  • Loads the decimal value n into the 16 most
    significant bits of register s0

lui s0, n
59
Loading 32-bit constants (I)
  • Works in combination with ori

60
Other immediate instructions
  • Comparison instructions
  • slti t0, s3, n
  • sltui t0, s3, n
  • Both set register t0
  • To one if s3 lt sign extended value of n
  • To zero otherwise
  • slti does a signed comparison
  • sltui does an unsigned comparison

61
Notes
  • We should use immediate instructions whenever
    possible
  • They are faster and use one register less than
    the equivalent R instructions
  • To isolate bits 14-15 of register s0, use
  • slr t0, s0, 14
  • and s1, t0, 3 decimal 3 binary 011

62
Immediate branch instructions (I)
  • All immediate jump and branch instructions use a
    very specific addressing scheme
  • Use the program counter as base register
  • Makes sense
  • Both register fields can be used for conditional
    branch instructions

63
Immediate branch instructions (II)
  • Immediate field contains a signed number
  • Can jump ahead or behind the address indicated by
    the current value of the PC 4
  • Address is multiplied by four before being added
    to the value of the PC

New PC Old PC 4 4n
64
Why PC 4
  • Because
  • By the time the MIPS CPU adds 4nto the PC
    register, it will contain the address of the next
    instruction

65
Immediate branch instructions (III)
  • Works well because
  • PC value is a multiple of four as instructions
    are properly aligned at addresses that are
    multiple of 4
  • Solution extends the range of the jump from ?
    215 B 32 KB to ? 217 B 128 KB
  • can use J or JR for bigger jumps

66
Immediate branch instructions (IV)
  • beq s0, 1 a
  • Branch on equal
  • Jump to address computed by adding 4a to the
    current value of the PC 4 if the values of
    registers s0 and s1 are equal
  • bne s0, 1 a
  • Branch on not equal

67
The J format (I)
  • Sole operand is a 26-bit address
  • Jump instructions
  • j a
  • Unconditionally jump to a new address
  • jal a
  • Unconditionally jump to a new address and store
    address of next instruction in ra

68
The J format (I)
  • Note that jal has an implicit operand
  • Register ra (stands for return address)
  • Always register 31
  • In general, implicit operands
  • Allow more compact instruction formats
  • Complicate register management

69
Computing the new address
  • Obtained as follows
  • Bits 10 are zero (address is multiple of 4)
  • Bits 282 come from jump operand (26 bits)
  • Bits 3129 come from PC4

Allows to jump to anywhere in memory
70
Observations
  • The overall philosophy used to design the MIPS
    instruction set was
  • Favor simplicity and regularity
  • Only three instruction formats
  • Optimize for the more frequent cases
  • Immediate to register instructions
  • Make good compromises

71
Comparison with IBM 360 IS
  • IBM 360
  • Variable length instructions
  • Sixteen GP registers
  • RR format has two register operands
  • RS format
  • SI format
  • SS format
  • MIPS
  • Fixed-size instructions
  • Thirty-two GP registers
  • R format has three register operands
  • An option of I format
  • No, but the equivalent of a non-existing RI
    format
  • No equivalent

72
The stack
  • Used to store saved registers
  • LIFO structure starting at a high address value
    and growing downwards
  • MIPS software reserves register 29 for the stack
    pointer (sp)
  • We can push registers to the stack and pop them
    later

73
The stack
Stack pointer sp
74
Handling procedure calls
  • MIPS software uses the following conventions
  • a0, a3 are the four argument registers that
    can be used to pass parameters
  • v0 and v1 are the two value registers that can
    be used to return values
  • ra is the register containing thereturn address

75
Simple procedure call (I)
  • Before starting the new procedure, we must save
    the registers used by the calling procedures
  • Not all of them
  • Save the eight registers s0, s1, , s7
  • Do not save the ten temporary registerst0, ,
    t9
  • Must also restore these registers when we exit
    the procedure

76
Simple procedure call (II)
  • At call time
  • addi sp, sp, -32 eight times four bytessw
    s7, 28(sp)sw s6, 24(sp)sw s5, 20(sp)sw
    s4, 16(sp)sw s3, 12(sp)sw s2, 8(sp)sw
    s1, 4(sp)sw s0, 0(sp

Reality CheckWe will only save the registers
that will bereused by the callee
77
Simple procedure call (III)
  • At return time
  • lw s0, 0(sp) restore the registers s0
    to s7lw s1, 4(sp)lw s2, 8(sp)lw s3,
    12(sp)lw s4, 16(sp)lw s5, 20(sp)lw s6,
    24(sp)lw s7, 28(sp)addi sp, sp, 32 eight
    times four bytes

78
Nested procedures (I)
  • Procedures that call other procedures
  • Including themselves recursive procedures
  • Likely to reuse argument registers and temporary
    registers
  • Caller will save the argument registers and
    temporary registers it will need after the call
  • Callee will save the saved registers of the
    caller before reusing them

79
Nested procedures (II)
  • All saved registers are restored when the
    procedure returns and the stack is shrunk

80
The assembler (I)
  • Helps the programmer with
  • Symbolic names
  • Arguments of jump and branch instructions are
    labels rather than numerical constants
  • beq s0, s1 done
  • .
  • done

81
The assembler (II)
  • Pseudo-instructions
  • To reserve memory locations for data
  • To create user-friendly specialized versionsof
    very general instructions
  • bzero s0, address
  • for
  • beq s0, 0, address

82
A review question
  • Which MIPS instructions involve
  • Three explicit operands?
  • Two explicit operands?
  • One explicit operand?

83
Answer (I)
  • Three explicit operands
  • All R format instructions but jr and jalr
  • All I format instructions involving an immediate
    value but lui
  • All I format instructions involving a
    conditional branch

84
Answer (II)
  • Two explicit operands
  • All I format instructions involving a load or a
    store
  • Load upper immediate (lui)
  • Jump and link to register (jalr)

85
Answer (III)
  • One explicit operand
  • Jump register instruction (jr)
  • Both J format instructions
  • jal has an implicit second operand that it uses
    to store the return address (ra)

86
OTHER EXAMPLES
87
Another RISC IS ARM (I)
  • ARM stands for Advanced RISC Machine
  • Originally developed as a CPU architecture for a
    desktop computer manufactured by Acorn in
    Britain
  • Now dominates the embedded system market
  • Company has no foundry
  • It licenses its products to manufacturers

88
Another RISC IS ARM (II)
  • Used in cell phones, embedded systems,
  • Same 32-bit instruction formats
  • Only 8 registers
  • Nine addressing modes
  • Including autoincrement
  • Branches use condition codes of arithmetic unit

89
Another RISC IS ARM
  • All instructions have a 4-bit condition code
    allowing their conditional execution
  • Claimed to faster than using a branch
  • NO zero register
  • Includes instructions like logical not, load
    immediate, move (R to R)

90
(No Transcript)
91
The x86 instruction set (I)
  • Not the result of a well-thought design process
  • Evolved over the years
  • 8086 was the first x86 processor architecture
  • Announced in 1978
  • Sixteen-bit architecture
  • Assembly language-compatible extension to Intel
    8080 (8-bit microprocessor)

92
The x86 instruction set (II)
  • New instructions were added over the years,
    mostly by Intel , but also by AMD
  • Floating-point instructions
  • Multimedia instructions
  • More floating-point instructions
  • Vector computing
  • 80836 was the first 32-bit processor
  • Introduced more interesting instructions

93
The x86 instruction set (III)
  • AMD extended the x86 architecture to 64-bit
    address space and registers in 2003
  • Intel followed AMDs lead the next year

94
The x86 instruction set (IV)
  • The result is a huge instruction set combining
  • Old instructions kept to insure backward
    compatibility
  • Golden handcuff
  • Newer 32-bit instructions that are in use today

95
A curious tradeoff
  • Complexity of x86 instruction set
  • Complicates the design of x86 microprocessors
  • More work for Intel and AMD architects
  • Effectively prevents other manufacturers to
    design and sell x86-compatible microprocessors
  • A mixed blessing for the duopoly Intel/AMD

96
Stack-oriented architectures (I)
  • A rarity in microprocessor architecture
  • Bad solution
  • Used in many interpreted languages
  • Java virtual machine, Python
  • Big exceptions are Dalvik and Lua
  • Register-based virtual machines

97
Interpreted vs. Compiled (I)
  • Compiled language
  • Machine code is directly executable

Sourcecode
Compiler
Machinecode
98
Interpreted vs. Compiled (II)
  • Interpreted language

Compiler
Bytecode
Source code
Interpreter
Bytecode
Results
99
Stack-oriented architectures (II)
  • Named registers replaced by a stack of registers
  • Two basic operations
  • PUSH ltaddressgtpush on stack contents of memory
    address
  • POP ltaddressgtpop top of stack and store it at
    specified memory address

100
Stack-oriented architecture (III)
  • Binary arithmetic and logical operations
  • Have both operands on the top of the stack
  • Replace them by the result
  • Example

AB
101
Tradeoff
  • Advantages
  • Simple compilers and interpreters
  • Very compact object code
  • Main disadvantage
  • More memory references
  • Cannot save partial results into a register

102
CONCLUSION
103
Good design principles (I)
  • Simplicity favors regularity
  • As few instruction formats as possible for easier
    decoding
  • Smaller is faster
  • No complicated instructions

104
Good design principles (II)
  • We should make the common case faster
  • PC-relative addressing for branches
  • Good design requires good compromises
  • Limiting address displacements and immediate
    values to 16 bits ensures that all instructions
    can fit in a 32-bit word
  • Weird J format addressing mode

105
Myths (I)
  • Adding more powerful instructions will increase
    performance
  • IBM 360 has an instruction saving multiple
    registers
  • DEC VAX has an instruction computing polynomial
    terms
  • These complex instructions limit pipelining
    options

106
Myths (II)
  • Writing programs in assembly language produces
    faster code than that generated by a compiler
  • Compilers are now better than humans for register
    allocations
  • In addition, programs written in assembly
    language cannot be ported to other CPU
    architectures
  • Think of Macintosh programs

107
Myths (III)
  • Backward compatibility prevents instruction sets
    from evolving
  • Not true for x86 architecture
  • Can add new instructions
  • Write compilers that avoid the older instructions
Write a Comment
User Comments (0)
About PowerShow.com