Henri Casanova

1 / 101
About This Presentation
Title:

Henri Casanova

Description:

It is the portion of the computer visible to the programmer/compiler ... world. memory contains. both code and data. CPU. PC. program counter. points to the ... – PowerPoint PPT presentation

Number of Views:158
Avg rating:5.0/5.0
Slides: 102
Provided by: henri172

less

Transcript and Presenter's Notes

Title: Henri Casanova


1
ICS431Computer Architecture Instruction Set
Architectures (ISA)
  • Henri Casanova
  • henric_at_hawaii.edu

2
Components of Architecture
  • Three main components to computer architecture
  • Instruction set architecture
  • Computer organization
  • Hardware
  • In this course well mostly focus on the first
    two, starting with the first one

3
What is an ISA?
  • The Instruction Set Architecture, or ISA, of a
    computer is the interface between the software
    and the hardware
  • It is the portion of the computer visible to the
    programmer/compiler
  • In some sense, the instruction set architecture
    is defined by the set of assembly instructions
    that can be used and by what they do
  • Basic Model

CPU
FETCH DECODE EXECUTE
program counter points to the current instruction
PC
External Data Bus
. . . LOAD R1, (R2) LOAD R3, (R4) ADD R5, R1,
R3 . . .
Memory
. . . 0x13A3B3FC 010101010010110101010101011010110
0x13A3B400 000010101010101010101101011101010 0x13
A3B404 011011111111101101011101101011010 . . .
. . . hello 2122666 world . . .
memory contains both code and data
. . . 0x23E453BB 111010111010100100001010101010111
0x23E453BF 000010100000001001110101010101010 0x23
E453C3 110110101010000000000000011010001 . . .
Instructions can refer to memory cells
4
ISA Categories
  • Although ISAs can come in many flavors, a typical
    distinction is the way in which operands to
    instructions are stored in a computer
  • Three main methods have been used
  • A stack architecture
  • Implicitly, all operands are at the top of the
    stack
  • An accumulator architecture
  • Implicitly, one operand is the accumulator, and
    the other must be referenced explicitly
  • A register architecture
  • All operands must be referenced explicitly
  • They can be either in registers or in memory
  • For now were only talking about operations on
    one of two operands (e.g., additions,
    comparisons)
  • These operations are performance in an ALU...

5
ALU
  • The CPU component that performs arithmetic and
    logic operations is called an Arithmetic Logic
    Unit (ALU)
  • Performs additions, substractions,
    multiplications, divisions, comparisons, bit
    shifting
  • You have seen how to implement such things in
    ICS331
  • Basically, tons of gates
  • There can be integer ALUs or floating point ALUs
  • If no floating point hardware, then it can be
    emulated in software
  • No longer the case for several decades
  • Floating Point emulation used to be the
    overwhelming bottleneck
  • A CPU may have multiple ALUs
  • Usual Representation

Op1
Op2
Operation
Result
Overflow
6
ALU Design a birds eye view
  • Basically an 8-bit ALU takes 16 bits of input
  • A0,...,A7 and B0,..., B7
  • Lets say the ALU does ADD, SUB, OR, and AND
  • Can be encoded with 2 bits X and Y
  • 00 ADD, 01 SUB, 10 OR, 11 AND
  • The output is 8-bit output plus an overflow bit
    (could also have a carry out output)

A0 ... A7
B0 ... B7
X
Y
Overflow
C0 ... C7
7
ALU Design a birds eye view
  • In the end, its all a big Karnaugh map, and can
    thus be implemented with (many) gates
  • In the real-world
  • There are convention for binary representations
    of numbers
  • 2s complements
  • sign bits
  • Large ALUs are built using 1-bit ALUs as building
    blocks
  • See ICS311 for all details

8
Back to ISAs a Stack ISA
  • The hardware implements a Stack
  • Reserve a zone in memory that will store the
    Stack
  • Use a TOS (Top of Stack) register that contains
    the address of the top of the Stack
  • The ISA provides a PUSH and a POP instruction, as
    well as, say, and ADD instruction
  • The operands to the ADD are implicitly the two
    elements at the top of the Stack, they are
    popped, and then the result is placed at the top
    of the Stack.

TOS
PUSH A // load the data at _at_ A at the top PUSH
B // load the data at _at_ B at the top ADD //
pop the two two elements, add them, push
the result POP C // store the top at _at_ C Every
time the Stack is modified, the TOS pointer is
updated
9
An Accumulator ISA
  • The hardware implements an accumulator
  • A memory cell
  • The ISA provides a LOAD and a STORE instruction,
    as well as, say, and ADD instruction
  • LOAD and STORE operate between the accumulator
    and the memory
  • One operands to the ADD is implicitly the content
    of the accumulator, the other is in memory, and
    the result is placed in the accumulator

LOAD A // load the data at _at_ A in the acc. ADD
B // add B to the acc. STORE C // store the
content of the // acc. at _at_
C (Fewer instructions than with a Stack)
...
...
memory
10
Register ISAs
  • In register ISAs, operations have only explicit
    operands from
  • A set of registers, i.e., memory cells, are
    located on the CPU and can hold data
  • or, Memory cells in RAM (or cache), away from the
    CPU
  • Options 1 register-memory ISAs
  • Instructions can have some operands in memory
  • Output is stored in a register
  • Option 2 register-register ISAs
  • Instructions can only have operands in registers
  • Output is stored in a register
  • Both ISAs provide ways to load data from memory
    into registers and store data from register into
    memory
  • Both are called General Purpose Register (GPR)
    ISAs
  • Memory-memory ISAs are no longer found in
    computers today
  • the point is that the use of registers is fast

11
Register ISAs
  • Register-Memory Register-Register

registers
registers
...
...
...
...
memory
12
Register ISAs
  • Since 1980s, no computer has been designed with a
    stack or accumulator ISA.
  • Reason 1 Registers are faster than memory and
    have become cheap(er) to build
  • Reason 2 Stack and or accumulator are
    inefficient because they make the compilers job
    difficult
  • Example (AB) - (BC) - (AD)
  • With a stack or accumulator the values of A and B
    must be loaded from memory multiple time, which
    reduces performance
  • With a stack of accumulator, there is only one
    operand evaluation order possible, which may
    prevent cool optimizations that weve come to
    take for granted in modern computers
  • Registers can be used to store frequently used
    variables and thereby reduce memory traffic
  • Registers can be named with fewer bits than a
    memory cell, which improves ode density
  • Therefore, we are left with only register ISAs

13
How many registers?
  • Newer ISAs tend to have more registers
  • Having more registers gives the compiler more
    opportunities to try to be as efficient as
    possible
  • Typically, compilers reserve some registers for
    special purposes
  • to hold parameters to a function call
  • as helper for expression evaluation
  • The remaining registers can be used to hold
    program variables

14
Operands and ALU
  • GPR (General purpose register) ISAs are
    characterized precisely by two characteristics
  • Whether the ALU has 2 or 3 operands
  • 2 operands one of the operands is both an input
    and an output
  • 3 operands the third operand is an output
  • The number of operands that may be memory
    addresses
  • Figure 2.3 in the book shows examples of
    architectures that have take different approaches
  • The only memory-memory ISA is the VAX, which is
    now obsolete
  • We will mostly focus on the (3,0) configuration,
    that is three operands, which must be in
    registers
  • corresponds to modern trends
  • fits the general principles that all instructions
    should take the same number of cycles for better
    performance

15
ISAs more characteristics
  • Defining the format of ALU operations is only one
    small part of defining the ISA
  • Other considerations include
  • Memory addressing
  • Type and size of operands
  • What are the operations in the ISA?
  • Control Flow
  • ISA encoding
  • Compilers?
  • Lets look at all these issues, which are covered
    in a bit more detail in the book
  • Ignore the DSP content in the book

16
Memory Addressing
  • Every ISA must define how memory addresses
  • are interpreted
  • are specified in instructions (Load / Store)
  • Most ISAs are byte addressed
  • i.e., the smallest piece of memory that can be
    addressed is a byte
  • Meaning that each byte has an address represented
    by a binary string
  • If I have 128 bytes of addressable RAM, then each
    byte has an address thats composed of 7 bits.
  • Typically, an ISA allows access to
  • bytes (8 bits)
  • half words (16 bits)
  • Supported in C (short integers)
  • Useful when size of data structures is a concern
    (e.g., O/S)
  • Useful for the Unicode character set (Java)
  • words (32 bits)
  • double words (64 bits)

17
Registers
  • The CPU has several 64-bit registers
  • These registers can hold bytes, half words,
    words, or double words

byte half word word double word
dont care
dont care
dont care
18
Memory Alignment
  • In most computers access to objects larger than a
    byte (half words, words, double words) must be
    aligned
  • Definition of aligned An object of size S
    bytes is aligned if its address is a multiple of
    S (i.e., _at_ mod S 0)
  • Example
  • Accessing 1 byte at _at_ 000000 aligned
  • Accessing 1 half word at _at_ 000001 NOT aligned
  • 1 mod 2 1
  • Accessing 1 half word at _at_ 000010 aligned
  • Accessing 1 double word at _at_ 011000 aligned
  • 24 mod 8 0
  • See Figure 2.5

19
Why Alignment?
  • The main reason is simpler hardware
  • The memory hardware is build with alignment on
    large-object boundaries (e.g., double word)
  • Therefore accessing an object thats misaligned
    would increase the number of memory accesses
  • Example
  • If you have a computer that allows misaligned
    access, then a program that does only aligned
    accesses would run faster
  • Therefore, imposing memory alignment it not such
    a terrible thing

20
Alignment to registers
  • When you load something from memory you actually
    get a double word, that is placed in a 64-bit
    register
  • The memory hardware is simpler to design (and
    therefore cheaper) if you assume that there is
    only one size of data that you can load from it
  • Besides, its not like loading 64 bits takes
    longer than loading 8 bits as long as we have 64
    wires
  • Note that storing can be done at the byte level
  • Lets say I want to load 1 byte whose address is
    x..x100
  • That byte is located within a double word of
    memory (lets assume Big Endian)
  • Therefore, when I load the whole double word into
    a 64-bit register, the byte will be in the middle
    of the register!
  • What I need is to right-shit the whole double
    word three times so that the byte is where it
    should be the 8 least-significant bits of the
    register

0
1
2
3
4
5
6
7
X
X
X
0
1
2
3
4
21
Byte ordering
  • There are two conventions for the bytes within a
    half word, word, or double word
  • Lets look at a double word (8 bytes)
  • The bytes within this double word are contiguous
    in memory
  • Therefore their addresses are
  • x...x000, x...x001, x...x010, x...x011, x...x100,
    x...x101, x...x110, x...x111
  • Question does the left-most byte have address
    x...x000 or x...x111.
  • The notion of left/right is based on the fact
    that the memory is linear, with increasing
    addresses

22
Byte Ordering
  • Two conventions for byte ordering in a double
    word
  • Little EndianThe byte whose address is x...x000
    is placed at the least significant position,
    the little end, meaning that the bytes are
    numbered
  • Pentium
  • Big EndianThe byte whose address is x...x000 is
    placed at the most significant position, the
    big end, meaning that the bytes are numbered
  • SPARC
  • Some architectures can be configured either way
    (bi-endian)
  • Either in software (at boot up) or on the
    motherboard
  • PowerPC, IA64

7
6
5
4
3
2
1
0
0
1
2
3
4
5
6
7
23
Byte Ordering
  • Why should one care about byte ordering?
  • Most of the time, one doesnt
  • its all hidden
  • But it can be an issue when one accesses the same
    location both as a double word or as a byte

24
Byte ordering in action
  • Lets see how one can write a C program that
    gives different results on computers that use
    different byte orderings
  • To expose different behaviors one need to
    interpret the bytes of a word as individual bytes
  • Therefore, one can cast an integer to a
    4-character string as in the code below
  • include ltstdio.hgt
  • int main()
  • struct
  • int x
  • char y
  • data
  • data.x 0x61626364
  • data.y '\0'
  • printf("--gt 's'\n",(char)data)

x
y
25
Byte ordering in action
  • On a Little Endian machine
  • include ltstdio.hgt
  • int main()
  • struct
  • int x
  • char y
  • data
  • data.x 0x61626364
  • data.y '\0'
  • printf("--gt 's'\n",(char)data)

d
c
b
a
\0
26
Byte ordering in action
  • On a Big Endian machine
  • include ltstdio.hgt
  • int main()
  • struct
  • int x
  • char y
  • data
  • data.x 0x61626364
  • data.y '\0'
  • printf("--gt 's'\n",(char)data)

a
b
c
d
\0
27
Byte ordering
  • Little Endian doesnt match normal ordering of
    characters in strings
  • Therefore, many people think that Little Endian
    is backwards
  • But note that its the way we interpret bits in a
    byte!!
  • Differences in Endianness causes problem when
    exchanging data among computers
  • IP defines network byte order as Big Endian
  • UNIX provides the htons() and ntohl() functions
    to enable easy conversions

28
Exercise 2.3
  • The value represented by the hexadecimal number
    434F 4D50 5554 4552 is to be stored in an aligned
    64-bit double word
  • Question a Using the physical arrangement of the
    first row in Figure 2.5, write the value to be
    stored using Big Endian Order

0 1 2 3 4 5 6 7
29
Exercise 2.3. Question a
  • Answer
  • Next, interpret each byte as an ASCII character
    and below each byte write the corresponding
    character, forming the character string as it
    would be stored in Big Endian order

43
4F
4D
50
55
54
45
52
ASCII codes in decimal
30
Exercise 2.3. Question b
  • Answer
  • Question b Same question but using Little Endian
    byte order

43
4F
4D
50
55
54
45
52
67
79
77
80
85
84
69
82
C
O
M
P
U
T
E
R
52
45
54
55
50
4D
4F
43
31
Exercise 2.3. Question b
  • Answer
  • Question b Same question but using Little Endian
    byte order

67
79
77
80
85
84
69
82
C
O
M
P
U
T
E
R
52
45
54
55
50
4D
4F
43
52
45
54
55
50
4D
4F
43
82
69
84
85
80
77
79
67
R
E
T
U
P
M
O
C
32
Exercise 2.3. Question c
  • Question c What are the hexadecimal values of
    all misaligned 2-byte words that can be read from
    the given 64-bit double word when stored in Big
    Endian byte order?

43
4F
4D
50
55
54
45
52
33
Exercise 2.3. Question c
  • Question c What are the hexadecimal values of
    all misaligned 2-byte words that can be read from
    the given 64-bit double word when stored in Big
    Endian byte order?

43
4F
4D
50
55
54
45
52
4F4D
5055
5445
34
Exercise 2.3. Question d
  • Question d What are the hexadecimal values of
    all misaligned 4-byte words that can be read from
    the given 64-bit double word when stored in
    Little Endian byte order?

52
45
54
55
50
4D
4F
43
35
Exercise 2.3. Question d
  • Question c What are the hexadecimal values of
    all misaligned 2-byte words that can be read from
    the given 64-bit double word when stored in
    Little Endian byte order?

5455504D
52
45
54
55
50
4D
4F
43
45545550
55504D4F
36
Exercise 2.3
  • This exercise is actually not very well
    formulated for the Little Endian scheme
  • The way in which the quantities are written and
    read matters, and the exercise does not say which
    way is used.
  • In this class we will just consider C-code
    fragment, so that everything is crystal clear.
  • Remember the C program I showed earlier
  • include ltstdio.hgt
  • int main()
  • struct
  • int x
  • char y
  • data
  • data.x 0x41424344
  • data.y '\0'
  • printf("--gt 's'\n",(char)data)

Big Endian abcd Little Endian dcba
37
Exercise 2.3
  • What happens if the same constant is stored as
    two short integers (i.e., 2-byte integers)?
  • include ltstdio.hgt
  • int main()
  • struct
  • short x1
  • short x2
  • char y
  • data
  • data.x1 0x4142
  • data.x2 0x4344
  • data.y '\0'
  • printf("--gt 's'\n",(char)data)
  • Little Endian badc !

38
Exercise 2.3
  • Whats happening here?
  • Bytes are ordered within the object that was
    stored from the program
  • Consider these two code fragments
  • int a
  • ((short )a)0x4142
  • ((short )a1) 0x4344
  • and
  • int a
  • a 0x41424344
  • On a Big Endian machine they are equivalent
  • One a Little Endian machine they are not

39
Exercise 2.3
  • Important The endianness also matters when data
    is being read into object larger than bytes
  • Example
  • struct
  • char x
  • char y
  • data
  • data.x 0x04
  • data.y 0x01
  • printf(--gt d\n,((short)data))
  • On a Big Endian machine
  • 0x0401 (decimal1025)
  • On a Little Endian machine
  • 0x0104 (decimal 260)

0A
01
40
In-class Exercise
  • Consider the following fragment of code
  • include ltstdio.hgt
  • int main()
  • struct
  • short x1
  • int x2
  • char y
  • data
  • data.x1 0x4142
  • data.x2 0x43444546
  • data.y '\0'
  • printf("--gt 's'\n",(char)data)
  • The output in Big Endian mode is abcdef, whats
    the output in Little Endian mode?

41
Answer
  • include ltstdio.hgt
  • int main()
  • struct
  • short x1
  • int x2
  • char y
  • data
  • data.x1 0x4142
  • data.x2 0x43444546
  • data.y '\0'
  • printf("--gt 's'\n",(char)data)

y
x2
x1
42
Answer
  • include ltstdio.hgt
  • int main()
  • struct
  • short x1
  • int x2
  • char y
  • data
  • data.x1 0x4142
  • data.x2 0x43444546
  • data.y '\0'
  • printf("--gt 's'\n",(char)data)

42
41
y
x2
x1
43
Answer
  • include ltstdio.hgt
  • int main()
  • struct
  • short x1
  • int x2
  • char y
  • data
  • data.x1 0x4142
  • data.x2 0x43444546
  • data.y '\0'
  • printf("--gt 's'\n",(char)data)

42
41
46
45
44
43
y
x2
x1
44
Answer
  • include ltstdio.hgt
  • int main()
  • struct
  • short x1
  • int x2
  • char y
  • data
  • data.x1 0x4142
  • data.x2 0x43444546
  • data.y '\0'
  • printf("--gt 's'\n",(char)data)

42
41
46
45
44
43
0
y
x2
x1
45
Answer
  • include ltstdio.hgt
  • int main()
  • struct
  • short x1
  • int x2
  • char y
  • data
  • data.x1 0x4142
  • data.x2 0x43444546
  • data.y '\0'
  • printf("--gt 's'\n",(char)data)

42
41
46
45
44
43
0
b
a
f
e
d
c
0
46
Addressing modes
  • Question how are addresses specified in the
    instructions of the ISA?
  • Example
  • Consider 2-operand ALUs
  • ADD R1, (R2)
  • take the value of register R2
  • interpret it as an address
  • fetch the memory cell at that address
  • add it to the value of register R1
  • store the result into register R1
  • This addressing mode is called Register
    indirect
  • It happens to be only one of MANY addressing
    modes that have been used in real-world computers
  • We will see a few of them and their associated
    C-like pseudo-code
  • Notation
  • Mem an array of memory cells
  • Regs an array of registers

47
Addressing Modes
48
Addressing Modes
  • Even more complex addressing modes

49
Addressing Modes
  • Some ISAs include all of these addressing modes
    VAX
  • Some ISAs include only a few MIPS
  • What is the trade-offs?
  • The more addressing modes the lower the IC
  • The programmer/compiler can use a single
    instruction to do several things at once
  • The more addressing modes the more complex the
    hardware
  • The cost of the system increases
  • The more addressing modes the higher the CPI
  • Some instructions will take many more clock
    cycles than others
  • We will see that nowadays this is considered a
    bad thing
  • (Note that VAXes are extinct)
  • Goal Find the right balance
  • Simple hardware
  • Enough addressing modes so that writing code
    isnt too painful

50
VAX Addressing Modes
Figure 2.7
  • Lesson Only a few addressing modes are heavily
    used
  • Immediate
  • Displacement

51
Displacement range
  • Add R4, 100(R1) // add R4 and Mem100R1
  • Question How big can the displacement be?
  • Trade-off
  • the bigger the maximum displacement, the wider
    the use of the instruction
  • the smaller the maximum displacement, the shorter
    the instruction encoding
  • instructions are encoded as bit strings
  • the displacement will be encoded as a binary
    number
  • the larger the maximum displacement, the larger
    the number of bits

Figure 2.8
52
Immediates
  • Add R4, 3 // add R4 and the number 3
  • Question How big can the immediate number be?
  • Trade-off
  • the bigger the maximum number, the wider the use
    of the instruction
  • the smaller the maximum number, the shorter the
    instruction encoding
  • instructions are encoded as bit strings
  • the maximum number will be encoded as a binary
    number
  • the larger the maximum number, the larger the
    number of bits

Figure 2.10
53
Type and Size of Operands
  • Question How is the type of an operand
    designated?
  • Typically types are encoded as part of the
    instruction code
  • The type generally gives the size of an operand
  • characters/bytes 8bits
  • half words / Unicode characters 16bits
  • words/single precision floats 32bits
  • double words / double precision floats 64bits
  • SPEC benchmark
  • Figure 2.12
  • indicates that it is important to support
    efficient access to 64-bit objects
  • 64-bit access path? (1 cycle)
  • 32-bit access path? (2 cycles)

54
X-bit architecture?
  • What do we mean when we say a X-bit architecture
  • A 32-bit architecture?
  • A 64-bit architecture?
  • Unfortunately, there is no clear definition
  • Can be used to describe an architecture in which
    integers, memory addresses (and sometimes other
    data units) are at most encoded with X bits
  • On a 32-bit architecture
  • 8-, 16-, and 32-bit integers
  • 32- and 64-bit floating point numbers
  • On a 64-bit architecture
  • 8-, 16, 32-, and 64-bit integers
  • 32- and 64-bit floating point numbers
  • Can be used to describe a CPU (or ALU) that uses
    X-bit registers
  • Can be used to describe a computer that has an
    X-bit wide external data bus

55
X-bit Architecture
  • In general when one says an X-bit architecture
    one refers to a system that can deal with X-bit
    chunks of integer data (integers and addresses)
    internally (registers) and externally (data
    busses)
  • Although a CPU may be X-bit internally, its
    external data bus or address bus may have a
    different size, either larger or smaller, and the
    term is often used to describe the size of these
    buses as well

Intel-Compatible Processors
56
Storing Integers
  • True binary representation
  • 1101 means 13
  • Problem with true binary representation
  • No way to encode negative numbers
  • Ones complement representation
  • The Most Significant Bit (MSB), i.e., the
    leftmost bit, indicates the sign of the number
  • 0 means positive
  • 1 means negative
  • If the bit string starts with 0, then just
    interpret the string as true binary
    representation
  • If the bit string starts with 1, then takes its
    complement (change all 0s to 1s and all 1s to 0s)
    and interpret the string as true binary
    representation
  • Example
  • 1101 means -2
  • 0010 means 2
  • It is symmetric as many positive numbers as
    negative numbers

57
Ones Complement
  • Problem with ones complement
  • There are 2 ways to represent zero (0000 and
    1111)
  • 0110 3
  • 1001 -3
  • 1111 -0
  • A machine using ones complement must recognize
    that the two zeros are identical
  • Option 1 deal with the two representations
  • All instructions that deal with zero must be
    augmented with two versions
  • Way too complicated and costly
  • Option 2 transform 1111 into 0000
  • This is done with a AND gate over all bits and an
    inversion of all bits when a string with all 1s
    is computed
  • Possible but slows down arithmetic operations
    (addition)
  • Computers havent used ones complement in
    decades (early Cray computers)

58
Twos Complement
  • Goal avoid the two zeros problem
  • Principle
  • Add one to the complement of a number to make it
    negative
  • Example
  • 01001 means 9
  • 10111 means -9
  • If we try to compute the negative of 00000
  • Take the complement 11111
  • Add 1 100000 (6-bit)
  • With overflow 00000
  • Addition
  • 0110 3
  • 1010 -3
  • (1) 0000 -0 (overflow)
  • Minor problem there is one more negative number
    than there are positive numbers
  • All computers today use twos complement

59
Storing Floating Point Numbers
  • IEEE Standard for Floating Point Arithmetic IEEE
    754
  • General Layout

exponent (e bits)
sign bit
mantissa (f bits)
  • Single precision (32-bit) exp8 bits,
    mantissa23 bits
  • Double precision (64-bit) exp11 bits,
    mantissa52 bits
  • Sign bit
  • 0positive, 1negative
  • Exponent
  • base2
  • both positive and negative exponent
  • Number /- mantissa x 2exponent

60
ISA Operations
  • What fundamental operations are provided by ISA?
  • There is a generally accepted taxonomy of
    operation types (Figure 2.15)

universal
varies
optional
61
Most popular instructions
  • Question Which instructions end up being used
    the most?
  • SPEC Benchmark on 80x86
  • (Figure 2.16)
  • Observation 10 simple instructions account for
    96 of all instructions
  • Lesson One should make sure that they go fast
    because they are the common case
  • Lesson2 It is dubious that its worth
    implementing many other sophisticated functions
  • Motivation to move from CISC to RISC (more on
    this later)

62
Control Flow
  • Control Flow instructions instructions that
    allow the program counter (PC) to jump to
    another instruction than the next instructions
  • The PC contains the address of the instruction
    being executed
  • By default, PCPC1 after executing an
    instruction
  • Terminology
  • 1950s transfer
  • 1960s branch
  • conditional branches (if test then goto label)
  • unconditional branches (goto label)
  • The book
  • jump when unconditional
  • branch when conditional
  • Four types of control flow instructions
  • Conditional branches
  • Jumps
  • Procedure calls
  • Procedure returns

63
Control Flow
  • Conditional branches dominates other control flow
    instructions
  • Figure 2.19 (SPEC benchmark)

64
Control Flow and Addressing
  • Question How does one specify the target
    address, i.e., the address of the instruction
    that needs to be executed next?
  • The target address must be encoded in the
    instruction
  • As bits in the instruction encoding
  • One exception returning from a procedure call
  • Reason the return address from a procedure call
    is not known at compile time
  • known at compile time The compiler can just
    look at the code and figure it out
  • example size of array x int x3
  • known only at runtime The compiler cannot just
    look at the code and figure it out.
  • Instead, one must wait until the program runs to
    figure it out.
  • But, the compiler can generate code to figure it
    out at runtime.
  • example size of array x int x x
    (int)calloc(1,nsizeof(int))
  • Question Why dont we know the returning address
    at compile-time?

65
Procedure returns
  • Reason why procedure return addresses are not
    known at compile-time

int main() int answer sscanf(d,answer)
if (d gt 0) foo1() else foo2()
void foo1() ... f() ...
void f() printf(hello\n) return
void foo2() ... f() ...
66
Control Flow and Addressing
  • Most common way to specify the address of a
    change in control flow relatively to the Program
    Counter (PC)
  • Called PC-relative
  • Advantages
  • The target is often near the control flow
    instruction, and the PC offset (2 in the example)
    can be encoded with few bits
  • The code can be loaded at any address and run
    correctly
  • position independence

... LOAD R1,0(R2) ADD R4, R1 JMP to PC2 ADD R3,
R6 LOAD R3, 0(R5) ...
PC
67
Non-PC-relative addressing
  • When the target address is not known at compile
    time, the PC-relative addressing is not
    applicable
  • Example procedure call return
  • Whats needed is a way to specify the target
    address at runtime
  • e.g., could be done by putting the target address
    in a register and then encoding the register in
    the control flow instructions encoding
  • Other examples of uses in which the target
    address is not known at compile-time
  • switch statements
  • virtual methods
  • function pointers
  • dynamically linked libraries

68
PC-relative branches
  • Question How far off are PC-relative branches
    from the PC?
  • Important to know to determine how many bits are
    needed to encode the offset in the instruction
  • Figure 2.20
  • Message gt8 bits is probably not necessary

69
Conditional Branches
  • Question How does one specify the condition on
    which branching occurs?
  • Three options (Figure 2.21)
  • Condition code
  • The ALU sets special bits based on the last
    operation it has executed
  • Drawback these bits can be overwritten
  • Condition register
  • A register contains the result of a comparison
  • Drawback uses up a register
  • Compare and branch
  • Do a comparison between a register and another
    register or a constant, and depending on the
    result branch or not
  • Drawback a lot of work for the instruction
  • Many comparisons are with 0, and many ISAs
    provide special operations for testing/comparing
    to 0.

70
Conditional Branches
  • Figure 2.22

71
Procedure Invocations
  • What happens when a procedure is called?
  • Caller procedure placing the call
  • Callee procedure being called
  • Sequence of actions
  • The context of the caller is saved
  • return address (sometimes to a special register)
  • registers
  • The callee is executed in a new context
  • fresh registers
  • The callee returns
  • to the address saved earlier
  • The callers context is restored
  • old register values are copied back into registers

72
Procedure Invocations
  • How does all this happen?
  • Answer the compiler just generates code (i.e.,
    assembly code) that deals with all the logistics
  • Generates loads and stores so that the contexts
    are managed correctly
  • Example
  • reserve a zone of memory for register saving
  • every time a procedure is called, save ALL
    register values into that zone
  • every time a procedure returns, load ALL register
    values from that zone
  • create a stack of such zones for subsequent
    procedure calls
  • This approach is naive and inefficient
  • Compilers go to great lengths to eliminate
    unneeded loads and stores during procedure calls
  • When you write the compiler you have tons of
    options for the assembly code you generate

73
Encoding an ISA
  • We have mention the notion of binary encoding of
    instructions of the ISA
  • Look at the specification of the ISA
  • For each possible instruction come up with a
    binary string that represents it (called the
    opcode)
  • Represent operands via binary strings as well
  • Then build the decoding/execution of the
    instructions in hardware (with gates and other
    ICS331 things)
  • A key concern how to encode addressing modes?
  • Some (older) ISAs have instructions with many
    operands and many addressing modes
  • In this case, one encodes an address specifier
    for each operand
  • Just come up with a binary representation of the
    addressing modes in Figure 2.6
  • Some (more recent) ISAs are so-called
    load-store architectures, with only one memory
    operand and only one or two addressing modes
  • everything can be encoded easily as part of the
    instruction

74
Encoding an ISA
  • Three important competing forces
  • 1 The desire to have as many registers and
    addressing modes as possible, which makes
    instruction encode longer
  • 2The desire to reduce the average instruction
    code size, and thus on average program size
  • 3 The desire to have instructions that are
    easy/fast to decode
  • in multiple of bytes, not arbitrary bit lengths
  • fixed-length instructions are the easiest
    although they may preclude some optimizations
  • Lets look at how some popular ISAs encode their
    instructions
  • Variable format (force 1 and 2)
  • Fixed format (force 3)
  • Hybrid format (an attempt at a good balance)

75
ISA encoding
Figure 2.23
76
ISA Encoding
  • Variable (vs. Fixed)
  • Since the instruction format is variable,
    instruction codes can just be as big as needed
  • Much less extra padding just to comply to a fixed
    format when in fact not all bits are used
  • All instructions look different and decoding can
    be more time consuming
  • Question What do real computers do?

77
CISC vs. RISC
  • CISC Complex Instruction Set Computer
  • Each instruction is complex, meaning that it
    can do many things at once
  • e.g., load something from memory, perform an
    arithmetic operation, and store something to
    memory, all in the same instruction
  • The idea was to bridge the gap between high-level
    programming languages and the machine
  • Assembly code was close(r) to a high-level
    programming language
  • Many machines were designed in the pre-compiler
    age
  • RISC Reduced Instruction Set Computer
  • Came after CISC systems, and motivated by several
    observations
  • Most programs used only a small fraction of the
    instructions in the ISAs, leaving the most
    complex ones out
  • The most complex ones were thus slower because of
    the make the common case fast principle
    employed by the designers!
  • Complex instructions were difficult to decode
  • Complex instructions took many clock cycles to
    execute
  • Registers became cheaper and using them to hold
    many temporary values was becoming conceivable
  • Compilers were better at using simpler
    instructions
  • The idea of pipelining became prevalent (see
    upcoming lecture)

78
CISC vs. RISC
  • Fallacy Reduced Instruction Set doesnt mean
    that there are fewer instructions in a RISC ISA
    than in a CISC ISA
  • It just means that the instructions are all
    simple
  • The key philosophy of RISC, which differs from
    CISC is
  • Do operations in registers
  • which are all identical and not scarce
  • Use load and store to communicate between
    registers and memory
  • using simple addressing modes
  • Code is implemented as a (much longer) series of
    these simple operations
  • Therefore, many people prefer the term
    load-store architecture
  • Note that the label CISC was given to systems
    after the fact, so as to differentiate them from
    RISC
  • By the late 1980s RISCs were outperforming most
    CISCs
  • Transistors saved by implementing simpler
    instructions can be used for other things
  • Rule of Thumb At the same number of transistors,
    a RISC system is faster than a CISC system
  • Note that RISC is not a good idea when code-size
    is an issue
  • Almost never the case for desktop and servers
  • Definitely an issue for current embedded systems

79
Back to ISA Encoding
  • CISC machines use the variable encoding
  • RISC machines use the fixed format

80
The x86 ISA
  • Fallacy Today, most computers are RISC
  • The x86 ISA is the most widely use ISA today
  • for instance, all Pentium processors
  • The x86 is a CISC architecture
  • x86 early history
  • 1978 Intel creates the 8086 processor (16-bit)
  • Sort of like an accumulator machine because
    registers were not general purpose
  • 1980 Intel 8087
  • Pretty much a stack architecture
  • 1982 80286 (24-bit)
  • New instructions
  • backward compatible (in a special mode) with the
    8086
  • 1985 80386 (32-bit)
  • New addressing modes and instructions
  • backward compatible (in a special mode) with the
    8086
  • The concern for backward compatibility due to an
    existing software base kept each step
    incremental without true architectural changes

81
The x86 ISA
  • x86 more recent history
  • 1989 80486, 1992 Pentium, 1995 P6
  • aimed at higher performance
  • only tiny additions to the ISA
  • 1997 MMX Extension
  • 57 new instructions to perform operations on
    narrow data types (8-bit, 16-bit, 32-bit) in
    parallel
  • used for multi-media
  • we will talk more about this
  • 1999 SEE Extension (in the Pentium III)
  • 70 new instructions
  • 4 32-bit floating-point operations in parallel
  • new cache prefetch instructions
  • 2001 SSE2 Extension
  • 144 new instructions
  • Basically MMX and SSE instructions but for 64-bit
    floating-point numbers

82
The x86 ISA
  • Bottom line
  • The x86 ISA is not orthogonal
  • i.e., there are many special cases and exceptions
    to rules
  • Mastering ways in which to determine which
    registers and which addressing modes are
    available for a given task is difficult
  • The ugliness of it all is that is stems from
    the antiquated 8086 processor, with pieces glued
    on
  • How come it was successful?
  • Intel had 16-bit processors before everybody else
  • Much more elegant architectures got there a bit
    later (Motorola 6800)
  • BUT, this head start led to the selection of the
    8086 for the IBM PC
  • Similar phenomenon with FORTRAN, for instance
  • How come its still successful?
  • The x86 ISA is not too difficult to implement
  • After all, Intel improved performance steadily
    over the years
  • easy for integer programs
  • more problematic for floating point programs
  • But what about this RISC idea?
  • Current x86 processor decode x86 instructions
    into smaller instructions (called micro-ops)
    which are then executed by the RISC-like
    architecture
  • Goal performance, but still expose the
    familiar ISA to the programmer
  • See Appendix D for more details on x86

83
IA64
  • New ISA developed by Intel and HP
  • Not to mistake with the IA32, which is basically
    a 32-bit x86 ISA
  • Not much in common with the x86 ISA
  • Use in the Intel Itanium processor family
  • Places its main emphasis on Instruction Level
    Parallelism (ILP)
  • See upcoming lecture on ILP
  • Is is RISC?
  • Supposed to be a post-RISC era ISA
  • Called Explicit Parallel Instruction Computing
    (EPIC)
  • But it borrows many RISC concepts
  • some say calling is a new name is a gimmick
  • Well talk at length about related issues in the
    ILP lecture

84
Some Historical Perspective
  • Earliest computers Accumulator-based
  • The only feasible approach at times when hardware
    was incredibly bulky and expensive
  • Stack and register architectures fought it out
    until the late 1970s, with the register
    architecture winning in the end
  • Architecture and programming languages
  • VAX complex instructions to have a better
    mapping between programming language and assembly
    language
  • In the 1980s the trend went towards RISC
    architectures
  • memory is cheap and code size is no longer a
    concern
  • compiler technology has improved and almost
    nobody writes assembly by hand any longer
  • RISC was proven to be much faster and cheaper to
    manufacture (Figure 2.41).
  • More details in Section 2.16

85
In-class Exercise
  • Exercise 2.4

86
Exercise 2.4
  • You task is to compare the memory efficiency of
    four different styles of instruction set
    architectures
  • Accumulator All operations occur between a
    single register and a memory location
  • Memory-memory All instruction addresses
    reference only memory locations
  • Stack All operations occur on top of the stack.
    Push and pop are the only instructions that
    access memory all others remove their operands
    from the stack and replace them with the result.
    Only the top two stack entries are kept near the
    processor (with circuitry). Lower stack positions
    are kept in memory locations, and accesses to
    these stack positions require memory references
  • Load-Store All operations occur in registers,
    and register-to-register instruction have three
    register names per instruction.

87
Exercise 2.4
  • Make the following assumptions
  • All instructions are an integral number of bytes
    in length
  • The opcode is always 1 byte
  • Memory accesses use direct addressing
  • A, B, C, and D are initially in memory
  • Question a Invent your own assembly language
    mnemonics (Figure 2.2), and for each architecture
    write the best equivalent assembly language code
    for
  • A B C
  • B A C
  • D A - B
  • Accumulator Memory-Memory Stack Load-Store
  • Load A Add C, A, B Push A Load R1, A
  • Add B Push B Load R2, B
  • Store C Add Add R3, R1, R2
  • Pop C Store R3, C
  • First architecture Accumulator

88
Exercise 2.4. Question a
  • Code for the accumulator architecture
  • Load B Acc ? B
  • Add C Acc ?Acc C
  • Store A A ? Acc
  • Add C Acc ? Acc C
  • Store B B ? Acc
  • Negate Acc ? -Acc
  • Add A Acc ? Acc A
  • Store D D ? Acc
  • Next memory-memory architecture

89
Exercise 2.4. Question a
  • Code for the memory-memory architecture
  • Add A, B, C
  • Add B, A, C
  • Sub D, A, B
  • Next stack architecture

90
Exercise 2.4. Question a
  • Code for the stack architecture
  • Push B
  • Push C
  • Add
  • Pop A
  • Push A
  • Push C
  • Add
  • Pop B
  • Push A
  • Push B
  • Sub
  • Pop D
  • Next load-store architecture

91
Exercise 2.4. Question a
  • Code for the load-store architecture
  • Load R1, B R1 ? B
  • Load R2, C R2 ? C
  • Add R3, R1, R2 R3 ? R1 R2
  • Store R3, A A ? R3
  • Add R4, R3, R2 R4 ? R3 R2
  • Store R4, B B ? R4
  • Sub R4, R3, R4 R4 ? R3 - R4
  • Store R4, D D ? R4
  • Lets skip Question b and go directly to Question
    c

92
Exercise 2.4. Question c
  • Assume the given code sequence is from a small,
    embedded computer application, such as a
    microwave oven controller, that uses 16-bit
    memory addresses and data operands. (A load-store
    architecture for this system would use 16-bit
    registers.) For each architecture answer
  • How many instruction bytes are fetched?
  • How many bytes of data are transferred from/to
    memory?

93
Exercise 2.4. Question c
  • First Accumulator architecture
  • How many instruction bytes are fetched?
  • How many bytes of data are transferred from/to
    memory?
  • Load B
  • Add C
  • Store A
  • Add C
  • Store B
  • Negate
  • Add A
  • Store D

Remember that opcodes are 8-bit and
data/addresses are 16-bit
94
Exercise 2.4. Question c
  • First Accumulator architecture
  • How many instruction bytes are fetched?
  • How many bytes of data are transferred from/to
    memory?
  • Load B I 123 D 2
  • Add C I 123 D 2
  • Store A I 123 D 2
  • Add C I 123 D 2
  • Store B I 123 D 2
  • Negate I 1 D 0
  • Add A I 123 D 2
  • Store D I 123 D 2
  • -------------------
  • I 22 D 14 36

95
Exercise 2.4. Question c
  • Second Memory-memory architecture
  • How many instruction bytes are fetched?
  • How many bytes of data are transferred from/to
    memory?
  • Add A, B, C
  • Add B, A, C
  • Sub D, A, B

Remember that opcodes are 8-bit and
data/addresses are 16-bit
96
Exercise 2.4. Question c
  • Second Memory-memory architecture
  • How many instruction bytes are fetched?
  • How many bytes of data are transferred from/to
    memory?
  • Add A, B, C I 16 D 6
  • Add B, A, C I 16 D 6
  • Sub D, A, B I 16 D 6
  • ---------------
  • I 21 D 18 39

97
Exercise 2.4. Question c
  • Third Stack architecture
  • How many instruction bytes are fetched?
  • How many bytes of data are transferred from/to
    memory?
  • Push B
  • Push C
  • Add
  • Pop A
  • Push A
  • Push C
  • Add
  • Pop B
  • Push A
  • Push B
  • Sub
  • Pop D

Remember that opcodes are 8-bit and
data/addresses are 16-bit Remember that the
stack is not empty when the code starts executing
and in fact has most likely more than two
elements in it.
98
Exercise 2.4. Question c
Write a Comment
User Comments (0)