Title: Lecture 3: Instruction Sets
1Lecture 3 Instruction Sets
- Section 1.3, Sections 2.1-2.8
- Technology trends
- Design issues in defining an instruction set
- Register and memory access
- Instruction and operand types
2Processor Technology Trends
- Shrinking of transistor sizes 250nm (1997) ?
- 130nm (2002) ? 70nm (2008) ? 35nm (2014)
- Transistor density increases by 35 per year and
die size - increases by 10-20 per year functionality
improvements! - Transistor speed improves linearly with size
(complex - equation involving voltages, resistances,
capacitances) - clock speed improvements!
- Wire delays do not scale down at the same rate
as logic - delays the Pentium 4 has pipeline stages for
wire delays
3Technology Trends
- DRAM density increases by 40-60 per year,
latency has - reduced by 33 in 10 years (the memory wall!),
bandwidth - improves twice as fast as latency decreases
- Disk density improves by 100 every year,
latency - improvement similar to DRAM
- Networks primary focus on bandwidth 10Mb ?
100Mb - in 10 years 100Mb ? 1Gb in 5 years
4Power Consumption Trends
- Dyn power a activity x capacitance x voltage2
x frequency - Capacitance per transistor and voltage are
decreasing, - but number of transistors and frequency are
increasing at - a faster rate
- Leakage power is also rising and will soon match
dynamic - power
- Power consumption is already between 100-150W in
- high-performance processors today
5Notable Points
- Complexity-effective design is important a
complex design - takes longer to build, verify, and consumes
more power - Dont forget about software cost while
evaluating a systems - cost-performance
- Similarly, power-performance of a single
component is - misleading
- Cant use CPI or IPC while comparing different
ISAs - Dont rely on peak performance metrics or on
results - obtained with synthetic benchmarks
6The Effect of Clock Speed
- Even with the same instruction set, performance
does - not closely track clock speed depends on the
- benchmark set and processor functionalities
- Even within the same processor family,
performance - improvements are slower than clock speed
improvements
7ISAs for Different Segments
- Instruction sets for all three segments are very
similar - Desktops equal emphasis for int and fp, little
regard for - code size and power
- Servers little need for high floating-point
performance - Embedded emphasis on low cost and power code
size - is important, floating-point may be optional
- Desktops and embedded also care about multimedia
apps - -- hence, use special media extension
instructions
8RISC Vs. CISC
- Complex Instruction Set Computer if you do it
in hardware, - its fast ? hence, implement every
functionality in hardware - rich instruction set
- complex decoding
- complex analysis to identify dependences
- Reduced Instruction Set Computer by using a few
simple - instruction primitives, the hardware is simpler
- easy to extract parallelism
- easy to effect high clock speeds
- x86 is CISC and is popular for compatibility
reasons CISC - instrs are converted to RISC instrs in hardware
9Accessing Internal Storage
- Implicit or explicit operands? compact or
flexible? - Representing C A B
- Stack Accumulator Reg (reg-mem)
Reg (load-store) - Push A Load A Load R1, A
Load R1, A - Push B Add B Add R3, R1, B
Load R2, B - Add Store C Store R3, C
Add R3, R1, R2 - Pop C
Store R3, C - Registers fast, exploit locality, reduced
memory traffic, - easier to re-order
10Register Architectures
Type Advantages Disadvantages Examples
Register-Register (0 mem, 3 ops) Simple, fixed-length, simple code-generation, easy pipelining and parallelism extraction High instr count and code size Alpha, MIPS, ARM, PowerPC, SPARC
Register-Memory (1 mem, 2 ops) Can access data without doing a load, small code size One of the operands is destroyed, instr latency is variable Intel 80x86, Motorola 68000
Memory-Memory (2 mem, 2 ops) or (3, 3) Most compact code size, doesnt waste registers Variation in instr size (hard to decode), frequent memory accesses, variable instr latency VAX
11Addressing Modes for Memory
Addressing mode Example instr Meaning
Register Add R4, R3 RegsR4 ? RegsR4 RegsR3
Immediate Add R4, 3 RegsR4 ? RegsR4 3
Displacement Add R4, 100(R1) RegsR4 ? RegsR4 Mem100RegsR1
Register indirect Add R4, (R1) RegsR4 ? RegsR4 MemRegsR1
Direct/absolute Add R1,(1001) RegsR1 ? RegsR1 Mem1001
Memory indirect Add R1, _at_(R3) RegsR1 ? RegsR1 MemMemRegsR3
- More addressing modes ? low instr counts, more
complexity (CISC-like) - Most common modes immediate and displacement
- Displacement and immediate values often require
fewer than 8 bits, but also - often require 16 bits
12Interpreting Memory Addresses
- Most computers are byte addressed and also allow
access - to half words (16 bits), words (32), and double
words (64) - Accesses are usually required to be aligned a
half word - can not have an odd address, a double word must
have an - address A, where A mod 8 0, etc.
- Misalignment increases hardware complexity and
worsens - performance (if data cross cache line
boundaries)
13Little and Big Endian
- Consider a 64-bit quantity, composed of bytes
0-7 (LSB-MSB) - In Little-Endian format, memory address A will
contain byte 0, - address A1 will contain byte 1,.address A7
will contain - byte 7
- Advantage easier to organize bytes, half-words,
words, - double words, etc. into registers (Alpha, x86)
- In Big-Endian format, memory address A will
contain byte 7, - address A1 will contain byte 6, address A7
will contain - byte 0
- Advantage values are stored in the order they
are - printed out, the sign is available early
(Motorola)
14Endianness Example
- Consider the hexadecimal number
- MSB 0x 43fa27c77156ab91 LSB
- Two options
- 43fa27c77156ab91
- address 7 6 5 4 3 2 1 0
- 91ab5671c727fa43
15Endianness Example
- Consider the hexadecimal number
- MSB 0x 43fa27c77156ab91 LSB
- Two options
- 43fa27c77156ab91
- address 7 6 5 4 3 2 1 0
- 91ab5671c727fa43
Little-endian
Big-endian
16Common Operations
Operator Type Examples
Arithmetic/Logical Add, sub, and, or, mult, div
Data transfer Loads/stores
Control Branch, jump, call, return
System OS call, virtual memory management
Floating point FP add, sub, mult, div
Decimal Decimal add, sub, mult, decimal to character conversions
String Move, compare, search
Graphics Compression/decompression, vertex/pixel ops
17Common Operations
80x86 instruction Integer average ( total executed)
Load 22
Conditional branch 20
Compare 16
Store 12
Add 8
And 6
Sub 5
Move register-register 4
Call/Return 2
18Title