Title: ECE6130: Computer Architecture: Instruction Set Architecture
1ECE6130 Computer ArchitectureInstruction Set
Architecture
- Dr. Xubin He
- http//iweb.tntech.edu/hexb
- Email hexb_at_tntech.edu
- Tel 931-3723462, Brown Hall 319
2- Previous Class
- Fundamentals of computer design
- Today
- Instruction Set Architecture
3Summary of Chapter 1
- Quantitative principles
- Technology trends
- Cost trends
- Measure, report and summarize performance
4Summary of Chapter 1
- Time to run the task
- Execution time, response time, latency, CPU time
- Tasks per day, hour, week, sec, ns,
- Throughput, bandwidth
- X is n times faster than Y means
- ExTime(Y) Performance(X)
- --------------- -------------------- n
- ExTime(X) Performance(Y)
-
5Summary of Chapter 1
- Amdahls Law
- Iron CPI Law
- Execution time is the REAL measure of computer
performance!
6ISA Outline
- ISA principles and classifications
- CISC and RISC
- Addressing modes
- Operations in ISA (control flow instructions)
- Compiler for ISA and ISA on Compiler
7Instruction Set Architecture (ISA)
software
instruction set
hardware
8Evolution of Instruction Sets
- Major advances in computer architecture are
typically associated with landmark instruction
set designs - Ex Stack vs GPR (System 360)
- Design decisions must take into account
- technology
- machine organization
- programming languages
- compiler technology
- operating systems
- And they in turn influence these
9Instruction Set Principles
- ISA should reflect application characteristics
- Desktop computing is compute-intensive, thus
focusing on features favoring Integer and FP ops - Server computing is data-intensive, focusing on
integers and char-strings (yet FP ops are still
standard in them) - Embedded computing is time-sensitive, memory and
power concern, thus focusing on code-density,
real-time and media data streams.
10Classifying ISA
- Taxonomy of ISA
- Stack both operands are implicit on the top of
the stack, a data structure in which items are
accessed in a last in, first out fashion. - Accumulator one operand is implicit in the
accumulator, a special-purpose register. - General Purpose Register all operands are
explicit in specified registers or memory
locations. Depending on where operands are
specified and stored, there are three different
ISA groups - Register-Memory one operand in register and one
in memory.Examples IBM 360/370, Intel 80x86
family, Mototola 68000 - Memory-Memory both operands are in memory.
Example VAX. - Register-Register (load store) all operands,
except for those in load and store instructions,
are in registers. Examples SPARC (Sun
Microsystems), MIPS, Precision Architecture (HP),
PowerPC (IBM), Alpha (DEC).
11Instruction Set Principles
C?AB
(a) Stack
(d) Reg-Reg/Load-Store
(e) Memory-Memory
(b) Accumulator
(c) Register-Memory
TOS
Reg. Set
Reg. Set
Stack
Accumulator
ALU
ALU
ALU
ALU
ALU
Memory
Memory
Memory
Memory
Memory
Push A Push B Add Pop C
Load A Add B Store C
Load R1,A Add R3,R1,B Store R3,C
Load R1,A Load R2,B Add R3,R1,R2 Store R3,C
Add C,A,B
12Instruction Set Principles
13ISA Principles about Pipelining
- Register-Register RISC
- Load-store machine with no memory references
- Easy to pipeline, higher IC (why?)
- Register-Memory IBM 360, Intel 80x86
- Harder to pipeline but reduce IC
- Memory-Memory VAX
- Hardest to pipeline, most compact IC
14Evolution of Instruction Sets
Single Accumulator (EDSAC 1950)
Accumulator Index Registers
(Manchester Mark I, IBM 700 series 1953)
Separation of Programming Model from
Implementation
High-level Language Based
Concept of a Family
(B5000 1963)
(IBM 360 1964)
General Purpose Register Machines
Complex Instruction Sets
Load/Store Architecture
(CDC 6600, Cray 1 1963-76)
(VAX, Intel 432 1977-80)
RISC
(Mips,Sparc,HP-PA,IBM RS6000, . . .1987)
Post-RISC(Intel Petium3,41998,2001)
15RISC/CISC
- Complex Instruction Set Computer
- Intel x86
- DEC VAX, PDP11
- Motorola 68k
- IBM 360, 370
- Complex instructions bring the hardware closer to
high-level languages - Memory was expensive
- Fewer, more powerful instructions
- Smaller programs
- More space for data
16CISC - ISA
- Instruction type count
- Usually almost 256
- Maximum number of 8-bit opcodes!
- Powerful instructions
- Many microcode steps
- Multiple cycle latency
- Faster in microcode than users program!
- Added some complexity to interrupt handling,
page faulting, etc - Instructions too long to be uninterruptible!
- Variable instruction length, multiple formats
- 1 to 10 bytes
17CISC - ISA critique
- Studies of compilers showed
- Many instructions unused
- DEC even dropped an indexed memory access,
post-decrement y xi-- from the ISA going
from PDP -gt VAX! - Compiler writers were sometimes simply not using
complex instructions when they were appropriate - because they could write faster sequences of
simple instructions for the most common cases!
18CISC - the death knell
- Irrespective of its performance ...
- Complex hardware is expensive!
- Speed improvements
- Change microcode to hard-wired control
- Irregularity of instruction OP format, length
(long design times) - Long lead times to market
19CISC - the death knell
- A prediction (qualified!)
- x86 probably the last major CISC architecture
- and it probably would have died a well-deserved
death a few years ago if it hadnt been for Bill
Gates - Its too expensive to design another one!
- Even Intel are abandoning it in favor of a RISC
architecture - Joint Intel/HP project
- If they cant afford a 786 design,nobody can!
20Reduced Instruction Set Computer
- also Load-store architecture
- Only a few instruction types
- No memory-memory instructions
- Data loaded to registers
- lw 3, 0(2)
- Data stored from registers
- st 4,40(5)
- Arithmetic, logical, etc operations are all
- Register -gt Register
- Mostly 3-operand type
- op dest_reg, src_regA, src_regB
- Mostly 1-cycle in ALU
- Throughput 1 instruction/cycle
21RISC
- Simplicity of RISC instructions
- permits high clock rates
- long-latency ALU instructions are divided further
as necessary - super-pipelined processors
- MIPS R4000 8-stage pipeline
- All instructions are 32-bits
- Simplifies
- pre-fetch,
- superscalar issue,
- branch management
22RISC - Simple Hardware, Complex Compiler
- Basic hardware is simple
- and hard-wired
- ie no microcode
- but
- Pipeline stalls can reduce throughput
- Optimizing Compiler needed
- Fully exploit capabilities
- Dependence Analysis
- Instruction re-ordering
- Avoid pipeline stalls
- Instruction grouping in super-scalar, VLIW
23Post-RISC Era
- RISC was not a specific technology as much as it
was a design strategy that developed in reaction
to a particular school of thought in computer
design. It was a rebellion against prevailing
norms--norms that no longer prevail in today's
world. - David Ditzel, the chief architect of Sun SPARC
family and CEO of Transmeta - "Today in RISC we have large design teams
and long design cycles," he said. "The
performance story is also much less clear now.
The die sizes are no longer small. It just
doesn't seem to make as much sense." The result
is the current crop of complex RISC chips.
"Superscalar and out-of-order execution are the
biggest problem areas that have impeded
performance leaps," Ditzel said. "The MIPS
R10,000 and HP PA-8000 seem much more complex to
me than today's standard CISC architecture, which
is the Pentium II. So where is the advantage of
RISC, if the chips aren't as simple anymore?" - "CISC" was invented retroactively as a catch-all
term for the type of thinking against which RISC
was a reaction. - We now live in a "post-RISC" world, where the
terms RISC and CISC have lost their relevance
(except to marketing departments and platform
advocates). In a post-RISC world, each
architecture and implementation must be judged on
its own merits, and not in terms of a narrow,
bipolar, compartmentalized worldview that tries
to cram all designs into one of two "camps. - http//www.arstechnica.com/cpu/4q99/risc-cisc/rvc-
1.html (Hannibal)
24Memory Addressing
- Addressing Memory how to specify and interpret
memory address is important since all data are
initially in the memory. - Interpreting Memory Addresses
- All computers, except DSPs, are byte-addressed,
providing access for bytes, half-words (2 bytes),
words (4 bytes), and double words (8 bytes) - Ordering bytes within a larger object 4 bytes in
a word - Little Endian
- Big Endian
- Alignment of bytes
- an access to an object of size s bytes at byte
address A is aligned if A mod s 0. - Memory is aligned on a multiple of a word or
double-word boundary - In general
- object larger than 1 byte must be aligned to
reduce HW complexity
25(No Transcript)
26Addressing Mode
- How architectures specify an address of an object
to be accessed - Addressing mode can specify
- Constant
- Register
- Location in memory (effective address)
- See figure B.6
- PC-relative addressing a displacement mode using
PC as register and used primarily for specifying
code addresses in branches - Complex addressing mode
- Reduce instruction counts a lot
- Add complexity (may increase the cycle time)
- Increase average CPI
- Know what modes to include is important
27Figure B.6 addressing modes
28What Addressing Modes are Common?
- Measure VAX using three SPEC89, which supports
all Addressing modes in figure B.6 - Immediate and displacement dominate (not
including PC-relative and register) - See figure B.7
29Figure B.7 summary of memory addressing mode
30Immediate or Literal AM
- Use of immediate
- Arithmetic operations
- Comparisons (primarily for branches)
- Moves where a constant is wanted in a register
- Constants written in code (small)
- Add reg , 2
- Address constants (large)
- Need to know whether they need to be supported by
all Ops or for only a subset
31Frequency of Immediate for Different Instructions
- Very high in ALU and comparisons gt many are 0
- See figure B.9
- Data taken on Alpha architecture with full
optimization for SPEC CPU 2000, integet CINT2000
and FP CFP2000
32Figure B.9 CINT2000 in SPEC CPU 2000
33Range of Immediate Values
- Range affects instruction length
- Small values are most common
- Large values sometimes address calculations
- See figure B.10
- Data taken on Alpha architecture with full
optimization for SPEC CPU 2000, integet CINT2000
and FP CFP2000
34Figure B.10 distributions of immediate values
35Displacement Addressing Mode
- What ranges Displacement used?
- Various displacement sizes determine the length
of the displacement field -gt length of
instructions - Displacement for data accesses (not branches)
- See figure B.8
36Figure B.8 displacement values
37Memory Addressing Summary
- Use a GPR machine a reg-reg one
- A new ISA at least support the following
addressing modes - Displacement
- Immediate 75-99 of AM
- Reg indirect
- Size of the address for displacement modes to be
at least 12-16 bits gt 75-99 of displacements - Immediate field is at least 8-16 bits gt
capture 50-80 of immediates