Title: Instruction Set Architectures: RISC, CISC,
1Instruction Set Architectures RISC, CISC,
64-bit Processors
2CISC (Complex Instruction Set Computers)
3The Rationale for CISC
- One of the most visible forms of evolution
associated with computers is that of programming
languages - As the cost of hardware has dropped, the relative
cost of software has risen. - Complexity of modern software has increased the
prevalence of faults (bugs). - Thus, the major cost in the lifecycle of a system
is software, not hardware.
4The Rationale for CISC
- The response from researchers and industry has
been to develop ever more powerful and complex
high-level languages. - These high-level languages (HLL) allow the
programmer to express algorithms more concisely,
take care of much of the detail, and naturally
support structured programming and
object-oriented design. - This solution gave rise to another problem, known
as the semantic gap. This is the difference
between the operations provided in HLLs and those
provided in computer architecture.
5The Rationale for CISC
- Symptoms of this gap include
- Execution inefficiency
- Excessive program size
- Compiler complexity
- Designers responded with architectures intended
to close this gap. Key feature include - Large instruction sets
- Dozens of addressing modes
- Various HLL statements implemented in hardware.
6The Rationale for CISC
- Such complex instruction sets are intended to
- Ease the task of the compiler writer
- Improve execution efficiency, because complex
sequences of operations can be implemented in
microcode - Provide support for even more complex and
sophisticated HLLs.
7Motivations for CISC
- Compiler simplification.
- The task of the compiler writer is to generate
machine instructions for each HLL statement. If
there are machine instructions that resemble HLL
statements, this task is simplified. - This reasoning has been disputed by RISC
researchers. They have found that CISC
instructions are often hard to exploit because
the compiler must find those cases that exactly
fit the construct. - The task of optimizing the generated code to
minimize code size, reduce instruction execution
count, and enhance pipelining is much more
difficult with a complex instruction set. - Most of the instructions in a compiled program
are the relatively simple ones.
8Motivations for CISC
- Smaller programs.
- Because the program takes up less memory, there
is a savings in that resource. Memory today is
relatively inexpensive, so this advantage is no
longer compelling. - Smaller programs should improve performance.
This will happen in two ways - Fewer instructions means fewer instruction bytes
to be fetched. - In a paging environment, smaller program occupy
fewer pages, reducing page faults. - The problem with this line of reasoning is that
it is not obvious that a CISC program will be
smaller than a corresponding RISC program. In
many cases, the CISC program, expressed a in
symbolic machine language, may be shorter (i.e.
fewer instructions) but the number of bits of
memory occupied may not be noticeably smaller.
9Motivations for CISC
- Improved performance.
- It seems to make sense that a complex HLL
operation will execute more quickly as a single
machine instruction than as a set of more
primitive instructions. - Because of the bias toward the simpler
instructions, this may not be so. - The entire control unit must be made more
complex, and/or the microprogram control store
must be made larger to accommodate a richer
instruction set. Both of these factors increase
the execution time of the simple instructions.
10Motivations for CISC
- It is far from clear that CISC is the appropriate
solution. This has led a number of groups to
pursue the opposite path.
11RISC (Reduced Instruction Set Computers)
12The Rationale for RISC
- Meanwhile, a number of studies have been done to
determine the characteristics and patterns of
execution of machine instructions generated from
HLL programs. - The results of these studies inspired some
researchers to look for a different approach. - Namely, to make the architecture that supports
the HLL simpler, rather than more complex.
13RISC
- RISC systems have been defined and designed in a
variety of ways, the key elements shared by most
designs are - A limited and simple instruction set.
- A large number of general-purpose registers, and
the use of compiler technology to optimize
register usage. - An emphasis on optimizing the instruction
pipeline.
14Characteristics of RISC Architectures
- Although there are a variety of approaches taken
to RISC architectures, certain characteristics
are common to all of them - One instruction per cycle RISC machine
instructions comprise only one cycle of fetch,
execute, store. With simple, one-cycle
instructions, there is no need for microcode (as
in CISC) machine instructions can be hardwired.
Such instructions should execute faster than
comparable machine instructions on CISC machines,
as it is not necessary to access a microprogram
control store. - Register-to register operation If most register
operations are register-to-register, this
simplifies the instruction set and therefore the
control unit. For example, a RISC instruction
set may only include one or two ADD instructions
the VAX has 25 different ADD instructions. This
also encourages the optimization of register use.
15Characteristics of RISC Architectures
- Simple addressing modes Almost all RISC
instructions use simple register addressing.
Complex addressing modes can be synthesized in
software from simple ones. Again, this design
feature simplifies the instruction set and the
control unit. - Simple instruction formats - Generally, only one
or a few formats are used. Instruction length is
fixed and aligned on word boundaries. Field
locations, especially the opcode, are fixed.
This generates a number of benefits - With fixed fields, opcode decoding and register
operand accessing can occur simultaneously. - Simplified formats simplify the control unit.
- Instruction fetching is optimized because
word-length units are fetched. - Alignment on word boundary also means that a
single instruction does not cross page
boundaries.
16Potential Benefits of RISC
- These characteristics can be assessed to
determine the potential benefits of RISC. These
benefits fall into two main categories
performance and VLSI implementation.
17Performance
- More effective optimizing compilers can be
developed. With more primitive instructions,
there are more opportunities for moving functions
out of loops, reorganizing code for efficiency,
maximizing register utilization, etc. - With simple instructions (and little or no
microcode), a relatively simple control unit
required. It is likely that a simple control
unit could be made to execute faster than a more
complex one. - Instruction pipelining. RISC researchers feel
that the instruction pipelining technique can be
applied much more effectively with a reduced
instruction set.
18VLSI Implementation
- Chip real estate a CISC processor typically
devotes about half of its area to the control
unit. A RISC processor typically uses only about
10 of the area for the control unit, using
precious real estate for registers instead. - Design and implementation time. The simple
control unit and circuitry of RISC result in
faster design cycles.
19CISC vs. RISC Characteristics
- RISC vs. CISC controversy is now 20 years old.
- After the initial enthusiasm for RISC machines,
there has been a growing realization that - RISC designs may benefits from the inclusion of
some CISC features, and - Vice-versa.
- The result is that more recent RISC design,
PowerPC and PSARC, are no longer "pure" RISC and
the more recent CISC designs, notably the
Pentium, incorporate core RISC characteristics.
20Example CISC ISA Intel X86,386/486/Pentium
- Operand sizes
- Can be 8, 16, 32, 48, 64, or 80 bits long.
- Also supports string operations.
- Instruction Encoding
- The smallest instruction is one byte.
- The longest instruction is 12 bytes long.
- The first bytes generally contain the opcode,
mode specifiers, and register fields. - The remainder bytes are for address displacement
and immediate data.
- 12 addressing modes
- Register.
- Immediate.
- Direct.
- Base.
- Base Displacement.
- Index Displacement.
- Scaled Index Displacement.
- Based Index.
- Based Scaled Index.
- Based Index Displacement.
- Based Scaled Index Displacement.
- Relative.
21Example RISC ISA PowerPC
- Operand sizes
- Four operand sizes 1, 2, 4 or 8 bytes.
- Instruction Encoding
- Instruction set has 15 different formats with
many minor variations. -
- All are 32 bits in length.
- 8 addressing modes
- Register direct.
- Immediate.
- Register indirect.
- Register indirect with immediate index (loads and
stores). - Register indirect with register index (loads and
stores). - Absolute (jumps).
- Link register indirect (calls).
- Count register indirect (branches).
22Example RISC ISA HP Precision
Architecture, HP-PA
- Operand sizes
- Five operand sizes ranging in powers of two from
1 to 16 bytes. - Instruction Encoding
- Instruction set has 12 different formats.
-
- All are 32 bits in length.
- 7 addressing modes
- Register
- Immediate
- Base with displacement
- Base with scaled index and displacement
- Predecrement
- Postincrement
- PC-relative
23Example RISC ISA
SPARC
- Operand sizes
- Four operand sizes 1, 2, 4 or 8 bytes.
- Instruction Encoding
- Instruction set has 3 basic instruction formats
with 3 minor variations. - All are 32 bits in length.
- 5 addressing modes
- Register indirect with immediate displacement.
- Register inderect indexed by another register.
- Register direct.
- Immediate.
- PC relative.
24Example RISC ISA Compaq Alpha AXP
- 4 addressing modes
- Register direct.
- Immediate.
- Register indirect with displacement.
- PC-relative.
- Operand sizes
- Four operand sizes 1, 2, 4 or 8 bytes.
- Instruction Encoding
- Instruction set has 7 different formats.
-
- All are 32 bits in length.
25Which is winning?
- It turns out to be a non-issue.
- Intel clearly can get their machines to run fast
(3 Giga-Hertz) - How?
- By making the microarchitecture RISC-like and
converting CISC to RISC during decode.
26Another Example
- Transmeta Crusoe
- Unknown architecture
- You cant buy the chip without the software
- Converts IA-32 to intermediate machine ISA
- Executes that machine ISA
2764-Bit Processors
2832-bit Computing
- In computer architecture, a word is defined as a
unit of data that can be addressed and moved
between the computer processor and the storage
area. - In 32-bit computing a word is 32 bits.
- Usually, the defined bit-length of a word is
equivalent to the width of the computer's data
bus (and registers) so that a word can be moved
in a single operation from the storage to the
processor registers
2932-bit Computing
- In a 32-bit microprocessor
- There are 32-bit general purpose registers in
the processor. - There are 232 4GB memory to be addressed.
3064-bit Computing
- The best and simple definition is enhancing the
processing word in the architecture to 64 bits. - The addressable memory increases from 4 GB to 264
18 billion GB - Size of registers extended to 64 bits
- Integer and address data up to 64 bits in length
can now be operated on - 264 1.8 x 1019 integers can be represented with
64 bits vs. 4.3 x 109 with 32 bits - Dynamic range has increased by a factor of 4.3
billion!
3164-bit Processor Basics
- Stepping up from 32 to 64 bits does not mean
doubling performance - Certain applications will benefit, others will not
32What Applications Can Benefit Most From 64-bit?
- Large databases
- Business and scientific simulation and modeling
programs - Highly graphics-intensive software (CAD, 3-D
games) - Cryptography
- Etc.
33Benefits of 64-bit Computing
- Allowing applications to store vast amount of
data in main memory. - Allowing complex calculations with a high-level
precision. - Manipulating data and executing instructions in
chunks that are twice as large as in 32-bit
computing.
34Intel Strategy
35Intels Approach to the Market
- Only producing a 64-bit processor for servers and
workstations Itanium - It believes there is not currently enough market
demand for 64-bit in PCs - There is still room to continue to improve
Pentium 4 for desktop customers
3664-bit Computing Two industry std architectures
different usages
Intels highest performance, most reliable server
platform for RISC replacement
The platform of choice just got better
X86
EPIC
- Broadest Software choice
- Versatile 32 and 64-bit support
- Enterprise proven
- High-end Performance
- Reliability/data integrity
- OS, HW, SW choice
37Migration To 64-bit
Validate IA32 binaries to run on 64-bit OS
Step 1
OK ?
no
yes
64-bit code clean
Step 2
Compile
Optimize For X86
Optimize For EPIC
Step 4
Step 4
Step 3
38The Move to Intel Architecture 64-bit and
Multi-core
39AMD Strategy
40AMDs Approach
- Provide a bridge between the 32-bit present and
the 64-bit future - Design processors for the server, workstation,
and personal computing markets - Beyond 64 bits improve interaction of processor
with memory and I/O
41Windows for x64-based Systems32-bit and 64-bit
on a single platform
- An AMD64-based Processor can run both 32- and
64-bit Windows operating systems
START
BOOT UP Using 32 bit BIOS
Look at OS
Load 32 bit OS
Load 64 bit OS
32-bit
64-bit
Run 32 bit Applications
Run 32 64 bit apps
42Before AMD64 Computing infrastructure
islands on either side of the wall
Platform A
Platform B
32-Bit Native Only System
64-Bit Native Only System
43AMDs Industry VisionCompatible systems that
bridge from 32- to 64-bit
AMD Single Platform
- Leverages existing infrastructure
- Runs existing 32-bit applications natively with
unsurpassed performance - No tools or O/S work needed
- Runs existing 32-bit applications on 64-bit O/S
- Take full advantage of 4GB local memory
- Allows customers to migrate to 64-bit performance
according to their schedule - Low learning curve for users and support staff
44The Role of Compilers
45Compiler and ISA
- ISA decisions are no more just for programming
assembly language (AL) easily - Due to HLL, ISA is a compiler target today
- Performance of a computer will be significantly
affected by compiler - Understanding the compiler technology today is
critical to designing and efficiently
implementing an instruction set - Architecture choice affects the code quality and
the complexity of building a compiler for it
46Goal of the Compiler
- Primary goal is correctness
- Second goal is speed of the object code
- Others
- Speed of the compilation
- Ease of providing debug support
- Inter-operability among languages
- Flexibility of the implementation - languages may
not change much but they do evolve - e. g.
Fortran 66 HPF
Make the frequent cases fast and the rare case
correct
47Typical Modern Compiler Structure
Common Intermediate Representation
Somewhat language dependentLargely machine
independent
Small language dependentSlight machine dependent
Language independentHighly machine dependent
48Typical Modern Compiler Structure (Cont.)
- Multi-pass structure ? easy to write bug-free
compilers - Transform HL, more abstract representations, into
progressively low-level representations,
eventually reaching the instruction set - Compilers must make assumptions about the ability
of later steps to deal with certain problems - Ex. 1 choose which procedure calls to expand
inline before they know the exact size of the
procedure being called - Ex. 2 Global common sub-expression elimination
- Find two instances of an expression that compute
the same value and saves the result of the first
one in a temporary - Temporary must be register, not memory
(Performance) - Assume register allocator will allocate temporary
into register
49Optimization Types
- High level - done at source code level
- Procedure called only once - so put it in-line
and save CALL - Local - done on basic sequential block
(straight-line code) - Common sub-expressions produce same value
- Constant propagation - replace constant valued
variable with the constant - saves multiple
variable accesses with same value - Global - same as local but done across branches
- Code motion - remove code from loops that compute
same value on each pass and put it before the
loop - Simplify or eliminate array addressing
calculations in loop
50Optimization Types (Cont.)
- Register allocation
- Use graph coloring (graph theory) to allocate
registers - NP-complete
- Heuristic algorithm works best when there are at
least 16 (and preferably more) registers - Processor-dependent optimization
- Strength reduction replace multiply with shift
and add sequence - Pipeline scheduling reorder instructions to
minimize pipeline stalls - Branch offset optimization Reorder code to
minimize branch offsets
51Register Allocation
- One the most important optimizations
- Based on graph coloring techniques
- Construct graph of possible allocations to a
register - Use graph to allocate registers efficiently
- Goal is to achieve 100 register allocation for
all active variables. - Graph coloring works best when there are at least
16 general-purpose registers available for
integers and more for floating-point variables.
52Constant propagation a 5 ... // no change to
a so far. if (a b) . . . The
statement (a b) can be replaced by (5 b).
This could free a register when the comparison is
executed. When applied systematically, constant
propagation can improve the code significantly.
53Strength reduction Example for (j 0 j n
j) Aj 2j for (i 0 4i A4i 0 An optimizing compiler can replace
multiplication by 4 by addition by 4. This is an
example of strength reduction. In general, scalar
multiplications can be replaced by additions.
54Major Types of Optimizations and Example in Each
Class
55Change in IC Due to Optimization
- Level 1 local optimizations, code scheduling,
and local register allocation - Level 2 global optimization, loop transformation
(software pipelining), global register allocation - Level 3 procedure integration
56How can Architects Help Compiler Writers
- Provide Regularity
- Address modes, operations, and data types should
be orthogonal (independent) of each other - Simplify code generation especially multi-pass
- Counterexample restrict what registers can be
used for a certain classes of instructions - Provide primitives - not solutions
- Special features that match a HLL construct are
often un-usable - What works in one language may be detrimental to
others
57How can Architects Help Compiler Writers (Cont.)
- Simplify trade-offs among alternatives
- How to write good code? What is a good code?
- Metric IC or code size (no longer true) ?caches
and pipeline - Anything that makes code sequence performance
obvious is a definite win! - How many times a variable should be referenced
before it is cheaper to load it into a register - Provide instructions that bind the quantities
known at compile time as constants - Dont hide compile time constants
- Instructions which work off of something that the
compiler thinks could be a run-time determined
value hand-cuffs the optimizer
58Short Summary -- Compilers
- ISA has at least 16 GPR (not counting FP
registers) to simplify allocation of registers
using graph coloring - Orthogonality suggests all supported addressing
modes apply to all instructions that transfer
data - Simplicity understand that less is more in ISA
design - Provide primitives instead of solutions
- Simplify trade-offs between alternatives
- Dont bind constants at runtime
- Counterexample Lack of compiler support for
multimedia instructions