Title: EE37E2005
1Lesson 5 Processor Design
- Topic 1 Methods and Concepts
2 Introduction
- References
- -Modern Processor Design Book ( pp. 1 16)
- - Computer Organization and Design Book (pp. 54-
89)
3- While introducing this topic we will focus on
these points - Evolution of microprocessors
- Instruction set processor design
- Principles
- Microprocessors are Instruction set processors
(ISPs). - An ISP executes instructions from a predefined
instruction set. - A microprocessors functionality is fully
characterized by the instruction set it is
capable of executing. - This predefined instruction set is also called
the instruction set architecture.
4- An ISA serves as an interface between software
and hardware. - In terms of processor design methodology, an ISA
is the specification of the design while the
microprocessor or ISP is the implementation of a
design.
5Computer System Components
1000MHZ - 3 GHZ (a multiple of system bus
speed) Pipelined ( 7 -21 stages ) Superscalar
(max 4 instructions/cycle) single-threaded Dyn
amically-Scheduled or VLIW Dynamic and static
branch prediction
CPU
L1 L2 L3
Examples Alpha, AMD K7 EV6, 400MHZ
Intel PII, PIII GTL 133MHZ
Intel P4
800MHZ
Caches
SDRAM PC100/PC133 100-133MHZ 64-128 bits
wide 2-way inteleaved 900 MBYTES/SEC Double
Date Rate (DDR) SDRAM PC3200 400MHZ (effective
200x2) 64-128 bits wide 4-way interleaved 3.2
GBYTES/SEC (second half 2002) RAMbus DRAM
(RDRAM) PC800, PC1060 400-533MHZ (DDR) 16-32
bits wide channel 1.6 - 3.2 GBYTES/SEC
( per channel)
System Bus
Support for one or more CPUs
adapters
I/O Buses
Example PCI-X 133MHZ PCI,
33-66MHZ 32-64 bits wide
133-1024 MBYTES/SEC
Memory Bus
Controllers
Disks Displays Keyboards
Networks
I/O Devices
Fast Ethernet Gigabit Ethernet ATM, Token Ring ..
North Bridge
South Bridge
Chipset
6Computer System Components
Enhanced CPU Performance Capabilities
- Support for Simultaneous Multithreading (SMT)
Alpha EV8. - VLIW intelligent compiler techniques
Intel/HP EPIC IA-64. - More Advanced Branch Prediction Techniques.
- Chip Multiprocessors (CMPs) The Hydra
Project. IBM Power 4,5 - Vector processing capability Vector
Intelligent RAM (VIRAM). - Or Multimedia ISA extension.
- Digital Signal Processing (DSP) capability in
system. - Re-Configurable Computing hardware capability
in system.
SMT CMP
Memory Latency Reduction
Conventional Block-based Trace Cache.
L1 L2 L3
CPU
Caches
Integrate Memory Controller a portion of main
memory with CPU Intelligent RAM Integrated
memory Controller AMD Opetron IBM
Power5
System Bus
adapters
I/O Buses
Memory Bus
Controllers
Disks (RAID) Displays Keyboards
Networks
North Bridge
South Bridge
I/O Devices
Chipset
7Recent Trends in Computer Design
- The cost/performance ratio of computing systems
have seen a steady decline due to advances in - Integrated circuit technology decreasing
feature size, ? - Clock rate improves roughly proportional to
improvement in ? - Number of transistors improves proportional to
????(or faster). - Architectural improvements in CPU design.
- Microprocessor systems directly reflect IC
improvement in terms of a yearly 35 to 55
improvement in performance. - Assembly language has been mostly eliminated and
replaced by other alternatives such as C or C - Standard operating Systems (UNIX, NT) lowered
the cost of introducing new architectures. - Emergence of RISC architectures and RISC-core
architectures. - Adoption of quantitative approaches to computer
design based on empirical performance
observations.
8Microprocessor Architecture Trends
CMPs
(SMT)
SMT/CMPs (e.g. IBM Power5 in 2004)
9 Evolution of microprocessors
Graduation Window
Alpha 21264 15 million Pentium Pro 5.5
million PowerPC 620 6.9 million Alpha 21164 9.3
million Sparc Ultra 5.2 million
Moores Law
- CMOS improvements
- Die size 2X every 3 yrs
- Line width halve / 4-7 yrs
Figure1 Evolution of microprocessors
10- Three decades of the history of microprocessors
tell a truly remarkable story of advances in the
computer industry (Table 1).
Table 1. The amazing decades of the evolution of
microprocessors
11Hierarchy of Computer Architecture
High-Level Language Programs
Assembly Language Programs
Software
Machine Language Program
Software/Hardware Boundary
Hardware
Microprogram
Register Transfer Notation (RTN)
Logic Diagrams
Circuit Diagrams
12 Instruction Set Processor Design
- Critical to an ISP is the instruction set
architecture, which specifies the functionality
that must be implemented by the instruction set
processor (ISP).
13The Design Process
- "To Design Is To Represent
- Design activity yields description/representation
of an object - Traditional craftsman does not distinguish
between the conceptualization and the artifact - Separation comes about because of complexity
- Concept is captured in one or more representation
languages - This process IS design
- Design Begins With Requirements
- Functional Capabilities what it will do
- Performance Characteristics Speed, Power, Area,
Cost, . . .
14Design Process (cont.)
CPU
- Design Finishes As Assembly
- Design understood in terms of components and how
they have been assembled - Top Down decomposition of complex functions
(behaviors) into more primitive functions - Bottom-up composition of primitive building
blocks into more complex assemblies
Datapath
Control
ALU
Regs
Shifter
Nand Gate
Design is a "creative process," not a simple
method
15Design as Search
Problem A
Strategy 1
Strategy 2
SubProb2
SubProb3
SubProb 1
BB1
BB2
BB3
BBn
16 Instruction Set Architecture(subset of Computer
Architecture)
- ... the attributes of a computing system as
seen by the programmer, i.e., the conceptual
structure and functional behavior, as distinct
from the organization of the data flows and
controls the logic design, and the physical
implementation. Amdahl, Blaaw, and Brooks,
1964
- Organization of Programmable Storage
- Data Types Data Structures
- Encodings Representations
- Instruction Set
- Instruction Formats
- Modes of Addressing and Accessing Data Items
and Instructions - Exceptional Conditions
17The Instruction Set a Critical Interface
software
instruction set
hardware
Figure 2 ISA
18Dynamic Static Interface
- We have discussed two critical roles played by
the ISA - Contract between software and Hardware, which
facilitates the development pf programs and
machines - Specification for microprocessor design
- The third role is an associated definition of an
interface that separates what is done statically
at the compile time versus what is done
dynamically at run time. This interface is called
the Dynamic-static Interface
19(Software)
Program
Compiler complexity
Exposed to software
Static
Architecture (DSI)
Hardware complexity
Hidden in hardware
Dynamic
Machine
(Hardware)
Figure 3 The dynamic-static feature
20Computer Architecture Topics
Input/Output and Storage
Disks, WORM, Tape
RAID
Emerging Technologies Interleaving Bus protocols
DRAM
Coherence, Bandwidth, Latency
Memory Hierarchy
L2 Cache
L1 Cache
Addressing, Protection, Exception Handling
VLSI
Instruction Set Architecture
Pipelining and Instruction Level Parallelism
Pipelining, Hazard Resolution, Superscalar,
Reordering, Prediction, Speculation, Vector, DSP
21 Principles of Processor Performance
22Definitions
- Performance is in units of things per sec
- bigger is better
- If we are primarily concerned with response time
" X is n times faster than Y" means
23Cycles Per Instruction
IC Instruction Count CPI Clock Per Instruction
24Cycles Per Instruction
We may separate the contribution of each type
of instruction to the execution time defining
Processor pipelining and memory interactions
limit the accuracy of this approach, but its a
good first guess. For accuracy, it is necessary
to simulate the instructions of an entire program
with issue, pipeline and memory interactions.
25Aspects of CPU Performance (CPU Law)
26Amdahl's Law
- Speedup due to enhancement E
-
-
-
- Suppose that enhancement E accelerates a fraction
F of the task by a factor S, and the remainder of
the task is unaffected - E.g. special instructions, memory, IO, parallel
processing
27Amdahls Law
28Amdahls Law
- Example Floating point instructions improved to
run 2X but only 10 of actual instructions are FP
29Topic 2 Instruction Set Architecture Design
- Adapted from Prof. Jerry Breechers Notes my
CS21Q Notes - (http//babbage.clarku.edu/jbreecher/arch/arch.ht
ml)
30Introduction
- 7.1 Introduction
- 7.2 Classifying Instruction Set Architectures
- 7.3 Memory Addressing
- 7.4 Operations in the Instruction Set
- 7.5 Type and Size of Operands
- 7.6 Encoding and Instruction Set
- 7.7 The Role of Compilers
- 7.8 The MIPS Architecture and Bonus
- 7.9. Endianess
31Introduction
- The Instruction Set Architecture is that portion
of the machine visible to the assembly level
programmer or to the compiler writer.
Questions - What are the advantages and
disadvantages of various instruction set
alternatives? - How do languages and compilers
affect ISA?
32Classifying Instruction Set Architectures
- Classifications can be by
- Stack/accumulator/register
- Number of memory operands.
- Number of total operands.
33Instruction Set Architectures
Basic ISA Classes
- Accumulator
- 1 address add A acc acc memA
- 1x address addx A acc acc memA x
- Stack
- 0 address add tos tos next
- General Purpose Register
- 2 address add A B EA(A) EA(A) EA(B)
- 3 address add A B C EA(A) EA(B) EA(C)
- Load/Store
- 0 Memory load R1, Mem1
- load R2, Mem2
- add R1, R2
- 1 Memory add R1, Mem2
ALU Instructions can have two or three operands.
ALU Instructions can have 0, 1, 2, 3 operands.
Shown here are cases of 0 and 1.
34Instruction Set Architectures
Basic ISA Classes
The results of different address classes is
easiest to see with the examples here, all of
which implement the sequences for C A B.
Registers are the class that won out. The more
registers on the CPU, the better.
35Instruction Set Architectures
Intel 80x86 Integer Registers
36Memory Addressing
- Sections Include
- Interpreting Memory Addresses
- Addressing Modes
- Displacement Address Mode
- Immediate Address Mode
37Memory Addressing
Interpreting Memory Addresses
- What object is accessed as a function of the
address and length? - Objects have byte addresses an address refers
to the number of bytes counted from the beginning
of memory. - Little Endian puts the byte whose address is
xx00 at the least significant position in the
word. - Big Endian puts the byte whose address is xx00
at the most significant position in the word. - Alignment data must be aligned on a boundary
equal to its size. Misalignment typically
results in an alignment fault that must be
handled by the Operating System.
38Memory Addressing
Addressing Modes
- This table shows the most common modes. A more
complete set is in Figure 2.6
39Memory Addressing
Displacement Addressing Mode
- How big should the displacement be?
- For addresses that do fit in displacement size
- Add R4, 10000 (R0)
- For addresses that dont fit in displacement
size, the compiler must do the following - Load R1, address
- Add R4, 0 (R1)
- Depends on typical displaces as to how big this
should be. - On both IA32 and DLX, the space allocated is 16
bits.
40Memory Addressing
Immediate Address Mode
- Used where we want to get to a numerical value in
an instruction.
At high level a b 3 if ( a gt 17
) goto Addr
At Assembler level Load R2, 3 Add R0,
R1, R2 Load R2, 17 CMPBGT R1,
R2 Load R1, Address Jump (R1)
So how would you get a 32 bit value into a
register?
41Operations In The Instruction Set
- Sections Include
- Detailed information about types of instructions.
- Instructions for Control Flow (conditional
branches, jumps)
42Operations In The Instruction Set
Operator Types
- Arithmetic and logical and, add
- Data transfer move, load
- Control branch, jump, call
- System system call, traps
- Floating point add, mul, div, sqrt
- Decimal add, convert
- String move, compare
- Multimedia - 2D, 3D? e.g., Intel MMX and Sun
VIS
43Operations In The Instruction Set
Control Instructions
Conditional branches are 20 of all instructions!!
- Control Instructions Issues
- taken or not
- where is the target
- link return address
- save or restore
- Instructions that change the PC
- (conditional) branches, (unconditional) jumps
- function calls, function returns
- system calls, system returns
44Type And Size of Operands
- The type of the operand is usually encoded in the
Opcode a LDW implies loading of a word. - Common sizes are
- Character (1 byte)
- Half word (16 bits)
- Word (32 bits)
- Single Precision Floating Point (1 Word)
- Double Precision Floating Point (2 Words)
- Integers are twos complement binary.
- Floating point is IEEE 754.
- Some languages (like COBOL) use packed decimal.
45The MIPS Architecture
- MIPS is very RISC oriented.
46The MIPS Architecture
- Addressing Modes
- Immediate
- Displacement
- (Register Mode used only for ALU)
Theres MIPS 32 that we learned in CS140
32bit byte addresses aligned Load/store only
displacement addressing Standard datatypes 3
fixed length formats 32 32bit GPRs (r0 0) 16
64bit (32 32bit) FPRs FP status register No
Condition Codes
- Data transfer
- load/store word, load/store byte/halfword signed?
- load/store FP single/double
- moves between GPRs and FPRs
- ALU
- add/subtract signed? immediate?
- multiply/divide signed?
- and,or,xor immediate?, shifts ll, rl, ra
immediate? - sets immediate?
Theres MIPS 64 the current arch. Standard
datatypes 4 fixed length formats (8,16,32,64) 32
64bit GPRs (r0 0) 64 64bit FPRs
47The MIPS Architecture
- Control
- branches 0, ltgt 0
- conditional branch testing FP bit
- jump, jump register
- jump link, jump link register
- trap, returnfromexception
- Floating Point
- add/sub/mul/div
- single/double
- fp converts, fp set
48The MIPS Architecture
49Byte Ordering
- How should bytes within multi-byte word be
ordered in memory? - Conventions
- Suns, Macs are Big Endian machines
- Least significant byte has highest address
- Alphas, PCs are Little Endian machines
- Least significant byte has lowest address
50Byte Ordering Example
- Big Endian
- Least significant byte has highest address
- Little Endian
- Least significant byte has lowest address
- Example
- Variable x has 4-byte representation 0x01234567
- Address given by x is 0x100
Big Endian
01
23
45
67
Little Endian
67
45
23
01
51Machine-Level Code Representation
- Encode Program as Sequence of Instructions
- Each simple operation
- Arithmetic operation
- Read or write memory
- Conditional branch
- Instructions encoded as bytes
- Alphas, Suns, Macs use 4 byte instructions
- Reduced Instruction Set Computer (RISC)
- PCs use variable length instructions
- Complex Instruction Set Computer (CISC)
- Different instruction types and encodings for
different machines - Most code not binary compatible
- Programs are Byte Sequences Too!
52Classification of Processors
- We can classify processors according to the areas
in which they are mostly used. - We can identity four different group of
processors - General purpose processors that are used in
building computers - Digital Signal processors which are processors
designed specifically for signal processing. - Microcontrollers which are small microcromputers
which integrate in the same chip a core
processors plus I/O elements and small amount of
memories - Application specific processors which design to
performed specific function (i.e. Network
processors)
53 General Purpose Processors
- These processors are used to built major computer
platforms. - We can name
- Intel / AMD based computers also called IBM
compatible - Macintosh computers built using PowerPC
processors - Sun machines that use Ultrasparc Processors.
54Examples of General Purpose Processors
55DSP
- Digital Signal Processing (DSP) is used in a wide
variety of applications, and it is hard to find a
good definition that is general. - We can start by dictionary definitions of the
words - Digital
- operating by the use of discrete signals to
represent data in the form of numbers - Signal
- a variable parameter by which information is
conveyed through an electronic circuit - Processing
- to perform operations on data according to
programmed instructions - Which leads us to a simple definition of Digital
Signal processing - changing or analyzing information which is
measured as discrete sequences of numbers
56- Note two unique features of Digital Signal
processing as opposed to plain old ordinary
digital processing - signals come from the real world - this intimate
connection with the real world leads to many
unique needs such as the need to react in real
time and a need to measure signals and convert
them to digital numbers - signals are discrete - which means the
information in between discrete samples is lost - The advantages of DSP are common to many digital
systems and include - Versatility
- digital systems can be reprogrammed for other
applications (at least where programmable DSP
chips are used) - digital systems can be ported to different
hardware (for example a different DSP chip or
board level product) - Repeatability
- digital systems can be easily duplicated
- digital systems do not depend on strict component
tolerances - digital system responses do not drift with
temperature - Simplicity
- some things can be done more easily digitally
than with analogue systems
57- DSP is used in a very wide variety of
applications. - But most share some common features
- they use a lot of math (multiplying and adding
signals) - they deal with signals that come from the real
world - they require a response in a certain time
- Where general purpose DSP processors are
concerned, most applications deal with signal
frequencies that are in the audio range.