Computer Architecture presentation

About This Presentation

Transcript and Presenter's Notes

Title: Computer Architecture

1
Computer Architecture

Lecture 3
Basic Fundamentals
and
Instruction Sets

2
The Task of a Computer Designer

1.1 Introduction
1.2 The Task of a Computer Designer
1.3 Technology and Computer Usage Trends
1.4 Cost and Trends in Cost
1.5 Measuring and Reporting Performance
1.6 Quantitative Principles of Computer Design
1.7 Putting It All Together The Concept of
Memory Hierarchy

Evaluate Existing Systems for Bottlenecks
Implementation Complexity
Benchmarks
Technology Trends
Implement Next Generation System
Simulate New Designs and Organizations
Workloads
3
Technology and Computer Usage Trends

1.1 Introduction
1.2 The Task of a Computer Designer
1.3 Technology and Computer Usage Trends
1.4 Cost and Trends in Cost
1.5 Measuring and Reporting Performance
1.6 Quantitative Principles of Computer Design
1.7 Putting It All Together The Concept of
Memory Hierarchy

When building a Cathedral numerous very practical
considerations need to be taken into account
available materials
worker skills
willingness of the client to pay the price.

Similarly, Computer Architecture is about working
within constraints
What will the market buy?
Cost/Performance
Tradeoffs in materials and processes

4
Trends

Gordon Moore (Founder of Intel) observed in 1965
that the number of transistors that could be
crammed on a chip doubles every year.
This has CONTINUED to be true since then.

5
Measuring And Reporting Performance

1.1 Introduction
1.2 The Task of a Computer Designer
1.3 Technology and Computer Usage Trends
1.4 Cost and Trends in Cost
1.5 Measuring and Reporting Performance
1.6 Quantitative Principles of Computer Design
1.7 Putting It All Together The Concept of
Memory Hierarchy

This section talks about
Metrics how do we describe in a numerical way
the performance of a computer?
What tools do we use to find those metrics?

6
Metrics

Time to run the task (ExTime)
Execution time, response time, latency
Tasks per day, hour, week, sec, ns
(Performance)
Throughput, bandwidth

7
Metrics - Comparisons

"X is n times faster than Y" means
ExTime(Y) Performance(X)
--------- ---------------
ExTime(X) Performance(Y)
Speed of Concorde vs. Boeing 747
Throughput of Boeing 747 vs. Concorde

8
Metrics - Comparisons

Pat has developed a new product, "rabbit" about
which she wishes to determine performance. There
is special interest in comparing the new product,
rabbit to the old product, turtle, since the
product was rewritten for performance reasons.
(Pat had used Performance Engineering techniques
and thus knew that rabbit was "about twice as
fast" as turtle.) The measurements showed
Performance Comparisons
Product Transactions / second Seconds/
transaction Seconds to process transaction
Turtle 30 0.0333 3
Rabbit 60 0.0166 1
Which of the following statements reflect the
performance comparison of rabbit and turtle?

o Rabbit is 100 faster than turtle. o Rabbit is
twice as fast as turtle. o Rabbit takes 1/2 as
long as turtle. o Rabbit takes 1/3 as long as
turtle. o Rabbit takes 100 less time than turtle.
o Rabbit takes 200 less time than turtle. o
Turtle is 50 as fast as rabbit. o Turtle is 50
slower than rabbit. o Turtle takes 200 longer
than rabbit. o Turtle takes 300 longer than
rabbit.
9
Metrics - Throughput
10
Methods For Predicting Performance

Benchmarks, Traces, Mixes
Hardware Cost, delay, area, power estimation
Simulation (many levels)
ISA, RT, Gate, Circuit
Queuing Theory
Rules of Thumb
Fundamental Laws/Principles

11
Benchmarks
SPEC System Performance Evaluation Cooperative

First Round 1989
10 programs yielding a single number
(SPECmarks)
Second Round 1992
SPECInt92 (6 integer programs) and SPECfp92 (14
floating point programs)
Compiler Flags unlimited. March 93 of DEC 4000
Model 610
spice unix.c/def(sysv,has_bcopy,bcopy(a,b,c)
memcpy(b,a,c)
wave5 /ali(all,dcomnat)/aga/ur4/ur200
nasa7 /norecu/aga/ur4/ur2200/lcblas
Third Round 1995
new set of programs SPECint95 (8 integer
programs) and SPECfp95 (10 floating point)
benchmarks useful for 3 years
Single flag setting for all programs
SPECint_base95, SPECfp_base95

12
Benchmarks
CINT2000 (Integer Component of SPEC CPU2000)

Program Language What Is It
164.gzip C Compression
175.vpr C FPGA Circuit Placement and Routing
176.gcc C C Programming Language Compiler
181.mcf C Combinatorial Optimization
186.crafty C Game Playing Chess
197.parser C Word Processing
252.eon C Computer Visualization
253.perlbmk C PERL Programming Language
254.gap C Group Theory, Interpreter
255.vortex C Object-oriented Database
256.bzip2 C Compression
300.twolf C Place and Route Simulator

http//www.spec.org/osg/cpu2000/CINT2000/
13
Benchmarks
CFP2000 (Floating Point Component of SPEC
CPU2000)

Program Language What Is It
168.wupwise Fortran 77 Physics / Quantum
Chromodynamics
171.swim Fortran 77 Shallow Water Modeling
172.mgrid Fortran 77 Multi-grid Solver 3D
Potential Field
173.applu Fortran 77 Parabolic / Elliptic
Differential Equations
177.mesa C 3-D Graphics Library
178.galgel Fortran 90 Computational Fluid
Dynamics
179.art C Image Recognition / Neural Networks
183.equake C Seismic Wave Propagation Simulation
187.facerec Fortran 90 Image Processing Face
Recognition
188.ammp C Computational Chemistry
189.lucas Fortran 90 Number Theory / Primality
Testing
191.fma3d Fortran 90 Finite-element Crash
Simulation
200.sixtrack Fortran 77 High Energy Physics
Accelerator Design
301.apsi Fortran 77 Meteorology Pollutant
Distribution

http//www.spec.org/osg/cpu2000/CFP2000/
14
Benchmarks
Sample Results For SpecINT2000
http//www.spec.org/osg/cpu2000/results/res2000q3/
cpu2000-20000718-00168.asc
Base Base
Base Peak Peak Peak Benchmarks
Ref Time Run Time Ratio Ref Time
Run Time Ratio 164.gzip 1400
277 505 1400 270
518 175.vpr 1400 419 334
1400 417 336 176.gcc
1100 275 399 1100 272
405 181.mcf 1800 621
290 1800 619 291 186.crafty
1000 191 522 1000 191
523 197.parser 1800 500
360 1800 499 361 252.eon
1300 267 486 1300 267
486 253.perlbmk 1800 302
596 1800 302 596 254.gap
1100 249 442 1100 248
443 255.vortex 1900 268
710 1900 264 719 256.bzip2
1500 389 386 1500 375
400 300.twolf 3000 784
382 3000 776 387 SPECint_base200
0 438 SPECint2000

442

Intel OR840(1 GHz Pentium III processor)
15
Benchmarks
Performance Evaluation

For better or worse, benchmarks shape a field
Good products created when have
Good benchmarks
Good ways to summarize performance
Given sales is a function in part of performance
relative to competition, investment in improving
product as reported by performance summary
If benchmarks/summary inadequate, then choose
between improving product for real programs vs.
improving product to get more salesSales almost
always wins!
Execution time is the measure of computer
performance!

16
Benchmarks
How to Summarize Performance

Management would like to have one number.
Technical people want more
They want to have evidence of reproducibility
there should be enough information so that you or
someone else can repeat the experiment.
There should be consistency when doing the
measurements multiple times.

How would you report these results?
Computer A Computer B Computer C
Program P1 (secs) 1 10 20
Program P2 (secs) 1000 100 20
Total Time (secs) 1001 110 40
17
Quantitative Principles of Computer Design

1.1 Introduction
1.2 The Task of a Computer Designer
1.3 Technology and Computer Usage Trends
1.4 Cost and Trends in Cost
1.5 Measuring and Reporting Performance
1.6 Quantitative Principles of Computer Design
1.7 Putting It All Together The Concept of
Memory Hierarchy

Make the common case fast. Amdahls Law Relates
total speedup of a system to the speedup of some
portion of that system.
18
Amdahl's Law
Quantitative Design
Speedup due to enhancement E
This fraction enhanced

Suppose that enhancement E accelerates a fraction
F of the task by a factor S, and the remainder of
the task is unaffected

19
Quantitative Design
Cycles Per Instruction
CPI (CPU Time Clock Rate) / Instruction Count
Cycles / Instruction Count
Number of instructions of type I.
Instruction Frequency
where

Invest Resources where time is Spent!

20
Quantitative Design
Cycles Per Instruction
Suppose we have a machine where we can count the
frequency with which instructions are executed.
We also know how many cycles it takes for each
instruction type.

Base Machine (Reg / Reg)
Op Freq Cycles CPI(i) ( Time)
ALU 50 1 .5 (33)
Load 20 2 .4 (27)
Store 10 2 .2 (13)
Branch 20 2 .4 (27)
Total CPI 1.5

21
Quantitative Design
Locality of Reference

Programs access a relatively small portion of the
address space at any instant of time.
There are two different types of locality
Temporal Locality (locality in time) If an item
is referenced, it will tend to be referenced
again soon (loops, reuse, etc.)
Spatial Locality (locality in space/location)
If an item is referenced, items whose addresses
are close by tend to be referenced soon (straight
line code, array access, etc.)

22
The Concept of Memory Hierarchy

1.1 Introduction
1.2 The Task of a Computer Designer
1.3 Technology and Computer Usage Trends
1.4 Cost and Trends in Cost
1.5 Measuring and Reporting Performance
1.6 Quantitative Principles of Computer Design
1.7 Putting It All Together The Concept of
Memory Hierarchy

Fast memory is expensive. Slow memory is
cheap. The goal is to minimize the
price/performance for a particular price point.
23
Memory Hierarchy
Registers
Level 1 cache
Level 2 Cache
Memory
Disk
Typical Size 4 - 64 lt16K bytes lt2 Mbytes lt16 Gigabytes gt 5 Gigabytes
Access Time 1 nsec 3 nsec 15 nsec 150 nsec 5,000,000 nsec
Bandwidth (in MB/sec) 10,000 50,000 2000 - 5000 500 - 1000 500 - 1000 100
Managed By Compiler Hardware Hardware OS OS/User
24
Memory Hierarchy

Hit data appears in some block in the upper
level (example Block X)
Hit Rate the fraction of memory access found in
the upper level
Hit Time Time to access the upper level which
consists of
RAM access time Time to determine hit/miss
Miss data needs to be retrieve from a block in
the lower level (Block Y)
Miss Rate 1 - (Hit Rate)
Miss Penalty Time to replace a block in the
upper level
Time to deliver the block the processor
Hit Time ltlt Miss Penalty (500 instructions on
21264!)

25
Memory Hierarchy
Registers
Level 1 cache
Level 2 Cache
Memory
Disk

What is the cost of executing a program if
Stores are free (theres a write pipe)
Loads are 20 of all instructions
80 of loads hit (are found) in the Level 1 cache
97 of loads hit in the Level 2 cache.

26
The Instruction Set

2.1 Introduction
2.2 Classifying Instruction Set Architectures
2.3 Memory Addressing
2.4 Operations in the Instruction Set
2.5 Type and Size of Operands
2.6 Encoding and Instruction Set
2.7 The Role of Compilers
2.8 The MIPS Architecture
Bonus

27
Introduction

The Instruction Set Architecture is that portion
of the machine visible to the assembly level
programmer or to the compiler writer.

What are the advantages and disadvantages of
various instruction set alternatives.
How do languages and compilers affect ISA.
Use the DLX architecture as an example of a RISC
architecture.

28
Classifying Instruction Set Architectures
2.1 Introduction
2.2 Classifying Instruction Set
Architectures 2.3 Memory Addressing 2.4
Operations in the Instruction Set 2.5 Type and
Size of Operands 2.6 Encoding and Instruction
Set 2.7 The Role of Compilers 2.8 The DLX
Architecture

Classifications can be by
Stack/accumulator/register
Number of memory operands.
Number of total operands.

29
Instruction Set Architectures
Basic ISA Classes

Accumulator
1 address add A acc acc memA
1x address addx A acc acc memA x
Stack
0 address add tos tos next
General Purpose Register
2 address add A B EA(A) EA(A) EA(B)
3 address add A B C EA(A) EA(B) EA(C)
Load/Store
0 Memory load R1, Mem1
load R2, Mem2
add R1, R2
1 Memory add R1, Mem2

ALU Instructions can have two or three operands.
ALU Instructions can have 0, 1, 2, 3 operands.
Shown here are cases of 0 and 1.
30
Instruction Set Architectures
Basic ISA Classes
The results of different address classes is
easiest to see with the examples here, all of
which implement the sequences for C A B.
Stack Accumulator Register (Register-memory) Register (load-store)
Push A Load A Load R1, A Load R1, A
Push B Add B Add R1, B Load R2, B
Add Store C Store C, R1 Add R3, R1, R2
Pop C Store C, R3
Registers are the class that won out. The more
registers on the CPU, the better.
31
Instruction Set Architectures
Intel 80x86 Integer Registers
GPR0 EAX Accumulator
GPR1 ECX Count register, string, loop
GPR2 EDX Data Register multiply, divide
GPR3 EBX Base Address Register
GPR4 ESP Stack Pointer
GPR5 EBP Base Pointer for base of stack seg.
GPR6 ESI Index Register
GPR7 EDI Index Register
CS Code Segment Pointer
SS Stack Segment Pointer
DS Data Segment Pointer
ES Extra Data Segment Pointer
FS Data Seg. 2
GS Data Seg. 3
PC EIP Instruction Counter
Eflags Condition Codes
32
Memory Addressing
2.1 Introduction
2.2 Classifying Instruction Set
Architectures 2.3 Memory Addressing 2.4
Operations in the Instruction Set 2.5 Type and
Size of Operands 2.6 Encoding and Instruction
Set 2.7 The Role of Compilers 2.8 The DLX
Architecture

Sections Include
Interpreting Memory Addresses
Addressing Modes
Displacement Address Mode
Immediate Address Mode

33
Memory Addressing
Interpreting Memory Addresses

What object is accessed as a function of the
address and length?
Objects have byte addresses an address refers
to the number of bytes counted from the beginning
of memory.
Little Endian puts the byte whose address is
xx00 at the least significant position in the
word.
Big Endian puts the byte whose address is xx00
at the most significant position in the word.
Alignment data must be aligned on a boundary
equal to its size. Misalignment typically
results in an alignment fault that must be
handled by the Operating System.

34
Memory Addressing
Addressing Modes

This table shows the most common modes.

Addressing Mode Example Instruction Meaning When Used
Register Add R4, R3 RR4 lt- RR4 RR3 When a value is in a register.
Immediate Add R4, 3 RR4 lt- RR4 3 For constants.
Displacement Add R4, 100(R1) RR4 lt- RR4 M100RR1 Accessing local variables.
Register Deferred Add R4, (R1) RR4 lt- RR4 MRR1 Using a pointer or a computed address.
Absolute Add R4, (1001) RR4 lt- RR4 M1001 Used for static data.
35
Memory Addressing
Displacement Addressing Mode

How big should the displacement be?
For addresses that do fit in displacement size
Add R4, 10000 (R0)
For addresses that dont fit in displacement
size, the compiler must do the following
Load R1, address
Add R4, 0 (R1)
Depends on typical displaces as to how big this
should be.
On both IA32 and DLX, the space allocated is 16
bits.

36
Memory Addressing
Immediate Address Mode

Used where we want to get to a numerical value in
an instruction.

At high level a b 3 if ( a gt 17
) goto Addr
At Assembler level Load R2, 3 Add R0,
R1, R2 Load R2, 17 CMPBGT R1,
R2 Load R1, Address Jump (R1)
37
Operations In The Instruction Set
2.1 Introduction
2.2 Classifying Instruction Set
Architectures 2.3 Memory Addressing 2.4
Operations in the Instruction Set 2.5 Type and
Size of Operands 2.6 Encoding and Instruction
Set 2.7 The Role of Compilers 2.8 The DLX
Architecture

Sections Include
Detailed information about types of instructions.
Instructions for Control Flow (conditional
branches, jumps)

38
Operations In The Instruction Set
Operator Types

Arithmetic and logical and, add
Data transfer move, load
Control branch, jump, call
System system call, traps
Floating point add, mul, div, sqrt
Decimal add, convert
String move, compare
Multimedia - 2D, 3D? e.g., Intel MMX and Sun
VIS

39
Operations In The Instruction Set
Control Instructions
Conditional branches are 20 of all instructions!!

Control Instructions Issues
taken or not
where is the target
link return address
save or restore
Instructions that change the PC
(conditional) branches, (unconditional) jumps
function calls, function returns
system calls, system returns

40
Operations In The Instruction Set
Control Instructions

There are numerous tradeoffs
Compare and branch
no extra compare, no state passed between
instructions
-- requires ALU op, restricts code scheduling
opportunities
Implicitly set condition codes Z, N, V, C
can be set for free''
-- constrains code reordering, extra state to
save/restore
Explicitly set condition codes
can be set for free'', decouples branch/fetch
from pipeline
-- extra state to save/restore

There are numerous tradeoffs condition in
generalpurpose register no special state but
uses up a register -- branch condition separate
from branch logic in pipeline some data for MIPS
gt 80 branches use immediate data, gt 80 of
those zero 50 branches use 0 or ltgt 0
compromise in MIPS branch0, branchltgt0
compare instructions for all other compares
41
Operations In The Instruction Set
Control Instructions

Link Return Address
implicit register many recent architectures use
this
fast, simple
-- s/w save register before next call, surprise
traps?
explicit register
may avoid saving register
-- register must be specified
processor stack
recursion direct
-- complex instructions

Save or restore state What state? function
calls registers system calls registers, flags,
PC, PSW, etc Hardware need not save registers
Caller can save registers in use Callee save
registers it will use Hardware register save
IBM STM, VAX CALLS Faster? Many recent
architectures do no register saving Or do
implicit register saving with register windows
(SPARC)
42
Type And Size of Operands
2.1 Introduction
2.2 Classifying Instruction Set
Architectures 2.3 Memory Addressing 2.4
Operations in the Instruction Set 2.5 Type and
Size of Operands 2.6 Encoding and Instruction
Set 2.7 The Role of Compilers 2.8 The DLX
Architecture

The type of the operand is usually encoded in the
Opcode a LDW implies loading of a word.
Common sizes are
Character (1 byte)
Half word (16 bits)
Word (32 bits)
Single Precision Floating Point (1 Word)
Double Precision Floating Point (2 Words)
Integers are twos complement binary.
Floating point is IEEE 754.
Some languages (like COBOL) use packed decimal.

43
Encoding And Instruction Set

This section has to do with how an assembly level
instruction is encoded into binary.
Ultimately, its the binary that is read and
interpreted by the machine.

2.1 Introduction
2.2 Classifying Instruction Set
Architectures 2.3 Memory Addressing 2.4
Operations in the Instruction Set 2.5 Type and
Size of Operands 2.6 Encoding and Instruction
Set 2.7 The Role of Compilers 2.8 The DLX
Architecture
44
Encoding And Instruction Set
80x86 Instruction Encoding
Heres some sample code thats been disassembled.
It was compiled with the debugger option so is
not optimized.

for ( index 0 index lt iterations index )
0040D3AF C7 45 F0 00 00 00 00 mov
dword ptr ebp-10h,0
0040D3B6 EB 09 jmp
main0D1h (0040d3c1)
0040D3B8 8B 4D F0 mov
ecx,dword ptr ebp-10h
0040D3BB 83 C1 01 add
ecx,1
0040D3BE 89 4D F0 mov
dword ptr ebp-10h,ecx
0040D3C1 8B 55 F0 mov
edx,dword ptr ebp-10h
0040D3C4 3B 55 F8 cmp
edx,dword ptr ebp-8
0040D3C7 7D 15 jge
main0EEh (0040d3de)
long_temp (alignment
long_temp) 47
0040D3C9 8B 45 F4 mov
eax,dword ptr ebp-0Ch
0040D3CC 8B 00 mov
eax,dword ptr eax
0040D3CE 03 45 EC add
eax,dword ptr ebp-14h
0040D3D1 99 cdq
0040D3D2 B9 2F 00 00 00 mov
ecx,2Fh
0040D3D7 F7 F9 idiv
eax,ecx
0040D3D9 89 55 EC mov
dword ptr ebp-14h,edx
0040D3DC EB DA jmp
main0C8h (0040d3b8)

This code was produced using Visual Studio
45
Encoding And Instruction Set
80x86 Instruction Encoding
Heres some sample code thats been disassembled.
It was compiled with optimization

for ( index 0 index lt iterations index )
00401000 8B 0D 40 54 40 00 mov
ecx,dword ptr ds405440h
00401006 33 D2 xor
edx,edx
00401008 85 C9 test
ecx,ecx
0040100A 7E 14 jle
00401020
0040100C 56 push esi
0040100D 57 push edi
0040100E 8B F1 mov
esi,ecx
long_temp (alignment long_temp) 47
00401010 8D 04 11 lea
eax,ecxedx
00401013 BF 2F 00 00 00 mov
edi,2Fh
00401018 99 cdq
00401019 F7 FF idiv
eax,edi
0040101B 4E dec esi
0040101C 75 F2 jne
00401010
0040101E 5F pop edi
0040101F 5E pop esi
00401020 C3 ret

This code was produced using Visual Studio
46
Encoding And Instruction Set
80x86 Instruction Encoding
Heres some sample code thats been disassembled.
It was compiled with optimization

for ( index 0 index lt iterations index )
0x804852f ltmain143gt add 0x10,esp
0x8048532 ltmain146gt lea 0xfffffff8(ebp),e
dx
0x8048535 ltmain149gt test esi,esi
0x8048537 ltmain151gt jle 0x8048543
ltmain163gt
0x8048539 ltmain153gt mov esi,eax
0x804853b ltmain155gt nop
0x804853c ltmain156gt lea 0x0(esi,1),esi
long_temp (alignment long_temp) 47
0x8048540 ltmain160gt dec eax
0x8048541 ltmain161gt jne 0x8048540
ltmain160gt
0x8048543 ltmain163gt add 0xfffffff4,esp

This code was produced using gcc and gdb.
Note that the representation of the code is
dependent on the compiler/debugger!
47
Encoding And Instruction Set

80x86 Instruction Encoding

3
4
8
1
A Morass of disjoint encoding!!
Reg
ADD
Disp.
W
6
8
2
8
postbyte
SHL
V/w
Disp.
7
1
8
8
TEST
W
postbyte
Immediate
48
Encoding And Instruction Set

80x86 Instruction Encoding

4
4
8
Cond
JE
Disp.
16
16
8
CALLF
Offset
Segment Number
6
8
2
8
postbyte
MOV
D/w
Disp.
5
3
PUSH
Reg
49
The Role of Compilers
2.1 Introduction
2.2 Classifying Instruction Set
Architectures 2.3 Memory Addressing 2.4
Operations in the Instruction Set 2.5 Type and
Size of Operands 2.6 Encoding and Instruction
Set 2.7 The Role of Compilers 2.8 The DLX
Architecture

Compiler goals
All correct programs execute correctly
Most compiled programs execute fast
(optimizations)
Fast compilation
Debugging support

50
The Role of Compilers

Steps In Compilation

Parsing gt intermediate representation Jump
Optimization Loop Optimizations Register
Allocation Code Generation gt assembly code
Common SubExpression Procedure in-lining
Constant Propagation Strength Reduction
Pipeline Scheduling
51
The Role of Compilers

Steps In Compilation

Optimization Name Explanation of the total number of optimizing transformations
High Level At or near the source level machine-independent Not Measured
Local Within Straight Line Code 40
Global Across A Branch 42
Machine Dependent Depends on Machine Knowledge Not Measured
52
The Role of Compilers

What compiler writers want

regularity
orthogonality
composability
Compilers perform a giant case analysis
too many choices make it hard
Orthogonal instruction sets
operation, addressing mode, data type

One solution or all possible solutions
2 branch conditions eq, lt
or all six eq, ne, lt, gt, le, ge
not 3 or 4
There are advantages to having instructions that
are primitives.
Let the compiler put the instructions together to
make more complex sequences.

53
The MIPS Architecture

MIPS is very RISC oriented.
MIPS will be used for many examples throughout
the course.

2.1 Introduction
2.2 Classifying Instruction Set
Architectures 2.3 Memory Addressing 2.4
Operations in the Instruction Set 2.5 Type and
Size of Operands 2.6 Encoding and Instruction
Set 2.7 The Role of Compilers 2.8 The MIPS
Architecture
MIPS (originally an acronym for Microprocessor
without Interlocked Pipeline Stages) is a RISC
microprocessor architecture developed by MIPS
Technologies. We will look at the Pipeline
concept in our next lecture.
The acronym RISC (pronounced risk), for reduced
instruction set computer represents a CPU design
strategy emphasizing the insight that simplified
instructions which "do less" may still provide
for higher performance if this simplicity can be
utilized to make instructions execute very fast.
Well known RISC families include DEC Alpha, ARC,
ARM, AVR, MIPS, PA-RISC, Power Architecture
(including PowerPC), and SPARC.
54
The MIPS Architecture

MIPS Characteristics

Addressing Modes
Immediate
Displacement
(Register Mode used only for ALU)

32bit byte addresses aligned Load/store only
displacement addressing Standard data types 3
fixed length formats 32 32bit GPRs (r0 0) 16
64bit (32 32bit) FPRs FP status register No
Condition Codes

Data transfer
load/store word, load/store byte/half word
signed?
load/store FP single/double
moves between GPRs and FPRs
ALU
add/subtract signed? immediate?
multiply/divide signed?
and, or, xor immediate?, shifts ll, rl, ra
immediate?
sets immediate?

Theres MIPS 64 the current arch. Standard
datatypes 4 fixed length formats (8,16,32,64) 32
64bit GPRs (r0 0) 64 64bit FPRs
55
The MIPS Architecture

MIPS Characteristics

Control
branches 0, ltgt 0
conditional branch testing FP bit
jump, jump register
jump link, jump link register
trap, returnfromexception
Floating Point
add/sub/mul/div
single/double
fp converts, fp set

56
The DLX Architecture
The DLX is a RISC processor architecture design
by the principal designers of the MIPS and the
Berkeley RISC designs, the two benchmark examples
of RISC design. The DLX is essentially a cleaned
up and simplified MIPS with a simple 32-bit
load/store architecture. Intended primarily for
teaching purposes, the DLX design is widely used
in university-level computer architecture courses.
The next couple of lectures will use the MIPS and
DLX architectures as examples to demonstrate
concepts.
57
End of Lecture

Write a Comment

User Comments (0)

About PowerShow.com

Computer Architecture PowerPoint PPT Presentation