Computer Architecture - PowerPoint PPT Presentation

About This Presentation
Title:

Computer Architecture

Description:

Computer Architecture Lecture 3 Basic Fundamentals and Instruction Sets ... – PowerPoint PPT presentation

Number of Views:479
Avg rating:3.0/5.0
Slides: 58
Provided by: BarbaraH154
Category:

less

Transcript and Presenter's Notes

Title: Computer Architecture


1
Computer Architecture
  • Lecture 3
  • Basic Fundamentals
  • and
  • Instruction Sets

2
The Task of a Computer Designer
  • 1.1 Introduction
  • 1.2 The Task of a Computer Designer
  • 1.3 Technology and Computer Usage Trends
  • 1.4 Cost and Trends in Cost
  • 1.5 Measuring and Reporting Performance
  • 1.6 Quantitative Principles of Computer Design
  • 1.7 Putting It All Together The Concept of
    Memory Hierarchy

Evaluate Existing Systems for Bottlenecks
Implementation Complexity
Benchmarks
Technology Trends
Implement Next Generation System
Simulate New Designs and Organizations
Workloads
3
Technology and Computer Usage Trends
  • 1.1 Introduction
  • 1.2 The Task of a Computer Designer
  • 1.3 Technology and Computer Usage Trends
  • 1.4 Cost and Trends in Cost
  • 1.5 Measuring and Reporting Performance
  • 1.6 Quantitative Principles of Computer Design
  • 1.7 Putting It All Together The Concept of
    Memory Hierarchy
  • When building a Cathedral numerous very practical
    considerations need to be taken into account
  • available materials
  • worker skills
  • willingness of the client to pay the price.
  • Similarly, Computer Architecture is about working
    within constraints
  • What will the market buy?
  • Cost/Performance
  • Tradeoffs in materials and processes

4
Trends
  • Gordon Moore (Founder of Intel) observed in 1965
    that the number of transistors that could be
    crammed on a chip doubles every year.
  • This has CONTINUED to be true since then.

5
Measuring And Reporting Performance
  • 1.1 Introduction
  • 1.2 The Task of a Computer Designer
  • 1.3 Technology and Computer Usage Trends
  • 1.4 Cost and Trends in Cost
  • 1.5 Measuring and Reporting Performance
  • 1.6 Quantitative Principles of Computer Design
  • 1.7 Putting It All Together The Concept of
    Memory Hierarchy
  • This section talks about
  • Metrics how do we describe in a numerical way
    the performance of a computer?
  • What tools do we use to find those metrics?

6
Metrics
  • Time to run the task (ExTime)
  • Execution time, response time, latency
  • Tasks per day, hour, week, sec, ns
    (Performance)
  • Throughput, bandwidth

7
Metrics - Comparisons
  • "X is n times faster than Y" means
  • ExTime(Y) Performance(X)
  • --------- ---------------
  • ExTime(X) Performance(Y)
  • Speed of Concorde vs. Boeing 747
  • Throughput of Boeing 747 vs. Concorde

8
Metrics - Comparisons
  • Pat has developed a new product, "rabbit" about
    which she wishes to determine performance. There
    is special interest in comparing the new product,
    rabbit to the old product, turtle, since the
    product was rewritten for performance reasons.
    (Pat had used Performance Engineering techniques
    and thus knew that rabbit was "about twice as
    fast" as turtle.) The measurements showed
  •  
  • Performance Comparisons
  •  
  • Product Transactions / second Seconds/
    transaction Seconds to process transaction
  • Turtle 30 0.0333 3
  • Rabbit 60 0.0166 1
  • Which of the following statements reflect the
    performance comparison of rabbit and turtle?
  •  

o Rabbit is 100 faster than turtle. o Rabbit is
twice as fast as turtle. o Rabbit takes 1/2 as
long as turtle. o Rabbit takes 1/3 as long as
turtle. o Rabbit takes 100 less time than turtle.
o Rabbit takes 200 less time than turtle. o
Turtle is 50 as fast as rabbit. o Turtle is 50
slower than rabbit. o Turtle takes 200 longer
than rabbit. o Turtle takes 300 longer than
rabbit.
9
Metrics - Throughput
10
Methods For Predicting Performance
  • Benchmarks, Traces, Mixes
  • Hardware Cost, delay, area, power estimation
  • Simulation (many levels)
  • ISA, RT, Gate, Circuit
  • Queuing Theory
  • Rules of Thumb
  • Fundamental Laws/Principles

11
Benchmarks
SPEC System Performance Evaluation Cooperative
  • First Round 1989
  • 10 programs yielding a single number
    (SPECmarks)
  • Second Round 1992
  • SPECInt92 (6 integer programs) and SPECfp92 (14
    floating point programs)
  • Compiler Flags unlimited. March 93 of DEC 4000
    Model 610
  • spice unix.c/def(sysv,has_bcopy,bcopy(a,b,c)
    memcpy(b,a,c)
  • wave5 /ali(all,dcomnat)/aga/ur4/ur200
  • nasa7 /norecu/aga/ur4/ur2200/lcblas
  • Third Round 1995
  • new set of programs SPECint95 (8 integer
    programs) and SPECfp95 (10 floating point)
  • benchmarks useful for 3 years
  • Single flag setting for all programs
    SPECint_base95, SPECfp_base95

12
Benchmarks
CINT2000 (Integer Component of SPEC CPU2000)
  • Program Language What Is It
  • 164.gzip C Compression
  • 175.vpr C FPGA Circuit Placement and Routing
  • 176.gcc C C Programming Language Compiler
  • 181.mcf C Combinatorial Optimization
  • 186.crafty C Game Playing Chess
  • 197.parser C Word Processing
  • 252.eon C Computer Visualization
  • 253.perlbmk C PERL Programming Language
  • 254.gap C Group Theory, Interpreter
  • 255.vortex C Object-oriented Database
  • 256.bzip2 C Compression
  • 300.twolf C Place and Route Simulator

http//www.spec.org/osg/cpu2000/CINT2000/
13
Benchmarks
CFP2000 (Floating Point Component of SPEC
CPU2000)
  • Program Language What Is It
  • 168.wupwise Fortran 77 Physics / Quantum
    Chromodynamics
  • 171.swim Fortran 77 Shallow Water Modeling
  • 172.mgrid Fortran 77 Multi-grid Solver 3D
    Potential Field
  • 173.applu Fortran 77 Parabolic / Elliptic
    Differential Equations
  • 177.mesa C 3-D Graphics Library
  • 178.galgel Fortran 90 Computational Fluid
    Dynamics
  • 179.art C Image Recognition / Neural Networks
  • 183.equake C Seismic Wave Propagation Simulation
  • 187.facerec Fortran 90 Image Processing Face
    Recognition
  • 188.ammp C Computational Chemistry
  • 189.lucas Fortran 90 Number Theory / Primality
    Testing
  • 191.fma3d Fortran 90 Finite-element Crash
    Simulation
  • 200.sixtrack Fortran 77 High Energy Physics
    Accelerator Design
  • 301.apsi Fortran 77 Meteorology Pollutant
    Distribution

http//www.spec.org/osg/cpu2000/CFP2000/
14
Benchmarks
Sample Results For SpecINT2000
http//www.spec.org/osg/cpu2000/results/res2000q3/
cpu2000-20000718-00168.asc
Base Base
Base Peak Peak Peak Benchmarks
Ref Time Run Time Ratio Ref Time
Run Time Ratio 164.gzip 1400
277 505 1400 270
518 175.vpr 1400 419 334
1400 417 336 176.gcc
1100 275 399 1100 272
405 181.mcf 1800 621
290 1800 619 291 186.crafty
1000 191 522 1000 191
523 197.parser 1800 500
360 1800 499 361 252.eon
1300 267 486 1300 267
486 253.perlbmk 1800 302
596 1800 302 596 254.gap
1100 249 442 1100 248
443 255.vortex 1900 268
710 1900 264 719 256.bzip2
1500 389 386 1500 375
400 300.twolf 3000 784
382 3000 776 387 SPECint_base200
0 438 SPECint2000

442

Intel OR840(1 GHz Pentium III processor)
15
Benchmarks
Performance Evaluation
  • For better or worse, benchmarks shape a field
  • Good products created when have
  • Good benchmarks
  • Good ways to summarize performance
  • Given sales is a function in part of performance
    relative to competition, investment in improving
    product as reported by performance summary
  • If benchmarks/summary inadequate, then choose
    between improving product for real programs vs.
    improving product to get more salesSales almost
    always wins!
  • Execution time is the measure of computer
    performance!

16
Benchmarks
How to Summarize Performance
  • Management would like to have one number.
  • Technical people want more
  • They want to have evidence of reproducibility
    there should be enough information so that you or
    someone else can repeat the experiment.
  • There should be consistency when doing the
    measurements multiple times.

How would you report these results?
Computer A Computer B Computer C
Program P1 (secs) 1 10 20
Program P2 (secs) 1000 100 20
Total Time (secs) 1001 110 40
17
Quantitative Principles of Computer Design
  • 1.1 Introduction
  • 1.2 The Task of a Computer Designer
  • 1.3 Technology and Computer Usage Trends
  • 1.4 Cost and Trends in Cost
  • 1.5 Measuring and Reporting Performance
  • 1.6 Quantitative Principles of Computer Design
  • 1.7 Putting It All Together The Concept of
    Memory Hierarchy

Make the common case fast. Amdahls Law Relates
total speedup of a system to the speedup of some
portion of that system.
18
Amdahl's Law
Quantitative Design
Speedup due to enhancement E
This fraction enhanced
  • Suppose that enhancement E accelerates a fraction
    F of the task by a factor S, and the remainder of
    the task is unaffected

19
Quantitative Design
Cycles Per Instruction
CPI (CPU Time Clock Rate) / Instruction Count
Cycles / Instruction Count
Number of instructions of type I.
Instruction Frequency
where
  • Invest Resources where time is Spent!

20
Quantitative Design
Cycles Per Instruction
Suppose we have a machine where we can count the
frequency with which instructions are executed.
We also know how many cycles it takes for each
instruction type.
  • Base Machine (Reg / Reg)
  • Op Freq Cycles CPI(i) ( Time)
  • ALU 50 1 .5 (33)
  • Load 20 2 .4 (27)
  • Store 10 2 .2 (13)
  • Branch 20 2 .4 (27)
  • Total CPI 1.5

21
Quantitative Design
Locality of Reference
  • Programs access a relatively small portion of the
    address space at any instant of time.
  • There are two different types of locality
  • Temporal Locality (locality in time) If an item
    is referenced, it will tend to be referenced
    again soon (loops, reuse, etc.)
  • Spatial Locality (locality in space/location)
    If an item is referenced, items whose addresses
    are close by tend to be referenced soon (straight
    line code, array access, etc.)

22
The Concept of Memory Hierarchy
  • 1.1 Introduction
  • 1.2 The Task of a Computer Designer
  • 1.3 Technology and Computer Usage Trends
  • 1.4 Cost and Trends in Cost
  • 1.5 Measuring and Reporting Performance
  • 1.6 Quantitative Principles of Computer Design
  • 1.7 Putting It All Together The Concept of
    Memory Hierarchy

Fast memory is expensive. Slow memory is
cheap. The goal is to minimize the
price/performance for a particular price point.
23
Memory Hierarchy
Registers
Level 1 cache
Level 2 Cache
Memory
Disk
Typical Size 4 - 64 lt16K bytes lt2 Mbytes lt16 Gigabytes gt 5 Gigabytes
Access Time 1 nsec 3 nsec 15 nsec 150 nsec 5,000,000 nsec
Bandwidth (in MB/sec) 10,000 50,000 2000 - 5000 500 - 1000 500 - 1000 100
Managed By Compiler Hardware Hardware OS OS/User
24
Memory Hierarchy
  • Hit data appears in some block in the upper
    level (example Block X)
  • Hit Rate the fraction of memory access found in
    the upper level
  • Hit Time Time to access the upper level which
    consists of
  • RAM access time Time to determine hit/miss
  • Miss data needs to be retrieve from a block in
    the lower level (Block Y)
  • Miss Rate 1 - (Hit Rate)
  • Miss Penalty Time to replace a block in the
    upper level
  • Time to deliver the block the processor
  • Hit Time ltlt Miss Penalty (500 instructions on
    21264!)

25
Memory Hierarchy
Registers
Level 1 cache
Level 2 Cache
Memory
Disk
  • What is the cost of executing a program if
  • Stores are free (theres a write pipe)
  • Loads are 20 of all instructions
  • 80 of loads hit (are found) in the Level 1 cache
  • 97 of loads hit in the Level 2 cache.

26
The Instruction Set
  • 2.1 Introduction
  • 2.2 Classifying Instruction Set Architectures
  • 2.3 Memory Addressing
  • 2.4 Operations in the Instruction Set
  • 2.5 Type and Size of Operands
  • 2.6 Encoding and Instruction Set
  • 2.7 The Role of Compilers
  • 2.8 The MIPS Architecture
  • Bonus

27
Introduction
  • The Instruction Set Architecture is that portion
    of the machine visible to the assembly level
    programmer or to the compiler writer.
  1. What are the advantages and disadvantages of
    various instruction set alternatives.
  2. How do languages and compilers affect ISA.
  3. Use the DLX architecture as an example of a RISC
    architecture.

28
Classifying Instruction Set Architectures
2.1 Introduction
2.2 Classifying Instruction Set
Architectures 2.3 Memory Addressing 2.4
Operations in the Instruction Set 2.5 Type and
Size of Operands 2.6 Encoding and Instruction
Set 2.7 The Role of Compilers 2.8 The DLX
Architecture
  • Classifications can be by
  • Stack/accumulator/register
  • Number of memory operands.
  • Number of total operands.

29
Instruction Set Architectures
Basic ISA Classes
  • Accumulator
  • 1 address add A acc acc memA
  • 1x address addx A acc acc memA x
  • Stack
  • 0 address add tos tos next
  • General Purpose Register
  • 2 address add A B EA(A) EA(A) EA(B)
  • 3 address add A B C EA(A) EA(B) EA(C)
  • Load/Store
  • 0 Memory load R1, Mem1
  • load R2, Mem2
  • add R1, R2
  • 1 Memory add R1, Mem2

ALU Instructions can have two or three operands.
ALU Instructions can have 0, 1, 2, 3 operands.
Shown here are cases of 0 and 1.
30
Instruction Set Architectures
Basic ISA Classes
The results of different address classes is
easiest to see with the examples here, all of
which implement the sequences for C A B.
Stack Accumulator Register (Register-memory) Register (load-store)
Push A Load A Load R1, A Load R1, A
Push B Add B Add R1, B Load R2, B
Add Store C Store C, R1 Add R3, R1, R2
Pop C Store C, R3
Registers are the class that won out. The more
registers on the CPU, the better.
31
Instruction Set Architectures
Intel 80x86 Integer Registers
GPR0 EAX Accumulator
GPR1 ECX Count register, string, loop
GPR2 EDX Data Register multiply, divide
GPR3 EBX Base Address Register
GPR4 ESP Stack Pointer
GPR5 EBP Base Pointer for base of stack seg.
GPR6 ESI Index Register
GPR7 EDI Index Register
CS Code Segment Pointer
SS Stack Segment Pointer
DS Data Segment Pointer
ES Extra Data Segment Pointer
FS Data Seg. 2
GS Data Seg. 3
PC EIP Instruction Counter
Eflags Condition Codes
32
Memory Addressing
2.1 Introduction
2.2 Classifying Instruction Set
Architectures 2.3 Memory Addressing 2.4
Operations in the Instruction Set 2.5 Type and
Size of Operands 2.6 Encoding and Instruction
Set 2.7 The Role of Compilers 2.8 The DLX
Architecture
  • Sections Include
  • Interpreting Memory Addresses
  • Addressing Modes
  • Displacement Address Mode
  • Immediate Address Mode

33
Memory Addressing
Interpreting Memory Addresses
  • What object is accessed as a function of the
    address and length?
  • Objects have byte addresses an address refers
    to the number of bytes counted from the beginning
    of memory.
  • Little Endian puts the byte whose address is
    xx00 at the least significant position in the
    word.
  • Big Endian puts the byte whose address is xx00
    at the most significant position in the word.
  • Alignment data must be aligned on a boundary
    equal to its size. Misalignment typically
    results in an alignment fault that must be
    handled by the Operating System.

34
Memory Addressing
Addressing Modes
  • This table shows the most common modes.

Addressing Mode Example Instruction Meaning When Used
Register Add R4, R3 RR4 lt- RR4 RR3 When a value is in a register.
Immediate Add R4, 3 RR4 lt- RR4 3 For constants.
Displacement Add R4, 100(R1) RR4 lt- RR4 M100RR1 Accessing local variables.
Register Deferred Add R4, (R1) RR4 lt- RR4 MRR1 Using a pointer or a computed address.
Absolute Add R4, (1001) RR4 lt- RR4 M1001 Used for static data.
35
Memory Addressing
Displacement Addressing Mode
  • How big should the displacement be?
  • For addresses that do fit in displacement size
  • Add R4, 10000 (R0)
  • For addresses that dont fit in displacement
    size, the compiler must do the following
  • Load R1, address
  • Add R4, 0 (R1)
  • Depends on typical displaces as to how big this
    should be.
  • On both IA32 and DLX, the space allocated is 16
    bits.

36
Memory Addressing
Immediate Address Mode
  • Used where we want to get to a numerical value in
    an instruction.

At high level a b 3 if ( a gt 17
) goto Addr
At Assembler level Load R2, 3 Add R0,
R1, R2 Load R2, 17 CMPBGT R1,
R2 Load R1, Address Jump (R1)
37
Operations In The Instruction Set
2.1 Introduction
2.2 Classifying Instruction Set
Architectures 2.3 Memory Addressing 2.4
Operations in the Instruction Set 2.5 Type and
Size of Operands 2.6 Encoding and Instruction
Set 2.7 The Role of Compilers 2.8 The DLX
Architecture
  • Sections Include
  • Detailed information about types of instructions.
  • Instructions for Control Flow (conditional
    branches, jumps)

38
Operations In The Instruction Set
Operator Types
  • Arithmetic and logical and, add
  • Data transfer move, load
  • Control branch, jump, call
  • System system call, traps
  • Floating point add, mul, div, sqrt
  • Decimal add, convert
  • String move, compare
  • Multimedia - 2D, 3D? e.g., Intel MMX and Sun
    VIS

39
Operations In The Instruction Set
Control Instructions
Conditional branches are 20 of all instructions!!
  • Control Instructions Issues
  • taken or not
  • where is the target
  • link return address
  • save or restore
  • Instructions that change the PC
  • (conditional) branches, (unconditional) jumps
  • function calls, function returns
  • system calls, system returns

40
Operations In The Instruction Set
Control Instructions
  • There are numerous tradeoffs
  • Compare and branch
  • no extra compare, no state passed between
    instructions
  • -- requires ALU op, restricts code scheduling
    opportunities
  • Implicitly set condition codes Z, N, V, C
  • can be set for free''
  • -- constrains code reordering, extra state to
    save/restore
  • Explicitly set condition codes
  • can be set for free'', decouples branch/fetch
    from pipeline
  • -- extra state to save/restore

There are numerous tradeoffs condition in
generalpurpose register no special state but
uses up a register -- branch condition separate
from branch logic in pipeline some data for MIPS
gt 80 branches use immediate data, gt 80 of
those zero 50 branches use 0 or ltgt 0
compromise in MIPS branch0, branchltgt0
compare instructions for all other compares
41
Operations In The Instruction Set
Control Instructions
  • Link Return Address
  • implicit register many recent architectures use
    this
  • fast, simple
  • -- s/w save register before next call, surprise
    traps?
  • explicit register
  • may avoid saving register
  • -- register must be specified
  • processor stack
  • recursion direct
  • -- complex instructions

Save or restore state What state? function
calls registers system calls registers, flags,
PC, PSW, etc Hardware need not save registers
Caller can save registers in use Callee save
registers it will use Hardware register save
IBM STM, VAX CALLS Faster? Many recent
architectures do no register saving Or do
implicit register saving with register windows
(SPARC)
42
Type And Size of Operands
2.1 Introduction
2.2 Classifying Instruction Set
Architectures 2.3 Memory Addressing 2.4
Operations in the Instruction Set 2.5 Type and
Size of Operands 2.6 Encoding and Instruction
Set 2.7 The Role of Compilers 2.8 The DLX
Architecture
  • The type of the operand is usually encoded in the
    Opcode a LDW implies loading of a word.
  • Common sizes are
  • Character (1 byte)
  • Half word (16 bits)
  • Word (32 bits)
  • Single Precision Floating Point (1 Word)
  • Double Precision Floating Point (2 Words)
  • Integers are twos complement binary.
  • Floating point is IEEE 754.
  • Some languages (like COBOL) use packed decimal.

43
Encoding And Instruction Set
  • This section has to do with how an assembly level
    instruction is encoded into binary.
  • Ultimately, its the binary that is read and
    interpreted by the machine.

2.1 Introduction
2.2 Classifying Instruction Set
Architectures 2.3 Memory Addressing 2.4
Operations in the Instruction Set 2.5 Type and
Size of Operands 2.6 Encoding and Instruction
Set 2.7 The Role of Compilers 2.8 The DLX
Architecture
44
Encoding And Instruction Set
80x86 Instruction Encoding
Heres some sample code thats been disassembled.
It was compiled with the debugger option so is
not optimized.
  • for ( index 0 index lt iterations index )
  • 0040D3AF C7 45 F0 00 00 00 00 mov
    dword ptr ebp-10h,0
  • 0040D3B6 EB 09 jmp
    main0D1h (0040d3c1)
  • 0040D3B8 8B 4D F0 mov
    ecx,dword ptr ebp-10h
  • 0040D3BB 83 C1 01 add
    ecx,1
  • 0040D3BE 89 4D F0 mov
    dword ptr ebp-10h,ecx
  • 0040D3C1 8B 55 F0 mov
    edx,dword ptr ebp-10h
  • 0040D3C4 3B 55 F8 cmp
    edx,dword ptr ebp-8
  • 0040D3C7 7D 15 jge
    main0EEh (0040d3de)
  • long_temp (alignment
    long_temp) 47
  • 0040D3C9 8B 45 F4 mov
    eax,dword ptr ebp-0Ch
  • 0040D3CC 8B 00 mov
    eax,dword ptr eax
  • 0040D3CE 03 45 EC add
    eax,dword ptr ebp-14h
  • 0040D3D1 99 cdq
  • 0040D3D2 B9 2F 00 00 00 mov
    ecx,2Fh
  • 0040D3D7 F7 F9 idiv
    eax,ecx
  • 0040D3D9 89 55 EC mov
    dword ptr ebp-14h,edx
  • 0040D3DC EB DA jmp
    main0C8h (0040d3b8)

This code was produced using Visual Studio
45
Encoding And Instruction Set
80x86 Instruction Encoding
Heres some sample code thats been disassembled.
It was compiled with optimization
  • for ( index 0 index lt iterations index )
  • 00401000 8B 0D 40 54 40 00 mov
    ecx,dword ptr ds405440h
  • 00401006 33 D2 xor
    edx,edx
  • 00401008 85 C9 test
    ecx,ecx
  • 0040100A 7E 14 jle
    00401020
  • 0040100C 56 push esi
  • 0040100D 57 push edi
  • 0040100E 8B F1 mov
    esi,ecx
  • long_temp (alignment long_temp) 47
  • 00401010 8D 04 11 lea
    eax,ecxedx
  • 00401013 BF 2F 00 00 00 mov
    edi,2Fh
  • 00401018 99 cdq
  • 00401019 F7 FF idiv
    eax,edi
  • 0040101B 4E dec esi
  • 0040101C 75 F2 jne
    00401010
  • 0040101E 5F pop edi
  • 0040101F 5E pop esi
  • 00401020 C3 ret

This code was produced using Visual Studio
46
Encoding And Instruction Set
80x86 Instruction Encoding
Heres some sample code thats been disassembled.
It was compiled with optimization
  • for ( index 0 index lt iterations index )
  • 0x804852f ltmain143gt add 0x10,esp
  • 0x8048532 ltmain146gt lea 0xfffffff8(ebp),e
    dx
  • 0x8048535 ltmain149gt test esi,esi
  • 0x8048537 ltmain151gt jle 0x8048543
    ltmain163gt
  • 0x8048539 ltmain153gt mov esi,eax
  • 0x804853b ltmain155gt nop
  • 0x804853c ltmain156gt lea 0x0(esi,1),esi
  • long_temp (alignment long_temp) 47
  • 0x8048540 ltmain160gt dec eax
  • 0x8048541 ltmain161gt jne 0x8048540
    ltmain160gt
  • 0x8048543 ltmain163gt add 0xfffffff4,esp

This code was produced using gcc and gdb.
Note that the representation of the code is
dependent on the compiler/debugger!
47
Encoding And Instruction Set
  • 80x86 Instruction Encoding

3
4
8
1
A Morass of disjoint encoding!!
Reg
ADD
Disp.
W
6
8
2
8
postbyte
SHL
V/w
Disp.
7
1
8
8
TEST
W
postbyte
Immediate
48
Encoding And Instruction Set
  • 80x86 Instruction Encoding

4
4
8
Cond
JE
Disp.
16
16
8
CALLF
Offset
Segment Number
6
8
2
8
postbyte
MOV
D/w
Disp.
5
3
PUSH
Reg
49
The Role of Compilers
2.1 Introduction
2.2 Classifying Instruction Set
Architectures 2.3 Memory Addressing 2.4
Operations in the Instruction Set 2.5 Type and
Size of Operands 2.6 Encoding and Instruction
Set 2.7 The Role of Compilers 2.8 The DLX
Architecture
  • Compiler goals
  • All correct programs execute correctly
  • Most compiled programs execute fast
    (optimizations)
  • Fast compilation
  • Debugging support

50
The Role of Compilers
  • Steps In Compilation

Parsing gt intermediate representation Jump
Optimization Loop Optimizations Register
Allocation Code Generation gt assembly code
Common SubExpression Procedure in-lining
Constant Propagation Strength Reduction
Pipeline Scheduling
51
The Role of Compilers
  • Steps In Compilation

Optimization Name Explanation of the total number of optimizing transformations
High Level At or near the source level machine-independent Not Measured
Local Within Straight Line Code 40
Global Across A Branch 42
Machine Dependent Depends on Machine Knowledge Not Measured
52
The Role of Compilers
  • What compiler writers want
  • regularity
  • orthogonality
  • composability
  • Compilers perform a giant case analysis
  • too many choices make it hard
  • Orthogonal instruction sets
  • operation, addressing mode, data type
  • One solution or all possible solutions
  • 2 branch conditions eq, lt
  • or all six eq, ne, lt, gt, le, ge
  • not 3 or 4
  • There are advantages to having instructions that
    are primitives.
  • Let the compiler put the instructions together to
    make more complex sequences.

53
The MIPS Architecture
  • MIPS is very RISC oriented.
  • MIPS will be used for many examples throughout
    the course.

2.1 Introduction
2.2 Classifying Instruction Set
Architectures 2.3 Memory Addressing 2.4
Operations in the Instruction Set 2.5 Type and
Size of Operands 2.6 Encoding and Instruction
Set 2.7 The Role of Compilers 2.8 The MIPS
Architecture
MIPS (originally an acronym for Microprocessor
without Interlocked Pipeline Stages) is a RISC
microprocessor architecture developed by MIPS
Technologies. We will look at the Pipeline
concept in our next lecture.
The acronym RISC (pronounced risk), for reduced
instruction set computer represents a CPU design
strategy emphasizing the insight that simplified
instructions which "do less" may still provide
for higher performance if this simplicity can be
utilized to make instructions execute very fast.
Well known RISC families include DEC Alpha, ARC,
ARM, AVR, MIPS, PA-RISC, Power Architecture
(including PowerPC), and SPARC.
54
The MIPS Architecture
  • MIPS Characteristics
  • Addressing Modes
  • Immediate
  • Displacement
  • (Register Mode used only for ALU)

32bit byte addresses aligned Load/store only
displacement addressing Standard data types 3
fixed length formats 32 32bit GPRs (r0 0) 16
64bit (32 32bit) FPRs FP status register No
Condition Codes
  • Data transfer
  • load/store word, load/store byte/half word
    signed?
  • load/store FP single/double
  • moves between GPRs and FPRs
  • ALU
  • add/subtract signed? immediate?
  • multiply/divide signed?
  • and, or, xor immediate?, shifts ll, rl, ra
    immediate?
  • sets immediate?

Theres MIPS 64 the current arch. Standard
datatypes 4 fixed length formats (8,16,32,64) 32
64bit GPRs (r0 0) 64 64bit FPRs
55
The MIPS Architecture
  • MIPS Characteristics
  • Control
  • branches 0, ltgt 0
  • conditional branch testing FP bit
  • jump, jump register
  • jump link, jump link register
  • trap, returnfromexception
  • Floating Point
  • add/sub/mul/div
  • single/double
  • fp converts, fp set

56
The DLX Architecture
The DLX is a RISC processor architecture design
by the principal designers of the MIPS and the
Berkeley RISC designs, the two benchmark examples
of RISC design. The DLX is essentially a cleaned
up and simplified MIPS with a simple 32-bit
load/store architecture. Intended primarily for
teaching purposes, the DLX design is widely used
in university-level computer architecture courses.
The next couple of lectures will use the MIPS and
DLX architectures as examples to demonstrate
concepts.
57
End of Lecture
Write a Comment
User Comments (0)
About PowerShow.com