Central Processing Unit Architecture - PowerPoint PPT Presentation

About This Presentation

Title:

Central Processing Unit Architecture

Description:

... unit for each possible instruction. 2 fp units, 1 integer unit, 2 MBRs ... make life easier for compiler writers. support more complex higher-level languages ... – PowerPoint PPT presentation

Number of Views:44

Avg rating:3.0/5.0

Slides: 35

Provided by: RonDan8

Learn more at: https://www.cse.scu.edu

Category:

more less

Transcript and Presenter's Notes

Title: Central Processing Unit Architecture

1
Central Processing Unit Architecture

Architecture overview
Machine organization
von Neumann
Speeding up CPU operations
multiple registers
pipelining
superscalar and VLIW
CISC vs. RISC

2
Computer Architecture

Major components of a computer
Central Processing Unit (CPU)
memory
peripheral devices
Architecture is concerned with
internal structures of each
interconnections
speed and width
relative speeds of components
Want maximum execution speed
Balance is often critical issue

3
Computer Architecture (continued)

CPU
performs arithmetic and logical operations
synchronous operation
may consider instruction set architecture
how machine looks to a programmer
detailed hardware design

4
Computer Architecture (continued)

Memory
stores programs and data
organized as
bit
byte 8 bits (smallest addressable location)
word 4 bytes (typically machine dependent)
instructions consist of operation codes and
addresses

oprn
addr 1
oprn
addr 2
addr 1
oprn
addr 3
addr 2
addr 1
5
Computer Architecture (continued)

Numeric data representations
integer (exact representation)
sign-magnitude
2s complement
negative values change 0 to 1, add 1
floating point (approximate representation)
scientific notation 0.3481 x 106
inherently imprecise
IEEE Standard 754-1985

s
magnitude
s
exp
significand
6
Simple Machine Organization

Institute for Advanced Studies machine (1947)
von Neumann machine
ALU performs transfers between memory and I/O
devices
note two instructions per memory word

Arithmetic - Logic Unit
main memory
Input- Output Equipment
Program Control Unit
0
8
20
28
39
op code
op code
address
address
7
Simple Machine Organization (continued)

ALU does arithmetic and logical comparisons
AC accumulator holds results
MQ memory-quotient holds second portion of long
results
MBR memory buffer register holds data while
operation executes

8
Simple Machine Organization (continued)

Program control determines what computer does
based on instruction read from memory
MAR memory address register holds address of
memory cell to be read
PC program counter address of next instruction
to be read
IR instruction register holds instruction being
executed
IBR holds right half of instruction read from
memory

9
Simple Machine Organization (continued)

Machine operates on fetch-execute cycle
Fetch
PC MAR
read M(MAR) into MBR
copy left and right instructions into IR and IBR
Execute
address part of IR MAR
read M(MAR) into MBR
execute opcode

10
Simple Machine Organization (continued)
11
Architecture Families

Before mid-60s, every new machine had a
different instruction set architecture
programs from previous generation didnt run on
new machine
cost of replacing software became too large
IBM System/360 created family concept
single instruction set architecture
wide range of price and performance with same
software
Performance improvements based on different
detailed implementations
memory path width (1 byte to 8 bytes)
faster, more complex CPU design
greater I/O throughput and overlap
Software compatibility now a major issue
partially offset by high level language (HLL)
software

12
Architecture Families
13
Multiple Register Machines

Initially, machines had only a few registers
2 to 8 or 16 common
registers more expensive than memory
Most instructions operated between memory
locations
results had to start from and end up in memory,
so fewer instructions
although more complex
means smaller programs and (supposedly) faster
execution
fewer instructions and data to move between
memory and ALU
But registers are much faster than memory
30 times faster

14
Multiple Register Machines (continued)

Also, many operands are reused within a short
time
waste time loading operand again the next time
its needed
Depending on mix of instructions and operand use,
having many registers may lead to less traffic to
memory and faster execution
Most modern machines use a multiple register
architecture
maximum number about 512, common number 32
integer, 32 floating point

15
Pipelining

One way to speed up CPU is to increase clock rate
limitations on how fast clock can run to complete
instruction
Another way is to execute more than one
instruction at one time

16
Pipelining

Pipelining breaks instruction execution down into
several stages
put registers between stages to buffer data and
control
execute one instruction
as first starts second stage, execute second
instruction, etc.
speedup same as number of stages as long as pipe
is full

17
Pipelining (continued)

Consider an example with 6 stages
FI fetch instruction
DI decode instruction
CO calculate location of operand
FO fetch operand
EI execute instruction
WO write operand (store result)

18
Pipelining Example

Executes 9 instructions in 14 cycles rather than
54 for sequential execution

19
Pipelining (continued)

Hazards to pipelining
conditional jump
instruction 3 branches to instruction 15
pipeline must be flushed and restarted
later instruction needs operand being calculated
by instruction still in pipeline
pipeline stalls until result ready

20
Pipelining Problem Example

Is this really a problem?

21
Real-life Problem

Not all instructions execute in one clock cycle
floating point takes longer than integer
fp divide takes longer than fp multiply which
takes longer than fp add
typical values
integer add/subtract 1
memory reference 1
fp add 2 (make 2 stages)
fp (or integer) multiply 6 (make 2 stages)
fp (or integer) divide 15
Break floating point unit into a sub-pipeline
execute up to 6 instructions at once

22
Pipelining (continued)

This is not simple to implement
note all 6 instructions could finish at the same
time!!

23
More Speedup

Pipelined machines issue one instruction each
clock cycle
how to speed up CPU even more?
Issue more than one instruction per clock cycle

24
Superscalar Architectures

Superscalar machines issue a variable number of
instructions each clock cycle, up to some maximum
instructions must satisfy some criteria of
independence
simple choice is maximum of one fp and one
integer instruction per clock
need separate execution paths for each possible
simultaneous instruction issue
compiled code from non-superscalar implementation
of same architecture runs unchanged, but slower

25
Superscalar Example
0
2
3
4
5
6
7
8
1
clock

Each instruction path may be pipelined

26
Superscalar Problem

Instruction-level parallelism
what if two successive instructions cant be
executed in parallel?
data dependencies, or two instructions of slow
type
Design machine to increase multiple execution
opportunities

27
VLIW Architectures

Very Long Instruction Word (VLIW) architectures
store several simple instructions in one long
instruction fetched from memory
number and type are fixed
e.g., 2 memory reference, 2 floating point, one
integer
need one functional unit for each possible
instruction
2 fp units, 1 integer unit, 2 MBRs
all run synchronized
each instruction is stored in a single word
requires wider memory communication paths
many instructions may be empty, meaning wasted
code space

28
VLIW Example
29
Instruction Level Parallelism

Success of superscalar and VLIW machines depends
on number of instructions that occur together
that can be issued in parallel
no dependencies
no branches
Compilers can help create parallelism
Speculation techniques try to overcome branch
problems
assume branch is taken
execute instructions but dont let them store
results until status of branch is known

30
CISC vs. RISC

CISC Complex Instruction Set Computer
RISC Reduced Instruction Set Computer

31
CISC vs. RISC (continued)

Historically, machines tend to add features over
time
instruction opcodes
IBM 70X, 70X0 series went from 24 opcodes to 185
in 10 years
same time performance increased 30 times
addressing modes
special purpose registers
Motivations are to
improve efficiency, since complex instructions
can be implemented in hardware and execute faster
make life easier for compiler writers
support more complex higher-level languages

32
CISC vs. RISC