Title: Processors
1Processors
- CS227
- Western Washington University
2vonNeumann Model
Recall the vonNeumann model of a computer A CPU
(Central Processing Unit) Main Memory I/O devices
all attached via a busa series of wires. What
makes up the CPU? Usually a CPU has a control
unit, an ALU(Arithmetic Logic Unit) and a
collection of registers. The control unit is
responsible for fetching instructions from memory
and determining their type.
3The ALU is responsible for performing arithmetic
and boolean operations. The registers provide a
small amount of high speed memory. All registers
are of the same size, usually 16, 32, or 64 bits.
But each one has a particular function. One
important register is the PC(Program Counter)
which contains the address of the next
instruction to be executed. The IR(Instruction
Register) contains the current instruction being
executed.
4The Data Path
The registers of the CPU communicate with the ALU
via several buses called the data path. The ALU
also has registers for input and output that
receive data from and return data to the CPU
registers. One possible categorization of
instructions is register-memory
register-register Register-memory instructions
are loaded from memory into a register, or stored
into memory from a register. Register-register
instructions involve operations between
registers, such as addition.
5The data path cycle is the process of running 2
operands thru the ALU and storing the result in
a register. The speed of a processor is
highly dependent on the speed of the data path
cycle.
6Instruction Execution
Instructions are executed by the CPU using a
fetch-decode-execute cycle. Step 1 Fetch the
next instruction from memory into the IR. Step
2 Change the PC to point to the next
instruction. Step 3 Determine the type of
instruction just fetched. Step 4 If the
instruction needs a word from memory, determine
where it is. Step 5 Fetch the word into a
register. Step 6 Execute the instruction.
7Instructions
All instructions are executed using the
fetch-decode-execute cycle. The speed of this
cycle has a great effect on the overall speed of
the completion of a program. Some more complex
operations could take a longer time to execute.
I.e. floating point operations, array
indexing. But these operations need to be done
frequently. Sohow about making instructions
explicitly defined to support these operations.
This solution requires hardware support.
Hardware support is justifiable for high
performance machines, but not so much for low end
machines.
8Interpreted Instructions
IBM decided to develop families of computers
based on a particular architecture. However, the
architecture could be implemented in various
waysdiffering in price and speed. But
how? Interpreted instructions are a
cost-effective alternative to hardware support
for all instructions. So some IBM computers
could be high end and have hardware support for
all instructions. Others could be low end having
some interpreted support for instructions in
addition to hardware. What are some other
advantages to interpreted support of instructions?
9Advantages of Interpreted Instructions
- Incorrect instructions can be corrected in the
field. - New instructions can be added after the machine
has been built. - Cost limitations for complex instructions are
alleviated. - Processor development could actually be
simplified because complex instructions could be
supported by the interpreter. - These advantages required fast read only memory
called control stores to hold the interpreters. -
10RISC
Rather than continuing to add more and more
complex instructions, some folks thought that
simpler might be better. If the instructions are
simpler, they may no longer require the support
of an interpreter. The term RISC was coined for
VLSI chips that did not require interpretation.
RISC stands for Reduced Instruction Set
Computer. One RISC chip was developed at
Berkeley by Patterson, which later evolved into
the SPARC chip. A similar RISC chip was developed
at Stanford called the MIPS. Previous chips
earned the description CISC for Complex
Instruction Set Computer.
11Who Won the War?
A battle between the RISC camp and CISC camp
ensued. But no one ever really wins a war! RISC
definitely had performance advantages. All
instructions were supported by hardware. And the
chip was properly designed without any backward
compatibility issues. CISC already had a fan
club. It had dominated the market and companies
had heavily invested in software applications for
CISC chips. The solution is a hybrid
approachmore complex instructions are
interpretedsimpler, more common instructions are
directly supported by hardware.
12Modern Computer Design
- Modern computer design is based on the RISC
design principles - All instructions are directly executable by the
hardware. - Maximize the rate at which instructions are
issued. - Instructions should be easy to decode.
- Only loads and stores should reference memory
- Provide plenty of registers.
13Parallelism
There are some physical limitations on clock
speed. Another approach to improving performance
is to try to get multiple things done at
one. This is parallelism. Parallelism can be
implemented at the instruction level and at the
processor level. Instruction level parallelism
results in more instructions per second issued by
the computer rather than improving the execution
speed of any particular instruction. Processor
level parallelism involves multiple processors
working together on the same problem.
14Pipelining
The biggest bottleneck in the fetch-decode-execute
cycle is fetching the instruction from memory.
One solution is to prefetch instructions and
store them in a buffer. Then theyll already be
loaded in registers when its time for
execution. Pipelining is an extension of
prefetching in that it divides the instruction
execution into multiple parts. Each part of
execution is handled by a dedicated piece of
hardware. And these can be run in
parallel. This is essentially an assembly-line
approach to executing an instruction. A single
instruction may actually take longer to execute,
but overall processor bandwidth is better.
15A Sample Pipeline
Instruction Fetch Unit
Instruction Decode Unit
Operand Fetch Unit
Instruction Execution Unit
Write Back Unit
16Two pipelines are surely better than 1! Pairs of
instructions can be fetched and then fed into the
pipeline to be completed simultaneously. Does
this always work? What if there are dependencies
between the 2 instructions? It is the
responsibility of the compiler to determine and
keep track of dependencies between the
instructions. Pipelines are mostly used on RISC
machines. But the Intel family introduced
pipelines in the 486 processor. One pipeline
handled any arbitrary instruction and the other
pipeline could execute only simple integer
instructions. Programs optimized for this arch
showed a 100 speedup.
17Superscalar Approach
What about 4 pipelines? Too much
architecture! How about a single pipeline with
multiple functional units? This is called a
superscalar architecture. This approach was
rooted in the work of Seymour Cray on CDC
6600. Some sample functional units ALU LOAD
unit STORE unit Floating point operations unit
18Processor Parallelism
2 approaches to processor parallelism are array
processors vector processors In an array
processor, there exists a large number of
identical processors. Each processor performs
the same operations on different sets of data.
Each processor has its own memory. Instructions
are broadcast by the control unit. What are some
of the challenges of this approach? In a vector
processor, all addition operations are performed
by a single heavily pipelined adder.
19How would each processor handle addition of 2
vectors element-wise? The array processor has an
adder for each element of the vector. The vector
processor uses vector registers which can load
and store entire vectors of data from memory in a
single instruction. Then addition is performed
by the pipelined adder using the data stored in
the vector registers. Which approach to
processor parallelism requires more
hardware? Which do you think is more difficult
to program? Which do you think is faster?
20Multiprocessors
A multiprocessor is made up of a collection of
CPUs sharing a common memory. How does each
processor access memory? How does a
multiprocessor differ from an array
processor? One possible organization of the
multiple CPUs within a multiprocessor is around
single bus. But the bus quickly becomes a
bottleneck! One solution is to give each CPU a
local memory. It can cache information here.
21Multicomputers
A multicomputer is similar to a multiprocessor in
that it is made up of a collection of CPUs. But
it differs in that there is no shared
memory. How do the individual CPUs communicate?
Sending messages. What are some of the risks of
a message based system? What is an advantage of
a multicomputer over a multiprocessor?