William Stallings Computer Organization and Architecture - PowerPoint PPT Presentation

1 / 64

About This Presentation

Title:

William Stallings Computer Organization and Architecture

Description:

CPU must have some working space (temporary storage) ... Ideally instruction cycle time would be halved (if durationF = durationE ...) 32 ... – PowerPoint PPT presentation

Number of Views:57

Avg rating:3.0/5.0

Slides: 65

Provided by: ChiChe7

Category:

more less

Transcript and Presenter's Notes

Title: William Stallings Computer Organization and Architecture

1
William Stallings Computer Organization and
Architecture

Chapter 12
CPU Structure
and Function

2
Topics

Processor Organization
Register Organization
Instruction Cycle
Instruction Pipelining

3
CPU Organization

What does CPU do?
Fetch instructions
Interpret instructions
Fetch data
Process data
Write data
Major components of CPU
ALU
Control unit
Registers

4
Internal Structure of CPU
5
Registers

CPU must have some working space (temporary
storage)
e.g., store data temporarily, remember where to
get next instruction
Registers temporary storage in CPU
Number and function vary between processor
designs
One of the major design decisions
Top level of memory hierarchy
Small set of high-speed (expensive) storage

6
Register Organization

Two types of registers
User visible registers
may be referenced by assembly-level instruction
Control and status
used by control unit to control CPU operations
and
by OS programs

7
User Visible Registers

General-Purpose
Data
Address
Condition Codes

8
General-Purpose Registers

May be true general purpose
E.g., PDP-11 R0 R7
May be restricted
E.g., stack pointer, floating point registers
May be used for data or addressing
Data
e.g., accumulator, Motorola 68000 D0 D7
Addressing
e.g., segment, stack pointer, Motorola 68000 A0
A7

9
Design Issues of General-Purpose Registers (1)

General Purpose vs. Specialized
General purpose
Increase flexibility and programmer options
Increase instruction size complexity
Specialized (w/ implicit spec. in opcode)
Smaller (faster) instructions, as it saves bits
Less flexibility
Which is better?
No final and best answer
Trend toward specialized

10
Design Issues of General-Purpose Registers (2)

How many?
Between 8 - 32
Fewer ? more memory references
More does not reduce memory references and takes
up processor real estate
See also RISC (later)
Advantages of using hundreds of registers

11
Design Issues of General-Purpose Registers (3)

How big?
Large enough to hold full address
Large enough to hold full word
Often possible to combine two data registers
e.g., PDP-11 two registers to hold one long
integer

12
Condition Code Registers

Sets of individual bits
Set by CPU as the result of operation
e.g., result of last operation was zero --- Z bit
? 1
Can be read (implicitly) by programs
e.g. Branch if zero
Might be set by programs
Some instructions can set or clear condition codes

13
Control Status Registers

Program Counter
Instruction Register
Memory Address Register
Memory Buffer Register
Refresh your memory what do these all do?
Program Status Word (PSW)

14
Program Status Word

A set of bits
Includes Condition Codes
Sign of last result
Zero
Carry
Overflow
Extension
Equal
Interrupt enable/disable, interrupt mask
Supervisor

15
Supervisor Mode

Kernel mode
Monitor mode
Allows privileged instructions to execute
e.g., system (service) call
Used by operating system
Not available to user programs

16
Program Status Word - Example

Motorola 68000s PSW
System Byte User Byte
Interrupt Mask
Supervisor Status
Trace Mode

15 14 13 12 11 10 9 8 7 6
5 4 3 2 1 0
T S I2 I1 I0
X N Z V C
17
Other Registers

May have registers pointing to (see O/S)
Process control blocks
Interrupt vector register
Page table pointer
Design issues
OS Support
Allocation between registers and memory
CPU design and operating system design are
closely linked!!

18
Example Register Organizations
19
Instruction Cycle

Revisit

20
Indirect Cycle

May require memory access to fetch operands
Indirect addressing requires more memory accesses
Can be thought of as additional instruction
subcycle

21
Instruction Cycle with Indirect
22
Instruction Cycle State Diagram
23
Data Flow (Instruction Fetch)

Depends on CPU design
In general
Fetch
1. PC contains address of next instruction
2. Address moved to MAR
3. Address placed on address bus
4. Control unit requests memory read
5. Result placed on data bus, copied to MBR, then
to IR
6. Meanwhile PC incremented by 1

24
Data Flow (Fetch Diagram)
2
3
1
6
4
5
25
Data Flow (Data Fetch)

IR is examined
If indirect addressing, indirect cycle is
performed
1. Right most N bits of MBR transferred to MAR
2. Control unit requests memory read
3. Result (address of operand) moved to MBR
op-code address
instruction format

26
Data Flow (Indirect Diagram)
2
1
3
27
Data Flow (Execute)

May take many forms
Depends on instruction being executed
May include
Memory read/write
Input/Output
Register transfers
ALU operations

28
Data Flow (Interrupt)

Simple, predictable
Current PC saved to allow resumption after
interrupt
Include
1. Contents of PC copied to MBR
2. Special memory location (e.g. stack pointer)
loaded to MAR
3. Control unit WRITE
4. MBR written to memory
5. PC loaded with address of interrupt handling
routine
(Next instruction (first of interrupt handler)
can then be fetched)

29
Data Flow (Interrupt Diagram)
2
3
5
1
4
30
Instruction Pipelining

Similar to assembly line in manufacturing plants
Products at various stages can be worked on
simultaneously
? Performance improved
First attempt 2 stages
Fetch
Execution

31
Prefetch

Fetch accessing main memory
Execution usually does not access main memory
Can fetch next instruction during execution of
current instruction
Called instruction prefetch or fetch overlap
Ideally instruction cycle time would be halved
(if durationF durationE )

32
Improved Performance (1)

But not doubled in reality, why?
Fetch usually shorter than execution
Prefetch more than one instruction?
Any jump or branch (branching) means that
prefetched instructions are not the required
instructions
e.g., ADD A, B
BEQ NEXT
ADD B, C
NEXT SUB C, D

33
Improved Performance (2)

Add more stages to improve performance
Reduce time loss due to branching by guessing
Prefetch instruction after branching instruction
If not branched
use the prefetched instruction
else
discard the prefetched instruction
fetch new instruction

34
Two Stage Instruction Pipeline
35
Pipelining

More stages ? more speedup (the more the merrier)
FI Fetch instruction
DI Decode instruction
CO Calculate operands (i.e. EAs)
FO Fetch operands
EI Execute instructions
WO Write result
Various stages are of nearly equal duration
Overlap these operations

36
Timing of Pipeline
37
Speedup of Pipelining (1)

9 instructions 6 stages
w/o pipelining __ time units
w/ pipelining __ time units
speedup _____
Q 100 instructions 6 stages, speedup ____
Q ? instructions k stages, speedup ____
Can you prove it (formally)?

38
Speedup of Pipelining (2)

Parameters
? pipeline cycle time time to advance a
set of instructions one stage
k number of stages
n number of instructions
Assume no branch, time to execute n instructions
Tk k (n - 1)?
Time to execute n instructions without pipelining
T1 nk?

39
Speedup of Pipelining (3)

Speedup of k-stage pipelining compared to without
pipelining
Q ? instructions k stages, speedup ____
Can you prove it (formally)?

40
Pipelining - Discussion

Not all stages are needed in one instruction
e.g., LOAD WO not needed
Timing is set up assuming all stages are needed
by each instruction
? Simplify pipeline hardware
Assume all stages can be performed in parallel
e.g., FI, FO, and WO ? memory conflicts

41
Limitation by Branching

Conditional branch instructions can invalidate
several instruction prefetches
In our example (see next slide)
Instruction 3 is a conditional branch to
instruction 15
Next instructions address wont be known till
instruction 3 is executed (at time unit 7)
? pipeline must be cleared
No instruction is finished from time units 9 to
12
? performance penalty

42
Branch in a Pipeline
43
Six Stage Instruction Pipeline
44
Alternative Pipeline Depiction
45
Speedup Factors with Instruction Pipelining
46
Limitation byData Dependencies

Data needed by current instruction may depend on
a previous instruction that is still in pipeline
E.g., A ? B C
D ? A E

47
Performance of Pipeline

Ideally, more stages, more speedup
However,
more overhead in moving data between buffers
more overhead in preparation
more complex circuit for pipeline hardware

48
Dealing with Branches

Multiple Streams
Prefetch Branch Target
Loop buffer
Branch prediction
Delayed branching

49
Multiple Streams

Have two pipelines
Prefetch each branch into a separate pipeline
Use appropriate pipeline
Problems
Leads to bus register contention delays
Multiple branches (i.e., additional branch
entering pipelines before original branch
decision made) lead to further pipelines being
needed
Can improve performance, anyway
e.g., IBM 370/168

50
Prefetch Branch Target

Target of branch is prefetched in addition to
instructions following branch
Keep target until branch is executed
If branch is taken, target is already prefetched
Used by IBM 360/91

51
Loop Buffer (1)

Small, very fast memory
Maintained by fetch (IF) stage of pipeline
Contains the n most recently fetched instructions
in sequence
If a branch is to be taken
Hardware checks whether the target is in buffer
If YES then
next instruction is fetched from the buffer
else
fetch from memory

52
Loop Buffer (2)

Reduce memory access time
Very good for small loops or jumps
If buffer is big enough to contain entire loop,
instructions in the loop need to be fetched from
memory only once at the first iteration
c.f. cache
Used by CRAY-1

53
Loop Buffer Diagram
54
Branch Prediction (1)

Predict whether a branch will be taken
If the prediction is right
? No branch penalty
If the prediction is wrong
? Empty pipeline
Fetch correct instruction
? Branch penalty

55
Branch Prediction (2)

Predict techniques
Static
Predict never taken
Predict always taken
Predict by opcode
Dynamic
Taken/not taken switch
Branch history table

56
Branch Prediction (3)

Predict never taken
Assume that jump will not happen
Always fetch next instruction
68020 VAX 11/780
VAX will not prefetch after branch if a page
fault would result (O/S v CPU design)
Predict always taken
Assume that jump will happen
Always fetch target instruction

57
Branch Prediction (4)