Title: William Stallings Computer Organization and Architecture
1William Stallings Computer Organization and
Architecture
- Chapter 12
- CPU Structure
- and Function
2Topics
- Processor Organization
- Register Organization
- Instruction Cycle
- Instruction Pipelining
3CPU Organization
- What does CPU do?
- Fetch instructions
- Interpret instructions
- Fetch data
- Process data
- Write data
- Major components of CPU
- ALU
- Control unit
- Registers
4Internal Structure of CPU
5Registers
- CPU must have some working space (temporary
storage) - e.g., store data temporarily, remember where to
get next instruction - Registers temporary storage in CPU
- Number and function vary between processor
designs - One of the major design decisions
- Top level of memory hierarchy
- Small set of high-speed (expensive) storage
6Register Organization
- Two types of registers
- User visible registers
- may be referenced by assembly-level instruction
- Control and status
- used by control unit to control CPU operations
and - by OS programs
7User Visible Registers
- General-Purpose
- Data
- Address
- Condition Codes
8General-Purpose Registers
- May be true general purpose
- E.g., PDP-11 R0 R7
- May be restricted
- E.g., stack pointer, floating point registers
- May be used for data or addressing
- Data
- e.g., accumulator, Motorola 68000 D0 D7
- Addressing
- e.g., segment, stack pointer, Motorola 68000 A0
A7
9Design Issues of General-Purpose Registers (1)
- General Purpose vs. Specialized
- General purpose
- Increase flexibility and programmer options
- Increase instruction size complexity
- Specialized (w/ implicit spec. in opcode)
- Smaller (faster) instructions, as it saves bits
- Less flexibility
- Which is better?
- No final and best answer
- Trend toward specialized
10Design Issues of General-Purpose Registers (2)
- How many?
- Between 8 - 32
- Fewer ? more memory references
- More does not reduce memory references and takes
up processor real estate - See also RISC (later)
- Advantages of using hundreds of registers
11Design Issues of General-Purpose Registers (3)
- How big?
- Large enough to hold full address
- Large enough to hold full word
- Often possible to combine two data registers
- e.g., PDP-11 two registers to hold one long
integer
12Condition Code Registers
- Sets of individual bits
- Set by CPU as the result of operation
- e.g., result of last operation was zero --- Z bit
? 1 - Can be read (implicitly) by programs
- e.g. Branch if zero
- Might be set by programs
- Some instructions can set or clear condition codes
13Control Status Registers
- Program Counter
- Instruction Register
- Memory Address Register
- Memory Buffer Register
- Refresh your memory what do these all do?
- Program Status Word (PSW)
14Program Status Word
- A set of bits
- Includes Condition Codes
- Sign of last result
- Zero
- Carry
- Overflow
- Extension
- Equal
- Interrupt enable/disable, interrupt mask
- Supervisor
15Supervisor Mode
- Kernel mode
- Monitor mode
- Allows privileged instructions to execute
- e.g., system (service) call
- Used by operating system
- Not available to user programs
16Program Status Word - Example
- Motorola 68000s PSW
- System Byte User Byte
- Interrupt Mask
- Supervisor Status
- Trace Mode
15 14 13 12 11 10 9 8 7 6
5 4 3 2 1 0
T S I2 I1 I0
X N Z V C
17Other Registers
- May have registers pointing to (see O/S)
- Process control blocks
- Interrupt vector register
- Page table pointer
- Design issues
- OS Support
- Allocation between registers and memory
- CPU design and operating system design are
closely linked!!
18Example Register Organizations
19Instruction Cycle
20Indirect Cycle
- May require memory access to fetch operands
- Indirect addressing requires more memory accesses
- Can be thought of as additional instruction
subcycle
21Instruction Cycle with Indirect
22Instruction Cycle State Diagram
23Data Flow (Instruction Fetch)
- Depends on CPU design
- In general
- Fetch
- 1. PC contains address of next instruction
- 2. Address moved to MAR
- 3. Address placed on address bus
- 4. Control unit requests memory read
- 5. Result placed on data bus, copied to MBR, then
to IR - 6. Meanwhile PC incremented by 1
24Data Flow (Fetch Diagram)
2
3
1
6
4
5
25Data Flow (Data Fetch)
- IR is examined
- If indirect addressing, indirect cycle is
performed - 1. Right most N bits of MBR transferred to MAR
- 2. Control unit requests memory read
- 3. Result (address of operand) moved to MBR
- op-code address
- instruction format
26Data Flow (Indirect Diagram)
2
1
3
27Data Flow (Execute)
- May take many forms
- Depends on instruction being executed
- May include
- Memory read/write
- Input/Output
- Register transfers
- ALU operations
28Data Flow (Interrupt)
- Simple, predictable
- Current PC saved to allow resumption after
interrupt - Include
- 1. Contents of PC copied to MBR
- 2. Special memory location (e.g. stack pointer)
loaded to MAR - 3. Control unit WRITE
- 4. MBR written to memory
- 5. PC loaded with address of interrupt handling
routine - (Next instruction (first of interrupt handler)
can then be fetched)
29Data Flow (Interrupt Diagram)
2
3
5
1
4
30Instruction Pipelining
- Similar to assembly line in manufacturing plants
- Products at various stages can be worked on
simultaneously - ? Performance improved
- First attempt 2 stages
- Fetch
- Execution
31Prefetch
- Fetch accessing main memory
- Execution usually does not access main memory
- Can fetch next instruction during execution of
current instruction - Called instruction prefetch or fetch overlap
- Ideally instruction cycle time would be halved
- (if durationF durationE )
32Improved Performance (1)
- But not doubled in reality, why?
- Fetch usually shorter than execution
- Prefetch more than one instruction?
- Any jump or branch (branching) means that
prefetched instructions are not the required
instructions - e.g., ADD A, B
- BEQ NEXT
- ADD B, C
- NEXT SUB C, D
33Improved Performance (2)
- Add more stages to improve performance
- Reduce time loss due to branching by guessing
- Prefetch instruction after branching instruction
- If not branched
- use the prefetched instruction
- else
- discard the prefetched instruction
- fetch new instruction
34Two Stage Instruction Pipeline
35Pipelining
- More stages ? more speedup (the more the merrier)
- FI Fetch instruction
- DI Decode instruction
- CO Calculate operands (i.e. EAs)
- FO Fetch operands
- EI Execute instructions
- WO Write result
- Various stages are of nearly equal duration
- Overlap these operations
36Timing of Pipeline
37Speedup of Pipelining (1)
- 9 instructions 6 stages
- w/o pipelining __ time units
- w/ pipelining __ time units
- speedup _____
- Q 100 instructions 6 stages, speedup ____
- Q ? instructions k stages, speedup ____
- Can you prove it (formally)?
38Speedup of Pipelining (2)
- Parameters
- ? pipeline cycle time time to advance a
set of instructions one stage - k number of stages
- n number of instructions
- Assume no branch, time to execute n instructions
Tk k (n - 1)? - Time to execute n instructions without pipelining
T1 nk?
39Speedup of Pipelining (3)
- Speedup of k-stage pipelining compared to without
pipelining - Q ? instructions k stages, speedup ____
- Can you prove it (formally)?
40Pipelining - Discussion
- Not all stages are needed in one instruction
- e.g., LOAD WO not needed
- Timing is set up assuming all stages are needed
by each instruction - ? Simplify pipeline hardware
- Assume all stages can be performed in parallel
- e.g., FI, FO, and WO ? memory conflicts
41Limitation by Branching
- Conditional branch instructions can invalidate
several instruction prefetches - In our example (see next slide)
- Instruction 3 is a conditional branch to
instruction 15 - Next instructions address wont be known till
instruction 3 is executed (at time unit 7) - ? pipeline must be cleared
- No instruction is finished from time units 9 to
12 - ? performance penalty
42Branch in a Pipeline
43Six Stage Instruction Pipeline
44Alternative Pipeline Depiction
45Speedup Factors with Instruction Pipelining
46Limitation byData Dependencies
- Data needed by current instruction may depend on
a previous instruction that is still in pipeline - E.g., A ? B C
- D ? A E
47Performance of Pipeline
- Ideally, more stages, more speedup
- However,
- more overhead in moving data between buffers
- more overhead in preparation
- more complex circuit for pipeline hardware
48Dealing with Branches
- Multiple Streams
- Prefetch Branch Target
- Loop buffer
- Branch prediction
- Delayed branching
49Multiple Streams
- Have two pipelines
- Prefetch each branch into a separate pipeline
- Use appropriate pipeline
- Problems
- Leads to bus register contention delays
- Multiple branches (i.e., additional branch
entering pipelines before original branch
decision made) lead to further pipelines being
needed - Can improve performance, anyway
- e.g., IBM 370/168
50Prefetch Branch Target
- Target of branch is prefetched in addition to
instructions following branch - Keep target until branch is executed
- If branch is taken, target is already prefetched
- Used by IBM 360/91
51Loop Buffer (1)
- Small, very fast memory
- Maintained by fetch (IF) stage of pipeline
- Contains the n most recently fetched instructions
in sequence - If a branch is to be taken
- Hardware checks whether the target is in buffer
- If YES then
- next instruction is fetched from the buffer
- else
- fetch from memory
52Loop Buffer (2)
- Reduce memory access time
- Very good for small loops or jumps
- If buffer is big enough to contain entire loop,
instructions in the loop need to be fetched from
memory only once at the first iteration - c.f. cache
- Used by CRAY-1
53Loop Buffer Diagram
54Branch Prediction (1)
- Predict whether a branch will be taken
- If the prediction is right
- ? No branch penalty
- If the prediction is wrong
- ? Empty pipeline
- Fetch correct instruction
- ? Branch penalty
55Branch Prediction (2)
- Predict techniques
- Static
- Predict never taken
- Predict always taken
- Predict by opcode
- Dynamic
- Taken/not taken switch
- Branch history table
56Branch Prediction (3)
- Predict never taken
- Assume that jump will not happen
- Always fetch next instruction
- 68020 VAX 11/780
- VAX will not prefetch after branch if a page
fault would result (O/S v CPU design) - Predict always taken
- Assume that jump will happen
- Always fetch target instruction
57Branch Prediction (4)
- Predict by Opcode
- Some instructions are more likely to result in a
jump than others - Can get up to 75 success
- Taken/Not taken switch
- Based on previous history
- Good for loops
- Branch history table
- Like a cache to look up
58Branch Prediction Flowchart
59Branch Prediction State Diagram
60Dealing With Branches
61Delayed Branching
- Do not take jump until you have to
- Rearrange instructions so that branch instruction
occurs later than actually desired (Chapter 13)
62Intel 80486 Pipelining
- Fetch
- From cache or external memory
- Put in one of two 16-byte prefetch buffers
- Fill buffer with new data as soon as old data
consumed - Average 5 instructions fetched per load
- Independent of other stages to keep buffers full
- Decode stage 1
- Opcode address-mode info
- At most first 3 bytes of instruction
- Can direct D2 stage to get rest of instruction
- Decode stage 2
- Expand opcode into control signals
- Computation of complex address modes
- Execute
- ALU operations, cache access, register update
- Writeback
- Update registers flags
- Results sent to cache bus interface write
buffers
6380486 Instruction Pipeline Examples
64Foreground Reading
- Processor examples
- Stallings Chapter 12
- Manufacturer web sites specs
- Web pages etc.